About The Ocean Protein Portal

The Ocean Protein Portal (OPP) is a data sharing platform for ocean metaproteomics data. A workflow diagram is provided here: OceanPortalWorkflow.pdf. A video tutorial can be found here.

The manuscript describing the Ocean Protein Portal and its use in research and education is available at https://pubs.acs.org/doi/full/10.1021/acs.jproteome.0c00382 and a review publication describing best practices for data sharing of ocean metaproteomic data that guided this portal’s development available at (https://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.8b00761). If you use the Ocean Protein Portal data in a publication cite : Saito, M.A., Saunders, J.K., Chagnon, M., Gaylord, D.A., Shepherd, A., Held, N.A., Dupont, C., Symmonds, N., York, A., Charron, M. and Kinkade, D.B., 2020. Development of an Ocean Protein Portal for Interactive Discovery and Education. Journal of Proteome Research, 20(1), 326-336.

Introduction to Proteins

Proteins comprise roughly half of the mass in organisms, serving a variety of critical cellular functions. Enzymes are one category of proteins that catalyze chemical reactions in cellular metabolism. There are a number of microbial enzymatic reactions that are fundamental to the global biogeochemical cycles of chemical elements and molecules, and hence learning about their distribution can inform our understanding of biogeochemistry. Proteins also be transporters of key nutritional or toxic elements, as well as serving structural and regulatory functions. In this regard, the measurement of proteins in natural environments containing a multitude of organisms, also called metaproteomics, has become a valuable oceanographic methodology.

Introduction the Ocean Protein Portal (OPP): Where is my protein in the Oceans?

The OPP is intended to allow those interested in a particular protein or function to explore where their protein of interest exists in the ocean. It provides a suite of search capabilities and filters to interrogate metaproteomic datasets that have been deposited into the OPP. For example, proteins can be discovered by their functional name (product name), KEGG, PFam, or Enzyme Commission number information (Boolean search terms enabled). Alternatively, sequence alignment searches of full or partial amino acid protein sequences can be conducted from the drop down menu using peptide or DIAMOND searches. For peptides, sequences will be digested in silico into tryptic peptides and exact tryptic peptides matches to peptides discovered by mass spectrometry will be returned. Text based searches such as those for metal related proteins (search the word iron, nickel) or important enzymes (Rubisco, Superoxide dismutase), or specific Enzyme Commission numbers within the protein annotation (searching 1.15.1.1 in product name will return nickel superoxide dismutase).

Introduction METATRYP Least Common Ancestor Search: Who makes my Protein?

Protein sequence information can provide taxonomic information regarding its biological source. This is conducted by comparison of tryptic peptide sequences to those found within the representative microbial genomes, single amplified genomes, and metagenomes within the METATRYP database. A Least Common Ancestor (LCA) for a peptide is determined as being the highest taxonomic branch of an exact shared peptide sequence. Previous pairwise genomic analysis microbial genomes found that the number of shared tryptic peptides between ocean microbial species is generally less than 5% and often less than 1% of all tryptic peptides within a genome (Saito et al., 2015), allowing significant opportunity for taxonomic analysis within tryptic peptide space if metaproteomic taxonomy tools such as METATRYP are employed. The METATRYP database is searched for shared tryptic peptides by OPP users in realtime using an API to an independent METATRYP site. The standalone METATRYP site is available METATRYP page.

Ocean Datasets Ingested into the Ocean Protein Portal

A list of datasets currently within the OPP is on the ABOUT page.

Submit Data to the Ocean Protein Portal

We welcome ocean metaproteomic data submissions. Data files types, templates, and contact and processing information are available at the data-file-templates Github page

If you are interested in depositing data to the OPP please contact us at oceanproteinportal@whoi.edu. Laboratory culture studies are recommended for deposition at other repositories, as the OPP is not currently scoped/funded to host laboratory studies.

Metaproteomic Data Units

Ocean metaproteomics currently hosts datasets that report each protein's abundance in total spectral counts. The OPP has also been designed with the ability to host a variety of datatypes that can be accomodated in the future.

Data Use Policies

The current OPP platform is intended to provide insights into where proteins exist in the oceans. The OPP is adopting the data use policies similar to the GEOTRACES program, where correct attribution and citation is viewed as an important aspect of the data policy. Moreover, the 2018 Workshop participants for Best Practices in Data Sharing (see review here https://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.8b00761) recommended that users interested in using metaproteomic data sets in publications contact data generators and consider discussing collaboration if using their metaproteomic data. This serves two important purposes: First, there is a danger that non-expert users misinterpret or misuse data resulting in incorrect interpretations given the youth of the metaproteomic data type especially when considering issues of cross dataset comparisons and normalizations. Second, attribution to and collaboration with the data generators will create a valuable incentive to share future datasets in the OPP’s data search and visualization environment versus solely depositing data in raw spectra repositories where data reuse is more challenging.

Selection of Datasets using Data and Physical (Membrane) Filters

The OPP has filters options to subselect datasets to be searched. These include Expeditions, dates, geographic locations, depth, and filter size(s). The last criteria allows a selection of the microbial communities that are being searched, with >0.2 micron being a typical filter pore size that captures the entire microbial community. Use of higher filter fractions (>3.0 micron) can select for or against eukaryotic protists (phytoplankton and mixotrophs) and even sinking particles. Most datasets within the OPP are 0.2-3.0, see the ABOUT page for specific dataset information.

(Tryptic) Peptides as a Base Unit

Proteomics identifications are based on determination of exact molecular weights from mass spectrometry spectra that are then matched to genome or metagenome sequences. Hence the sequence variation within metaproteomics is captured through exact mapping to genomic and metagenomic sequencing and the sequence variation therein. As a result, tryptic peptides (peptides cleaved by the enzyme trypsin prior to mass spec analysis) are the basal unit of information within these metaproteomic datasets. The OPP allows the users to examine the peptides attributed to protein identification and to search them on the METATRYP for taxonomic analyses. Users can also conduct a NCBI BLAST-P search of identified proteins of interest from the Protein Data/Sequence popup.

OPP Sustainability

The OPP was designed to have modest operational costs to allow it to be a sustainable data sharing portal. It is the hope of the OPP team that the Ocean Protein Portal will be viewed as a useful resource by the ocean chemistry and biology, biochemical protein science, and educational communities in the coming years. Please support the OPP and provide submissions and feedback, so that this prototype can continue to serve a broad scientific community.

OPP Release Schedule

The Ocean Protein Portal Prototype was released 2/26/19 at a Town Hall at the Aquatic Sciences meeting in Puerto Rico. Prior to that the OPP Beta Technical version was released in the summer of 2018 (6/7/18) for the EarthCube all hands meeting (DC), the American Society for Mass Spectrometry meeting (San Diego), and SciPy scientific computing meeting in Austin Texas. The METATRYP software (2.0) that operates behind the OPP was released as a standalone web interface in February 2018 at the Ocean Science Meeting. METATRYP command line software released in 2015 (Saito et al., 2015; Proteomics). Feedback and improvements will incorporated in the Fall of 2018, and with ocean metaproteomic data ingestion with a target date for a Scientific Release at the February 24th 2019 Aquatic Sciences Meeting. Version 1.5 was released in 2023 with DIAMOND search and improved backend databases architechure and visualizations.

About OPP: Software Infrastructure

The OPP and METATRYP are hosted on virtual machines at WHOI, using Python, ElasticSearch, Django, Javascript, PostGres, OceanMap, Plotly, and Matplotlib. OceanMap attribution Satellite: Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye, Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community

Acknowledgements and Support

The OPP was developed with funding from the U.S. National Science Foundation as an EarthCube prototype on two grants: The prototype funded by an initial (NSF Earthcube grant) starting in 2016 and a renewal (Earthcube grant) for expanding functionality. The underlying METATRYP peptide taxonomic software was developed in a grant from the Gordon and Betty Moore Foundation Marine Microbiology Initiative program. The OPP team is a collaboration between the Saito laboratory, the Information Services Application group, and the Biological and Chemical Oceanography Data Management Office all at the Woods Hole Oceanographic Institution. Consulting services were provided by the Kaimika. The OPP development team consists of Mak Saito (PI), Jaci Saunders (Co-PI), Noelle Held, Nick Symmonds, David Gaylord (WHOI IS Applications group), Danie Kinkade (Co-PI, BCO-DMO) and Adam Shepherd (Co-PI, BCO-DMO), and Michael Chagnon, Colin Nelson, and Paul Duffy (Kaimika). The efforts of the participants of the Data Sharing Workshop for Ocean Metaproteomics (May 2017) were also instrumental in developing best practices for ocean metaproteomics data sharing. Metagenomic resources for Metatryp were provided by Chris Dupont at J.C. Venter Institute.

Feedback

Feedback regarding the Ocean Protein Portal is actively encouraged so we may improve its functionality. Please write us at oceanproteinportal@whoi.edu.