Spectral Libraries and Databases from the Ocean Protein Portal

The discovered proteins in the Ocean Protein Portal (OPP) can serve as a valuable resource for future metaproteomic analyses. To create this community resource, the data in the OPP has been converted into spectral libraries using machine learning spectrum generation as shown in this flowchart. Briefly, spectral libraries were generated by passing full ORF amino acid sequences of discovered peptides through the machine learning algorithm to create predicted spectra and compiled into a spectral library file for each geographic region (DLIB files) that can then be used in Data Independent Acquisition (DIA) experiments (Brisbin et al., in prep). The machine learning algorithm was trained using the human proteome dataset. By generating libraries using machine learning as opposed to using discovered peptide spectra directly we avoid the propagation of DDA false discoveries into the spectral library.

Geographic Region Data Source* Spectral Library Access Number of Peptides in Library
Central Pacific Ocean KM1128 3.0-51um DLIB 350429
Central Pacific Ocean KM1128 0.2-3.0um DLIB 198789
Central Pacific Ocean FK160115 0.2-3.0um DLIB 1240875
South Atlantic Ocean KN192 DLIB 2800
Atlantic Ocean KN210-04 DLIB 1176560
Ross Sea NBP0601 Net Tow DLIB 43968
Arctic and Bearing Sea HLY1301 DLIB 190486
Canada Basin JOIS2015 DLIB 179449
North Atlantic Ocean AE1913 DLIB 1107679
All Regions All Datasets DLIB 3451405

*See About Datasets tab for further information about each dataset