The discovered proteins in the Ocean Protein Portal (OPP) can serve as a valuable resource for future metaproteomic analyses. To create this community resource, the data in the OPP has been converted into spectral libraries using machine learning spectrum generation as shown in this flowchart. Briefly, spectral libraries were generated by passing full ORF amino acid sequences of discovered peptides through the machine learning algorithm to create predicted spectra and compiled into a spectral library file for each geographic region (DLIB files) that can then be used in Data Independent Acquisition (DIA) experiments (Brisbin et al., in prep). The machine learning algorithm was trained using the human proteome dataset. By generating libraries using machine learning as opposed to using discovered peptide spectra directly we avoid the propagation of DDA false discoveries into the spectral library.
Geographic Region | Data Source* | Spectral Library Access | Number of Peptides in Library | Source FASTA file |
---|---|---|---|---|
Central Pacific Ocean | KM1128 3.0-51um | DLIB | 350429 | FASTA |
Central Pacific Ocean | KM1128 0.2-3.0um | DLIB | 198789 | FASTA |
Central Pacific Ocean | FK160115 0.2-3.0um | DLIB | 1240875 | FASTA |
South Atlantic Ocean | KN192 | DLIB | 2800 | FASTA |
Atlantic Ocean | KN210-04 | DLIB | 1176560 | FASTA |
Ross Sea | NBP0601 Net Tow | DLIB | 43968 | FASTA |
Arctic and Bearing Sea | HLY1301 | DLIB | 190486 | FASTA |
Canada Basin | JOIS2015 | DLIB | 179449 | FASTA |
North Atlantic Ocean | AE1913 | DLIB | 1107679 | FASTA |
All Regions | All Datasets | DLIB | 3451405 | FASTA |
*See About Datasets tab for further information about each dataset