File Formats

Spectrum Input

For inference, Cascadia takes as input spectra in mzML format, which is a standardized and commonly used machine readable XML file format developed by the Proteomics Standards Initiative (PSI).

Raw vendor files can be converted to mzML format using the msConvert tool in ProteoWizard. We suggest using the adding the peakPicking filter at both the MS1 and MS2 levels using the vendor algorithm, and otherwise using default parameters.

Output format

Cascadia produces as output a generic tab-delimited text file format refered to as ssl (spectrum sequence list). A full description of this file format is available here. The file contains one row for each identification. The columns contain the following information for each prediction:

file

The mzML file containing the identified spectrum

scan

The ScanID for the center of the detection

charge

Charge state of the precursor

sequence

Peptide sequence

score

Cascadia confidence score assigned to this prediction

retention-time

The retention time (in minutes) when this peptide was detected

To visualize the results from Cascadia sequencing, this .ssl file can be coverted into a spectral library and loaded into skyline.

.asf format

To train Cascadia, the user needs to supply a collection of labeled augmented spetra. Augmented spectra are represented in .asf format. This format is a simple extension of the Mascot Generic Format (MGF) format for spectra with annotations. Each augmented spectrum starts with a set of annotations, which for Cascadia should at a minimum contain the fields ‘SEQ, CHARGE, and PEPMASS. PEPMASS should be the center m/z for the isolation window, while SEQ and CHARGE should contain the sequence and charge state for the precursor idenified in the augmented spectrum.

Following the anotations is a list of tab-delimited 4-tuples, with one row for each peak in the augmented spectrum. The augmented spectrum should contain peaks from the 2w+1 MS2 scans surrounding the central scan where the peptide is identified (where w is the augmentation width), along with the peaks from the corresponding MS1 scans that fall within the isolation window. Each row contains the m/z and intensity for a single peak, the same as in an MGF file, along with the relative position of the scan this peak is from compared to the central scan in the augmented spectrum (an integer in the range [-w, w]), and either a 1 or a 2 representing whether it is an MS1 or MS2 peak.

An example augmented spectrum with w=1 is shown below:

BEGIN IONS
TITLE=1
PEPMASS=402.5
CHARGE=1
SCAN=1
RT=0.31070598402
SEQ=PEPTIDEK
185.07986450195312      0.666241404428991       0     2
367.8453063964844       0.4087404090593467      0     2
400.98358154296875      0.8838386047632094      0     2
401.1849060058594       0.4504917813994659      0     2
401.2586669921875       0.42375381912879023     0     2
401.9834289550781       0.6479885582525154      0     2
418.99261474609375      0.6245163448036832      0     2
606.7568359378328       0.41198908195979644     0     2
606.756835937573        0.41198908195979644     0     2
843.2614135742188       0.41591027114618606     0     2
946.1708374023438       0.430206516710194       0     2
185.07986450195312      0.666241404428991       1     2
367.8453063964844       0.4087404090593467      1     2
400.98358154296875      0.8838386047632094      1     2
401.1849060058594       0.4504917813994659      1     2
401.2586669921875       0.42375381912879023     1     2
401.9834289550781       0.6479885582525154      1     2
418.99261474609375      0.6245163448036832      1     2
185.07986450195312      0.666241404428991       -1    2
367.8453063964844       0.4087404090593467      -1    2
400.98358154296875      0.8838386047632094      -1    2
401.1849060058594       0.4504917813994659      -1    2
401.2586669921875       0.42375381912879023     -1    2
401.9834289550781       0.6479885582525154      -1    2
418.99261474609375      0.6245163448036832      -1    2
606.7568359375737       0.41198908195979644     -1    2
843.2614135742188       0.41591027114618606     -1    2
946.1708374023438       0.430206516710194       -1    2
400.1812744140625       0.2918983249625358      0     1
400.3501892089844       0.3196363715025925      0     1
401.2091979980469       0.34867104752184375     0     1
403.2320556640625       0.30672175589489503     0     1
402.2713624687576       0.2951177103313615      -1    1
403.17193603515625      0.362321942870444       -1    1
END IONS