Getting Started

Installation

We recommend using conda to manage dependencies for Cascadia. Create a new conda enviornment with:

conda create --name cascadia_env python=3.10

This will create an environment called cascadia_env with Python 3.10 installed. Activate it by running:

conda activate cascadia_env

Finally, you can install Cascadia and all of its dependencies with:

pip install cascadia

Run de novo sequencing on new data with a trained model

Note

We recommend using linux and a dedicated GPU to achieve optimal runtime performance.

Most users will want to use a pretrained Cascadia model to perform de novo sequencing on a new dataset. Cascadia takes input MS data in the mzML format. A small demo dataset, along with the pretrained model checkpoints from the paper, are available here. The following example on the provided demo dataset should take approximately 1 minute to run on a GPU:

    cascadia sequence \
      demo.mzML  \
      cascadia.ckpt \
      --out demo_results

Cascadia will produce an output file, demo_results.ssl, containing the de novo sequencing results. This file contains one row for each prediction, and can be loaded ints skyline as a spectral library to visualize the results.

A full description of additional optional paramaters to Cascadia sequencing is available here.

Train a new model from scratch

To train Cascadia on new data, you need a labeled training and validation set in .asf format as positional arguments:

    cascadia train training_data.asf validation_data.asf

A full list of additional optional arguments for training are descried here.

Fine tune a model on new data

To fine tune a pre-trained model checkpoint on new data, you can simply pass it as an additional keyword argument to train:

    cascadia train training_data.asf validation_data.asf \
        --model pretrained_checkpoint.ckpt