Getting Started
Installation
We recommend using conda to manage dependencies for Cascadia. Create a new conda enviornment with:
conda create --name cascadia_env python=3.10
This will create an environment called cascadia_env with Python 3.10 installed. Activate it by running:
conda activate cascadia_env
Finally, you can install Cascadia and all of its dependencies with:
pip install cascadia
Run de novo sequencing on new data with a trained model
Note
We recommend using linux and a dedicated GPU to achieve optimal runtime performance.
Most users will want to use a pretrained Cascadia model to perform de novo sequencing on a new dataset. Cascadia takes input MS data in the mzML format. A small demo dataset, along with the pretrained model checkpoints from the paper, are available here. The following example on the provided demo dataset should take approximately 1 minute to run on a GPU:
cascadia sequence \
demo.mzML \
cascadia.ckpt \
--out demo_results
Cascadia will produce an output file, demo_results.ssl, containing the de novo sequencing results. This file contains one row for each prediction, and can be loaded ints skyline as a spectral library to visualize the results.
A full description of additional optional paramaters to Cascadia sequencing is available here.
Train a new model from scratch
To train Cascadia on new data, you need a labeled training and validation set in .asf format as positional arguments:
cascadia train training_data.asf validation_data.asf
A full list of additional optional arguments for training are descried here.
Fine tune a model on new data
To fine tune a pre-trained model checkpoint on new data, you can simply pass it as an additional keyword argument to train:
cascadia train training_data.asf validation_data.asf \
--model pretrained_checkpoint.ckpt