Usage

Sequence

cascadia sequence SPECTRUM_FILE -m MODEL [-o OUTFILE] [-t SCORE_THRESHOLD] [-b BATCH_SIZE] [-w WIDTH] [-c MAX_CHARGE]

Argument

Description

spectrum_file

(required) The mzML file to perform de novo sequencing on.

-o, –outfile

The output file to save de novo sequencing results to. (default: cascadia_results.ssl)

-t, –score_threshold]

The score threshold applied to predictions. (default: 0.8)

-b, –batch_size

The batch size for inference. For the fastest inference the largest batch size that fits in GPU memory is recommended. (default: 32)

-w, –width

The number of adjacent scans to use when construcing augmented spectra. (default: 2)

-c, –max_charge

The maximum precursor charge to consider when making predictions. (default 4)

-p, –modifications

A path to the json file containing a list of PTMs in Proforma format. If not provided, the PTMs present in the Massive-KB dataset will be used by default. The list of PTMs needs to match those used to train the model. (default mskb)

Train

cascadia train TRAIN_SPECTRUM_FILE VAL_SPECTRUM_FILE [-m MODEL] [-b BATCH_SIZE] [-w WIDTH] [-c MAX_CHARGE] [-e MAX_EPOCHS] [-lr LEARNING_RATE]

Argument

Description

spectrum_file

(required) A labeled .asf file to use for model training.

spectrum_file

(required) A labeled .asf file used for validation during training.

-m, –model

A pre-trained model checkpoint to use for fine-tuning. If none is provided, the model is trained from scratch. (default: None)

-b, –batch_size

The batch size to use for training. (default: 32)

-w, –width

The number of adjacent scans used when constructing augmented spectra in the training data. (default: 2)

-c, –max_charge

The maximum precursor charge to be considered by the model. (default 4)

-e, –max_epochs

The maximum number of epochs to train the model for. The model checkpoint with the lowest validation loss after max_epochs will be saved. (default 10)

-lr, –learning_rate

The learning rate to use for model training. (default 1e-5)

-p, –modifications

A path to the json file containing a list of PTMs in Proforma format. If not provided, the PTMs present in the Massive-KB dataset will be used by default. (default mskb)