GitHub - Pfeeder/GPTime: Chromatographic Retention Time prediction with Gaussian Procsses

GPTime python package

-----------------------------------

0.0 - Installation

To install the package, you first needed to add lfs functionality to GIT by
following the instruction on :

https://git-lfs.github.com

and then run the command :

git lfs clone https://github.com/statisticalbiotechnology/GPTime.git

The application will be installed in directory GPTime, inside the path in which
you ran this command.

1.0 - Dependencies

The GPTime package is dependant on several other python packages such as

numpy, matplotlib, sklearn, joblib, GPy

Each of these packages can be installed using the following formula :

sudo pip install --upgrade package_name

2.0 - Training

To train a model using GPTime, you need a file containing the peptides and
their recorded retention time. The content of the file should be organized
as following :

K.HLNICGTVGSIDNDMSTTDATIGAYSALDRICK.A   245.754
K.AANSVSQDSSYTDFSFTIAGTAHNAHSVTQSASK.V  184.938
K.FATVPTGGASSAAAGAAGAAAGGDAAEEEK.E      150.038
K.IGSGSFGDIYHGTNLISGEEVAIK.L    225.381
K.AASELRILYGGSANGSNAVTFK.D      191.693
K.DAGAISGLNVLRIINEPTAAAIAYGLGAGK.S      256.446
K.ATVDEFPLCVHLVSNELEQLSSEALEAARICANK.Y  256.898
K.GVLGYTEDAVVSSDFLGDSHSSIFDASAGIQLSPK.F 255.529
K.VNLQISDGQPTMCQLEQDYQASDFSVNVK.T       253.647
K.ISAVSTYFESFPYRVNPETGIIDYDTLEK.N       255.285
K.VTDCGDFSYTDLDGSVSDHQGLYVK.L   199.155
K.IPAVEYFGGESPVDVQSQVDSSSVSEDSAVFK.A    252.335

Different training files that were used in our paper can be found in ./Data .

Following command line is an example of how a model is trained. The output model
is saved to model.pk .

python gptime.py --operation train --peptides ./Data/20110922_EXQ4_NaNa_SA_YeastEasy_Labelfree_06.rtimes_q_0.001.tsv --model ./model.pk --ntrain 100

This model is trained over the first 100 peptides of the data file ./Data/20110922_EXQ4_NaNa_SA_YeastEasy_Labelfree_06.rtimes_q_0.001.tsv
and is saved to ./model.pk .

3.0 - Prediction

Similarly to predict the retention time for the content of a file, we call the
gptime.py using predict operation :

python gptime.py --operation predict --peptides ./Data/20110922_EXQ4_NaNa_SA_YeastEasy_Labelfree_06.rtimes_q_0.001.tsv --model ./model.pk

This way, we calculate the RT time and Predictive Standard deviation of the
peptides in the file using the model ./model.pk .

The output of this process is for each row :
peptide actual_rt predicted_rt predicted_variance predicted_std

4.0 - Generating the plots of the manuscript

To generate the plots of the manuscript look at the jupyter notebook ./Codes/Manuscript_plots.ipynb