The tutorial for a yale training session: TCGA RNA-seq Data, Download and Analyses all on your laptop.
See the slides used during the workshop here.
Go to TCGA data hub
- Navigate and select files to basket
- Download metadata and manifest from basket
- Download the files with GDC-client
Preprocess the metadata
- Convert to csv use the online tool json-to-csv
- Metadata Description here
- Choose and rename fields in a speadsheet or a R script.
.. Note: To to run the R script, you can install Rstudio.
Preprocess the FPKM matrix
- Convert the downloaded files to a FPKM matrix in unix shell/terminal
for f in */*.gz; do id=$(dirname $f); echo $id > $id.tmp; zcat $f | cut -f2 >> $id.tmp; done echo 'featureId' > tmp.index zcat $f | cut -f1 >> tmp.index paste tmp.index *.tmp > ../geneId_fileId_FPKM.txt rm tmp.index; rm *.tmp
.. Note: to use linux shell, run terminal on mac (OS X); install and run babun on a PC (windows).
- Description of the Barcode
- Description of the pipeline
- Download the GENCODE gene annotation file
- Map the FPKM matrix to gene symbol and barcode with preprocess_count_matrix.R.
Introduction of analyses in R
Using the script to:
- Filter the genes and convert FPKM to log scale
- Id genes coexpressed with your gene of interest
- Id genes differently expressed between paired normal and tumor
- PCA plot
Introduction of the analyses by FireHose
- Gene
- Cohort summary
- Cohort data and workflow
- Cohort analysis
FAQS
quick fix to get the miRNA FPM matrix for Queen Okoro
- Convert the downloaded files to a FPKM matrix in unix shell/terminal
# cd to the folder with all the txt files under each sample directory. for f in */*.txt; do id=$(dirname $f); echo $id > $id.tmp; #colnames to-be cat $f | cut -f3 >> $id.tmp; #cell values to-be done echo 'featureId' > tmp.index cat $f | cut -f1 >> tmp.index #the rownames to be paste tmp.index *.tmp > ../featureId_fileId_FPKM.txt rm tmp.index; rm *.tmp