GitHub - zz2liu/TCGA-RNASeq-tutorial

GitHub - zz2liu/TCGA-RNASeq-tutorial

The tutorial for a yale training session: TCGA RNA-seq Data, Download and Analyses all on your laptop.

See the slides used during the workshop here.

Go to TCGA data hub

Navigate and select files to basket
Download metadata and manifest from basket
Download the files with GDC-client

Preprocess the metadata

Convert to csv use the online tool json-to-csv
Metadata Description here
Choose and rename fields in a speadsheet or a R script.

.. Note: To to run the R script, you can install Rstudio.

Preprocess the FPKM matrix

Convert the downloaded files to a FPKM matrix in unix shell/terminal

for f in */*.gz; do
  id=$(dirname $f); echo $id > $id.tmp; 
  zcat $f | cut -f2 >> $id.tmp; 
done
echo 'featureId' > tmp.index
zcat $f | cut -f1 >> tmp.index
paste tmp.index *.tmp > ../geneId_fileId_FPKM.txt
rm tmp.index; rm *.tmp

.. Note: to use linux shell, run terminal on mac (OS X); install and run babun on a PC (windows).

Description of the Barcode
Description of the pipeline
Download the GENCODE gene annotation file
Map the FPKM matrix to gene symbol and barcode with preprocess_count_matrix.R.

Introduction of analyses in R

Using the script to:

Filter the genes and convert FPKM to log scale
Id genes coexpressed with your gene of interest
Id genes differently expressed between paired normal and tumor
PCA plot

Introduction of the analyses by FireHose

Gene
Cohort summary
Cohort data and workflow
Cohort analysis

FAQS

quick fix to get the miRNA FPM matrix for Queen Okoro

Convert the downloaded files to a FPKM matrix in unix shell/terminal

# cd to the folder with all the txt files under each sample directory.
for f in */*.txt; do
  id=$(dirname $f); echo $id > $id.tmp; #colnames to-be
  cat $f | cut -f3 >> $id.tmp;  #cell values to-be
done
echo 'featureId' > tmp.index
cat $f | cut -f1 >> tmp.index #the rownames to be
paste tmp.index *.tmp > ../featureId_fileId_FPKM.txt
rm tmp.index; rm *.tmp