Hi there, I'm Aditya Elayavalli ๐
Computational Scientist | Bioinformatics Researcher | Machine Learning Enthusiast
๐ Cambridge, MA/ Leander, TX | ๐ง elayavalli.a@northeastern.edu | ๐ผ LinkedIn
๐งฌ About Me
I'm a computational scientist specializing in bioinformatics with a passion for translating complex biological datasets into actionable insights. My work combines deep learning, statistical modeling, and data visualization to advance cancer research and genomic analysis.
- ๐ฑ Learning advanced ensemble methods and explainable AI techniques
- ๐ฏ Looking to collaborate on open-source bioinformatics tools and medical AI projects
- ๐ซ How to reach me: elayavalli.a@northeastern.edu
๐ Featured Projects
๐ฉธ Blood Cancer Image Detection
Deep learning CNN model for automated classification of blood cells across 8 cell types, specifically identifying abnormal cells (immature granulocytes, erythroblasts) that may indicate leukemia or blood disorders.
Tech Stack: TensorFlow Keras OpenCV scikit-learn Python
Highlights:
- 4-block CNN architecture with batch normalization
- Data augmentation for robust training
- Automated cancer cell detection with confidence scoring
- Comprehensive reporting with visualization grids
๐งฌ RNA-seq Differential Expression Pipeline
Complete bioinformatics pipeline for analyzing cytokine and chemokine gene expression from RNA-seq data, converting mouse Ensembl IDs to gene symbols and performing treatment-focused differential expression analysis.
Tech Stack: mygene Pandas Matplotlib Seaborn SciPy Python
Highlights:
- 98.4% gene ID conversion success rate
- Automated PCA and heatmap generation
- FDR-adjusted statistical comparisons
- Cross-tissue treatment effect analysis
๐ง Single-Cell RNA-seq Cancer Analysis
Comprehensive exploratory analysis of CNS cancer cell lines from CCLE, identifying three distinct molecular subtypes through unsupervised machine learning.
Tech Stack: DESeq2 biomaRt R ComplexHeatmap ggplot2
Highlights:
- Analyzed 63 CNS cancer cell lines across ~19,000 genes
- Identified ECM/Stromal-like, Neural/Glial-like, and Intermediate subtypes
- Variance stabilizing transformation (VST) normalization
- Differential expression with negative binomial modeling
๐งช Computational Phylogenetics Toolkit
Suite of bioinformatics tools including Markov chain sequence classifiers, HMM-based ORF simulators, UPGMA phylogenetic tree construction, and coding/non-coding gene prediction models.
Tech Stack: Python Biopython NumPy SciPy
Highlights:
- First-order Markov models for sequence classification
- Hidden Markov Models for gene structure simulation
- ROC curve analysis for pseudogene prediction
- Phylogenetic comparative analysis
๐ Data Science Practicum Projects
Collection of advanced data science and machine learning projects from DA 5020 and DA 5030 coursework, including diabetes prediction models and comprehensive statistical analyses.
Tech Stack: R RMarkdown tidyverse caret Machine Learning
Key Projects:
Diabetes Prediction Model: Machine learning classification with feature engineering and model evaluation Statistical Analysis Pipeline: Comprehensive EDA, hypothesis testing, and predictive modeling Data Wrangling & Visualization: Advanced R programming for data manipulation and insights
Highlights:
- End-to-end data science workflows from raw data to deployment-ready models
- Statistical inference and hypothesis testing
- Model validation and performance optimization
- Professional reporting with RMarkdown
๐ฆ Viral Proteome Amino Acid Analysis
Python-based bioinformatics analysis of the herpesvirus proteome to quantify and compare amino acid composition across viral structural proteins (capsid, envelope, membrane).
Tech Stack: Python NumPy Matplotlib JSON Jupyter Notebook
Highlights:
- Automated parsing of viral genome JSON to extract protein sequences
- Global amino acid frequency analysis across full viral proteome
- Comparative analysis of envelope, membrane, and capsid proteins
- Visualization of amino acid distributions using bar plots
- Identification of enriched/depleted amino acids by viral structure
- Reproducible workflow for viral proteomics and computational biology analysis
๐งฌ Cancer Genomics MAF Analysis Pipeline
Complete cancer genomics workflow for reading, summarizing, visualizing, and analyzing somatic mutation data from MAF files, integrating clinical information and identifying key mutation patterns, driver genes, and survival-associated biomarkers.
Tech Stack: R maftools Bioconductor ggplot2 Survival Analysis Cancer Genomics
Highlights:
- End-to-end MAF file processing and summarization
- Automated oncoplots, rainfall plots, and mutation burden visualization
- Detection of co-occurring and mutually exclusive mutations
- Driver gene identification using positional clustering (Oncodrive)
- Protein domain and mutation hotspot analysis
- Kaplan-Meier survival analysis linked to mutation status
- Cohort comparison for differential mutation profiling
Curently Working On
- Mutation Impact Predictor: Predict whether mutation likely disrupts viral structure.
๐ ๏ธ Technical Skills
Languages: Python, R, Bash, SQL
Bioinformatics: RNA-seq analysis, Single-cell sequencing, Differential expression, Gene annotation, Phylogenetics
Machine Learning: TensorFlow, Keras, scikit-learn, CNN architectures, Transfer learning
Data Analysis: Pandas, NumPy, DESeq2, biomaRt, mygene
Visualization: Matplotlib, Seaborn, ggplot2, ComplexHeatmap, Plotly
Tools: Git, Docker, Jupyter, RStudio, Quarto, SLURM
๐ซ Let's Connect
I'm always interested in collaborating on computational biology projects, discussing bioinformatics challenges, or exploring new applications of machine learning in healthcare.
- ๐ง Email: elayavalli.a@northeastern.edu
- ๐ผ LinkedIn: linkedin.com/in/adityaelayavallia368a8158
- ๐ GitHub: @adihusky99
"Transforming biological complexity into computational clarity"