adihusky99 - Overview

Hi there, I'm Aditya Elayavalli 👋

Computational Scientist | Bioinformatics Researcher | Machine Learning Enthusiast

📍 Cambridge, MA/ Leander, TX | 📧 elayavalli.a@northeastern.edu | 💼 LinkedIn

🧬 About Me

I'm a computational scientist specializing in bioinformatics with a passion for translating complex biological datasets into actionable insights. My work combines deep learning, statistical modeling, and data visualization to advance cancer research and genomic analysis.

🌱 Learning advanced ensemble methods and explainable AI techniques
👯 Looking to collaborate on open-source bioinformatics tools and medical AI projects
📫 How to reach me: elayavalli.a@northeastern.edu

🚀 Featured Projects

🩸 Blood Cancer Image Detection

Deep learning CNN model for automated classification of blood cells across 8 cell types, specifically identifying abnormal cells (immature granulocytes, erythroblasts) that may indicate leukemia or blood disorders.

Tech Stack: TensorFlow Keras OpenCV scikit-learn Python

Highlights:

4-block CNN architecture with batch normalization
Data augmentation for robust training
Automated cancer cell detection with confidence scoring
Comprehensive reporting with visualization grids

🧬 RNA-seq Differential Expression Pipeline

Complete bioinformatics pipeline for analyzing cytokine and chemokine gene expression from RNA-seq data, converting mouse Ensembl IDs to gene symbols and performing treatment-focused differential expression analysis.

Tech Stack: mygene Pandas Matplotlib Seaborn SciPy Python

Highlights:

98.4% gene ID conversion success rate
Automated PCA and heatmap generation
FDR-adjusted statistical comparisons
Cross-tissue treatment effect analysis

🧠 Single-Cell RNA-seq Cancer Analysis

Comprehensive exploratory analysis of CNS cancer cell lines from CCLE, identifying three distinct molecular subtypes through unsupervised machine learning.

Tech Stack: DESeq2 biomaRt R ComplexHeatmap ggplot2

Highlights:

Analyzed 63 CNS cancer cell lines across ~19,000 genes
Identified ECM/Stromal-like, Neural/Glial-like, and Intermediate subtypes
Variance stabilizing transformation (VST) normalization
Differential expression with negative binomial modeling

🧪 Computational Phylogenetics Toolkit

Suite of bioinformatics tools including Markov chain sequence classifiers, HMM-based ORF simulators, UPGMA phylogenetic tree construction, and coding/non-coding gene prediction models.

Tech Stack: Python Biopython NumPy SciPy

Highlights:

First-order Markov models for sequence classification
Hidden Markov Models for gene structure simulation
ROC curve analysis for pseudogene prediction
Phylogenetic comparative analysis

📊 Data Science Practicum Projects

Collection of advanced data science and machine learning projects from DA 5020 and DA 5030 coursework, including diabetes prediction models and comprehensive statistical analyses. Tech Stack: R RMarkdown tidyverse caret Machine Learning Key Projects:

Diabetes Prediction Model: Machine learning classification with feature engineering and model evaluation Statistical Analysis Pipeline: Comprehensive EDA, hypothesis testing, and predictive modeling Data Wrangling & Visualization: Advanced R programming for data manipulation and insights

Highlights:

End-to-end data science workflows from raw data to deployment-ready models
Statistical inference and hypothesis testing
Model validation and performance optimization
Professional reporting with RMarkdown

🦠 Viral Proteome Amino Acid Analysis

Python-based bioinformatics analysis of the herpesvirus proteome to quantify and compare amino acid composition across viral structural proteins (capsid, envelope, membrane).

Tech Stack: Python NumPy Matplotlib JSON Jupyter Notebook

Highlights:

Automated parsing of viral genome JSON to extract protein sequences
Global amino acid frequency analysis across full viral proteome
Comparative analysis of envelope, membrane, and capsid proteins
Visualization of amino acid distributions using bar plots
Identification of enriched/depleted amino acids by viral structure
Reproducible workflow for viral proteomics and computational biology analysis

🧬 Cancer Genomics MAF Analysis Pipeline

Complete cancer genomics workflow for reading, summarizing, visualizing, and analyzing somatic mutation data from MAF files, integrating clinical information and identifying key mutation patterns, driver genes, and survival-associated biomarkers.

Tech Stack: R maftools Bioconductor ggplot2 Survival Analysis Cancer Genomics

Highlights:

End-to-end MAF file processing and summarization
Automated oncoplots, rainfall plots, and mutation burden visualization
Detection of co-occurring and mutually exclusive mutations
Driver gene identification using positional clustering (Oncodrive)
Protein domain and mutation hotspot analysis
Kaplan-Meier survival analysis linked to mutation status
Cohort comparison for differential mutation profiling

Curently Working On

Mutation Impact Predictor: Predict whether mutation likely disrupts viral structure.

🛠️ Technical Skills

Languages: Python, R, Bash, SQL

Bioinformatics: RNA-seq analysis, Single-cell sequencing, Differential expression, Gene annotation, Phylogenetics

Machine Learning: TensorFlow, Keras, scikit-learn, CNN architectures, Transfer learning

Data Analysis: Pandas, NumPy, DESeq2, biomaRt, mygene

Visualization: Matplotlib, Seaborn, ggplot2, ComplexHeatmap, Plotly

Tools: Git, Docker, Jupyter, RStudio, Quarto, SLURM

📫 Let's Connect

I'm always interested in collaborating on computational biology projects, discussing bioinformatics challenges, or exploring new applications of machine learning in healthcare.

📧 Email: elayavalli.a@northeastern.edu
💼 LinkedIn: linkedin.com/in/adityaelayavallia368a8158
🐙 GitHub: @adihusky99

"Transforming biological complexity into computational clarity"