adihusky99 - Overview

Hi there, I'm Aditya Elayavalli ๐Ÿ‘‹

Computational Scientist | Bioinformatics Researcher | Machine Learning Enthusiast

๐Ÿ“ Cambridge, MA/ Leander, TX | ๐Ÿ“ง elayavalli.a@northeastern.edu | ๐Ÿ’ผ LinkedIn


๐Ÿงฌ About Me

I'm a computational scientist specializing in bioinformatics with a passion for translating complex biological datasets into actionable insights. My work combines deep learning, statistical modeling, and data visualization to advance cancer research and genomic analysis.

  • ๐ŸŒฑ Learning advanced ensemble methods and explainable AI techniques
  • ๐Ÿ‘ฏ Looking to collaborate on open-source bioinformatics tools and medical AI projects
  • ๐Ÿ“ซ How to reach me: elayavalli.a@northeastern.edu

๐Ÿš€ Featured Projects

๐Ÿฉธ Blood Cancer Image Detection

Deep learning CNN model for automated classification of blood cells across 8 cell types, specifically identifying abnormal cells (immature granulocytes, erythroblasts) that may indicate leukemia or blood disorders.

Tech Stack: TensorFlow Keras OpenCV scikit-learn Python

Highlights:

  • 4-block CNN architecture with batch normalization
  • Data augmentation for robust training
  • Automated cancer cell detection with confidence scoring
  • Comprehensive reporting with visualization grids

๐Ÿงฌ RNA-seq Differential Expression Pipeline

Complete bioinformatics pipeline for analyzing cytokine and chemokine gene expression from RNA-seq data, converting mouse Ensembl IDs to gene symbols and performing treatment-focused differential expression analysis.

Tech Stack: mygene Pandas Matplotlib Seaborn SciPy Python

Highlights:

  • 98.4% gene ID conversion success rate
  • Automated PCA and heatmap generation
  • FDR-adjusted statistical comparisons
  • Cross-tissue treatment effect analysis

๐Ÿง  Single-Cell RNA-seq Cancer Analysis

Comprehensive exploratory analysis of CNS cancer cell lines from CCLE, identifying three distinct molecular subtypes through unsupervised machine learning.

Tech Stack: DESeq2 biomaRt R ComplexHeatmap ggplot2

Highlights:

  • Analyzed 63 CNS cancer cell lines across ~19,000 genes
  • Identified ECM/Stromal-like, Neural/Glial-like, and Intermediate subtypes
  • Variance stabilizing transformation (VST) normalization
  • Differential expression with negative binomial modeling

๐Ÿงช Computational Phylogenetics Toolkit

Suite of bioinformatics tools including Markov chain sequence classifiers, HMM-based ORF simulators, UPGMA phylogenetic tree construction, and coding/non-coding gene prediction models.

Tech Stack: Python Biopython NumPy SciPy

Highlights:

  • First-order Markov models for sequence classification
  • Hidden Markov Models for gene structure simulation
  • ROC curve analysis for pseudogene prediction
  • Phylogenetic comparative analysis

๐Ÿ“Š Data Science Practicum Projects

Collection of advanced data science and machine learning projects from DA 5020 and DA 5030 coursework, including diabetes prediction models and comprehensive statistical analyses. Tech Stack: R RMarkdown tidyverse caret Machine Learning Key Projects:

Diabetes Prediction Model: Machine learning classification with feature engineering and model evaluation Statistical Analysis Pipeline: Comprehensive EDA, hypothesis testing, and predictive modeling Data Wrangling & Visualization: Advanced R programming for data manipulation and insights

Highlights:

  • End-to-end data science workflows from raw data to deployment-ready models
  • Statistical inference and hypothesis testing
  • Model validation and performance optimization
  • Professional reporting with RMarkdown

๐Ÿฆ  Viral Proteome Amino Acid Analysis

Python-based bioinformatics analysis of the herpesvirus proteome to quantify and compare amino acid composition across viral structural proteins (capsid, envelope, membrane).

Tech Stack: Python NumPy Matplotlib JSON Jupyter Notebook

Highlights:

  • Automated parsing of viral genome JSON to extract protein sequences
  • Global amino acid frequency analysis across full viral proteome
  • Comparative analysis of envelope, membrane, and capsid proteins
  • Visualization of amino acid distributions using bar plots
  • Identification of enriched/depleted amino acids by viral structure
  • Reproducible workflow for viral proteomics and computational biology analysis

๐Ÿงฌ Cancer Genomics MAF Analysis Pipeline

Complete cancer genomics workflow for reading, summarizing, visualizing, and analyzing somatic mutation data from MAF files, integrating clinical information and identifying key mutation patterns, driver genes, and survival-associated biomarkers.

Tech Stack: R maftools Bioconductor ggplot2 Survival Analysis Cancer Genomics

Highlights:

  • End-to-end MAF file processing and summarization
  • Automated oncoplots, rainfall plots, and mutation burden visualization
  • Detection of co-occurring and mutually exclusive mutations
  • Driver gene identification using positional clustering (Oncodrive)
  • Protein domain and mutation hotspot analysis
  • Kaplan-Meier survival analysis linked to mutation status
  • Cohort comparison for differential mutation profiling

Curently Working On

  1. Mutation Impact Predictor: Predict whether mutation likely disrupts viral structure.

๐Ÿ› ๏ธ Technical Skills

Languages: Python, R, Bash, SQL

Bioinformatics: RNA-seq analysis, Single-cell sequencing, Differential expression, Gene annotation, Phylogenetics

Machine Learning: TensorFlow, Keras, scikit-learn, CNN architectures, Transfer learning

Data Analysis: Pandas, NumPy, DESeq2, biomaRt, mygene

Visualization: Matplotlib, Seaborn, ggplot2, ComplexHeatmap, Plotly

Tools: Git, Docker, Jupyter, RStudio, Quarto, SLURM


๐Ÿ“ซ Let's Connect

I'm always interested in collaborating on computational biology projects, discussing bioinformatics challenges, or exploring new applications of machine learning in healthcare.


"Transforming biological complexity into computational clarity"