Projects
Large-Scale Genomic Analysis of Psychiatric and Substance Use Disorders
Role: Postdoctoral Associate, Yale School of Medicine
Biobank-scale genomics to inform risk stratification, disease biology, and translational research in psychiatry.
Quick summary
- Lead large-scale genomic analyses using Million Veteran Program (MVP) and UK Biobank
- Apply GWAS, PheWAS, Mendelian Randomization, PRS, and Genomic SEM
- Build secure, reproducible HPC pipelines for real-world genomic data
- Translate complex genetic findings into clinician-friendly insights
Methods & Tools: R, Python, HPC, GWAS/PRS tools, Bayesian & ML methods
Outputs: Manuscripts (in preparation), grant contributions, collaborative research products
Deep dive
Situation
Psychiatric and substance use disorders are highly polygenic and heterogeneous, limiting target discovery, patient stratification, and translational interpretation without large-scale, statistically robust analyses.
Task
Design and implement scalable statistical and machine-learning pipelines to characterize genetic risk architecture and behavioral correlates using biobank-scale real-world data.
Action
- Led end-to-end analyses of MVP and UK Biobank genomic and longitudinal phenotype data
- Conducted GWAS, PheWAS, Mendelian Randomization, PRS, and Genomic SEM analyses
- Integrated frequentist, Bayesian, and machine-learning approaches to assess robustness and uncertainty
- Developed secure, reproducible HPC workflows compliant with data governance and HIPAA requirements
- Collaborated with psychiatrists, geneticists, and data scientists to align analyses with translational goals
Result
- Generated population-scale insights relevant to precision psychiatry and therapeutic development
- Contributed to manuscripts, grant proposals, and collaborative initiatives
- Delivered interpretable, decision-ready summaries to clinical collaborators
Sensitivity of Bayesian Kernel Machine Regression to Data Distribution
Role: Lead Author (PhD Dissertation Research)
Improving reliability and interpretability of Bayesian mixture models used in biomedical research.
Quick summary
- Evaluated robustness of BKMR across realistic data-generating scenarios
- Identified limitations of fixed posterior inclusion probability thresholds
- Proposed adaptive, empirically grounded thresholding strategies
Methods & Tools: Bayesian modeling, large-scale simulation, R
Outputs: Peer-reviewed journal article, arXiv preprint, conference presentations
Deep dive
Situation
BKMR is widely used for high-dimensional exposure mixture analysis, yet common defaults are applied without validation across realistic data distributions.
Task
Assess sensitivity of BKMR inference to data-generating assumptions and provide practical guidance to reduce misleading results.
Action
- Designed large-scale simulation studies varying correlation, sparsity, effect size, and distributional form
- Quantified instability in effect estimates and variable selection under misspecification
- Demonstrated that PIP ≥ 0.5 is not universally valid
- Proposed adaptive thresholding based on empirical performance
Result
- Published in Journal of Statistical Computation and Simulation
- Influenced best practices for applied Bayesian mixture modeling
- Reduced risk of false discoveries in environmental and clinical studies
simBKMRdata: Statistical Software for BKMR Evaluation
Role: Lead Developer
Open-source software enabling reproducible simulation studies for Bayesian mixture modeling.
Quick summary
- Developed CRAN R package for simulating realistic exposure-mixture data
- Enabled reproducible benchmarking of BKMR and related methods
Methods & Tools: R, package development, reproducible research workflows
Outputs: CRAN R package, software citation
Deep dive
Situation
Researchers lacked standardized, reproducible tools to evaluate BKMR under realistic exposure scenarios.
Task
Develop open-source software to support transparent and reproducible method evaluation.
Action
- Designed modular simulation functions for correlated mixtures and nonlinear effects
- Implemented documentation, examples, and reproducible workflows
- Released and maintained the package on CRAN
Result
- Lowered barriers to rigorous method validation
- Supported translation of methodological research into applied biomedical studies
CTN-0094 Data Analysis Dashboard – Research on Opioid Use Disorder
Role: Data Scientist / Biostatistician
Interactive analytics and machine learning to support personalized treatment strategies for opioid use disorder.
Quick summary
- Developed machine learning models to predict recovery outcomes in opioid use disorder (OUD)
- Harmonized and analyzed multi-trial clinical data across heterogeneous study designs
- Built an interactive R Shiny dashboard to explore patient characteristics and treatment outcomes
Methods & Tools: R, machine learning, R Shiny, data harmonization, clinical trial analytics
Outputs: Interactive Shiny dashboard, predictive modeling framework, analytical reports
Deep dive
Situation
Opioid Use Disorder (OUD) is a chronic brain disease affecting over 2 million individuals in the United States. Clinical trials generate rich data, but heterogeneity across studies limits integrated analysis and personalized treatment insights.
Task
Develop a unified analytic framework and interactive dashboard to predict recovery outcomes and support data-driven, personalized treatment strategies for OUD.
Action
- Analyzed and harmonized clinical trial data from CTN-0027 (n = 1,269), CTN-0030 (n = 653), and CTN-0051 (n = 570).
- Standardized variables across trials to enable consistent modeling of patient characteristics and treatment outcomes.
- Developed machine learning models to predict recovery trajectories and treatment response.
- Built an interactive R Shiny application to allow researchers to explore outcomes, covariate patterns, and model results dynamically.
Result
- Enabled integrated, cross-trial analysis of OUD treatment outcomes at scale.
- Provided an interactive decision-support tool for researchers to explore recovery predictors.
- Demonstrated the value of combining machine learning with clinical trial data to inform personalized treatment approaches.
Statistical Modeling & Simulation for Drug Development
Role: Computational Modeling & Simulation Intern, Johnson & Johnson
Applying quantitative modeling to support clinical and development decisions.
Quick summary
- Supported modeling and simulation across preclinical and clinical stages
- Contributed to trial design and biomarker evaluation efforts
Methods & Tools: R, Python, modeling & simulation frameworks
Outputs: Internal analytical tools, technical reports
Deep dive
Situation
Drug development decisions require integrating evidence across stages under uncertainty.
Task
Support modeling and simulation workflows that inform trial design and development strategy.
Action
- Developed computational tools supporting modeling workflows
- Assisted in analysis of clinical trial and pharmacokinetic data
- Conducted targeted literature reviews on oncology biomarkers
- Collaborated with quantitative and clinical scientists
Result
- Contributed to internal analyses supporting development decisions
- Strengthened cross-functional communication and translation
Bayesian Analysis of Heavy Metal Mixtures and Cardiovascular Risk
Role: Statistical Contributor / Co-Author
Quantifying joint environmental exposure effects using Bayesian mixture models.
Quick summary
- Modeled correlated heavy metal exposures using BKMR
- Assessed joint effects on heart attack risk with uncertainty-aware inference
Methods & Tools: Bayesian modeling, BKMR, R
Outputs: Research Square preprint, conference presentations
Deep dive
Situation
Single-exposure models fail to capture combined effects of correlated environmental toxicants.
Task
Quantify joint effects of heavy metal mixtures on cardiovascular risk using Bayesian methods.
Action
- Applied BKMR to blood and urine heavy metal biomarker data
- Conducted sensitivity analyses and diagnostic checks
- Collaborated with epidemiologists and clinicians on interpretation
Result
- Advanced understanding of environmental cardiovascular risk
- Demonstrated applied utility of Bayesian mixture models
Mortality Modeling Under Stochastic Frailty
Role: Lead Author (MS Thesis)
Modeling unobserved heterogeneity in survival and mortality data.
Quick summary
- Developed stochastic frailty survival models
- Improved representation of latent mortality risk
Methods & Tools: Survival analysis, stochastic processes, R
Outputs: Peer-reviewed journal article
Deep dive
Situation
Standard survival models underestimate risk variation when unobserved heterogeneity exists.
Task
Develop and evaluate stochastic frailty models for mortality data.
Action
- Formulated stochastic frailty survival models
- Studied theoretical properties under varying assumptions
- Applied models to real mortality datasets
Result
- Published in Missouri Journal of Mathematical Sciences
- Established strong foundations in theory-driven modeling
COVID-19 Lockdown Policy: Economy and Mental Health Trade-offs
Role: Co-Author / Applied Analyst
Quantifying policy trade-offs during a global public health crisis.
Quick summary
- Evaluated economic and mental health impacts of lockdown policies
- Supported evidence-based policy discussion
Methods & Tools: Statistical modeling, applied data analysis
Outputs: Peer-reviewed journal article
Deep dive
Situation
COVID-19 lockdowns created competing economic and mental health pressures.
Task
Quantify trade-offs between economic outcomes and mental health indicators.
Action
- Analyzed population-level economic and mental health data
- Applied statistical models to compare policy scenarios
Result
- Published in Journal of Biomedical Analytics
- Informed data-driven public health policy discussions
Data Visualization for Climate & Public Health Risk
Role: Research Assistant, University of Dhaka
Dashboards and visual analytics for communicating climate-related health risks.
Quick summary
- Built dashboards and visual analytics for public health impact studies
- Designed outputs for non-technical stakeholders
Methods & Tools: R, data visualization, dashboard development
Outputs: Interactive dashboards, analytical reports
Deep dive
Situation
Public health stakeholders needed clear, interpretable views of climate-related health risks.
Task
Translate complex data into accessible visual summaries.
Action
- Developed dashboards and visual analytics
- Supported climate–health risk assessments with statistical input
- Collaborated with interdisciplinary teams
Result
- Improved accessibility of public health findings
- Supported evidence-based communication and reporting