Projects

Large-Scale Genomic Analysis of Psychiatric and Substance Use Disorders

Role: Postdoctoral Associate, Yale School of Medicine

Biobank-scale genomics to inform risk stratification, disease biology, and translational research in psychiatry.

Quick summary

Lead large-scale genomic analyses using Million Veteran Program (MVP) and UK Biobank
Apply GWAS, PheWAS, Mendelian Randomization, PRS, and Genomic SEM
Build secure, reproducible HPC pipelines for real-world genomic data
Translate complex genetic findings into clinician-friendly insights

Methods & Tools: R, Python, HPC, GWAS/PRS tools, Bayesian & ML methods
Outputs: Manuscripts (in preparation), grant contributions, collaborative research products

Deep dive

Situation
Psychiatric and substance use disorders are highly polygenic and heterogeneous, limiting target discovery, patient stratification, and translational interpretation without large-scale, statistically robust analyses.

Task
Design and implement scalable statistical and machine-learning pipelines to characterize genetic risk architecture and behavioral correlates using biobank-scale real-world data.

Action
- Led end-to-end analyses of MVP and UK Biobank genomic and longitudinal phenotype data
- Conducted GWAS, PheWAS, Mendelian Randomization, PRS, and Genomic SEM analyses
- Integrated frequentist, Bayesian, and machine-learning approaches to assess robustness and uncertainty
- Developed secure, reproducible HPC workflows compliant with data governance and HIPAA requirements
- Collaborated with psychiatrists, geneticists, and data scientists to align analyses with translational goals

Result
- Generated population-scale insights relevant to precision psychiatry and therapeutic development
- Contributed to manuscripts, grant proposals, and collaborative initiatives
- Delivered interpretable, decision-ready summaries to clinical collaborators

Sensitivity of Bayesian Kernel Machine Regression to Data Distribution

Role: Lead Author (PhD Dissertation Research)

Improving reliability and interpretability of Bayesian mixture models used in biomedical research.

Quick summary

Evaluated robustness of BKMR across realistic data-generating scenarios
Identified limitations of fixed posterior inclusion probability thresholds
Proposed adaptive, empirically grounded thresholding strategies

Methods & Tools: Bayesian modeling, large-scale simulation, R
Outputs: Peer-reviewed journal article, arXiv preprint, conference presentations

Deep dive

Situation
BKMR is widely used for high-dimensional exposure mixture analysis, yet common defaults are applied without validation across realistic data distributions.

Task
Assess sensitivity of BKMR inference to data-generating assumptions and provide practical guidance to reduce misleading results.

Action
- Designed large-scale simulation studies varying correlation, sparsity, effect size, and distributional form
- Quantified instability in effect estimates and variable selection under misspecification
- Demonstrated that PIP ≥ 0.5 is not universally valid
- Proposed adaptive thresholding based on empirical performance

Result
- Published in Journal of Statistical Computation and Simulation
- Influenced best practices for applied Bayesian mixture modeling
- Reduced risk of false discoveries in environmental and clinical studies

simBKMRdata: Statistical Software for BKMR Evaluation

Role: Lead Developer

Open-source software enabling reproducible simulation studies for Bayesian mixture modeling.

Quick summary

Developed CRAN R package for simulating realistic exposure-mixture data
Enabled reproducible benchmarking of BKMR and related methods

Methods & Tools: R, package development, reproducible research workflows
Outputs: CRAN R package, software citation

Deep dive

Situation
Researchers lacked standardized, reproducible tools to evaluate BKMR under realistic exposure scenarios.

Task
Develop open-source software to support transparent and reproducible method evaluation.

Action
- Designed modular simulation functions for correlated mixtures and nonlinear effects
- Implemented documentation, examples, and reproducible workflows
- Released and maintained the package on CRAN

Result
- Lowered barriers to rigorous method validation
- Supported translation of methodological research into applied biomedical studies

CTN-0094 Data Analysis Dashboard – Research on Opioid Use Disorder

Role: Data Scientist / Biostatistician

Interactive analytics and machine learning to support personalized treatment strategies for opioid use disorder.

Quick summary

Developed machine learning models to predict recovery outcomes in opioid use disorder (OUD)
Harmonized and analyzed multi-trial clinical data across heterogeneous study designs
Built an interactive R Shiny dashboard to explore patient characteristics and treatment outcomes

Methods & Tools: R, machine learning, R Shiny, data harmonization, clinical trial analytics
Outputs: Interactive Shiny dashboard, predictive modeling framework, analytical reports

Deep dive

Situation
Opioid Use Disorder (OUD) is a chronic brain disease affecting over 2 million individuals in the United States. Clinical trials generate rich data, but heterogeneity across studies limits integrated analysis and personalized treatment insights.

Task
Develop a unified analytic framework and interactive dashboard to predict recovery outcomes and support data-driven, personalized treatment strategies for OUD.

Action
- Analyzed and harmonized clinical trial data from CTN-0027 (n = 1,269), CTN-0030 (n = 653), and CTN-0051 (n = 570).
- Standardized variables across trials to enable consistent modeling of patient characteristics and treatment outcomes.
- Developed machine learning models to predict recovery trajectories and treatment response.
- Built an interactive R Shiny application to allow researchers to explore outcomes, covariate patterns, and model results dynamically.

Result
- Enabled integrated, cross-trial analysis of OUD treatment outcomes at scale.
- Provided an interactive decision-support tool for researchers to explore recovery predictors.
- Demonstrated the value of combining machine learning with clinical trial data to inform personalized treatment approaches.

Statistical Modeling & Simulation for Drug Development

Role: Computational Modeling & Simulation Intern, Johnson & Johnson

Applying quantitative modeling to support clinical and development decisions.

Quick summary

Supported modeling and simulation across preclinical and clinical stages
Contributed to trial design and biomarker evaluation efforts

Methods & Tools: R, Python, modeling & simulation frameworks
Outputs: Internal analytical tools, technical reports

Deep dive

Situation
Drug development decisions require integrating evidence across stages under uncertainty.

Task
Support modeling and simulation workflows that inform trial design and development strategy.

Action
- Developed computational tools supporting modeling workflows
- Assisted in analysis of clinical trial and pharmacokinetic data
- Conducted targeted literature reviews on oncology biomarkers
- Collaborated with quantitative and clinical scientists

Result
- Contributed to internal analyses supporting development decisions
- Strengthened cross-functional communication and translation

Bayesian Analysis of Heavy Metal Mixtures and Cardiovascular Risk

Role: Statistical Contributor / Co-Author

Quantifying joint environmental exposure effects using Bayesian mixture models.

Quick summary

Modeled correlated heavy metal exposures using BKMR
Assessed joint effects on heart attack risk with uncertainty-aware inference

Methods & Tools: Bayesian modeling, BKMR, R
Outputs: Research Square preprint, conference presentations

Deep dive

Situation
Single-exposure models fail to capture combined effects of correlated environmental toxicants.

Task
Quantify joint effects of heavy metal mixtures on cardiovascular risk using Bayesian methods.

Action
- Applied BKMR to blood and urine heavy metal biomarker data
- Conducted sensitivity analyses and diagnostic checks
- Collaborated with epidemiologists and clinicians on interpretation

Result
- Advanced understanding of environmental cardiovascular risk
- Demonstrated applied utility of Bayesian mixture models

Mortality Modeling Under Stochastic Frailty

Role: Lead Author (MS Thesis)

Modeling unobserved heterogeneity in survival and mortality data.

Quick summary

Developed stochastic frailty survival models
Improved representation of latent mortality risk

Methods & Tools: Survival analysis, stochastic processes, R
Outputs: Peer-reviewed journal article

Deep dive

Situation
Standard survival models underestimate risk variation when unobserved heterogeneity exists.

Task
Develop and evaluate stochastic frailty models for mortality data.

Action
- Formulated stochastic frailty survival models
- Studied theoretical properties under varying assumptions
- Applied models to real mortality datasets

Result
- Published in Missouri Journal of Mathematical Sciences
- Established strong foundations in theory-driven modeling

COVID-19 Lockdown Policy: Economy and Mental Health Trade-offs

Role: Co-Author / Applied Analyst

Quantifying policy trade-offs during a global public health crisis.

Quick summary

Evaluated economic and mental health impacts of lockdown policies
Supported evidence-based policy discussion

Methods & Tools: Statistical modeling, applied data analysis
Outputs: Peer-reviewed journal article

Deep dive

Situation
COVID-19 lockdowns created competing economic and mental health pressures.

Task
Quantify trade-offs between economic outcomes and mental health indicators.

Action
- Analyzed population-level economic and mental health data
- Applied statistical models to compare policy scenarios

Result
- Published in Journal of Biomedical Analytics
- Informed data-driven public health policy discussions

Data Visualization for Climate & Public Health Risk

Role: Research Assistant, University of Dhaka

Dashboards and visual analytics for communicating climate-related health risks.

Quick summary

Built dashboards and visual analytics for public health impact studies
Designed outputs for non-technical stakeholders

Methods & Tools: R, data visualization, dashboard development
Outputs: Interactive dashboards, analytical reports

Deep dive

Situation
Public health stakeholders needed clear, interpretable views of climate-related health risks.

Task
Translate complex data into accessible visual summaries.

Action
- Developed dashboards and visual analytics
- Supported climate–health risk assessments with statistical input
- Collaborated with interdisciplinary teams

Result
- Improved accessibility of public health findings
- Supported evidence-based communication and reporting