About

I am an assistant professor at the Department of Biostatistics, University of Pittsburgh (since 2018). I received my Ph.D. in Biostatistics from the University of Michigan. Before that, I got my B.A. in Mathematics and M.S. in Statistics from the University of Virginia.

My research lies at the intersection of biostatistics and machine learning, with a broad goal of promoting and propelling health data science. I am particularly interested in developing statistical methods for integrative data analysis that combines data sets from multiple sources or knowledge of different types to achieve higher precision and power. With this in mind, my current research program focuses on developing methods that support regression, prediction and decision making based on large scale (distributed) data sets. I also develop data processing tools for analyzing high-dimensional data. Most of my work is inspired by and closely related to applications in bioinformatics, clinical trials, electronic health records, environmental health sciences, fairness and disparity, and health policies.

  • Data integration and meta-analysis
  • Unsupervised learning and subgroup analysis
  • High-dimensional data analysis
  • Longitudinal data analysis
  • Causal inference and precision medicine

Education

  • University of Michigan - Ph.D. in Biostatistics (2018)
  • University of Virginia - M.S. in Statistics (2013)
  • University of Virginia - B.A. in Mathematics (2012)
  • Sun Yat-sen University - Information and Computational Science (2008-2010)

Selected Publications

See my Google Scholar page for the complete list and citation metrics.
* student first author, + corresponding author

Methods
  • A tree-based model averaging approach for personalized treatment effect estimation from heterogeneous data sources
    [Link] -- Tan, X.*, Chang, C.H., Zhou, L., and Tang, L.+
    2022 -- International Conference on Machine Learning (ICML 2022)
  • Method of contraction-expansion (MOCE) for simultaneous inference in linear models
    [Link] -- Wang, F., Zhou, L., Tang, L., and Song, P.X.
    2021 -- Journal of Machine Learning Research
  • A sparse negative binomial mixture model for clustering RNA-seq count data
    [Link] -- Li, Y., Rahman, T., Ma, T., Tang, L., and Tseng, G.
    2021 -- Biostatistics
  • Post‐stratification fusion learning in longitudinal data analysis
    [Link] -- Tang, L.+, and Song, P.X.
    2021 -- Biometrics
  • An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China (with discussion)
    [Link] -- Wang, L., Zhou, Y., He, J., Zhu, B., Wang, F., Tang, L., Kleinsasser, M., Barker, D., Eisenberg, M.C., and Song, P.X.
    2020 -- Journal of Data Science
  • Distributed simultaneous inference in generalized linear models via confidence distribution
    [Link] -- Tang, L., Zhou, L., and Song, P.X.
    2020 -- Journal of Multivariate Analysis
  • Fused lasso approach in regression coefficients clustering -- learning parameter heterogeneity in data integration
    [Link] -- Tang, L., and Song, P.X.
    2016 -- Journal of Machine Learning Research
Applications
  • Duration of medication treatment for opioid-use disorder and risk of overdose among Medicaid enrollees in eleven states: A retrospective cohort study
    [Link] -- Burns, M., Tang, L., Chang, C.H., Kim, J.Y., Ahrens, K., Lindsay, A., Cunningham, P., Gordon, A., Jarlenski, M.P., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Talbert, J., Zivin, K., and Donohue, J.
    2022 -- Addiction
  • Use of medications for treatment of opioid use disorder among US Medicaid enrollees in 11 states, 2014-2018
    [Link] -- Donohue, J.M., Jarlenski, M., Kim, J.Y., Tang, L., Ahrens, K., Allen, L., Austin, A., Barnes, A.J., Burns, M., Chang, C.H., Clark, S., Cole, E., Crane, D., Cunningham, P., Idala, D., Junker, S., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Pauly, N., Sheets, L., Talbert, J., Zivin, K., Gordon, A.J., and Kennedy, S.
    2021 -- Journal of the American Medical Association
  • Integrative analysis of gene-specific DNA methylation and untargeted metabolomics data from the ELEMENT cohort
    [Link] -- Goodrich, J.M., Hector, E.C., Tang, L., LaBarre, J.L., Dolinoy, D.C., Mercado-Garcia, A., Cantoral, A., Song, P.X., Téllez-Rojo, M.M., and Peterson, K.E.
    2020 -- Epigenetics Insights
  • Mitochondrial nutrient utilization underlying the association between metabolites and insulin resistance in adolescents
    [Link] -- LaBarre, J.L., Peterson, K.E., Kachman, M.T., Perng, W., Tang, L., Hao, W., Zhou, L., Karnovsky, A., Cantoral, A., Téllez-Rojo, M.M., Song, P.X., and Burant, C.F.
    2020 -- The Journal of Clinical Endocrinology & Metabolism
  • Urate and nonanoate mark the relationship between sugar-sweetened beverage intake and blood pressure in adolescent girls: A metabolomics analysis in the ELEMENT cohort.
    [Link] -- Perng, W., Tang, L., Song, P.X., Goran, M., Tellez-Rojo, M.M., Cantoral, A., and Peterson, K.E.
    2019 -- Metabolites
  • Metabolomic profiles and development of metabolic risk during the pubertal transition: a prospective study in the ELEMENT project
    [Link] -- Perng, W., Tang, L., Song, P.X., Tellez-Rojo, M.M., Cantoral, A., Peterson, K.E.
    2019 -- Pediatric Research
  • Lipid Metabolism is a key mediator of developmental epigenetic programming
    [Link] -- Marchlewicz, E.H., Dolinoy, D.C., Tang, L., Milewski, S., Jones, T.R., Goodrich, J.M., Soni, T., Domino, S.E., Song, P.X., Burant, C. and Padmanabhan, V.
    2016 -- Scientific Reports

Software

  • ifedtree: tree-based federated learning for heterogeneous model harmonization
    The package allows harmonizing models derived from heterogeneous data sources to boost the power of the current study, without the need for individual-level data from the other sources. It also provides visualization of the heterogeneous association stucture across studies. [GitHub]
  • metafuse: fused lasso approach for data integration
    The package allows detection of heterogeneous effects across multiple independent datasets when analyzed jointly. It provides visualization of covariate-specific effect subgrouping via dendrograms, and enables variable selection. [CRAN]
  • modac: method of divide-and-combine for penalized GLM
    Map-reduce functions in Python for fitting GLM when a dataset is large and stored on distributed Hadoop clusters. The method provides stable inference. [GitHub]
  • eSIR: extended SIR (Susceptible-Infectious-Removed) model
    R package of an epidemiological forecast model for assessing interventions based on COVID-19 data. [GitHub]
  • pgee: R implementation of penalized GEE with LASSO, SCAD and MCP [GitHub]

Students

Current Students
Past Students
  • [PhD] Peng Liu (co-advise with George Tseng)
  • [MS] Liling Lu
  • [MS] Jason N. Kennedy (co-advise with Jeanine Buchanich)
  • [MS] Zhuxuan Fu
  • [MS] Ruishen Lyu

Miscellaneous

  • Outside of work, I like to swim, run, and spend time with my family.
  • I got into the hobby of woodworking during the pandemic. Check here for some of my work.

This page was last modified on: 5/28/2022