PhD - Biostatistics and Data Science

Main Content

PhD - Biostatistics and Data Science

The Doctor of Philosophy (PhD) program in Biostatistics & Data Science will prepare graduates to conduct cutting-edge research, teach the next generation of biostatisticians and data scientists, and collaborate with basic research scientists, clinicians, epidemiologists, and population and public health organizations. This program synergizes competencies in statistics, computer science, and epidemiology, a critical combination of skills for analyzing increasingly complex health-related data, offering three areas of emphasis:

  • Biostatistics
  • Bioinformatics & Genomics
  • Data Science

Target audiences
College graduates and professionals from:

  • Mathematics
  • Statistics
  • Biology
  • Computer science
  • Math-intensive fields (e.g. engineering) who have completed courses in calculus (through multivariable integration and differentiation) and linear algebra.

Primary Objective

Students must complete a dissertation expanding knowledge in one or more emphasis areas:

  1. Students selecting the Biostatistics track as their primary emphasis area will be expected to develop new statistical methods to accurately interpret biomedical and population health data.
  2. Students completing the Bioinformatics & Genomics track will be equipped to analyze a broad range of biological data (including genomics, transcriptomics, proteomics, metabolomics, and epigenomics) to investigate the molecular and environmental basis of human health traits and diseases.
  3. Students completing the Data Science track will be able to create systems to turn vast amounts of data into actionable evidence, requiring additional knowledge in computer science, data mining, applied mathematics, predictive analytics, and data visualization.

The doctoral course of study includes supervised consulting, internships, and the aforementioned dissertation, offering students ample opportunities to work with high-quality data and reputable researchers from two epidemiologic studies supported by the National Institutes of Health. The Jackson Heart Study (JHS) is the largest ever single-site study of cardiovascular disease and its causes in African-Americans. The Atherosclerosis Risk in Communities study (ARIC) is designed to investigate the causes of atherosclerosis and its clinical outcomes, as well as the variation in cardiovascular risk factors and disease by race, gender, and location.

Graduates of the program will be able to:

  • Efficiently collect, clean, organize, and appropriately analyze biomedical, clinical, and population health data;
  • Use standard statistical (R, SAS, and Stata) and computer (Python) programming languages to reproducibly explore and visualize data, fit models, conduct inference, and translate analysis results;
  • Conduct all facets of big data analysis, including the extraction, storage, manipulation, and analysis of massive genetic and bioinformatics datasets;
  • Convert information contained in databases and data warehouses into actionable findings using machine learning and other data science techniques;
  • Adhere to rigorous ethical and methodological standards when analyzing real-world data;
  • Collaborate with non-statisticians and communicate findings to the scientific and general community to improve health care and prevent disease;
  • Lead cutting-edge methodological, genetic epidemiological, or data science research;
  • Act as a consummate resource in the design, analysis, and interpretation of a wide array of studies.

You may complete the doctoral program in 5 years (60 credit hours) and earn a master of science (MS) in Biostatistics & Data Science along the way.