PhD - Biostatistics and Data Science

Main Content

PhD - Biostatistics and Data Science

The Doctor of Philosophy (PhD) program in Biostatistics & Data Science will prepare graduates to conduct cutting-edge research, teach the next generation of biostatisticians and data scientists, and collaborate with basic research scientists, clinicians, epidemiologists, and population and public health organizations. This program synergizes competencies in statistics, computer science, and epidemiology, a critical combination of skills for analyzing increasingly complex health-related data. The target audiences include college graduates and professionals from mathematics, statistics, biology, computer science, or math-intensive fields (e.g. engineering) who have completed courses in calculus (through multivariable integration and differentiation) and linear algebra. Enrolled students will be able to complete the doctoral program in 5 years, earning a total of 60 credit hours and a master of science (MS) in Biostatistics & Data Science along the way.

The primary objective of the program is to educate students on statistical theory, practical data analysis, big data management and manipulation, and communication to the scientific and general community. These reflect vital competencies that all biostatisticians and data scientists must master in order to support basic science, clinical, and population health studies and to conduct independent methodological or epidemiological research. The doctoral program offers three areas of emphasis: 1) Biostatistics; 2) Bioinformatics & Genomics; and 3) Data Science. Students must demonstrate basic competencies in all three emphasis areas through coursework and qualifying exams. Students must additionally complete a dissertation expanding knowledge in one or more emphasis areas. Students selecting the Biostatistics track as their primary emphasis area will be expected to develop new statistical methods to accurately interpret biomedical and population health data. Students completing the Bioinformatics & Genomics track will be equipped to analyze a broad range of biological data (including genomics, transcriptomics, proteomics, metabolomics, and epigenomics) to investigate the molecular and environmental basis of human health traits and diseases. Students completing the Data Science track will be able to create systems to turn vast amounts of data into actionable evidence, requiring additional knowledge in computer science, data mining, applied mathematics, predictive analytics, and data visualization.

The doctoral course of study includes supervised consulting, internships, and the aforementioned dissertation, offering students ample opportunities to work with high-quality data and reputable researchers from two epidemiologic studies supported by the National Institutes of Health. The Jackson Heart Study (JHS) is the largest ever single-site study of cardiovascular disease and its causes in African-Americans. The Atherosclerosis Risk in Communities study (ARIC) is designed to investigate the causes of atherosclerosis and its clinical outcomes, as well as the variation in cardiovascular risk factors and disease by race, gender, and location.

Program goals

Graduates of the program will be able to:
  • Efficiently collect, clean, organize, and appropriately analyze biomedical, clinical, and population health data;
  • Use standard statistical (R, SAS, and Stata) and computer (Python) programming languages to reproducibly explore and visualize data, fit models, conduct inference, and translate analysis results;
  • Conduct all facets of big data analysis, including the extraction, storage, manipulation, and analysis of massive genetic and bioinformatics datasets;
  • Convert information contained in databases and data warehouses into actionable findings using machine learning and other data science techniques;
  • Adhere to rigorous ethical and methodological standards when analyzing real-world data;
  • Collaborate with non-statisticians and communicate findings to the scientific and general community to improve health care and prevent disease;
  • Lead cutting-edge methodological, genetic epidemiological, or data science research;
  • Act as a consummate resource in the design, analysis, and interpretation of a wide array of studies.

Admission criteria

Admission into the Bower School of Population Health is based upon a baccalaureate degree in a relevant scientific discipline, past academic performance in undergraduate and graduate (if applicable) degree programs (prefer a grade point average ≥ 3.0 on a 4.0 scale), official scores on the Graduate Record Examination (GRE; prefer ≥ 300 on the combined verbal and quantitative sections), three letters of recommendation, and a personal statement. Applicants whose native language is not English and/or who have completed their tertiary education primarily outside of the USA must demonstrate proficiency in written and spoken English through the Test of English as a Foreign Language (TOEFL), International English Language Testing System (IELTS), or Pearson Test of English-Academic (PTE-A). This requirement may be waived for students who are currently enrolled at a college or university in the United States and/or who demonstrate a proficiency in written and spoken English following a personal interview.  

Students matriculating into the doctoral program in Biostatistics & Data Science must have documented training in calculus (covering multiple integration and differentiation) and linear algebra. Additional training in statistical or computer programming languages is preferred. Applicants may submit code exhibiting their knowledge in a statistical or computer programming language and/or slides presenting a completed data analysis project. These materials are optional but may strengthen the overall application.