Pages for Sondra

Main Content

Doctor of Philosophy in Biostatistics & Data Science

The Doctor of Philosophy (PhD) program in Biostatistics & Data Science will prepare graduates to conduct cutting-edge research, teach the next generation of biostatisticians and data scientists, and collaborate with basic research scientists, clinicians, epidemiologists, and population and public health organizations. This program synergizes competencies in statistics, computer science, and epidemiology, a critical combination of skills for analyzing increasingly complex health-related data. The target audiences include college graduates and professionals from mathematics, statistics, biology, computer science, or math-intensive fields (e.g. engineering) who have completed courses in calculus (through multivariable integration and differentiation) and linear algebra. Enrolled students will be able to complete the doctoral program in 5 years, earning a total of 60 credit hours and a master of science (MS) in Biostatistics & Data Science along the way.

The primary objective of the program is to educate students on statistical theory, practical data analysis, big data management and manipulation, and communication to the scientific and general community. These are crucial skills biostatisticians and data scientists use daily in their service and research roles in academic, government, industrial, population health, or other health-related organizations. The doctoral program offers three areas of emphasis: 1) Biostatistics; 2) Bioinformatics & Genomics; and 3) Data Science. Students must demonstrate basic competencies in all three emphasis areas through coursework and qualifying exams. However, students must additionally complete a dissertation expanding knowledge in one or more emphasis areas. Students selecting the Biostatistics track as their primary emphasis area will be expected to develop new statistical methods to accurately interpret biomedical and population health data. Students completing the Bioinformatics & Genomics track will be equipped to analyze a broad range of biological data (including genomics, transcriptomics, proteomics, metabolomics, and epigenomics) to investigate the molecular and environmental basis of human health traits and diseases. Students completing the Data Science track will be able to create systems to turn vast amounts of data into actionable evidence, requiring additional knowledge in computer science, data mining, applied mathematics, predictive analytics, and data visualization.

The doctoral course of study includes supervised consulting, internships, and the aforementioned dissertation, offering students ample opportunities to work with high-quality data and reputable researchers from two epidemiologic studies supported by the National Institutes of Health. The Jackson Heart Study (JHS) is the largest ever single-site study of cardiovascular disease and its causes in African-Americans. The Atherosclerosis Risk in Communities study (ARIC) is designed to investigate the causes of atherosclerosis and its clinical outcomes, as well as the variation in cardiovascular risk factors and disease by race, gender, and location.

Program goals

Graduates of the program will be able to:

  • Efficiently collect, clean, organize, and appropriately analyze biomedical, clinical, and population health data;
  • Use standard statistical (R, SAS, and Stata) and computer (Python) programming languages to reproducibly explore and visualize data, fit models, conduct inference, and translate analysis results;
  • Conduct all facets of big data analysis, including the extraction, storage, manipulation, and analysis of massive genetic and bioinformatics datasets;
  • Convert information contained in databases and data warehouses into actionable findings using machine learning and other data science techniques;
  • Adhere to rigorous ethical and methodological standards when analyzing real-world data;
  • Collaborate with non-statisticians and communicate findings to the scientific and general community to improve health care and prevent disease.

Admission criteria

Admission into the Bower School of Population Health is based upon a baccalaureate degree in a relevant scientific discipline, past academic performance in undergraduate and graduate (if applicable) degree programs (prefer a grade point average ≥ 3.0 on a 4.0 scale), official scores on the Graduate Record Examination (GRE; prefer ≥ 300 on the combined verbal and quantitative sections), three letters of recommendation, and a personal statement. Applicants whose native language is not English and/or who have completed their tertiary education primarily outside of the USA must demonstrate proficiency in written and spoken English through the Test of English as a Foreign Language (TOEFL), International English Language Testing System (IELTS), or Pearson Test of English-Academic (PTE-A). This requirement may be waived for students who are currently enrolled at a college or university in the United States and/or who demonstrate a proficiency in written and spoken English following a personal interview.  

Students matriculating into the doctoral program in Biostatistics & Data Science must have documented training in calculus (covering multiple integration and differentiation) and linear algebra. Additional training in statistical or computer programming languages is preferred. Applicants may submit code exhibiting their knowledge in a statistical or computer programming language and/or slides presenting a completed data analysis project. These materials are optional but may strengthen the overall application.

Plan of study

Year 1 - Fall
BDS 721Analytics3
BDS 741Statistical Inference I3
BDS XXXPrinciples of Programming3
Year 1 - Spring
BDS 722Advanced Analytics3
BDS 723Statistical Computation3
BDS 751Statistical Inference in Genetics3
Year 2 - Summer
BDS 797Biostatistics & Data Science Internship1
Year 2 - Fall 
BDS 725Survival Analysis3
BDS 765Advanced Machine Learning3
BDS 792Statistical Consulting1
MSCI 710Epidemiology I3
Year 2 - Spring
BDS 724Longitudinal and Multilevel Models3
BDS 761Data Science3
BDS XXXStudy Design3
ID 709Responsible Conduct of Research1
Year 3 - Summer
BDS 797Biostatistics & Data Science Internship1
Year 3 - Fall
BDS 798Dissertation Research1
PHS 700Essentials of Population Health Science3
Year 3 - Spring
BDS 798Dissertation Research1
ID 714Professional Skills3
Year 4 - Summer
BDS 797Biostatistics & Data Science Internship1
Year 4 - Fall 
BDS 798Dissertation Research1
Year 4 - Spring
BDS 798Dissertation Research1
Year 5 - Summer
BDS 797Biostatistics & Data Science Internship1
Year 5 - Fall
BDS 798Dissertation Research1
Year 5 - Spring
BDS 798Dissertation Research1
*Elective courses will be chosen from the courses offered in the Data Science or other UMMC graduate degree departments upon approval of the program director.

Course descriptions

  • BDS 711. Statistical Methods in Research. Provides an introduction to selected important topics in statistical concepts and reasoning. This course represents an introduction to the field and provides a survey of data types and analysis techniques. Specific topics include applications of statistical techniques such as point and interval estimation, hypothesis testing (tests of significance), correlation and regression, relative risks and odds ratios, sample size/power calculations, and study designs. While the course emphasizes interpretation and concepts, there are also formulae and computational elements such that upon completion, class participants have gained real world applied skills. Traditional Lecture (3 credit hours)
  • BDS 713. Intro to Data Management and Programming. Provides an introduction to programming and data management. The course will focus on planning and organizing programs to handle and process data, as well as the grammar of particular programming languages. Traditional Lecture (3 credit hours)
  • BDS 714. Statistical Methods for Clinical Trials. Provides a basic understanding of the statistical concepts important in the design, conduct and analysis of clinical trials. Traditional Lecture (3 credit hours)
  • BDS 715. Intro to Sample Survey Analyses. Provides an introduction to statistical concepts in the design and analyses of sample surveys. Covers topics such as instrument design, sampling procedures, variance estimation, reliability, validity, scaling and scoring, complex samples and weighting procedures. Traditional Lecture (3 credit hours)
  • BDS 721.Analytics. Provides an introduction to basic statistical and data analytic methods. This course covers topics such as data archetypes; exploratory data analysis; basic statistical paradigms including frequentist, likelihood and Bayesian approaches; contingency tables, sampling distributions, the Central Limit Theorem, point and interval estimation, sufficiency, tests of statistical significance including large sample, likelihood ratio and resampling approaches, basic random variable linear combinations, ANOVA, correlation, linear regression, logistic regression and Poisson regression. Course content will be delivered through lectures, hands-on lab instruction and team-based learning using statistical packages including R, SAS and Stata. Traditional Lecture (3 credit hours)
  • BDS 722. Advanced Analytics. Continues introductions to intermediate and advanced statistical analysis methods for biomedical research. This course covers advanced regression topics, generalized linear models (GLM), generalized additive models (GAM), splines and smoothing techniques, decision trees, basic survival models, and introduces machine learning techniques (clustering, classification, regularization/penalized regression, feature selection, Bayesian methods, and unbiased estimators). Course content will be delivered through lectures and hands-on lab instruction.Traditional Lecture (3 credit hours)
  • BDS 723. Statistical Computation. This course is designed to provide students with an introduction to statistical computing. Students will learn the core ideas of programming — functions, objects, data structures, flow control, input and output, debugging, logical design and abstraction — through writing code to assist in numerical and graphical statistical analyses. Students will learn descriptive statistics, graphical presentation, estimation (EM algorithm), and computational methods for optimization. This course will emphasize the learning of statistical methods and concepts through hands-on experience with real data. Since code is also an important form of communication among scientists, students will learn how to comment and organize code. Traditional Lecture (3 credit hours)
  • BDS 724. Longitudinal and Multilevel Models. Covers statistical models for drawing scientific inferences from clustered/correlated data such as longitudinal and multilevel data. Topics include longitudinal study design; exploring clustered data; linear and generalized linear regression models for correlated data, including marginal, random effects, and transition models; and handling missing data. Traditional Lecture (3 credit hours)
  • BDS 725. Survival Analysis. This course will give an overview of modern survival analysis methods. Topics included are survival functions, hazard functions, censoring and truncation, competing risks, estimation of survival and related functions, hypothesis testing and semi-parametric regression methods with survival data. Traditional Lecture (3 credit hours)
  • BDS 726. Generalized Linear Models. Provides a foundation in the theory and application of generalized linear models and related statistical topics. A generalized linear model (GLM) is characterized by (1) a response variable with a distribution in an exponential dispersion family and (2) a mean response related to linear combinations of covariates through a link function. GLMs allow a unified theory for many of the models used in statistical practice, including normal theory regression and ANOVA models, many categorical data models including logit and probit models for binary data, loglinear models, and models for gamma responses and survival data. Traditional Lecture (3 credit hours)
  • BDS 727. Nonparametric Analyses. Provides an introduction to modern topics in nonparametric data analysis for estimation and inference. Topics include kernel estimation, rank based methods, nonparametric regression, confidence sets and random processes. Methodology and theory are presented together. Traditional Lecture (3 credit hours)
  • BDS 728. Multivariate Analysis. Provides an introduction of the analysis of multivariate data, balancing theory, implementation and translation of these methods. Topics covered include matrix computations, visualization techniques, the multivariate normal distribution, MANOVA, principal components analysis, factor analysis, and other clustering techniques. Traditional Lecture (3 credit hours)
  • BDS 741.Statistical Inference I. Introduces probability and distribution theory, including axioms of probability; random variables; probability mass and density functions; common discrete and continuous distributions; transformations and sums of random variables; expectations, variances, and moments; hierarchical models and mixture distributions; and properties of random samples. Traditional Lecture (3 credit hours)
  • BDS 742. Statistical Inference II. This course is a continuation of Statistical Inference I and continues to introduce modern statistical theory and principles of inference based on decision theory and likelihood (evidence) theory. Traditional Lecture (3 credit hours)
  • BDS 743. Theory of Linear Models. Provides an introduction to the development and use of general linear models including frameworks for parameter estimation and inference in a variety of settings. Theoretical foundations of the models will be reinforced with areas in which the models are applied to answer scientific questions. Topics covered include matrix algebra, distribution theory for quadratic forms of normal random vectors, properties of OLS estimators, estimable functions and related themes. Traditional Lecture (3 credit hours)
  • BDS 750. Study Design. This course will equip doctoral-level biostatisticians and data scientists with the skills necessary to participate in the planning and analysis of biomedical, clinical, and population-based health studies. This course will cover a wide array of study designs, one and two-way classifications, nesting, blocking, factorial designs, multiple comparisons, confounding, power, sample size, and selected issues (randomization, blindness, adherence, dropout, phases) from clinical trials. Traditional Lecture (3 credits).
  • BDS 751. Statistical Inference in Genetics. This course will present fundamental theoretical concepts and statistical inference with emphasis on genetic epidemiology research for common human diseases. Five modules will be covered, including an introduction to statistical inference methods used on genetic data, familial aggregation methods, segregation analysis, linkage analysis, and testing associations between genetic variants and disease. Traditional Lecture (3 credit hours)
  • BDS 752. Advanced Statistical Genetics. An advanced course on modeling and methodology in statistical genetics for human diseases and traits. The course will cover topics including linkage analysis, population structure and stratification, admixture mapping, heritability and genetic risk prediction, familial aggregation, association analysis and others. On successful completion, participants will have the skills to develop and apply statistical methods towards a variety of genetic questions. Traditional Lecture (3 credit hours)
  • BDS 753. Bioinformatics. Provides an introduction to selected important topics in Bioinformatics. The course focuses on integrating bioinformatics resources with basic biology and clinical applications to enhance research in population health.  Includes methods for analysis of high-throughput NGS data, understanding bioinformatics databases in precision medicine and population health. Covers common programs and algorithms for sequence alignment, evolutionary tree construction, database searching, functional interpretation for expressed genes, and finding mutations in DNA for human disease. Traditional Lecture (3 credit hours)
  • BDS 754. Principles of Programming. This course will introduce fundamental programming concepts such as data structures and algorithms, object oriented programming, and the basics of building interactive applications in the python programming language. Traditional Lecture (3 credits).
  • BDS 761. Data Science. Provides a modern introduction to data science, including data wrangling and dynamic data visualization processes, while reinforcing advanced analytics reproducible research and applied statistical methods. Course content will be delivered through lectures and hands-on lab instruction. Traditional Lecture (3 credit hours)
  • BDS 762. Advanced Data Science. Provides a continuation into advanced Data Science topics with deeper programming and additional concepts. Topics include simulation, bootstrap, prediction, machine learning, and tool development. Course content will be delivered through lectures and hands-on lab instruction. Traditional Lecture (3 credit hours)
  • BDS 763. Database Systems. Review of database systems with special emphasis on data description and manipulation languages; data normalization; functional dependencies; database design; data integrity and security; distributed data processing; design and implementation of a comprehensive project. Traditional Lecture (3 credit hours)
  • BDS 764. Data Visualization. Provides an introduction to principles and techniques for creating effective interactive visualizations of quantitative information. Primary topics include principles for designing effective visualizations and implementing interactive visualizations using web-based frameworks. Traditional Lecture (3 credit hours)
  • BDS 765. Advanced Machine Learning. This course introduces students to the basic theories, concepts, and techniques of machine learning and gives them a glimpse of the state-of-the-art methods in this area. Topics covered include Bayesian estimation and decision theory, maximum likelihood estimation, nonparametric techniques, linear discriminant analysis, computational learning theory, support vector machines and kernel methods, boosting, clustering, dimensional reduction, and deep learning. Traditional Lecture (3 credit hours)
  • BDS 766. Advanced Computational Methods. Provides a blend of software engineering, stochastic processes and optimization for creating and deploying efficient analytic tools. Topics covered include software engineering paradigms, robust software design, data structure, object oriented design, parallel computations, and distributed computing, with a focus on implementation. Traditional Lecture (3 credit hours)
  • BDS 791. Special Topics. This course is intended to meet special needs of individual students. Students who wish to learn more about a particular topic can approach a mentor to determine an advanced course of study for a particular topic. The structure of an individual course is decided upon by the individual course instructor with approval from the program committee. Traditional Independent Study (1-9 credit hours)
  • BDS 792. Statistical Consulting. Provides hands-on training and experience in statistical consulting. Written and oral communication skills are emphasized. Ethical aspects of consulting are also discussed. Traditional Practicum/Internship (1 credit hour)
  • BDS 796. Directed Research. Provides students the opportunity to conduct research under the guidance of a faculty member from the Department of Data Science (3 credit hours).
  • BDS 797. Biostatistics & Data Science Internship. A work experience conducted in the Department of Data Science, an affiliated department, center, or institute at the University of Mississippi Medical Center, or a public or private organization. The internship is focused on the development of real world analytic, programming, and communication skills. Traditional Practicum/Internship (1 -9 credit hours)
  • BDS 798.  Dissertation Research. Research and preparation of a dissertation. Traditional Dissertation (1-9 credit hours)

Approved electives offered by other departments

  • BMS 730. Grant Writing and Management. An introduction to acquiring and managing extramural funding for sponsored projects with emphasis on NIH research grants. The following topics will be covered: searching for sponsors, including an overview of NIH funding mechanisms; grant writing, including development of specific aims and hypothesis, writing a literature review, presenting preliminary data, describing methods and timelines, and making a budget; the submission and review process; revising unsuccessful applications; starting a new laboratory; and submitting progress reports and competing continuations. Students will write and revise a grant application during this course. Traditional Lecture (2 credit hours)
  • ID 713.  Bioinformatics & Genomics. This multidisciplinary and interdepartmental course is designed to provide students in the School of Graduate Studies in the Health Sciences, and other related programs at UMMC, with sound training and knowledge in the use and application of bioinformatics tools and genomics recourses to analysis, visualization and interpretation of high-throughput "omics", genotype, proteomics, sequence, methylation and other biological data on cancer and other complex human diseases. Traditional Lecture (3 credit hours)
  • MSCI 711. Epidemiology II. This course will present and illustrate key methods used in epidemiologic research at an intermediate level. Topics will include causal inference in epidemiology, additional study designs, measures of disease frequency and association, methods to assess and handle confounding and bias, and analysis and statistical modeling in epidemiologic studies. Traditional Lecture (3 credit hours)
  • PHYSIO 725. Fundamental Physiology. A fundamental course designed to provide students with knowledge of the basic functions of the cells, tissues, organs and organ systems, and how they interrelate to accomplish the many and diverse functions of the human body. The course is intended for students whom physiology is not their primary area of study. Traditional Lecture (4 credit hours)