Degree Requirements

Students must complete 42 credit hours:

MSBDSMSIBS
33 credits of required core course work30 credits of required course work
6 credits of Internship or Mentored Research3 credits of Internship
3 credits of elective course work6 credits of Mentored Research or elective course work
Fall, Year 1 – Python weekly lectures (content under preparation). Lecture series is a 0-credit program requirement.3 credits of elective course work

Program Length

The MSBDS and MSIBS degrees are 18-month programs beginning in the summer semester. As a full-time student, students will graduate in the following fall semester. Part-time status is available on a case-by-case basis and students have 3 years from the start of the program to complete the degree.

MSBDS/MSIBS curriculum at-a-glance»

Academic Calendar

The 2024-25 academic year begins with an abbreviated online orientation held in late July. The summer semester course schedule is still to be determined.

Our master’s degree programs follow the Danforth Campus academic calendar. Semester dates, including holidays and breaks, can be found here.

Computer Access and Unix

All students including MSBDS, MSIBS and non-degree students are required to attend computer access and UNIX trainings. This training is offered during orientation week in the summer semester. Arrangements will be made for courses not offered during the summer.

All students are required to bring a laptop. Information on minimum specifications can be found here».

Course Descriptions

Biostatistics I

This course is designed for students who want to develop a working knowledge of basic methods in biostatistics. The course is focused on biostatistical and epidemiological concepts and on practical hints and hands-on approaches to data analysis rather than on details of the theoretical methods. We will cover basic concepts in hypothesis testing, will introduce students to several of the most widely used probability distributions, and will discuss classical statistical methods that include t-tests, chi-square tests, regression analysis, and analysis of variance. Both in-class examples and homework assignments will involve extensive use of SAS®. Auditors will not have access to the computer lab sessions. Prerequisite: M21-502 Statistical Computing with SAS® or student must have practical experience with SAS®. 

Biostatistics II

This course is designed for students who have taken Biostatistics I or the equivalent and who want to extend their knowledge of biostatistical applications to more modern and more advanced methods. Biostatistical methods to be discussed include logistic and Poisson regression, survival analysis, Cox regression analysis, and several methods for analyzing longitudinal data. Students will be introduced to modern topics that include statistical genetics and bioinformatics. The course will also discuss clinical trial design, the practicalities of sample size and power computation and meta analysis, and will ask students to read journal articles with a view towards encouraging a critical reading of the medical literature. Both in-class examples and homework assignments will involve extensive use of SAS®. Auditors will not have access to the computer lab sessions. Prerequisite: M21-560 Biostatistics I or its equivalent as judged by the course masters. 

Biomedical Data Mining

This course introduces methods and applications in biomedical data mining. Various computational and statistical methods will be introduced, e.g., data wrangling and visualization, model selection and regularization, tree-based methods. Besides common applications of the covered methods in biomedical sciences, this course will prepare students for future challenges and opportunities in data science. Prerequisites are R for Data Science, Introduction to Bioinformatics, Biostatistics I and Biostatistics II.

Computational Statistical Genetics

This course covers the theory and application of both classical and advanced algorithms for estimating parameters and testing genomic hypotheses connecting genotype to phenotype. Students learn the key methods by writing their own program to do (simplified) linkage analysis in pedigrees in SAS for a simulated dataset provided by the coursemaster. Topics covered in the course include Maximum Likelihood theory for pedigrees and unrelated individuals, Maximization routines such as Newton-Raphson and the E-M Algorithm, Path analysis, Variance components, Mixed model algorithms, the Elston-Stewart and Lander-Green Algorithms, Simulated Annealing and the Metropolis Hastings algorithm, Bayesian and MCMC methods, Hidden Markov Models, Coalescent Theory, Haplotyping Algorithms, Genetic Imputation Algorithms, Permutation/Randomization Tests, classification and Data Mining Algorithms, Population Stratification and Admixture Mapping Methods, Loss of Heterozygosity models, Gene Networks, Copy Number Variation methods, Multiple comparisons corrections and Power and Monte-carlo simulation experiments.

Ethics in Biostatistics and Data Science

This course prepares biostatisticians to analyze and address ethical and professional issues in the practice of biostatistics across the range of professional roles and responsibilities of a biostatistician. The primary goals are for biostatisticians to recognize complex situational dynamics and ethical issues in their work and to develop professional and ethical problem-solving skills. The course specifically examines ethical challenges related to research design, data collection, data management, ownership, security, and sharing, data analysis and interpretation, and data reporting and provides practical guidance on these issues. The course also examines fundamentals of the broader research environment in which biostatisticians work, including principles of ethics in human subjects and animal research, regulatory and compliance issues in biomedical research, publication and authorship, and collaboration in science. 

Fundamentals of Genetic Epidemiology

This course is designed for students to understand basic concepts, methods and analytical approaches in genetic epidemiology. Lectures cover causes of phenotypic variation, familial resemblance and heritability, Hardy-Weinberg Equilibrium, ascertainment, study designs and basic concepts in genetic segregation, linkage and association. The computer laboratory portion is designed as hands-on practice of fundamental concepts. Students will gain practical experience with various genetics computer programs (e.g. MERLIN, QTDT, and PLINK) and data QC using R-programming. Prerequisites: Must have taken the R-programming course (M21-506) or have equivalent R-programming experience, and must have experience with Unix/Linux computing environment. Auditors will not have access to the computer lab sessions. 

Human Genetic Analysis

Basic Genetic concepts: meiosis, inheritance, Hardy-Weinberg Equilibrium, Linkage, segregation analysis; Linkage analysis: definition, crossing over, map functions, phase, LOD scores, penetrance, phenocopies, liability classes, multi-point analysis, non-parametric analysis (sibpairs and pedigrees), quantitative trait analysis, determination of power for mendelian and complex trait analysis; Linkage Disequilibrium analyses: allelic association (case control designs and family bases studies), QQ and Manhattan plots, whole genome association analysis; population stratification; Quantitative Trait Analysis: measured genotypes and variance components. Hands-on computer lab experience doing parametric linkage analysis with the program LINKAGE, model free linkage analyses with Genehunter and Merlin, power computations with SLINK, quantitative trait anaylses with SOLAR, LD computations with Haploview and WGAViewer, and family-based and case-control association anaylses with PLINK and SAS. The methods and exercises are coordinated with the lectures and students are expected to understand underlying assumptions and limitations and the basic calculations performed by these computer programs.

Internship

The primary goal of the Internship program is for all students enrolled in Internship to acquire critical professional experience so that they will be well prepared to enter the job market upon graduation. This provides an opportunity for students to test-drive the job market, develop contacts, build marketable skills, and figure out likes and dislikes in the chosen field.

MSBDS Internship
Students are required to spend a total of 440 hours in the research centers of their Internship match. Students will meet on a regular basis over the two semester course with the Course Master to discuss the progress of their experience. At the end of the Fall 2 semester, students will present their findings through a written project and oral presentation. Internship presentations will be scheduled late during the last semester (Fall 2). Grading will be determined in consultation with the mentor. MSBDS students who do not enroll in this course option are required to take 6 credit hours of Mentored Research.

MSIBS Internship
Students are required to spend a minimum total of 220 hours in the research centers of their Internship match. Students will meet on a regular basis over the summer semester with the Course Master to discuss the progress of their experience. At the end of the semester, students will present their findings through a written project and oral presentation. Grading will be determined in consultation with the mentor.  There is also a full time (440 hours – 6 credits) option available to students.

Introduction to Bioinformatics

This course provides a broad exposure to basic concepts, methodologies and applications of bioinformatics. Students will learn online databases & mining tools, and acquire understanding of mathematical algorithms in sequence analysis (sequence alignment, gene finding, and hidden Markov models), gene expression microarray analysis (data QC & normalization, univariate & multivariate differential expression analysis), next generation sequence analysis (short-read data format and processing, variant calling algorithms), and topics on other high-throughput biomedical experiments. Students will become familiar with popular bioinformatics software, online tools, and R/BioConductor packages. We will discuss methods for high-dimensional data analysis including classification and clustering analysis, principal component analysis (PCA), statistical/machine learning, and Bayesian inference. There also will be seasonal additional lectures on topics such as proteomics and applications of bioinformatics to real studies of complex diseases.

As an important component of this course, students will conduct hands-on computer labs to learn basics of online bioinformatics databases and tools, and to practice computer programming. The labs require using the statistical computing environment R though introduction to BioConductor basics will be provided. Students will use specialized software and R packages to accomplish tasks including designing experiments, low-level analysis of expression levels, univariate differential expression analysis, and various multivariate analysis techniques taught in class. A variety of software will be used for NGS data analysis covering alignment, variants calls, differential analysis, and visualization of results. Through the lab exercises, students will learn how state of art computational tools are applied to solving bioinformatics problems in real studies of human diseases. Prerequisite: M21-506, Introduction to R for Data Science. Auditing will not have access to the computer lab sessions.

Introduction to Biomedical Informatics I: Foundations

This survey course provides an overview of the theories and methods that comprise the field of biomedical informatics. Topics to be covered include: 1) information architecture as applied to the biomedical computing domain; 2) data and interoperability standards; 3) biological, clinical, and population health relevant data analytics; 4) healthcare information systems; 5) human factors and cognitive science; 6) evaluation of biomedical computing applications; and 7) ethical, legal, and social implications of technology solutions as applied to the field of biomedicine. The course will consist of both didactic lectures as well as experiential learning opportunities including “hands on” laboratory sessions and journal club style discussion. The course will culminate with a capstone project requiring the in-depth examination, critique and presentation of a student-selected topic related to the broad field of biomedical informatics. Biomedical Informatics I is designed primarily for individuals with a background in the health and/or life sciences and who have completed a course in introductory statistics (e.g., MATH 1011). No assumptions are made about computer science or clinical background; however, some experience with computers and a high-level familiarity with health care will be useful. This course does not require any programming knowledge, and it will not teach students how to program. 

Introduction to Biomedical Informatics II: Methods

This course introduces students to the methods needed in order to apply the foundational theories covered in Biomedical Informatics I. The course will cover a broad spectrum of such methods including both computational and quantitative science techniques that can be employed in the design, conduct, and analysis of basic science, clinical, and translational research programs. This course is intended to enable individuals to critically select such methods and evaluate their results as part of both the design of new project as well as the review of results available in the public domain (e.g., literature, public data sets, etc.). Core concepts to be reviewed during this course include: basic computational skills, data modelling and integration, formal knowledge representation, in silico hypothesis generation, quantitative data analysis principles, and critical thinking skills surrounding the ability to ask and answer questions about complex and heterogeneous biomedical data. Prerequisite: M17-5302 or instructor permission.

Introduction to Epidemiology

This course introduces the basic principles and methods of epidemiology, with an emphasis on critical thinking, analytic skills, and application to clinical practice. Topics include outcome measures, methods of adjustment, surveillance, quantitative study designs, and sources of data. Designed for those with a clinical background, the course will provide tools for critically evaluating the literature and skills to practice evidence-based medicine.

Introduction to R for Data Science

This is a short 2-credit primer to introduce the R Statistical Environment to new users. R is “a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modeling, statistical tests, time series analysis, classification, clustering, etc. The goal is to give students a set of tools to perform statistical analysis in medicine, biology, or epidemiology. At the conclusion of this primer, students will: be able to manipulate and analyze data, write basic models, understand the R environment for using packages, and create standard or customized graphics. This primer assumes some knowledge of basic statistics as taught in a first semester undergraduate or graduate sequence. Topics should include: probability, cross-tabulation, basic statistical summaries, and linear regression in either scalar or matrix form. This course meets the prerequisite for two summer courses M21-515 Fundamentals of Genetic Epidemiology and M21-550 Introduction to Bioinformatics. May not be taken for audit.

Mentored Research

All students enrolled in the Mentored Research course will complete a Master’s thesis, which may involve conducting and reporting a comprehensive data analysis or conducting research and reporting on a focused methodological problem; the latter may include a computer simulation approach to solve a problem, an in depth review of available methods in a certain topical area, or developing new methods. Each student will work closely with a Mentor who has expertise in biostatistics or a related quantitative field. The grade for each student will be determined in consultation with the mentor.

MSBDS Mentored Research
Students will meet regularly over the two-semester course with the Course Master to discuss the progress of their project. As part of the Mentored Research requirements at the end of the Fall 2 semester, each student will submit a written thesis and give an oral presentation of the thesis research. MSBDS students who do not enroll in this course option are required to take 6 credit hours of Internship. MSBDS students who do not enroll in this course option are required to take 6 credit hours of Internship.

MSIBS Mentored Research
Students will meet regularly over the fall semester with the Course Master to discuss the progress of their project. As part of the Mentored Research requirements at the end of the Fall 2 semester, each student will submit a written thesis and give an oral presentation of the thesis research.  MSIBS students who do not enroll in this course option are required to take 6 credits hours of elective credit.

Statistical Computing with SAS®

Intensive hands-on summer training in SAS® over 7 full weekdays. Students will learn how to use the SAS® System for handling, managing, and analyzing data. Instruction is provided in the use of the SAS® programming language and procedures. The course will teach students how to become effective, self-reliant SAS® users, and will instruct the students in data management and basic exploratory data analysis using SAS®.  Topics include, but are not limited to: Reading External Files into SAS; Examining and Manipulating the Contents of SAS Datasets; and SAS Macro Variables and Programs.  Students will learn how to output results, and create high quality tables and graphs in SAS.  A brief introduction to statistics in SAS will also be included.  Instruction manual and computer lab will be provided. This course meets the prerequisite for M21-560 Biostatistics I. May not be taken for audit. 

Study Design and Clinical Trials

This course will focus on statistical and epidemiological concepts of study design and clinical trials. Topics include: different phases of clinical trials, various types of medical studies (observational studies, retrospective studies, cross-over design, factorial design, and group sequential design and power analysis, along with statistical methods for the various types of studies. Study management, randomization method and survey data analysis are also addressed. Students will be expected to write up a proposed design for a study of their choice, and to practice power analysis/sample size estimation during lab sessions. Permission of the Course Master required. Prerequisites: M21-560 Biostatistics I and M21-570 Biostatistics II or the equivalent as determined by the course masters.

Survival Analysis

This course will cover the basic applied and theoretical aspects of survival analysis techniques to analyze time-to-event data. Basic concepts will be introduced and topics include survival function, hazard function, censoring and truncation, Kaplan-Meier and Nelson-Aalen estimators, cohort life table, likelihood construction for censored and truncated data, estimating hazard and survival functions, Cox-proportional hazards (PH) model with fixed and time-dependent covariates and model selection. Additional topics will include regression diagnostics for survival models, the stratified PH model, parametric regression models and competing risk. Computer lab sessions are designed to provide intensive hands-on experience to analyze real life datasets. Prerequisites: Biostat I and II, mathematical statistics (covers probabilities, distributions, likelihood, etc.), Calculus II or III and SAS programming. Or permission from the course master. 

Elective

Students will select an elective course(s) from an approved list of statistics, genetics, analysis and data science courses. Course examples include:

  • Human Genetic Analysis
  • Data Mining
  • Machine Learning
  • Introduction to Epidemiology