Statistics and Data Science
Statistics and Data Science are fields that help us make sense of information. They focus on how to collect, organize, analyze, and explain data in ways that support better decision-making.
As the amount of data in the world continues to grow, so does the need for tools and techniques to understand it. These skills are used in nearly every area of life – from finance and medicine to sports, law, and the sciences. In fact, it's hard to find a field today that doesn't rely on data in some way.
Statistics is often used to design surveys and experiments, and to analyze the results. It is an essential tool for understanding patterns and relationships in data. Statistical methods help uncover cause-and-effect relationships and support evidence-based conclusions. It plays a key role in areas like medical research, political polling, insurance, and financial planning.
Data Science focuses on working with large and complex data sets, often using computers to find patterns and make predictions. It emphasizes speed, efficiency, and helping people make informed choices. New fields like artificial intelligence, deep learning, and bioinformatics have grown out of Data Science methods.
Together, Statistics and Data Science provide powerful tools for understanding the world and solving real-world problems.
Data Science Learning Objectives
Our Data Science major and minor are designed to equip learners with the foundational knowledge, practical skills, and analytical mindset necessary to thrive in a data driven world.
Through a blend of hands-on projects and theoretical instruction, students will explore the full data science lifecycle – from data collection and cleaning to visualization, statistical analysis, and predictive modeling. Emphasis is placed on real-world applications, ethical data use, and effective communication of results to both technical and non-technical audiences.
- Students will demonstrate the ability to effectively and efficiently import, clean, transform, and combine data (a.k.a. data preparation or data wrangling/munging); they will demonstrate the ability to handle data of various formats.
- Students will demonstrate the ability to both select and construct visual representations of the data for both exploration and communication.
- Students will demonstrate the ability to write analytic code that is well documented, follows coding standards, and is independently reproducible.
- Students will demonstrate the ability to perform an exploratory data analysis.
- Students will demonstrate the ability to propose and fit various competing predictive models and appropriately evaluate them.
- Students will be able to understand statistical theory and principles of statistical inference and apply them to data science.
- Students will demonstrate understanding of fundamental concepts of computer programming, algorithms, and data structure.
- Students will be able to work independently and/or on teams to design and conduct analysis of complex, large data sets using statistical and machine learning methods to answer questions and solve problems in real-world contexts.
- Students will understand and demonstrate ethical practices within data science such as incorporating multiple perspectives, especially those of marginalized communities, question traditional hierarchies that dehumanize and objectify data, and promote transparency, equity, and community partnership.
Statistics Learning Objectives
Our Statistics major and minor are designed to build a strong foundation in statistical theory while emphasizing practical applications and data interpretation skills. Whether pursuing careers in research, industry, or further academic study, graduates will be prepared to apply statistical thinking in a data-rich world.
Students will explore core concepts such as probability, inference, and regression. The curriculum balances mathematical rigor with real-world relevance, preparing learners to tackle complex problems using both classical and modern statistical methods.
- Students will demonstrate ability in mathematical and statistical theory, including basic probability, inference, and modeling principles necessary to understand statistical methods typically applied in data analysis.
- Students will demonstrate ability in computational methods, including basic statistical programming, data analysis, and reproducibility necessary to do applied data analysis.
- Students will demonstrate the ability to use appropriate statistical methodologies for real-world data analysis settings.
- Students will evaluate the ethical implications of aspects related to statistical inquiry including study design, data collection, and data analysis.
- Students will develop skills in written communication of statistical findings.
- Students will develop skills in conducting research in statistics
Courses
STAT 101-7 College Seminar (1 Unit) Small, writing and discussion-oriented course exploring a specific topic or theme, and introducing skills necessary to thriving at Northwestern. Not eligible to be applied towards a WCAS major or minor except where specifically indicated.
STAT 101-8 First-Year Writing Seminar (1 Unit) Small, writing and discussion-oriented course exploring a specific topic or theme, and focused on the fundamentals of effective, college-level written communication. Not eligible to be applied towards a WCAS major or minor except where specifically indicated.
STAT 201-0 Introduction to Programming for Data Science (1 Unit) This course is an introduction programming for Data Science. It will prepare students to use essential programming methods as implemented in either Python or R as a tool in the subsequent data science courses including STAT 301-1, STAT 302-0, STAT 303-1, STAT 304-0, STAT 305-0, STAT 350-0, etc. Prerequisite: High School Algebra. Empirical and Deductive Reasoning Foundational Dis Formal Studies Distro Area
STAT 202-0 Introduction to Statistics and Data Science (1 Unit) Data collection, summarization, correlation, regression, sampling, confidence intervals, tests of significance. Introduction to data analysis techniques using R programming, no prior programming experience required. May not receive credit for both STAT 202-0 and STAT 210-0. Prerequisite: makes minimal use of high school algebra (calculus is not required). Empirical and Deductive Reasoning Foundational Dis Formal Studies Distro Area
STAT 202-SG Peer-Guided Study Group: Introduction to Statistics and Data Science (0 Unit) Peer-guided study group for students enrolled in STAT 202-0. Meets weekly in small groups, along with a peer facilitator, to collaboratively review material, work through practice problems, and clarify course concepts. Enrollment optional. Graded S/U.
STAT 210-0 Introduction to Probability and Statistics (1 Unit) A mathematical introduction to probability theory and statistical methods, including properties of probability distributions, sampling distributions, estimation, confidence intervals, and hypothesis testing. STAT 210-0 is primarily intended for economics majors. May not receive credit for both STAT 202-0 and STAT 210-0. Prerequisite: strong background in high school algebra (calculus is not required). Empirical and Deductive Reasoning Foundational Dis Formal Studies Distro Area
STAT 210-SG Peer-Guided Study Group: Introduction to Probability and Statistics (0 Unit) Peer-guided study group for students enrolled in STAT 210-0. Meets weekly in small groups, along with a peer facilitator, to collaboratively review material, work through practice problems, and clarify course concepts. Enrollment optional. Graded S/U.
STAT 228-0 Series and Multiple Integrals (1 Unit) Sequences and series, and convergence tests. Power series, Taylor polynomials and error. Double integrals, triple integrals, and change of variables. Students may receive credit for only one of MATH 235‐0, MATH 226‐0, or STAT 228‐0. Prerequisite: MATH 218‐3 or MATH 220‐2, and MATH 228‐1 or MATH 230‐1 or MATH 281‐1 or MATH 285‐2 or MATH 290‐2 or MATH 291‐2 or ES_APPM 252‐1. Empirical and Deductive Reasoning Foundational Dis Formal Studies Distro Area
STAT 232-0 Applied Statistics (1 Unit) Basic concepts of using statistical models to draw conclusions from experimental and survey data. Topics include simple linear regression, multiple regression, analysis of variance, and analysis of covariance. Practical application of the methods and the interpretation of the results will be emphasized. Prerequisites: STAT 202-0, STAT 210-0, or equivalent; MATH 220-1. Formal Studies Distro Area
STAT 301-1 Data Science 1 with R (1 Unit)
First course in Data Science with a focus on data management, manipulation, and visualization skills and techniques for exploratory data analysis. The course also introduces the R programming language. Students may not receive credit for both this course and STAT 303-1.
Prerequisite: STAT 201-0 or COMP_SCI 110-0 and STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0 or equivalent.
Formal Studies Distro AreaSTAT 301-2 Data Science 2 with R (1 Unit)
Introduction to supervised machine/statistical learning with a focus on application using R. Course covers essential concepts in machine learning while surveying standard machine learning models such as linear and logistic regression. Course provides a foundation for learning more machine learning methods. Students may not receive credit for both this course and STAT 303-2.
Prerequisite: STAT 301-1.
Formal Studies Distro AreaSTAT 301-3 Data Science 3 with R (1 Unit)
An intermediate course that covers machine learning methods in R, including supervised and unsupervised learning. It provides the knowledge and skills necessary to tackle real world problems with machine learning. Students may not receive credit for both this course and STAT 303-3.
Prerequisite: STAT 301-2.
Formal Studies Distro AreaSTAT 302-0 Data Visualization (1 Unit)
Introduction to the knowledge, skills, and tools required to visualize data of various formats across statistical domains and to create quality visualizations for both data exploration and presentation.
Prerequisite: STAT 202-0 or equivalent.
Formal Studies Distro AreaSTAT 303-1 Data Science 1 with Python (1 Unit)
First course in Data Science, with focus on data management, manipulation, and visualization skills and techniques for exploratory data analysis. The course also introduces the Python programming language in the context of Data Science. Students may not receive credit for both this course and STAT 301-1.
Prerequisite: STAT 201-0 or COMP_SCI 110-0 and STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0 or equivalent.
Formal Studies Distro AreaSTAT 303-2 Data Science 2 with Python (1 Unit)
Introduction to supervised machine/statistical learning with a focus on application using Python. Course covers essential concepts in machine learning while surveying standard machine learning models such as linear and logistic regression. Course provides a foundation for learning more machine learning methods. Students may not receive credit for both this course and STAT 301-2.
Prerequisite: STAT 303-1.
Formal Studies Distro AreaSTAT 303-3 Data Science 3 with Python (1 Unit)
An intermediate course that covers machine learning methods in Python, including supervised and unsupervised learning. It provides the knowledge and skills necessary to tackle real world problems with machine learning. Students may not receive credit for both this course and STAT 301-3.
Prerequisite: STAT 303-2.
Formal Studies Distro AreaSTAT 304-0 Data Structures and Algorithms for Data Science (1 Unit)
This course will introduce students to the design, implementation, analysis, and proper application of abstract data types, data structures, and their algorithms. Python will be used to implement and explore various algorithms and data structures. Students should be prepared for a significant amount of hands-on programming.
Prerequisite: STAT 201-0 or COMP_SCI 110-0 and STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0 or equivalent.
Formal Studies Distro AreaSTAT 305-0 Information Management for Data Science (1 Unit)
This course aims to give students an extensive data processing and visualization skillset using various Python libraries. It will also focus on relational databases and queries in SQL. Students will learn data scraping from online sources and mobile applications as well as a brief introduction to statistical and predictive analysis after the data is clean and ready to use.
Prerequisite: STAT 201-0 or COMP_SCI 110-0 and STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0 or equivalent.
Formal Studies Distro AreaSTAT 320-1 Statistical Theory & Methods 1 (1 Unit)
Sample spaces, computing probabilities, random variables, distribution functions, expected values, variance, correlation, limit theory. May not receive credit for both STAT 320-1 and any of STAT 383-0, MATH 310-1, MATH 311-1, MATH 314-0, MATH 385-0, ELEC_ENG 302-0, or IEMS 302-0. Co-requisites: STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0 or equivalent, STAT 228-0 or MATH 235-0 or both MATH 226-0 and MATH 230-2 or MATH 228-2 or MATH 281-2 or MATH 285-3 or MATH 290-3 or MATH 291-3 or ES_APPM 252-2.
Formal Studies Distro AreaSTAT 320-2 Statistical Theory & Methods 2 (1 Unit)
Parameter estimation, confidence intervals, hypothesis tests.
Prerequisite: STAT 320-1 or MATH 310-1 or MATH 310-1 or MATH 311-1 or MATH 314-0 or MATH 385-0 or ELEC_ENG 302-0 or IEMS 302-0 or STAT 383-0.
Formal Studies Distro AreaSTAT 320-3 Statistical Theory & Methods 3 (1 Unit)
Comparison of parameters, goodness-of-fit tests, regression analysis, analysis of variance, and nonparametric methods.
Prerequisites: STAT 320-2 and MATH 240-0 or MATH 285-1 or MATH 281-3 or MATH 290-1or MATH 291-1 or GEN_ENG 205-1 or GEN_ENG 206-1.
Formal Studies Distro AreaSTAT 325-0 Survey Sampling (1 Unit)
Probability sampling, simple random sampling, error estimation, sample size, stratification, systematic sampling, replication methods, ratio and regression estimation, cluster sampling.
Prerequisites: MATH 230-1 and 2 quarters of statistics, or consent of instructor.
Formal Studies Distro AreaSTAT 328-0 Causal Inference (1 Unit)
Introduction to modern statistical thinking about causal inference. Topics include completely randomized experiments, confounding, ignorability of assignment mechanisms, matching, observational studies, noncompliance, and Bayesian methods.
Prerequisites: STAT 320-2, STAT 350-0.
Formal Studies Distro AreaSTAT 330-1 Applied Statistics for Research 1 (1 Unit)
First Quarter: Design of experiments and surveys, numerical summaries of data, graphical summaries of data, correlation and regression, probability, sample mean, sample proportion, confidence intervals and tests of significance, one and two sample problems, ANOVA. Second Quarter: Simple linear regression, inference, diagnostics, multiple regression diagnostics, autocorrelation, 1-way ANOVA, power and sample size determination, 2-way ANOVA, ANCOVA, randomized block designs.
STAT 332-0 Statistics for Life Sciences (1 Unit) Application of statistical methods and data analysis techniques to the life sciences. Parametric statistics, nonparametric approaches, resampling-based approaches. Prerequisite: 1 introductory statistics course. Formal Studies Distro Area
STAT 344-0 Statistical Computing (1 Unit)
Exploration of theory and practice of computational statistics with emphasis on statistical programming in R.
Prerequisite: STAT 320-2 or equivalent. Some R programming experience is desired.
Formal Studies Distro AreaSTAT 348-0 Applied Multivariate Analysis (1 Unit)
Statistical methods for describing and analyzing multivariate data. Principal component analysis, factor analysis, canonical correlation, clustering. Emphasis on statistical and geometric motivation, practical application, and interpretation of results.
Prerequisites: STAT 320-2 and MATH 240-0 or MATH 285-1 or MATH 281-3 or MATH 290-1 or MATH 291-1 or GEN_ENG 205-1 or GEN_ENG 206-1. Co-requisite STAT 350-0. The course uses R extensively; hence, some experience with R will be useful.
Formal Studies Distro AreaSTAT 350-0 Regression Analysis (1 Unit)
Simple linear regression and correlation, multiple regression, residual analysis, model building, variable selection, multi-collinearity and shrinkage estimation, nonlinear regression.
Prerequisites: STAT 201-0 or COMP_SCI 110-0 and STAT 202-0 or STAT 210-0 or STAT 232-0 or PSYCH 201-0 or IEMS 201-0 or IEMS 303-0. Co-requisite: STAT 320-1 or STAT 383-0 or MATH 310-1 or MATH 311-1 or MATH 314-0 or MATH 385-0 or ELEC_ENG 302-0 or IEMS 302-0.
Formal Studies Distro AreaSTAT 351-0 Design and Analysis of Experiments (1 Unit)
Methods of designing experiments and analyzing data obtained from them: one-way and two-way layouts, incomplete block designs, factorial designs, random effects, split-plot and nested designs.
Prerequisite: STAT 320-1 or STAT 383-0 or MATH 310-1 or MATH 311-1 or MATH 314-0 or MATH 385-0 or ELEC_ENG 302-0 or IEMS 302-0 or equivalent.
Formal Studies Distro AreaSTAT 352-0 Nonparametric Statistical Methods (1 Unit)
Survey of nonparametric methods, with emphasis on understanding their application. Estimation of a distribution function, density estimation, and nonparametric regression.
Prerequisite: STAT 350-0.
Formal Studies Distro AreaSTAT 353-0 Advanced Regression (1 Unit)
This course covers modern regression methods, including: (1) generalized linear models (binary, categorical, and count data), (2) random effects, mixed effects, and nonlinear models, and (3) model selection. The course emphasizes both the theoretical development of the methods, as well as their application, including the communication of models and results both verbally and in writing.
Prerequisites: STAT 350-0 and STAT 320-2 or STAT 420-2 or MATH 310-2.
Formal Studies Distro AreaSTAT 354-0 Time Series Modeling (1 Unit)
Introduction to modern time series analysis. Autocorrelation, time series regression and forecasting, ARIMA and GARCH models.
Prerequisites: STAT 320-1 or STAT 383-0 or MATH 310-1 or MATH 311-1 or MATH 314-0 or MATH 385-0 or ELEC_ENG 302-0 or IEMS 302-0. Corequisite: STAT 350-0.
Formal Studies Distro AreaSTAT 355-0 Analysis of Qualitative Data (1 Unit)
Introduction to the analysis of qualitative data. Measures of association, loglinear models, logits, and probits.
Prerequisite: STAT 320-2 or equivalent.
Formal Studies Distro AreaSTAT 356-0 Hierarchical Linear Models (1 Unit)
Introduction to the theory and application of hierarchical linear models. Two and three level linear models, hierarchical generalized linear models, and application of hierarchical models to organizational research and growth models.
Prerequisites: STAT 320-2 and STAT 350-0.
Formal Studies Distro AreaSTAT 357-0 Introduction to Bayesian Statistics (1 Unit)
Introduction to basic concepts and principles in Bayesian inference such as the prior, likelihood, posterior and predictive distributions, as well as an introduction to a variety of computational algorithms for Bayesian inference. Students learn how to develop, describe, implement and critique statistical models from a Bayesian perspective.
Prerequisites: STAT 320-2 and STAT 301-2 or STAT 350-0 or consent of instructor.
Formal Studies Distro AreaSTAT 359-0 Topics in Statistics (1 Unit)
Topics in theoretical and applied statistics to be chosen by instructor.
Prerequisite: varies by course topic.
Formal Studies Distro AreaSTAT 360-0 Introduction to Generative AI (1 Unit)
This course will provide an introduction to generative AI. In particular, we will cover large language models and diffusion models. By the end of the course, students should have a thorough understanding of all major components underpinning modern large language models. Students should be able to train their own large language models after taking this class.
Prerequisites: Linear algebra (MATH 240-0), STAT 320-2 or equivalent, some familiarity with deep learning, Python experience.
STAT 362-0 Advanced Machine Learning for Data Science (1 Unit) This course aims to focus on the theory and applications of advanced Machine Learning (ML) and Deep Learning (DL) topics. It also includes an introduction to Bayesian Modeling and Reinforcement Learning (RL). The students are expected to have a basic understanding of ML from STAT 301-1-2-3/303-1-2-3. The coding language for the homework projects is Python. Prerequisites: STAT 301-3 or STAT 303-3. Formal Studies Distro Area
STAT 365-0 Introduction to the Analysis of Financial Data (1 Unit)
Statistical methods for analyzing financial data. Models for asset returns, portfolio theory, parameter estimation. The statistical software R is used.
Prerequisites: STAT 320-3 and MATH 240-0 or MATH 285-1 or MATH 281-3 or MATH 290-1or MATH 291-1 or GEN_ENG 205-1 or GEN_ENG 206-1.
Formal Studies Distro AreaSTAT 370-0 Human Rights Statistics (1 Unit)
Development, analysis, interpretation, use, and misuse of statistical data and methods for description, evaluation, and political action regarding war, disappearances, justice, violence against women, trafficking, profiling, elections, hunger, refugees, discrimination, etc.
Prerequisites: Two of STAT 325-0, STAT 350-0, STAT 320-2, STAT 320-3; or ECON 381-1, ECON 381-2; or MATH 386-1, MATH 386-2; or IEMS 303-0, IEMS 304-0.
Formal Studies Distro AreaSTAT 383-0 Probability and Statistics for ISP (1 Unit) Probability and statistics. Ordinarily taken only by students in ISP; permission required otherwise. May not receive credit for both STAT 383-0 and any of STAT 320-1, MATH 310-1, MATH 311-1, MATH 314-0, MATH 385-0, ELEC_ENG 302-0, or IEMS 302-0. Prerequisites: MATH 281-1 and MATH 281-2 and MATH 281-3 and PHYSICS 125-1 and PHYSICS 125-2 and PHYSICS 125-3. Formal Studies Distro Area
STAT 390-0 Data Science Project (1 Unit) An opportunity to develop and create solutions for stakeholders with data needs. Students will work in teams to appropriately scope and solve data problems. Students should expect to spend significant amounts of time coordinating and working with team mates outside of class. Prerequisites: STAT 301-3 or STAT 303-3.
STAT 398-0 Undergraduate Seminar (1 Unit)
STAT 399-0 Independent Study (1-3 Units) Independent work under the guidance of a faculty member. Consent of department required.