Data Science Curriculum

Required courses for the Master's in Data Science Degree.

Master of Science in Applied Data Science Curriculum

The Applied Data Science program is a two-year 36-credit program, which includes 11 full-semester 3-credit courses, two required practicums (one being a full-semester 3-credit course), industrial seminar series in the first three semesters and industrial workshops during the January interterm. For students who apply and enter with a Bachelor’s degree, the curriculum is provided below. For New College students who apply and enter via the 3+2 pathway, please visit here.

Year One: Pre-Semester

Introduction to Data Science Bootcamp: The bootcamp aims to equip all students entering the graduate program with introductory skills and knowledge needed to conduct further coursework in the program. This will help students from diverse backgrounds to have a common knowledge base as a cohort. Topics will include review of Python and R programming, common data science tools, resources and platforms, operating systems, database concepts and systems, among others.

Year One: Fall Semester

Applied Statistics I: A statistics course focusing on descriptive and inferential statistics, with topics on linear regression, confidence intervals and hypothesis testing, including probability theory and modern approaches such as resampling, with all methods illustrated in R and a focus on methods relevant for data science using industrial datasets.

Data Munging and Exploratory Data Analysis: A course on practical approaches for reshaping, reorganizing, and summarizing relationships in data through exploratory analysis. Principles and methods for preprocessing, normalizing, and validating data are covered, with an emphasis on collaborative and reproducible research.

Algorithms for Data Science: Fundamentals of algorithms and measures of performance. Taught in Python, the course includes an exploration of efficient algorithms for sorting and retrieving data, graph algorithms and combinatorial optimization, dynamic programming, randomized algorithms and approximation algorithms.

Databases for Data Science: Fundamentals of traditional database design and management. Various types and comparison of databases including SQL databases (eg. Postgre, SQLite), NoSQL databases, column-oriented databases (eg. HBase) and document-oriented databases (eg. MongoDb). Consistency, availability, scalability, efficiency and performance in data retrieval and storage.

Industrial Seminar Series I: The first offering of a three-semester long seminar series which hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

Year One: January Interterm

Industrial Workshops: This course offers content modules complementary to the regular coursework of the graduate program in applied data science. Examples include, but are not limited to, topics such as Ethics, emerging or trending techniques in data science, domain-specific applications, industrial software platforms or tools, and professional certification modules and exams widely acknowledged in the industry.

Year One: Spring Semester

Applied Statistics II: A course on statistical modeling, including multiple linear and logistic regression, and more generally, generalized linear models. Emphasis is placed on model formulation, building, assumptions, interpretations, predictions and assessments, with implementation carried out in R and a focus on methods and models relevant for data science using industrial datasets.

Data Visualization: A project-centered introduction to the visual display of quantitative information for both knowledge discovery and the communication of results. Students develop, over the course of the semester, a visual application in their interest with data collected from an industrial application or project.

Applied Machine Learning: Project-based course with a coverage of supervised and unsupervised learning and an emphasis on working with real industrial data. Bayesian analysis and other specific learning paradigms including regression, clustering, random forests, support vector machines, kernel methods, and neural networks.

Distributed Computing: Fundamentals concerning the design and maintenance of massively parallel data sets. Non-relational databases and their management. Algorithms for parallel architectures and associated software tools including the MapReduce/Hadoop framework and BigTable.

Industrial Seminar Series II: The second offering of a three-semester long seminar series which hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

Year One Summer or Year Two January Interterm:

Industrial Practicum I: Intended as a summer internship or interterm applied project, this course is the first extensive real industry experience opportunity offered to students who would like to put their data science knowledge and skills to practical use. Must be completed with an industrial partner of the program or a company/organization the student chooses to work with, while under the supervision of a data science faculty.

Year Two: Fall Semester

Advanced Applied Statistics: A second statistical modeling course, with a mix of topics such as generalized additive models, models for longitudinal responses, time series models, survival analysis, statistical learning or Bayesian statistics, with a focus on models relevant for data science. Taught with a project-based focus using real industrial data in an applied business context.

Advanced Applied Computing: Advanced topics in computing, including such topics as image processing and object detection, text mining, natural language processing, recurrent neural networks, reinforcement learning. Taught with a project-based focus using real industrial data in an applied business context.

Practical Data Science: Analysis of data and creation of a data science pipeline and deliverable for industry. Working in small groups, students analyze an industry-submitted data set starting with exploratory analysis, followed by statistical or machine learning-based model building, and the construction and presentation of a data product to an industry partner.

Industrial Seminar Series III: The third and final offering of a three-semester long seminar series which hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

Year Two: Spring Semester

Industrial Practicum II: A full semester working in industry as part of a data science team, while under the weekly supervision of and submitting reports to a Data Science faculty. This is the second and final stage of the industrial practicum where the student works in an industrial partner company or organization or in a company of their choice. Performance is assessed both by a faculty advisor and a company supervisor.