APPLIED DATA SCIENCE CURRICULUM: 3+2 PATHWAY

New College students who complete the second year of their studies in any area of concentration are encouraged to follow the accelerated 3+2 curriculum provided below, if they are interested in completing both undergraduate and graduate programs in five years. Students who are interested in this option will be eligible only after entering the New College undergraduate program and showing strong academic performance. These applicants must satisfy the following minimum conditions before they can be admitted via the 3+2 pathway:

  • Complete 2 years of study with Satisfactory evaluations in all academic undertakings.
  • Complete prerequisite courses (see below)
  • Be recommended for the 3+2 pathway by a faculty member

The regular curriculum for students who apply and enter with a Bachelor’s degree is available here.

Prerequisite Courses

The following courses must be completed during the first two years of undergraduate study:

MATH 2400 – Calculus I

MATH 3250 – Calculus II

CSCI 2200 – Introduction to Programming in Python

CSCI 3250 – Intermediate Python or CSCI 2400 – Object Oriented Programming

MATH 2200 – Probability 1 (Mod 1)

MATH 4550 – Probability 2 (Mod 2)

MATH 2320 – Linear Algebra

These courses also count towards satisfying the ADS 5000 Introduction to Data Science Bootcamp course in the graduate program.

Year Three: Fall Semester

IDC 5100 – Applied Statistics I: A statistics course focusing on descriptive and inferential statistics, with topics on linear regression, confidence intervals and hypothesis testing, including probability theory and modern approaches such as resampling, with all methods illustrated in R and a focus on methods relevant for data science using industrial datasets.

IDC 5110 – Data Munging and Exploratory Data Analysis: A course on practical approaches for reshaping, reorganizing, and summarizing relationships in data through exploratory analysis. Principles and methods for preprocessing, normalizing, and validating data are covered, with an emphasis on collaborative and reproducible research.

Year Four: Fall Semester

IDC 5120 – Algorithms for Data Science: Fundamentals of algorithms and measures of performance. Taught in Python, the course includes an exploration of efficient algorithms for sorting and retrieving data, graph algorithms and combinatorial optimization, dynamic programming, randomized algorithms and approximation algorithms.

IDC 5130 – Databases for Data Science: Fundamentals of traditional database design and management. Various types and comparison of databases including SQL databases (eg. Postgre, SQLite), NoSQL databases, column-oriented databases (eg. HBase) and document-oriented databases (eg. MongoDb). Consistency, availability, scalability, efficiency and performance in data retrieval and storage.

IDC 5290 – Industrial Seminar Series I: The first offering of a three-semester long seminar series which hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

Year Four: January Inter-term

IDC 5295 – Industrial Workshops: This course offers content modules complementary to the regular coursework of the graduate program in applied data science. Examples include, but are not limited to, topics such as Ethics, emerging or trending techniques in data science, domain-specific applications, industrial software platforms or tools, and professional certification modules and exams widely acknowledged in the industry.

Year Four: Spring Semester

IDC 5102 – Applied Statistics II: A course on statistical modeling, including multiple linear and logistic regression, and more generally, generalized linear models. Emphasis is placed on model formulation, building, assumptions, interpretations, predictions and assessments, with implementation carried out in R and a focus on methods and models relevant for data science using industrial datasets.

IDC 5112 – Data Visualization: A project-centered introduction to the visual display of quantitative information for both knowledge discovery and the communication of results. Students develop, over the course of the semester, a visual application in their interest with data collected from an industrial application or project.

IDC 5122 – Applied Machine Learning: Project-based course with a coverage of supervised and unsupervised learning and an emphasis on working with real industrial data. Bayesian analysis and other specific learning paradigms including regression, clustering, random forests, support vector machines, kernel methods, and neural networks.

IDC 5132 – Distributed Computing: Fundamentals concerning the design and maintenance of massively parallel data sets. Non-relational databases and their management. Algorithms for parallel architectures and associated software tools including the MapReduce/Hadoop framework and BigTable.

IDC 5291 – Industrial Seminar Series II: The second offering of a three-semester long seminar series that hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

Year Four Summer or Year Five January Interterm:

IDC 6298 – Industrial Practicum I: Intended as a summer internship or interterm applied project, this course is the first extensive real industry experience opportunity offered to students who would like to put their data science knowledge and skills to practical use. Must be completed with an industrial partner of the program or a company/organization the student chooses to work with, while under the supervision of a data science faculty.

Year Five: Fall Semester

IDC 6200 – Advanced Applied Statistics: A second statistical modeling course, with a mix of topics such as generalized additive models, models for longitudinal responses, time series models, survival analysis, statistical learning or Bayesian statistics, with a focus on models relevant for data science. Taught with a project-based focus using real industrial data in an applied business context.

IDC 6220 – Advanced Applied Computing: Advanced topics in computing, including such topics as image processing and object detection, text mining, natural language processing, recurrent neural networks, reinforcement learning. Taught with a project-based focus using real industrial data in an applied business context.

IDC 6250 – Practical Data Science: Analysis of data and creation of a data science pipeline and deliverable for industry. Working in small groups, students analyze an industry-submitted data set starting with exploratory analysis, followed by statistical or machine learning-based model building, and the construction and presentation of a data product to an industry partner.

IDC 6293 – Industrial Seminar Series III: The third and final offering of a three-semester long seminar series that hosts professionals and executives as guest speakers from a variety of industrial domains. Each weekly or biweekly seminar covers topics and applications to diverse problems in business via applications of various data science techniques.

Year Five: Spring Semester

IDC 6299 – Industrial Practicum II: A full semester working in industry as part of a data science team, while under the weekly supervision of and submitting reports to a Data Science faculty. This is the second and final stage of the industrial practicum where the student works in an industrial partner company or organization or in a company of their choice. Performance is assessed both by a faculty advisor and a company supervisor.