Data Science Curriculum

Required courses for the Master's in Data Science Degree.

Data Science Curriculum

Year One: First Semester

  • CAP 5300 – Statistical Inference for Data Science I: A rapid review of probability followed by an introduction to R. Fundamentals of statistical inference including parameter estimation and maximum likelihood, hypothesis testing, regression and linear models with a focus on working with large data sets. An introduction to resampling and nonparametric methods.
  • CAP 5322 – Data Storage and Retrieval: Fundamentals of traditional database design and management. data warehousing, extraction and transformation of structured and unstructured data. Concurrency, stability and efficiency in data retrieval storage. An introduction to massively parallel data structures and software tools used in their management (MapReduce, Hadoop, etc).
  • CAP 5328 – Algorithms and Optimization: Fundamentals of Algorithms and measures of performance. Taught in Python, the course includes an exploration of efficient algorithms for sorting and retrieving data. Material covered over the course of the semester includes graph algorithms and combinatorial optimization, dynamic programming, randomized algorithms and approximate algorithms.
  • CAP 5320 – Data Munging and Exploratory Data Analysis: Exploratory data analysis in the context of knowledge discovery, including the use of data visualization software. Inference, prediction and causal relationships. Multivariate models and independence. Resampling methods and nonparametric statistics with a focus on application to real data.

Year One: Second Semester

  • CAP 5302 – Statistical Inference for Data Science II: Nonparametric methods and multivariate inference. Linear and nonlinear methods for dimension reduction; an introduction to Bayesian methods; graphical models and causal inference.
  • CAP 5738 – Data Visualization, Presentation, Reporting, and Reproducible Research: A project-centered introduction to the visual display of quantitative information for both knowledge discovery and the communication of results. Fundamentals of reproducible research in the context of consulting.
  • CAP 5327 – Distributed Computing for Data Science: Fundamentals concerning the design and maintenance of massively parallel data sets. Nonrelational databases and their management. Algorithms for parallel architectures and associated software tools including the MapReduce/Hadoop framework and BigTable.
  • CAP 5610 – Optimization and Machine Learning: Fundamentals of supervising and unsupervised learning with an emphasis on working with real data. An introduction to Bayesian analysis. Implementation of specific learning paradigms including regression, clustering, random forests, support vector machines, kernel methods and neural networks. Construction of hybrid classifiers.

Year Two: Third Semester

  • CAP 5323 – Practical Data Science: Analysis of data and creation of a data product for industry. Working in small groups, students analyze an industry-submitted data set from exploratory analysis, through construction and testing of hypotheses, to the construction and presentation of a data product to inform an industry-driven decision.
  • CAP 5931 – Topics in Computing for Data Science — Deep Learning: Advanced material involving computing and data science. Topics vary and may include image processing, text mining, nonrelational databases and their management, and software engineering for massively parallel structures.
  • CAP 5303 – Topics in Statistical Inference for Data Science: Time-Series and Forecasting: Advanced material involving statistical inference and massive data sets. Topics vary and may include survival analysis, time series and prediction, risk analysis, decision theory, the theory of social networks, distributed software for statistical inference and advanced topics in machine learning.

Year Two: Fourth Semester:

  • CAP 5940 – Practicum: a full semester placed and working in industry as part of a data science team, while under the weekly supervision of and submitting reports to Data Science faculty.