Data Science Curriculum

Required courses for the Master's in Data Science Degree.

-[calendar][/calendar]Data Science Curriculum

Year One: First Semester

  • CAP 5300 – Statistical Inference for Data Science I: A rapid review of probability followed by an introduction to R. Fundamentals of statistical inference including parameter estimation and maximum likelihood, hypothesis testing, regression and linear models with a focus on working with large data sets. An introduction to resampling and nonparametric methods.
  • CAP 5322 – Databases for Data Science: Fundamentals of traditional database design and management. data warehousing, extraction and transformation of structured and unstructured data. Concurrency, stability and efficiency in data retrieval storage. An introduction to massively parallel data structures and software tools used in their management (MapReduce, Hadoop, etc).
  • CAP 5328 – Algorithms and Optimization: Fundamentals of Algorithms and measures of performance. Taught in Python, the course includes an exploration of efficient algorithms for sorting and retrieving data. Material covered over the course of the semester includes graph algorithms and combinatorial optimization, dynamic programming, randomized algorithms and approximate algorithms.
  • CAP 5320 – Data Munging: Exploratory data analysis in the context of knowledge discovery, including the use of data visualization software. Inference, prediction and causal relationships. Multivariate models and independence. Resampling methods and nonparametric statistics with a focus on application to real data.

Year One: January Interterm

  • CAP 6303 – Graduate Independent Study Period: includes topics and concepts that are complementary to the Data Science graduate curriculum, such as recent trends or emerging concepts and technologies in Data Science. It includes one or more components in the form of workshops, seminars, or short courses on a variety of topics generally to be delivered by guest speakers, instructors, or corporate partners. This is a required component of the graduate program, and is a zero credit course.

Year One: Second Semester

  • CAP 5302 – Statistical Inference for Data Science II: Nonparametric methods and multivariate inference. Linear and nonlinear methods for dimension reduction; an introduction to Bayesian methods; graphical models and causal inference.
  • CAP 5738 – Data Visualization, Presentation, Reporting, and Reproducible Research: A project-centered introduction to the visual display of quantitative information for both knowledge discovery and the communication of results. Fundamentals of reproducible research in the context of consulting.
  • CAP 5327 – Distributed Computing for Data Science: Fundamentals concerning the design and maintenance of massively parallel data sets. Nonrelational databases and their management. Algorithms for parallel architectures and associated software tools including the MapReduce/Hadoop framework and BigTable.
  • CAP 5610 – Optimization and Machine Learning: Fundamentals of supervising and unsupervised learning with an emphasis on working with real data. An introduction to Bayesian analysis. Implementation of specific learning paradigms including regression, clustering, random forests, support vector machines, kernel methods and neural networks. Construction of hybrid classifiers.

Year Two: Third Semester

  • CAP 5323 – Practical Data Science: Analysis of data and creation of a data product for industry. Working in small groups, students analyze an industry-submitted data set from exploratory analysis, through construction and testing of hypotheses, to the construction and presentation of a data product to inform an industry-driven decision.
  • CAP 5931 – Topics in Computing for Data Science — Deep Learning: Advanced material involving computing and data science. Topics vary and may include image processing, text mining, nonrelational databases and their management, and software engineering for massively parallel structures.
  • CAP 5303 – Topics in Statistical Inference for Data Science: Time-Series and Forecasting: Advanced material involving statistical inference and massive data sets. Topics vary and may include survival analysis, time series and prediction, risk analysis, decision theory, the theory of social networks, distributed software for statistical inference and advanced topics in machine learning.

Year Two: Fourth Semester:

  • CAP 5940 – Practicum: a full semester placed and working in industry as part of a data science team, while under the weekly supervision of and submitting reports to Data Science faculty.