DATA_O 101 (3) Making Predictions with Data
Introduction to the techniques and software for handling real-world data. Topics include data cleaning, visualization, simulation, basic modelling, and prediction making. [3-1-0]
Introduction to the techniques and software for handling real-world data. Topics include data cleaning, visualization, simulation, basic modelling, and prediction making. [3-1-0]
Techniques for computation, analysis, and visualization of data using software. Manipulation of small and large data sets. Databases. Automation using scripting. Real-world applications from life sciences, physical sciences, economics, engineering, or psychology. No prior computing background is required. Cannot be used for credits toward a major in Computer Science, Data Science, Mathematics, or Statistics. Credit will be granted for only one of COSC 301, DATA 301 or DATA 501. [3-2-0] Prerequisite: Third-year standing.
Theory and application of simple and multiple linear regression models, estimation, inference (confidence intervals, prediction intervals and hypothesis testing), polynomial regression, ANOVA and ANCOVA, variable selection, model adequacy and residual diagnostics. [3-0-0] Prerequisite: MATH 221 and one of STAT 205, STAT 230.
Regression, classification, resampling, model selection and validation, fundamental properties of matrices, dimension reduction, tree-based methods, unsupervised learning. [3-2-0] Prerequisite: Either (a) one of STAT 205, STAT 230 or (b) a score more than 75% in one of APSC 254, BIOL 202, PSYO 373; and one of COSC 111, APSC 177.
Trends, stationary and nonstationary time series models, forecasting, seasonal models. [3-0-0] Prerequisite: One of STAT 205, STAT 230.
Pseudorandom number generation and testing. Simulation and modelling of univariate and multivariate data; stochastic models, including Poisson processes and Markov chains; MCMC simulation, hidden Markov models, and queuing systems. Credit will be granted for only one of COSC 405, DATA 405, COSC 505, or DATA 505. [3-2-0] Prerequisite: One of STAT 205, STAT 230 [with 60% or above].
Planning/practice of data collection. Pros/cons of both observational and experimental data. Survey samples: random sampling; bias and variance; unequal probability sampling; systematic, multistage, and stratified sampling; ratio and regression estimators. Experimental design: simple one-way comparisons; designs with randomization restrictions including blocking, split-plots, nested and repeated measures designs. Credit will be granted for only one of DATA 407 or STAT 507. [3-1-0] Prerequisite: One of STAT 205, STAT 230, PSYO 372, BIOL 202.
Regression, linear models, generalized linear models, additive models, generalized additive models, mixed models, theory and numerical performance. Credit will be granted for only one of DATA 410 or STAT 538. [3-0-0] Prerequisite: DATA 310.
Advanced or specialized topics in data science. Consult the department for the specific topic to be offered in any given year. This course may be taken more than once for credit with different topics. [3-0-0] Prerequisite: Fourth-year standing.
Investigation of a specific topic as agreed upon by the student and the faculty supervisor. Completion of a project and an oral presentation are required. Prerequisite: Third-year standing in the Data Science major or Honours, and permission of the department head.
Students will undertake a research project as agreed upon by the student, supervising faculty member, and unit head. A written thesis and a public presentation (poster or seminar) are required. Restricted to students in the B.Sc. Data Science Honours Program. Prerequisite: Fourth-year standing and permission of the department head.
Effective consulting practices, ethical considerations, methodology selection, data preparation, effective software development. Credit will be granted for only one of DATA 500 or STAT 400 when the subject matter is of the same nature.
Techniques for computation, analysis, and visualization of data using software. Manipulation of small and large data sets. Automation using scripting. Real-world applications from life sciences, physical sciences, engineering, or psychology. Credit will be granted for only one of COSC 301, DATA 301 or DATA 501.
Simulation methodology: data collection, model design, output analysis, optimization, validation. Credit will be granted for only one of COSC 405, DATA 405, COSC 505, or DATA 505.
Introduction to software and tools for Data Science. Setup process. Restricted to students in the MDS program.
Programming including decisions, loops, functions, and using data structures and libraries. Restricted to students in the MDS program.
Data structures including lists, queues, stacks, hash tables, trees and graphs. Recursion. Searching and sorting. Asymptotic complexity. Restricted to students in the MDS program.
Software life cycle. Licensing. Packaging. Testing and quality control. Version control. Collaborative environments. Restricted to students in the MDS program. Prerequisite: DATA 532.
Parallel and cloud computing architectures and program deployment. Restricted to students in the MDS program.
Using and querying relational and NoSQL databases for analysis. Experience with SQL, JSON, and programs that use databases. Restricted to students in the MDS program. Prerequisite: DATA 531.
Scripting engines for data science. Reporting tools. Automation. Restricted to students in the MDS program.
Manipulation of data using software tools. Data conversion, filtering, sorting, grouping, cleaning, parsing. Automation. Restricted to students in the MDS program. Prerequisite: All of DATA 532, DATA 540, DATA 541.
Fundamental techniques in the collection of data. Focus will be devoted to understanding the effects of randomization, restrictions on randomization, repeated measures and blocking on the model fitting. Restricted to students in the MDS program. Prerequisite: All of DATA 540, DATA 570.
Data visualization to produce graphs and images. Advanced data analysis on spreadsheets. Restricted to students in the MDS program. Prerequisite: All of DATA 530, DATA 531.
Data visualization using business intelligence and data analysis software. Interactive visualization. Production of visualizations for mobile and web. Restricted to students in the MDS program. Prerequisite: All of DATA 534, DATA 543, DATA 550.
Interpretation of data. Argumentation: hypothesis, claim, evidence and inference. Model limitations: bias, validity, reliability, sensitive analysis. Communication of recommendations to decision-makers. Restricted to students in the MDS program.
Data privacy laws and expectations. Freedom of information. Ethics board. Licensing. Data security. Restricted to students in the MDS program.
Introduction to regression for Data Science. Simple linear regression, multiple linear regression, interactions, mixed variable types, model assessment, simple variable selection, k-nearest-neighbours regression. Restricted to students in the MDS program. Prerequisite: DATA 580.
Resampling techniques and regularization for linear models. Bootstrap, jackknife, cross-validation, ridge regression, lasso, discussion of tuning parameters. Restricted to students in the MDS program. Prerequisite: DATA 570.
Analysis of data with categorical responses. Logistic regression, k-nearest-neighbours classification, discriminant analysis, decision trees and random forests. Restricted to students in the MDS program. Prerequisite: DATA 571.
Analyses for data with unknown responses. Distance measures, hierarchical clustering, k-means, mixture models. Restricted to students in the MDS program. Prerequisite: DATA 572.
Pseudorandom number generation, testing and transformation to other discrete and continuous data types. Introduction to Poisson processes and the simulation of data from predictive models, as well as temporal and spatial models. Restricted to students in the MDS program.
Markov chains and their applications, for example, queueing and Markov Chain Monte Carlo. Restricted to students in the MDS program. Prerequisite: DATA 580.
Introduction to Bayesian paradigm and tools for Data Science. Topics include Bayes theorem, prior, likelihood and posterior. A detailed analysis of the cases of binomial, normal samples, normal linear regression models. A significant focus will be on computational aspects of Bayesian problems using software packages. Restricted to students in the MDS program. Prerequisite: All of DATA 572, DATA 581.
Splines. Smoothing. Generalized linear models. Generalized additive models. An introduction to mixed models. Restricted to students in the MDS program. Prerequisite: All of DATA 572, DATA 581.
Modelling using mathematical programming. Fundamental continuous and discrete optimization algorithms. Optimization software for small to medium scale problems. Optimization algorithms for data science. Restricted to students in the MDS program. Prerequisite: DATA 580.
Neural networks, backpropagation, deep learning. Restricted to students in the MDS program. Prerequisite: DATA 580.
Advanced or specialized topic in Data Science with applications to specific data sets. Restricted to students in the MDS program. Prerequisite: DATA 543.
A capstone design project designed to give students experience in performing data science on a complex multi-disciplinary project. Restricted to students in the MDS program. Prerequisite: All of DATA 583, DATA 586.