Train your mind, create your career.
This course focuses on data analysis in settings where the data is so large, dispersed or messy that machine-processing is required to gather, clean and transform it into forms suitable for analysis. We also study computer-based techniques for the analysis of such data, including machine data-visualization and machine-learning. Finally we consider how the practice of reproducible research and the development of interactive web-based applications can enhance communication of the results of data analysis.
MAT111 or CSC115 or PSY211 or permission of the instructor.
This course continues the Data Analysis thread of the minor. The language of instruction is R, which you studied in CSC 115, but since the course may be taken by students who have not has CSC 115 we'll start from scratch with the elements of R that we need for this course. You can think of this as a gentle review.
What will be new to you are the special contributed R-packages, such as
that facilitate manipulation of data sets, and
ggplot2, an elegant system of computer
graphics for producing graphical summaries of data.
You'll also become acquainted with the new field known as machine learning, in which we use the computer to build models that make predictions—sometimes astonishly accurate ones—in practical situations. The models we study, including classification an regression trees, as we as random forests, bring us to the door-step of the influential new discipline of data science.
Since Data Analysis encompasses the effective communication of what one has learned from data, we take a closer look at R Markdown—first studied in CSC 115—as a tool for writing data analysis reports quickly and easily. We also learn to write simple Shiny applications that permit non-technical users to do simple data analysis interactively over the web.
Students majoring in Biology or one of the social sciences may find that this course prepares them well for projects in their major that require statistics and analysis of data.
Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.