# GC Computer Science

Train your mind, create your career.

Apply

## CSC 303 Fundamentals of Data Computing

### Course Description

This course focuses on data analysis in settings where the data is so large,
dispersed or messy that machine-processing is required to gather, clean and
transform it into forms suitable for analysis. We also study computer-based
techniques for the analysis of such data, including machine data-visualization
and machine-learning. Finally we consider how the practice of reproducible research
and the development of interactive web-based applications can enhance
communication of the results of data analysis.

### Prequisites

MAT111 or CSC115 or PSY211 or permission of the instructor.

### When Offered

Fall semesters.

### More Info

This course continues the Data Analysis thread of the minor. The language of instruction
is R, which you studied in CSC 115, but since the course may be taken by students who
have not has CSC 115 we'll start from scratch with the elements of R that we need
for this course. You can think of this as a gentle review.

What will be new to you are the special contributed R-packages, such as `dplyr`

that facilitate manipulation of data sets, and `ggplot2`

, an elegant system of computer
graphics for producing graphical summaries of data.

You'll also become acquainted with the new field known as *machine learning*,
in which we use the computer to build models that make predictions—sometimes astonishly
accurate ones—in practical situations. The models we study, including classification
an regression trees, as we as random forests, bring us to the door-step of the
influential new discipline of *data science*.

Since Data Analysis encompasses the effective communication of what one has learned from data,
we take a closer look at R Markdown—first studied
in CSC 115—as a tool for writing data analysis reports quickly and easily. We also learn
to write simple Shiny applications that permit non-technical
users to do simple data analysis interactively over the web.

Students majoring in Biology or one of the social sciences may find that this
course prepares them well for projects in their major that require statistics and analysis of
data.

## Random Programming Quote

Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.

— *Jamie Zawinski*