# R

##### University of Chicago Courses

## CMSC 11900: Introduction to Data Science II *Intermediate*

Background on probability and statistical methodology. Training in RStudio will be provided. Advanced topics include data privacy and ethics, reproducibility in science, data encryption, and basic machine learning. Exploration of these topics according to real-world problems. No prerequisites.

## CMSC 25300: Mathematical Foundations of Machine Learning *Intermediate*

Introduction to mathematical foundations of machine learning, focusing on matrix methods. Mathematical topics include linear equations, regression, regularization, the singular value decomposition, and iterative algorithms. Machine learning topics include the lasso, support vector machines, kernel methods, clustering, dictionary learning, neural networks, and deep learning. Features real-world applications ranging from classification and clustering to denoising and data analysis. Background in calculus and exposure to numerical computing via Matlab, Python, Julia, R is preferred. Prerequisites: 2 quarters of calculus (MATH 153 or higher) or equivalent, and CMSC 11900 or CMSC 12200 or CMSC 15200 or CMSC 16200..

## CMSC 25025: Machine Learning and Large-Scale Data *Advanced*

Introduction to machine learning and the analysis of large data sets using distributed computation and storage infrastructure. Lectures present basic machine learning methodology and relevant statistical theory. Homework exercises will give hands-on experience with different types of data. Methods include algorithms for clustering, binary classification, and hierarchical Bayesian modeling. Data types include images, archives of scientific articles, online ad clickthrough logs, and public records of the City of Chicago. Programming will be based on Python and R, but previous exposure to these languages is not assumed. Prerequisites: CMSC 15400 or CMSC 12200, and STAT 2200 or STAT 23400.

##### Online Courses

## Statistics and R with Harvard *Beginner*

Introductory course to statistics and R programming for analyzing data in the life sciences. Topics covered: random variables, distributions, inference: p-values and confidence intervals, exploratory data analysis, non-parametric statistics. 4 weeks with 2-4 hours per week to complete. Self-paced.

## R for Data Science with Microsoft *Beginner*

R is the language of data science and statistics. This is an introductory course that will help you master the basics of R including vectors, matrices, factors, Data Frames, lists, and basic data visualization. No prior knowledge in programming or data science is required. 4 weeks with 2-3 hours per week to complete. Self-paced.

## Software Carpentry: R for Reproducible Scientific Analysis *Beginner*

Teaches programmers how to write modular code and R data analysis focusing on the fundamentals of the language.

## USGS Introduction to R Curriculum *Beginner*

Program that covers the basic syntax of R and how to incorporate it. Installation, functions, packages, and data analysis are all covered.

## Data Science for Ecologists and Environmental Course with Coding Club Scientists *Beginner, Intermediate*

Free and self-paced journey through a tailored selection of Coding Club tutorials, quizzes and practical challenges.

## Software Carpentry: Programming with R *Intermediate*

Data Analysis using basic programming principles.

## Linear Models and Matrix Algebra with Harvard *Intermediate*

This course provides an introduction to using R to apply linear models to analyze data for the life sciences. Topics include matrix algebra notation, matrix algebra operations, application of matrix algebra to data analysis, linear models, introduction to the QR decomposition

Basic statistics knowledge will help for this course. 4 weeks with 2-4 hours per week to complete. Self-paced.

## The Analytics Edge with MIT *Intermediate*

This MIT course teaches analytics methods including linear regression, logistic regression, trees, text analytics, clustering, visualization, and optimization. R will be used to build models and work with the data. Real-world examples will be detailed, including Netflix, Moneyball, IBM, Twitter, and more. Course consists of lecture videos, homework assignments using R, recitations, and a final exam.

## Earth Analytics Course: Learn Data Science *Advanced*

An upper level online course that teaches students to use computationally intensive techniques to address scientific questions. This course will use the R scientific programming environment and the RStudio graphical interface to work with data.

##### Video Lectures

## (CDA) Intro to R Beginner *Beginner*

Introductory course that covers basic R syntax, input and output, and basic statistical analysis.

## Geostats Guy Youtube Lectures *Beginner, Intermediate*

Covers content from introduction material on data analytics and geostatistics to spatial data analytics to machine learning. He also covers various programming languages like R and Python while also giving lectures on how to apply them using real-world data.

##### Tutorials

## Coding Club Tutorials *Beginner*

Tutorials on topics such as basics of R, Python, data manipulation and visualisation, modelling, spatial data, Google Earth Engine, and Fortran.

## R Tutorial at CU Boulder * Intermediate*

R is an open-source programming language that can help you speed up and automate tedious tasks like downloading large datasets, visualizing data or performing repetitive calculations that you might otherwise have to do manually. Explore how the R programming language can be used to work with earth data science free tutorials

## Karl Broman Tutorials *Intermediate*

These tutorials cover basics of R and data management with spreadsheets and git/github. He also provides links to some useful textbooks and manuals.

## EDI Github *Intermediate, Advanced*

A collection of repositories for training in working with data sets, including topics like data set construction, data management, data package design, data portal-related software, and more.

##### Workshops

## Workshops at CU Boulder *Intermediate*

Explore a variety of workshops that cover a wide range of topics from finding and managing data to spatial data analysis. Learn how to perform a specific workflow using a specific tool that is commonly used in the earth data science field.

## EDI Workshops *Intermediate*

Notes on various EDI workshops ranging from data publication training to hackathons.

##### Books

## Introduction to Open Data Science *Beginner*

a free textbook created by the Ocean Health Index (OHI) to train people to use R and Github to improve their Data Science analysis. OHI also has their own set of extra literature resources

## Advanced R *Intermediate*

This is the website for the 2nd edition of “Advanced R”, a book in Chapman & Hall’s R Series. The book is designed primarily for R users who want to improve their programming skills and understanding of the language. It should also be useful for programmers coming to R from other languages, as it helps you to understand why R works the way it does.

##### Newsletters

## R-bloggers *Intermediate, Advanced*

A blog aggregator of content contributed by bloggers who write about R (in English). The site helps R bloggers and users to connect and follow the “R blogosphere”.