Prerequisites

R is installed
Working knowledge of basic algebra
- you know what PEMDAS stands for
- you know what substitution is
  - i.e. solve for y when y = 3x - 3, and x = 1
Working knowledge of basic statistics
- mean, median, mode
- you kind of remember what a t-test is and what a p-value is
Working knowledge of basic computer science principles
- logical values / statements (i.e. true, false, true and false, etc.)
- iteration

If you feel iffy about any of that, a quick google search will help, or just send it and continue.

Outline

This outline will help me guide what I will write and cover. This is a list of the things that I think are important to know as a data scientist.

the basics
- r as a calculator
- data types
- functions
- selecting object indexes
- tidyverse
- tibble
- selecting
- filter
- grouping
- summarising
- plotting
- reshaping
- presenting (Rmd)
data visualization
- univariate
  - histogram
  - density plot (pmf / cmf)
  - violin plot
  - boxplot
- multivariate
  - scatter plot (2 numeric)
  - bar plot (1 nominal 1 numeric)
  - boxplot (1 nominal 1 numeric)
  - violin plot (1 nominal 1 numeric)
map making (GIS)
- projections
- coordinate reference systems
- vector data
  - shapefiles
  - sf package
  - oh looks, its a dataframe
  - making the map
- density maps
  - point density
  - kernel density
statistical modeling (machine learning)
- supervised learning
  - Linear Regression (continuous prediction)
    - evaluation: R^2, MSE, RMSE, MAE
  - Logistic Regression (classification)
    - evaluation: confusion matrix, auc, roc
data preprocessing
- cross validation (k-fold)
  - train, test, validate
- scaling & centering (standardization)
- normalization
- imputation
- dimensionality reduction
- pca / lda
- backward feature selection
supervised continued
- k-nearest neighbors
- Support Vector Machines (classification)
- Naive bayes (classification)
- random forest (regression / classification)
- gradient boosted trees (regression / classification)
Unsupervised learning
- hierarchical clustering (classification)
- k-means
- neural networks
natual language processing / text mining