R for Social Science
Josiah Parry
2018-12-29
Prerequisites
- R is installed
- Working knowledge of basic algebra
- you know what PEMDAS stands for
- you know what substitution is
- i.e. solve for
y
wheny = 3x - 3
, andx = 1
- i.e. solve for
- Working knowledge of basic statistics
- mean, median, mode
- you kind of remember what a t-test is and what a p-value is
- Working knowledge of basic computer science principles
- logical values / statements (i.e.
true
,false
,true and false
, etc.) - iteration
- logical values / statements (i.e.
If you feel iffy about any of that, a quick google search will help, or just send it and continue.
Outline
This outline will help me guide what I will write and cover. This is a list of the things that I think are important to know as a data scientist.
- the basics
- r as a calculator
- data types
- functions
- selecting object indexes
- tidyverse
- tibble
- selecting
- filter
- grouping
- summarising
- plotting
- reshaping
- presenting (Rmd)
- data visualization
- univariate
- histogram
- density plot (pmf / cmf)
- violin plot
- boxplot
- multivariate
- scatter plot (2 numeric)
- bar plot (1 nominal 1 numeric)
- boxplot (1 nominal 1 numeric)
- violin plot (1 nominal 1 numeric)
- univariate
- map making (GIS)
- projections
- coordinate reference systems
- vector data
- shapefiles
- sf package
- oh looks, its a dataframe
- making the map
- density maps
- point density
- kernel density
- statistical modeling (machine learning)
- supervised learning
- Linear Regression (continuous prediction)
- evaluation: R^2, MSE, RMSE, MAE
- Logistic Regression (classification)
- evaluation: confusion matrix, auc, roc
- Linear Regression (continuous prediction)
- supervised learning
- data preprocessing
- cross validation (k-fold)
- train, test, validate
- scaling & centering (standardization)
- normalization
- imputation
- dimensionality reduction
- pca / lda
- backward feature selection
- cross validation (k-fold)
- supervised continued
- k-nearest neighbors
- Support Vector Machines (classification)
- Naive bayes (classification)
- random forest (regression / classification)
- gradient boosted trees (regression / classification)
- Unsupervised learning
- hierarchical clustering (classification)
- k-means
- neural networks
- natual language processing / text mining