Regression and Other Stories - Examples
Introduction
The code and data are provided to fully reproduce the examples and figures in the book. They can be a good way to see what the code does. Different people have different styles of code. The code here is not supposed to be a model. The statistical analyses and graphs in the book are intended to be models for good practice, but the code here is meant to be simple with minimal dependencies.
For R programming basics see Appendix A of Regression and Other Stories. If you want to learn more, see our recommendations for R programming and visualization with R.
The folders below (ending /) point to the code (.R and .Rmd) and
datafolders (.csv or .txt + codebooks) in github, and .html -files point to knitted notebooks.Most examples have cleaned data in .csv file in
datasubfolder for easy experimenting. For completeness and reproducibility, the data subfolders have also the raw data and*_setup.Rfile showing how the data pre-processing has been done (to do the exercises and follow along with the examples, you don’t need to worry about the setup code). Mostdatafolders hve also some codebook explaining the column names.For easy access to data sets, there is an R package
rosdata. You can install it with a commandremotes::install_github("avehtari/ROS-Examples",subdir = "rpackage"). Then you can access data, for example, aslibrary(rosdata),data(wells),head(wells). You can get the list of data sets with?rosdata.When running the notebooks, to avoid need to switch the working directory,
rprojrootpackage is used to set the project root directory. The downloaded git repository can be placed anywhere you like and you can rename the ROS-Examples directory if you wish. When running the code, it is sufficient that the working directory is any directory in the ROS-Examples (or renamed). Running
library("rprojroot")
root<-has_file(".ROS-Examples-root")$make_fix_file()
will find the file .ROS-Examples-root which is in the ROS-Examples directory, and will set the full path according to that. Then, for example,
wells <- read.csv(root("Arsenic/data","wells.csv"))
finds the wells.csv file, no matter where you have placed or renamed the ROS-Examples directory. When you switch to another example, there is no need to switch the working directory.
Examples by chapters
1 Introduction
- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- ElectricCompany/
- electric.html - Analysis of “Electric company” data
- Peacekeeping/
- peace.html - Outcomes after civil war in countries with and without United Nations peacekeeping
- SimpleCausal/
- causal.html - Simple graphs illustrating regression for causal inference
- Helicopters/
- helicopters.html - Example data file for helicopter flying time exercise
2 Data and measurement
- HDI/
- hdi.html - Human Development Index - Looking at data in different ways
- Pew/
- pew.html - Miscellaneous analyses using raw Pew data
- HealthExpenditure/
- healthexpenditure.html - Discovery through graphs of data and models
- Names/
- names.html - Names - Distributions of names of American babies
- lastletters.html - Last letters - Distributions of last letters of names of American babies
- AgePeriodCohort/
- births.html - Age adjustment
- Congress/
- congress_plots.html - Predictive uncertainty for congressional elections
3 Some basic methods in mathematics and probability
- Mile/
- mile.html - Trend of record times in the mile run
- Metabolic/
- metabolic.html - How to interpret a power law or log-log regression
- Earnings/
- height_and_weight.html - Predict weight
- CentralLimitTheorem/
- heightweight.html - Illustrate central limit theorem and normal distribution
- Stents/
- stents.html - Stents - comparing distributions
4 Generative models and statistical inference
- Coverage/
- coverage.html - Example of coverage
- Death/
- polls.html - Proportion of American adults supporting the death penalty
- Coop/
- riverbay.html - Example of hypothesis testing
- Girls/
5 Simulation
- ProbabilitySimulation/
- probsim.html - Simulation of probability models
- Earnings/
- earnings_bootstrap.html - Bootstrapping to simulate the sampling distribution
6 Background on regression modeling
- Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor
- Earnings/
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.
- PearsonLee/
- heights.html - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.
- FakeMidtermFinal/
- simulation.html - Fake dataset of 1000 students’ scores on a midterm and final exam
7 Linear regression with a single predictor
- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- hills.html - Present uncertainty in parameter estimates
- hibbs_coverage.html - Checking the coverage of intervals
- Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor
8 Fitting regression models
- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- Influence/
- influence.html - Influence of individual points in a fitted regression
9 Prediction and Bayesian inference
- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- bayes.html - Demonstration of Bayesian information aggregation
- SexRatio/
- sexratio.html - Example where an informative prior makes a difference
- Earnings/
- height_and_weight.html - Predict weight
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.
10 Linear regression with multiple predictors
- KidIQ/
- kidiq.html - Linear regression with multiple predictors
- Earnings/
- height_and_weight.html - Predict weight
- Congress/
- congress.html - Predictive uncertainty for congressional elections
- NES/
- nes_linear.html - Fitting the same regression to many datasets
- Beauty/
- beauty.html - Student evaluations of instructors’ beauty and teaching quality
11 Assumptions, diagnostics, and model evaluation
- KidIQ/
- kidiq.html - Linear regression with multiple predictors
- kidiq_loo.html - Linear regression and leave-one-out cross-validation
- kidiq_R2.html - Linear regression and Bayes-R2 and LOO-R2
- kidiq_kcv.html - Linear regression and K-fold cross-validation
- Residuals/
- residuals.html - Plotting the data and fitted model
- Introclass/
- residual_plots.html - Plot residuals vs. predicted values, or residuals vs. observed values?
- Newcomb/
- newcomb.html - Posterior predictive checking of Normal model for Newcomb’s speed of light data
- Unemployment/
- unemployment.html - Time series fit and posterior predictive model checking for unemployment series
- Rsquared/
- rsquared.html - Bayesian R^2
- CrossValidation/
- crossvalidation.html - Demonstration of cross validation
- FakeKCV/
- fake_kcv.html - Demonstration of \(K\)-fold cross-validation using simulated data
- Pyth/
12 Transformations
- KidIQ/
- kidiq.html - Linear regression with multiple predictors
- Earnings/
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.
- Gay/
- gay_simple.html - Simple models (linear and discretized age) and political attitudes as a function of age
- Mesquite/
- mesquite.html - Predicting the yields of mesquite bushes
- Student/
- student.html - Models for regression coefficients
- Pollution/
- pollution.html - Pollution data.
13 Logistic regression
- NES/
- nes_logistic.html - Logistic regression, identifiability, and separation
- LogisticPriors/
- logistic_priors.html - Effect of priors in logistic regression
- Arsenic/
- arsenic_logistic_building.html - Building a logistic regression model: wells in Bangladesh
14 Working with logistic regression
- LogitGraphs/
- logitgraphs.html - Different ways of displaying logistic regression
- NES/
- nes_logistic.html - Logistic regression, identifiability, and separation
- Rodents/
- Arsenic/
- arsenic_logistic_residuals.html - Residual plots for a logistic regression model: wells in Bangladesh
- arsenic_logistic_apc.html - Average predictice comparisons for a logistic regression model: wells in Bangladesh
15 Other generalized linear models
- PoissonExample/
- PoissonExample.html - Demonstrate Poisson regression with simulated data.
- Roaches/
- roaches.html - Analyse the effect of integrated pest management on reducing cockroach levels in urban apartments
- Storable/
- storable.html - Ordered categorical data analysis with a study from experimental economics, on the topic of ``storable votes.’’
- Robit/
- robit.html - Comparison of robit and logit models for binary data
- Earnings/
- earnings_compound.html - Compound discrete-continuos model
- RiskyBehavior/
- risky.html Risky behavior data.
- NES/
- Lalonde/
- Congress/
- AcademyAwards/
16 Design and sample size decisions
- ElectricCompany/
- electric.html - Analysis of “Electric company” data
- SampleSize/
- simulation.html - Sample size simulation
- FakeMidtermFinal/
- simulation_based_design.html - Fake dataset of a randomized experiment on student grades
17 Poststratification and missing-data imputation
- Poststrat/
- poststrat.html - Poststratification after estimation
- poststrat2.html - Poststratification after estimation
- Imputation/
- imputation.html - Regression-based imputation for the Social Indicators Survey
- imputation_gg.html - Regression-based imputation for the Social Indicators Survey, dplyr/ggplot version
18 Causal inference basics and randomized experiments
- Sesame/
- sesame.html - Causal analysis of Sesame Street experiment
19 Causal inference using regression on the treatment variable
- ElectricCompany/
- electric.html - Analysis of “Electric company” data
- Incentives/
- incentives.html - Simple analysis of incentives data
- Cows/
20 Observational studies with all confounders assumed to be measured
- ElectricCompany/
- electric.html - Analysis of “Electric company” data
- Childcare/
- childcare.html - Infant Health and Development Program (IHDP) example.
21 More advanced topics in causal inference
- Sesame/
- sesame.html - Causal analysis of Sesame Street experiment
- Bypass/
- ChileSchools/
- chile_schools.html - ChileSchools example.
22 Advanced regression and multilevel models
- Golf/
- golf.html - Gold putting accuracy: Fitting a nonlinear model using Stan
- Gay/
- gay.html - Nonlinear models (Loess, B-spline, GP-spline, and BART) and political attitudes as a function of age
- ElectionsEconomy/
- hibbs.html - Predicting presidential vote share from the economy
- Scalability/
- scalability.html - Demonstrate computation speed with 100 000 observations.
Appendix A
- Coins/
- Mile/
- mile.html - Trend of record times in the mile run
- Parabola/
- parabola.html - Demonstration of using Stan for optimization
- Restaurant/
- restaurant.html - Demonstration of using Stan for optimization
- DifferentSoftware/
- linear.html - Linear regression using different software options
Examples alphabetically
- AcademyAwards/
- AgePeriodCohort/
- births.html - Age adjustment
- Arsenic/
- arsenic_logistic_building.html - Building a logistic regression model: wells in Bangladesh
- arsenic_logistic_residuals.html - Residual plots for a logistic regression model: wells in Bangladesh
- arsenic_logistic_apc.html - Average predictice comparisons for a logistic regression model: wells in Bangladesh
- arsenic_logistic_building_optimizing.html - Building a logistic regression model: wells in Bangladesh. A version with normal approximation at the mode.
- Balance/
- Beauty/
- beauty.html - Student evaluations of instructors’ beauty and teaching quality
- Bypass/
- CausalDiagram/
- diagrams.html - Plot causal diagram
- CentralLimitTheorem/
- heightweight.html - Illustrate central limit theorem and normal distribution
- Childcare/
- childcare.html - Infant Health and Development Program (IHDP) example.
- ChileSchools/
- chile_schools.html - ChileSchools example.
- Coins/
- Congress/
- congress.html - Predictive uncertainty for congressional elections
- congress_plots.html - Predictive uncertainty for congressional elections
- Coop/
- riverbay.html - Example of hypothesis testing
- Coverage/
- coverage.html - Example of coverage
- Cows/
- CrossValidation/
- crossvalidation.html - Demonstration of cross validation
- SampleSize/
- simulation.html - Sample size simulation
- Death/
- polls.html - Proportion of American adults supporting the death penalty
- DifferentSoftware/
- linear.html - Linear regression using different software options
- Earnings/
- earnings_regression.html - Predict respondents’ yearly earnings using survey data from 1990.
- earnings_bootstrap.html - Bootstrapping to simulate the sampling distribution
- earnings_compound.html - Compound discrete-continuos model
- height_and_weight.html - Predict weight
- ElectionsEconomy/
- bayes.html - Demonstration of Bayesian information aggregation
- hibbs.html - Predicting presidential vote share from the economy
- hills.html - Present uncertainty in parameter estimates
- hibbs_coverage.html - Checking the model-fitting procedure using fake-data simulation.
- ElectricCompany/
- electric.html - Analysis of “Electric company” data
- FakeKCV/
- fake_kcv.html - Demonstration of \(K\)-fold cross-validation using simulated data
- FakeMidtermFinal/
- simulation.html - Fake dataset of 1000 students’ scores on a midterm and final exam
- simulation_based_design.html - Fake dataset of a randomized experiment on student grades
- FrenchElection/
- ps_primaire.html - French Election data
- Gay/
- gay_simple.html - Simple models (linear and discretized age) and political attitudes as a function of age
- gay.html - Nonlinear models (Loess, B-spline, GP-spline, and BART) and political attitudes as a function of age
- Girls/
- Golf/
- golf.html - Gold putting accuracy: Fitting a nonlinear model using Stan
- HDI/
- hdi.html - Human Development Index - Looking at data in different ways
- HealthExpenditure/
- healthexpenditure.html - Discovery through graphs of data and models
- Helicopters/
- helicopters.html - Example data file for helicopter flying time exercise
- Imputation/
- imputation.html - Regression-based imputation for the Social Indicators Survey
- imputation_gg.html - Regression-based imputation for the Social Indicators Survey, dplyr/ggplot version
- Incentives/
- incentives.html - Simple analysis of incentives data
- Influence/
- influence.html - Influence of individual points in a fitted regression
- Interactions/
- interactions.html - Plot interaction example figure
- Introclass/
- residual_plots.html - Plot residuals vs. predicted values, or residuals vs. observed values?
- KidIQ/
- kidiq.html - Linear regression with multiple predictors
- kidiq_loo.html - Linear regression and leave-one-out cross-validation
- kidiq_R2.html - Linear regression and Bayes-R2 and LOO-R2
- kidiq_kcv.html - Linear regression and K-fold cross-validation
- Lalonde/
- LogisticPriors/
- logistic_priors.html - Effect of priors in logistic regression
- Mesquite/
- mesquite.html - Predicting the yields of mesquite bushes
- Metabolic/
- metabolic.html - How to interpret a power law or log-log regression
- Mile/
- mile.html - Trend of record times in the mile run
- Names/
- names.html - Names - Distributions of names of American babies
- lastletters.html - Last letters - Distributions of last letters of names of American babies
- NES/
- nes_linear.html - Fitting the same regression to many datasets
- nes_logistic.html - Logistic regression, identifiability, and separation
- Newcomb/
- newcomb.html - Posterior predictive checking of Normal model for Newcomb’s speed of light data
- Parabola/
- parabola.html - Demonstration of using Stan for optimization
- Peacekeeping/
- peace.html - Outcomes after civil war in countries with and without United Nations peacekeeping
- PearsonLee/
- heights.html - The heredity of height. Published in 1903 by Karl Pearson and Alice Lee.
- Pew/
- pew.html - Miscellaneous analyses using raw Pew data
- PoissonExample/
- poissonexample.html - Demonstrate Poisson regression with simulated data.
- Pollution/
- pollution.html - Pollution data.
- Poststrat/
- poststrat.html - Poststratification after estimation
- poststrat2.html - Poststratification after estimation
- ProbabilitySimulation/
- probsim.html - Simulation of probability models
- Pyth/
- Redistricting/
- Residuals/
- residuals.html - Plotting the data and fitted model
- Restaurant/
- restaurant.html - Demonstration of using Stan for optimization
- RiskyBehavior/
- risky.html Risky behavior data.
- Roaches/
- roaches.html - Analyse the effect of integrated pest management on reducing cockroach levels in urban apartments
- Robit/
- robit.html - Comparison of robit and logit models for binary data
- Rodents/
- Rsquared/
- rsquared.html - Bayesian R^2
- Sesame/
- sesame.html - Causal analysis of Sesame Street experiment
- SexRatio/
- sexratio.html - Example where an informative prior makes a difference
- SimpleCausal/
- causal.html - Simple graphs illustrating regression for causal inference
- Simplest/
- simplest.html - Linear regression with a single predictor
- simplest_lm.html - Linear least squares regression with a single predictor
- Stents/
- stents.html - Stents - comparing distributions
- Storable/
- storable.html - Ordered categorical data analysis with a study from experimental economics, on the topic of ``storable votes.’’
- Student/
- student.html - Models for regression coefficients
- Unemployment/
- unemployment.html - Time series fit and posterior predictive model checking for unemployment series
Python code
Ravin Kumar, Tomás Capretto, and Osvaldo Martin are porting ROS examples to Python using bambi (BAyesian Model-Building Interface) which has similar formula syntax as rstanarm and brms.