Tags
Language
Tags
December 2024
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 1 2 3 4

Using Data Science To Lay Odds On The Middletown Bike Rally

Posted By: ELK1nG
Using Data Science To Lay Odds On The Middletown Bike Rally

Using Data Science To Lay Odds On The Middletown Bike Rally
Published 12/2023
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 254.56 MB | Duration: 1h 7m

Grit, genes or privilege - which matters most?

What you'll learn

How correlation between variables is measured and interpreted..

How simple and multiple regression equations are estimated and interpreted

How and why multiple trials of a random variable are normally distributed

How probability distributions for random variables are constructed from regression equations

How probability distributions are used to predict outcomes for random variables

Requirements

First year algebra. No prior knowledge of probability, statistics or data science is required. Familiarity with Micrsoft Excel may be helpful, but is not required.

Description

The 10th grade math class at Middletown High School is using data science tools to explain the results of last summer’s Middletown bike rally. To construct a dataset, they use the miles covered by each of the 30 riders in the rally as the dependent variable. The independent explanatory variables they choose are motivation (“grit”), aptitude (“genes”) and the amounts riders’ parents spend on their kids equipment and training for the rally (“privilege”). Based on answers to a questionnaire, each rider is given a grit score, a gene score and a privilege score.The class uses several data science tools to analyse the database to determine how much of the variation in rider performance is explained by grit, genes and privilege, and to predict rider performance in next summer’s rally.They begin by looking at the correlations between rider performance and the explanatory variables. They learn how correlation is calculated, and how to interpret strong, weak, positive and negative correlation.The class then  performs simple regressions on rider performance using each of the the explanatory variables in turn. Each regression produces an equation whose coefficient and constant describe the relationship between rider performance and grit, genes or privilege.The class then looks at how the R-squared value reported in each regression is calculated. R-squared measures the percentage of variation in the dependent variable that is explained by variation in the explanatory variable.To understand the combined explanatory effect of grit, genes and privilege on rider performance, the class proceeds to use multiple regression on the dataset. Multiple regression estimates coefficients and a constant for a single equation that includes all three explanatory variables.Having estimated the equation that best explains rider performance last summer, the class then learns how the regression equation can be used to predict rider performance in next summer’s rally.The starting point here is to understand that rider performance next summer can be seen as a random variable, because it is the sum of random variables, each represented by one of the terms of the regression equation.The class then looks at frequency distributions that result after multiple trials of a random variable that is the sum of random variables. They see that as the number of trials increases, the distribution takes on the bell shape of the so-called normal distribution.Moving to the next step, the class considers how a frequency distribution can also be thought of as a probability distribution. The class learns how to build a normal probabilitly distribution for a random variable by using the mean or expected value of the variable together with the variable's standard error, which measures how widely multiple trials of the variable are spread around the mean value.   The class is now ready to use the multiple regression equation to build the probability distribution for a rider's performance next summer. For any given rider, the equation calculates the expected number of miles he will cover based on his scores. The regression also calculates the standard error of the estimate.In the final stage of the analysis, the class uses probability distributions to calculate the odds of various outcomes in next summer's rally – for example, the odds that Gina will ride more than 35 miles, or the odds that Gina will ride further than her brother Joey.

Overview

Section 1: Introduction

Lecture 1 Introduction

Section 2: The Middletown Bike Rally

Lecture 2 Day one: the Middletown Bike Rally

Lecture 3 Day two: Introducing the rally dataset

Section 3: Explaining rider performance

Lecture 4 Day 3: Correlation

Lecture 5 Day 4: Simple regression analysis

Lecture 6 Day 5: Multiple regression analysis

Section 4: Predicting rider performance

Lecture 7 Day 6: Random variables and the normal distribution

Lecture 8 Day 7: Probablity and Prediction

Lecture 9 Day 8: What are the odds?

Lecture 10 Day 9: Beating the odds

9th and 10th grade math students preparing to take AP statistics and data science in 11th or 12 grades