top of page
Exam

Statistics for Historians

A practical introduction to statistical reasoning designed specifically for students in the humanities. The course teaches how to interpret data, evaluate historical claims, and understand quantitative methods without assuming advanced mathematical background.

play-button.png

Explore economic history through statistical methods: analyze long-lasting effects of geography on African development (Nunn & Puga), test Protestant ethic hypotheses using county-level data (Becker & Vesman), and examine persistent impacts of forced labor in Peru's mines (Dell). Learn about various dataset structures—cross-sectional, time series, panel, and repeated cross-sectional—in statistics for historians. Next, delve into variable types.

play-button.png

The tutor explores variable types in datasets: ordinal (rankable but no units, like Likert scale responses), interval (equal spacing with units, e.g., years of education), and categorical (no intrinsic ordering, e.g., ethnicity). Avoids misinterpreting arbitrary numeric codes for these variables. Subscribe to @AxiomTutoring.

play-button.png

This video introduces descriptive statistics, an essential first step when analyzing a new dataset. You'll learn how to get a foundational understanding of your data's variables, distributions, and potential patterns. The tutorial focuses on histograms, explaining how to interpret them, the significance of bin size, and their crucial role in detecting outliers that can skew analysis. It also differentiates histograms from bar charts, demonstrating when to use each for various data types, such as categorical or ordinal variables. The video begins by demonstrating how histograms provide insights into data concentration and distribution tails, using the example of Russian household sizes. It then illustrates the power of histograms in identifying extreme outliers, referencing the historical dataset of slave sale prices in Louisiana and explaining why such outliers necessitate data cleaning before further analysis. Finally, the video transitions to bar charts, showing their application for categorical variables like prisoner literacy levels, and clearly distinguishing their function from histograms by highlighting how bar charts represent each distinct value individually. Subscribe to @AxiomTutoringCourses for more tutorials.

play-button.png

In this video, we explore the fundamental measures of central tendency: mean, median, and mode. The sample mean is introduced as the average value of a variable within a dataset, with a detailed explanation of its formula and summation notation. We then delve into the median, defining it as the middle value of an ordered dataset, and illustrate its calculation for both odd and even sample sizes. Percentiles and quartiles are presented as extensions of the median concept, with the 50th percentile being the median itself. Finally, the mode is explained as the most frequently occurring value in a dataset, with considerations for grouped and continuous data, and the possibility of multiple modes. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

In this video, we explore which types of variables are suitable for calculating the mean. We begin with a quick recap of how to calculate a sample mean using a simple data set. Then, we define and differentiate between interval, ordinal, and categorical variables, providing clear examples for each. We examine why the mean cannot be calculated for categorical and ordinal variables, highlighting the arbitrary nature of their numerical coding. Finally, we confirm that only interval variables are appropriate for mean calculations. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video explains how to calculate the median for different types of variables. We start with a recap of what the median is for numerical data and then explore its applicability to categorical, ordinal, and interval variables. You'll learn why the median is not suitable for categorical data but is a valid measure for ordinal and interval variables, with clear examples for each. The video emphasizes the importance of ordering your data before finding the median. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video explains which types of variables are compatible with calculating the mode. We begin by revisiting the definition of the mode as the most frequent value in a dataset. The video then explores whether the mode can be used with categorical, ordinal, and interval variables, providing clear examples for each. You'll learn that the mode is applicable to all three variable types. Subscribe to @AxiomTutoringCourses for more helpful tutorials.

play-button.png

In this video, we explore the distinctions between the mean, median, and mode and when to use each measure. We start with a data set representing household sizes to illustrate how these three central tendency measures yield different results and convey unique insights. Discover how the mode highlights the most common household size, the median indicates the midpoint of the population, and the mean provides an average that can be influenced by extreme values. We then examine scenarios where the mean can be misleading, such as when dealing with data entry errors or genuine outliers like extremely high incomes. Learn why the median is often preferred for skewed distributions, offering a more robust representation of typical values. Understand the crucial first step in data analysis: identifying whether an outlier is a mistake to be removed or a genuine observation to be considered when choosing your measure of central tendency. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video delves into measures of dispersion, exploring how to quantify the spread of data around a central value. We begin by comparing two distributions with the same mean but vastly different spreads, illustrating the need for dispersion measures. The discussion covers the range and its limitations due to outliers, followed by the interquartile range as a more robust alternative. We then introduce the variance, explaining its calculation and purpose in measuring spread around the mean. Finally, the standard deviation is presented as the square root of the variance, offering a more interpretable measure of dispersion with the same units as the original data. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video introduces the fundamental statistical concept of correlation, exploring how researchers investigate the co-variance of two variables. We examine real-world historical research questions, such as the link between Protestant ethics and economic growth, or unemployment benefits and unemployment rates, and how statistical methods can help answer them. The video demonstrates the utility of scatter plots as a visual tool to identify trends and patterns between variables, using historical census data to illustrate both positive and negative correlations, as well as the absence of correlation. It also highlights how scatter plots can reveal non-linear relationships and potential issues with data superposition. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video explains how to calculate and interpret covariance and Pearson's correlation coefficients. Learn the mathematical formulas for both measures and see them applied to concrete examples with real data. We'll explore how positive and negative covariance indicate the direction of the relationship between variables and discuss the limitations of covariance, leading to the introduction of Pearson's correlation as a normalized and more interpretable measure. Finally, we'll examine various scatter plots to understand how correlation values reflect different types of linear relationships and highlight the importance of visualizing data alongside statistical measures. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video explains Spearman's correlation coefficient, a statistical measure used to assess the strength and direction of a monotonic relationship between two variables. It details how Spearman's correlation differs from Pearson's correlation, particularly when relationships are non-linear or when dealing with ordered variables and outliers. The explanation includes a step-by-step guide on calculating Spearman's correlation by ranking the data and then applying the Pearson correlation formula to these ranks. The video further illustrates the interpretation of Spearman's correlation coefficients, highlighting their range from -1 to 1 and what values indicate perfect, strong, or no monotonic relationships. It also discusses scenarios where Spearman's correlation is advantageous, such as with monotonic but non-linear data or when outliers are present. Subscribe to @AxiomTutoringCourses

play-button.png

In this video, we demystify the crucial statistical difference between correlation and causation. Discover why observing a pattern between two variables doesn't automatically mean one causes the other, and how easily this misconception can arise. We explore humorous examples of spurious correlations and delve into real-world scenarios, like a study on children's myopia, to illustrate how a third, hidden variable can influence apparent relationships. Learn to be skeptical and always question whether a correlation truly implies causation, or if something else is at play. Subscribe to @AxiomTutoringCourses for more essential statistics insights.

play-button.png

This video explains the concept of standardization, a crucial process for transforming any random variable into one with a mean of 0 and a variance of 1. It details the two key steps: first, subtracting the population mean to center the distribution, and then dividing by the standard deviation. Learn how these transformations precisely affect the mean and variance, with a clear proof for why the variance becomes one. Understand how to standardize any random variable to achieve a distribution with a mean of 0 and a variance of 1. Subscribe to @AxiomTutoringCourses for more insightful videos.

play-button.png

Discover the power of the Central Limit Theorem, a cornerstone of statistics. This theorem reveals that the distribution of sample means will approximate a normal distribution as the sample size increases, regardless of the original data's distribution. Learn the broad assumptions required for this theorem and explore practical examples, like rolling a die, to visualize how the sample mean's distribution transforms with larger sample sizes. We delve into the nuances, showing how a uniform distribution can lead to an approximately normal distribution of sample means. Subscribe to @AxiomTutoringCourses.

play-button.png

This video dives into the essential concept of hypothesis testing in statistics, explaining its fundamental role in analyzing data. We explore what a statistical hypothesis is, using a compelling example about the average age of slaves in 1800 to illustrate how sample data is used to infer characteristics of a larger population. The process of formulating null and alternative hypotheses, calculating test statistics, and determining p-values is clearly laid out. Finally, the video discusses the significance level and how it guides the decision to either reject or fail to reject the null hypothesis, emphasizing the crucial distinction between these two outcomes. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

In hypothesis testing, understanding the distribution of sample means is crucial, especially when the population standard deviation is unknown. This video explains how the Central Limit Theorem allows us to approximate the distribution of sample means with a normal distribution. We then delve into the process of standardizing these variables and introduce the concept of the t-statistic when the population variance must be estimated using the sample variance. This leads to the t-distribution, which differs from the standard normal distribution, particularly in small sample sizes, and depends on degrees of freedom. Subscribe to @AxiomTutoringCourses for more helpful tutorials.

play-button.png

In this video, we explore the concept of the two-sample t-test, a statistical method used to determine if the means of two independent groups are significantly different. We delve into how to formulate null and alternative hypotheses, such as comparing the average age of male and female slaves. The discussion covers the theoretical underpinnings, including the central limit theorem and variance calculations, to understand the distribution of the difference between sample means. We then demonstrate how to standardize this difference and introduce the t-statistic, which is derived from estimated population parameters and follows a t-distribution with specific degrees of freedom. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

In statistical hypothesis testing, understanding Type I and Type II errors is crucial for accurate conclusions. This video breaks down the concepts of Type I errors, where the null hypothesis is incorrectly rejected, and Type II errors, where the null hypothesis is incorrectly failed to be rejected. We will explore what influences the probability of making each type of error. Learn how significance level, sample size, effect size, and data variability impact your testing results. Subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video introduces simple regression as a way to understand the relationship between two variables, going beyond the correlation coefficient. While correlation shows association, it doesn't quantify how a unit change in one variable affects another. Simple regression assumes a direction of causality, designating one variable as dependent and the other as independent. The video explains the mathematical model for a linear relationship, including the crucial error term that accounts for unexplained variance. An example of height and weight is used to illustrate how this model is applied. Visit AxiomTutoring.com and subscribe to @AxiomTutoringCourses.

play-button.png

In this video, we explore how to estimate the parent alpha and beta parameters in a regression model, specifically linking weight as the dependent variable to height as the independent variable. We delve into the concept of a fitted line, represented as weight hat equals a plus b times height, where 'a' and 'b' are our estimates for the true population parameters. The core of the discussion focuses on finding the best fitted line for our data, explaining how minimizing the sum of absolute deviations, or more commonly, the sum of squared deviations, leads to ordinary least squares. We differentiate between residuals and error terms, a crucial distinction in regression analysis. Visit AxiomTutoring.com and subscribe to @AxiomTutoringCourses.

play-button.png

This video dives into the assumptions required for Ordinary Least Squares (OLS) estimates to be the best linear unbiased estimators in simple linear regression. We explore the linearity in parameters, the expected value of the error term being zero, and constant variance. The crucial assumption of the error term being independent of the independent variable is highlighted as foundational for accurate estimations. Learn how to calculate the OLS estimates for the intercept (alpha) and slope (beta) and understand their interpretations. Visit AxiomTutoring.com and subscribe to @AxiomTutoringCourses.

play-button.png

Learn how to measure the effectiveness of your regression models with R-squared. This video explains the coefficient of determination, its formula, and its range from 0 to 1. Discover what a high or low R-squared value signifies about your model's predictive power. We'll also explore the relationship between R-squared and correlation in simple regression models and examine extreme cases of zero and perfect correlation. Visit AxiomTutoring.com and subscribe to @AxiomTutoringCourses for more educational content.

play-button.png

This video explains how to test hypotheses in simple linear regression, focusing on the parameter beta. We start with a regression model and explore why testing beta's value is crucial for understanding the relationship between variables. The process involves setting up null and alternative hypotheses, calculating a t-statistic using sample estimates and standard errors, and determining the p-value from the t-distribution. Ultimately, we learn how to interpret the p-value relative to significance levels to decide whether to reject the null hypothesis. Visit AxiomTutoring.com and subscribe to @AxiomTutoringCourses.

play-button.png

This video introduces multiple linear regression, expanding on simple linear regression to account for the impact of multiple independent variables on a dependent variable. We learn how to mathematically represent this model, incorporating additional explanatory variables and understanding the role of the error term. An illustrative example using prisoner height, age, and weight demonstrates the concept, visualizing how a plane is fitted through three-dimensional data to represent the relationship. Visit AxiomTutoring.com and subscribe to @AxiomTutoringCourses.

bottom of page