Calculating Residuals In Regression Analysis [Manually And With Codes]

Page content

  • What is residuals?
  • How to calculate residuals?
  • Calculate residuals in Python
  • Calculate residuals in R

What is residuals?

In regression analysis, we model the linear relationship between one or more independent (X) variables with that of the dependent variable (y).

The simple linear regression model is given as,

\( y = a + bX + \epsilon \)Where, a = y-intercept, b = slope of the regression line (unbiased estimate) and \( \epsilon \) = error term (residuals)

This regression model has two parts viz. fitted regression line (a + bX) and error term (ε)

The error term (ε) in regression model is called as residuals, which is difference between the actual value of y and predicted value of y (regression line).

\( residuals = actual \ y (y_i) - predicted \ y \ (\hat{y}_i) \)

Regression plot

How to calculate residuals?

The residual is calculated by subtracting the actual value of the data point from the predicted value of that data point. The predicted value can be obtained from regression analysis.

For example, let’s take an example of the height and weight of students (source)

Height (X)Weight (y)
1.3652
1.4750
1.5467
1.5662
1.5969
1.6374
1.6659
1.6787
1.6977
1.7473
1.8167

If we perform simple linear regression on this dataset, we get fitted line with the following regression equation,

ŷ = -22.4 + (55.48 * X)

Learn more here how to perform the simple linear regression in Python

With the regression equation, we can predict the weight of any student based on their height.

For example, if the height of student is 1.36, its predicted weight is 53.08

ŷ = -22.37 + (55.48 * 1.36) = 53.08

Similarly, we can calculate the predicted weight (ŷ) of all students,

Height (X)Weight (y)Predicted weight (ŷ)
1.365253.08
1.475059.18
1.546763.07
1.566264.18
1.596965.84
1.637468.06
1.665969.72
1.678770.28
1.697771.39
1.747374.16
1.816778.04

Now, we have actual weight (y) and predicted weight (ŷ) for calculating the residuals,

Calculate residual when height is 1.36 and weight is 52,

\( residuals = actual \ y (y_i) - predicted \ y \ (\hat{y}_i) = 52-53.08 = -1.07 \)

Similarly, we can calculate the residuals of all students,

Height (X)Weight (y)Predicted weight (y_pred)Residual
1.365253.08-1.07
1.475059.18-9.18
1.546763.073.93
1.566264.18-2.18
1.596965.843.15
1.637468.065.93
1.665969.72-10.71
1.678770.2816.72
1.697771.395.61
1.747374.16-1.15
1.816778.05-11.04

The sum and mean of residuals is always equal to zero

If you plot the predicted data and residual, you should get residual plot as below,

Residual plot in python

The residual plot helps to determine the relationship between X and y variables. If residuals are randomly distributed (no pattern) around the zero line, it indicates that there linear relationship between the X and y (assumption of linearity). If there is a curved pattern, it means that there is no linear relationship and data is not appropriate for regression analysis.

In addition, residuals are used to assess the assumptions of normality and homogeneity of variance (homoscedasticity).

Calculate residuals in Python

Here are the steps involved in calculating residuals in regression analysis using Python,

For following steps, you need to install pandas, statsmodels, matplotlib, and seaborn Python packages. Check how to install Python packages

Get the dataset

import pandas as pd df = pd.read_csv("https://reneshbedre.github.io/assets/posts/reg/height.csv") # view first two rows df.head(2) Height Weight 0 1.36 52 1 1.47 50

Fit the regression model

We will fit the simple linear regression model as there is only one independent variableimport statsmodels.api as sm X = df['Height'] # independent variable y = df['Weight'] # dependent variable # to get intercept -- this is optional X = sm.add_constant(X) # fit the regression model reg = sm.OLS(y, X).fit() # to get output summary, use reg.summary()

Get the residuals

reg.resid # output 0 -1.079016 1 -9.182056 2 3.934191 3 -2.175453 4 3.160082 5 5.940795 6 -10.723671 7 16.721507 8 5.611864 9 -1.162245 10 -11.045998 dtype: float64

Create residuals plot

Create scatterplot of predicted values and residuals,from bioinfokit import visuz import seaborn as sns import matplotlib.pyplot as plt # create a DataFrame of predicted values and residuals df["predicted"] = reg.predict(X) df["residuals"] = reg.resid sns.scatterplot(data=df, x="predicted", y="residuals") plt.axhline(y=0)

Residual plot in python

Calculate residuals in R

Here are the steps involved in calculating residuals in regression analysis using R,

Get the dataset

library(tidyverse) df <- read.csv("https://reneshbedre.github.io/assets/posts/reg/height.csv") # view first two rows head(df, 2) Height Weight 1 1.36 52 2 1.47 50

Fit the regression model

reg <- lm(Weight ~ Height, data = df) # to get output summary, use summary(reg)

Get the residuals

resid(reg) 1 2 3 4 5 6 7 -1.079016 -9.182056 3.934191 -2.175453 3.160082 5.940795 -10.723671 8 9 10 11 16.721507 5.611864 -1.162245 -11.045998

Create residuals plot

Create scatterplot of predicted values and residuals in R,# create a data frame df1 <- data.frame(fitted(reg), resid(reg)) ggplot(df1, aes(fitted.reg., resid.reg.)) + geom_point(size = 3) + geom_hline(yintercept = 0)

Residual plot in R

Enhance your skills with courses on regression

  • Machine Learning Specialization
  • Linear Regression and Modeling
  • Linear Regression with Python
  • Linear Regression with NumPy and Python
  • Building and analyzing linear regression model in R
  • Machine Learning: Regression

References

  1. Kim HY. Statistical notes for clinical researchers: simple linear regression 3–residual analysis. Restorative dentistry & endodontics. 2019 Feb 1;44(1).
  • Linear regression basics and implementation in Python
  • Multiple linear regression (MLR)
  • Mixed ANOVA using Python and R (with examples)
  • Repeated Measures ANOVA using Python and R (with examples)
  • ANCOVA using R (with examples and code)
  • Multiple hypothesis testing problem in Bioinformatics

If you have any questions, comments or recommendations, please email me at [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.

Tag » How To Calculate The Residual