How To Standardize Data In Python (With Examples) - - Statology
Maybe your like
To standardize a dataset means to scale all of the values in the dataset such that the mean value is 0 and the standard deviation is 1.
We use the following formula to standardize the values in a dataset:
xnew = (xi – x) / s
where:
- xi: The ith value in the dataset
- x: The sample mean
- s: The sample standard deviation
We can use the following syntax to quickly standardize all of the columns of a pandas DataFrame in Python:
(df-df.mean())/df.std()The following examples show how to use this syntax in practice.
Example 1: Standardize All Columns of DataFrame
The following code shows how to standardize all columns in a pandas DataFrame:
import pandas as pd #create data frame df = pd.DataFrame({'y': [8, 12, 15, 14, 19, 23, 25, 29], 'x1': [5, 7, 7, 9, 12, 9, 9, 4], 'x2': [11, 8, 10, 6, 6, 5, 9, 12], 'x3': [2, 2, 3, 2, 5, 5, 7, 9]}) #view data frame df y x1 x2 x3 0 8 5 11 2 1 12 7 8 2 2 15 7 10 3 3 14 9 6 2 4 19 12 6 5 5 23 9 5 5 6 25 9 9 7 7 29 4 12 9 #standardize the values in each column df_new = (df-df.mean())/df.std() #view new data frame df_new y x1 x2 x3 0 -1.418032 -1.078639 1.025393 -0.908151 1 -0.857822 -0.294174 -0.146485 -0.908151 2 -0.437664 -0.294174 0.634767 -0.525772 3 -0.577717 0.490290 -0.927736 -0.908151 4 0.122546 1.666987 -0.927736 0.238987 5 0.682756 0.490290 -1.318362 0.238987 6 0.962861 0.490290 0.244141 1.003746 7 1.523071 -1.470871 1.416019 1.768505We can verify that the mean and standard deviation of each column is equal to 0 and 1, respectively:
#view mean of each column df_new.mean() y 0.000000e+00 x1 2.775558e-17 x2 -4.163336e-17 x3 5.551115e-17 dtype: float64 #view standard deviation of each column df_new.std() y 1.0 x1 1.0 x2 1.0 x3 1.0 dtype: float64Example 2: Standardize Specific Columns of DataFrame
Sometimes you may only want to standardize specific columns in a DataFrame.
For example, for many machine learning algorithms you may only want to standardize the predictor variables before fitting a certain model to the data.
The following code shows how to standardize specific columns in a pandas DataFrame:
import pandas as pd #create data frame df = pd.DataFrame({'y': [8, 12, 15, 14, 19, 23, 25, 29], 'x1': [5, 7, 7, 9, 12, 9, 9, 4], 'x2': [11, 8, 10, 6, 6, 5, 9, 12], 'x3': [2, 2, 3, 2, 5, 5, 7, 9]}) #view data frame df y x1 x2 x3 0 8 5 11 2 1 12 7 8 2 2 15 7 10 3 3 14 9 6 2 4 19 12 6 5 5 23 9 5 5 6 25 9 9 7 7 29 4 12 9 #define predictor variable columns df_x = df[['x1', 'x2', 'x3']] #standardize the values for each predictor variable df[['x1', 'x2', 'x3']] = (df_x-df_x.mean())/df_x.std() #view new data frame df y x1 x2 x3 0 8 -1.078639 1.025393 -0.908151 1 12 -0.294174 -0.146485 -0.908151 2 15 -0.294174 0.634767 -0.525772 3 14 0.490290 -0.927736 -0.908151 4 19 1.666987 -0.927736 0.238987 5 23 0.490290 -1.318362 0.238987 6 25 0.490290 0.244141 1.003746 7 29 -1.470871 1.416019 1.768505Notice that the “y” column remains unchanged, but the columns “x1”, “x2”, and “x3” are all standardized.
We can verify that the mean and standard deviation of each predictor variable column is equal to 0 and 1, respectively:
#view mean of each predictor variable column df[['x1', 'x2', 'x3']].mean() x1 2.775558e-17 x2 -4.163336e-17 x3 5.551115e-17 dtype: float64 #view standard deviation of each predictor variable column df[['x1', 'x2', 'x3']].std() x1 1.0 x2 1.0 x3 1.0 dtype: float64Additional Resources
How to Normalize Columns in a Pandas DataFrame How to Remove Outliers in Python Standardization vs. Normalization: What’s the Difference?
Tag » How To Standardize Data In Python
-
2 Easy Ways To Standardize Data In Python For Machine Learning
-
How To Standardize Data In Python - Python-bloggers
-
How And Why To Standardize Your Data: A Python Tutorial
-
2 Easy Ways To Normalize Data In Python - DigitalOcean
-
6.3. Preprocessing Data — Scikit-learn 1.1.2 Documentation
-
How To Standardize Data In A Pandas DataFrame? - GeeksforGeeks
-
How To Use StandardScaler And MinMaxScaler Transforms In Python
-
How To Standardize Your Data ? [Data Standardization With Python]
-
How To Standardise Features In Python? - ProjectPro
-
Standardizing Data | Python - DataCamp
-
Standardizing Data | Python - DataCamp
-
Machine-learning-articles/how-to-normalize-or-standardize ... - GitHub
-
Data Normalization In Python
-
How To Standardize Data Using Z-Score/Standard Scalar | Python