2 Easy Ways To Normalize Data In Python - DigitalOcean
Maybe your like
Using the scikit-learn preprocessing.normalize() Function to Normalize Data
You can use the scikit-learn preprocessing.normalize() function to normalize an array-like dataset.
The normalize() function scales vectors individually to a unit norm so that the vector has a length of one. The default norm for normalize() is L2, also known as the Euclidean norm. The L2 norm formula is the square root of the sum of the squares of each value. Although using the normalize() function results in values between 0 and 1, it’s not the same as simply scaling the values to fall between 0 and 1.
Normalizing an Array Using the normalize() Function
You can normalize a one dimensional NumPy array using the normalize() function.
Import the sklearn.preprocessing module:
from sklearn import preprocessingImport NumPy and create an array:
import numpy as np x_array = np.array([2,3,5,6,7,4,8,7,6])Use the normalize() function on the array to normalize data along a row, in this case a one dimensional array:
normalized_arr = preprocessing.normalize([x_array]) print(normalized_arr)Run the the complete example code to demonstrate how to normalize a NumPy array using the normalize() function:
norm_numpy.py from sklearn import preprocessing import numpy as np x_array = np.array([2,3,5,6,7,4,8,7,6]) normalized_arr = preprocessing.normalize([x_array]) print(normalized_arr)The output is:
Output[[0.11785113 0.1767767 0.29462783 0.35355339 0.41247896 0.23570226 0.47140452 0.41247896 0.35355339]]The output shows that all the values are in the range 0 to 1. If you square each value in the output and then add them together, the result is 1, or very close to 1.
Normalizing Columns from a DataFrame Using the normalize() Function
In a pandas DataFrame, features are columns and rows are samples. You can convert a DataFrame column into a NumPy array and then normalize the data in the array.
The examples in this, and the following, sections use the California Housing dataset.
The first part of the example code imports the modules, loads the dataset, creates the DataFrame, and prints the description of the dataset:
import numpy as np from sklearn import preprocessing from sklearn.datasets import fetch_california_housing # create the DataFrame california_housing = fetch_california_housing(as_frame=True) # print the dataset description print(california_housing.DESCR)Note that the as_frame parameter is set to True to create the california_housing object as a pandas DataFrame.
The output includes the following excerpt from the dataset description, which you can use to choose a feature to normalize:
Output.. _california_housing_dataset: California Housing dataset -------------------------- **Data Set Characteristics:** :Number of Instances: 20640 :Number of Attributes: 8 numeric, predictive attributes and the target :Attribute Information: - MedInc median income in block group - HouseAge median house age in block group - AveRooms average number of rooms per household - AveBedrms average number of bedrooms per household - Population block group population - AveOccup average number of household members - Latitude block group latitude - Longitude block group longitude ...Next, convert a column (feature) to an array, and print it. This example uses the HouseAge column:
x_array = np.array(california_housing['HouseAge']) print("HouseAge array: ",x_array)Finally, use the normalize() function to normalize the data and print the resulting array:
normalized_arr = preprocessing.normalize([x_array]) print("Normalized HouseAge array: ",normalized_arr)Run the the complete example to demonstrate how to normalize a feature using the normalize() function:
norm_feature.py from sklearn import preprocessing import numpy as np from sklearn.datasets import fetch_california_housing california_housing = fetch_california_housing(as_frame=True) # print(california_housing.DESCR) x_array = np.array(california_housing.data['HouseAge']) print("HouseAge array: ",x_array) normalized_arr = preprocessing.normalize([x_array]) print("Normalized HouseAge array: ",normalized_arr)The output is:
OutputHouseAge array: [41. 21. 52. ... 17. 18. 16.] Normalized HouseAge array: [[0.00912272 0.00467261 0.01157028 ... 0.00378259 0.0040051 0.00356009]]The output shows that the normalize() function changed the array of median house age values so that the square root of the sum of the squares of the values equals one. In other words, the values were scaled to a unit length using the L2 norm.
Normalizing Datasets by Row or by Column Using the normalize() Function
When you normalize a dataset without converting features, or columns, into arrays for processing, the data is normalized by row. The default axis for the normalize() function is 1, which means that each sample, or row, is normalized.
The following example demonstrates normalizing the California Housing dataset using the default axis:
norm_dataset_sample.py from sklearn import preprocessing import pandas as pd from sklearn.datasets import fetch_california_housing california_housing = fetch_california_housing(as_frame=True) d = preprocessing.normalize(california_housing.data) scaled_df = pd.DataFrame(d, columns=california_housing.data.columns) print(scaled_df)The output is:
Output MedInc HouseAge AveRooms ... AveOccup Latitude Longitude 0 0.023848 0.117447 0.020007 ... 0.007321 0.108510 -0.350136 1 0.003452 0.008734 0.002594 ... 0.000877 0.015745 -0.050829 2 0.014092 0.100971 0.016093 ... 0.005441 0.073495 -0.237359 3 0.009816 0.090449 0.010119 ... 0.004432 0.065837 -0.212643 4 0.006612 0.089394 0.010799 ... 0.003750 0.065069 -0.210162 ... ... ... ... ... ... ... ... 20635 0.001825 0.029242 0.005902 ... 0.002995 0.046179 -0.141637 20636 0.006753 0.047539 0.016147 ... 0.008247 0.104295 -0.320121 20637 0.001675 0.016746 0.005128 ... 0.002291 0.038840 -0.119405 20638 0.002483 0.023932 0.007086 ... 0.002823 0.052424 -0.161300 20639 0.001715 0.011486 0.003772 ... 0.001879 0.028264 -0.087038 [20640 rows x 8 columns]The output shows that the values are normalized along the rows so that that each sample is normalized instead of each feature.
However, you can normalize by feature by specifying the axis.
The following example demonstrates normalizing the California Housing dataset using axis=0 to normalize by feature:
norm_dataset_feature.py from sklearn import preprocessing import pandas as pd from sklearn.datasets import fetch_california_housing california_housing = fetch_california_housing(as_frame=True) d = preprocessing.normalize(california_housing.data, axis=0) scaled_df = pd.DataFrame(d, columns=california_housing.data.columns) print(scaled_df)The output is:
Output MedInc HouseAge AveRooms ... AveOccup Latitude Longitude 0 0.013440 0.009123 0.008148 ... 0.001642 0.007386 -0.007114 1 0.013401 0.004673 0.007278 ... 0.001356 0.007383 -0.007114 2 0.011716 0.011570 0.009670 ... 0.001801 0.007381 -0.007115 3 0.009110 0.011570 0.006787 ... 0.001638 0.007381 -0.007116 4 0.006209 0.011570 0.007329 ... 0.001402 0.007381 -0.007116 ... ... ... ... ... ... ... ... 20635 0.002519 0.005563 0.005886 ... 0.001646 0.007698 -0.007048 20636 0.004128 0.004005 0.007133 ... 0.002007 0.007700 -0.007055 20637 0.002744 0.003783 0.006073 ... 0.001495 0.007689 -0.007056 20638 0.003014 0.004005 0.006218 ... 0.001365 0.007689 -0.007061 20639 0.003856 0.003560 0.006131 ... 0.001682 0.007677 -0.007057 [20640 rows x 8 columns]When you examine the output, you’ll notice that the results for the HouseAge column match the output you got when you converted the HouseAge column to an array and normalized it in a preceding example.
Tag » How To Standardize Data In Python
-
2 Easy Ways To Standardize Data In Python For Machine Learning
-
How To Standardize Data In Python - Python-bloggers
-
How And Why To Standardize Your Data: A Python Tutorial
-
6.3. Preprocessing Data — Scikit-learn 1.1.2 Documentation
-
How To Standardize Data In Python (With Examples) - - Statology
-
How To Standardize Data In A Pandas DataFrame? - GeeksforGeeks
-
How To Use StandardScaler And MinMaxScaler Transforms In Python
-
How To Standardize Your Data ? [Data Standardization With Python]
-
How To Standardise Features In Python? - ProjectPro
-
Standardizing Data | Python - DataCamp
-
Standardizing Data | Python - DataCamp
-
Machine-learning-articles/how-to-normalize-or-standardize ... - GitHub
-
Data Normalization In Python
-
How To Standardize Data Using Z-Score/Standard Scalar | Python