Normalization of data is transforming the data to appear on the same scale across all the records.
You can normalize data between 0 and 1 range by using the formula (data – np.min(data)) / (np.max(data) – np.min(data)).
In this tutorial, you’ll learn how to normalize data between 0
and 1
range using different options in python.
If you’re in Hurry
You can use the below code snippet to normalize data between 0
and 1
ranges.
The below code snippet uses the NumPy
array to store the values and a user-defined function is created to normalize the data by using the minimum value and maximum value in the array.
Snippet
import numpy as np
def NormalizeData(data):
return (data - np.min(data)) / (np.max(data) - np.min(data))
X = np.array([
[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]
])
scaled_x = NormalizeData(X)
print(scaled_x)
When you print the normalized array, you’ll see the below output.
The minimum value in the array will always be normalized to 0
and the maximum value in the array will be normalized to 1
. All the other values will be in the range between 0
and 1
.
Output
[[0. 0.06666667]
[0.13333333 0.2 ]
[0.26666667 0.33333333]
[0.4 0.46666667]
[0.53333333 0.6 ]
[0.66666667 0.73333333]
[0.8 0.86666667]
[0.93333333 1. ]]
This is how you can normalize the data in a NumPy
array between 0
and 1
.
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the different methods available to normalize data between 0
and 1
.
Why You Need to Normalize Data
You need to normalize data when you’re performing some sort of analysis on the dataset and that dataset has multiple variables measured using the different scales.
For example, your dataset may have a column that stores the value of length of an object in meters and another column that stores the value of width of an object in inches.
Let’s consider one record.
Length = 2
Meters and Width = 78
Inches.
In the normal conversion scale, 1
meter equals 39
inches.
So when you convert the width of 78
inches to meters, then it’ll be 2
meters only.
However, if you pass this data without normalizing for statistical analysis or any machine learning algorithm, there is a high chance that the width parameters get overly influential. Because of its value 78
over the value of length 2
. Hence scaling must be done.
What Does It Mean To Normalize Data
When you normalize the data of the different scales, both the values will be transformed to the same scale/range. For example, both values will be in the range between 0
and 1
.
The lowest value in the data will have the value 0
and the highest value in the data will have the value 1
and the other values will be within the range 0
and 1
.
Normalization Formula
The formula for normalizing the data between 0
and 1
range is given below.
zi = (xi – min(x)) / (max(x) – min(x))
where,
xi
– Value of the current iteration in your datasetmin(x)
– Minimum value in the datasetmax(x)
– Maximum value in the datasetzi
– Normalized value of the current iteration
To normalize a value, subtract it from the minimum value of the dataset and divide it by using the difference between the maximum and minimum value of the dataset.
Using SKLearn MinMaxScaler
When you’re handling data analysis on Python, there are multiple libraries available to perform the normalization. One such library is Sklearn.
It has a scaler object known as MinMaxScaler
which will normalize the dataset using the minimum and maximum value of the dataset.
Note: When you’re scaling the training data, you need to scale the test data also on the same scale. Because training data will have different minimum and maximum values and test data will have different minimum and maximum values. However, the test data also must be scaled with the minimum and maximum value of the Train dataset for the proper scaling.
Use the below snippet to normalize the data using the Sklearn MinMaxScaler in Python.
Snippet
import numpy as np
from sklearn import preprocessing
X = np.array([
[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]
])
min_max_scaler = preprocessing.MinMaxScaler()
scaled_x = min_max_scaler.fit_transform(X)
scaled_x
Where,
numpy
– Used to create an arraysklearn preprocessing
– To using themin_max_scaler
from the preprocessing class.min_max_scaler.fit_transform(X)
– Scales the array X using the min max scaler object.
When you print the scaled_x, you could see that the values are between the range 0 and 1.
Output
array([[0. , 0. ],
[0.14285714, 0.14285714],
[0.28571429, 0.28571429],
[0.42857143, 0.42857143],
[0.57142857, 0.57142857],
[0.71428571, 0.71428571],
[0.85714286, 0.85714286],
[1. , 1. ]])
This is how you can normalize the data between the range 0
and 1
using the sklearn library.
Using np.linalg.norm()
In this section, you’ll learn how to normalize the data using the method norm() available in the NumPy
library.
This method will return one of eight different matrix norms or one of an infinite number of vector norms depending on the value of the ord
parameter. If you do not pass the ord
parameter, it’ll use the FrobeniusNorm.
Once you have this matrix norm you can divide the values with this norm which will normalize the data.
Use the below snippet to normalize data using the matrix norms.
Snippet
import numpy as np
X = np.array([
[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]
])
normalized_x= X/np.linalg.norm(X)
print(normalized_x)
Where,
np.linalg.norm(X)
– Gets the matrix norm of the datasetX/np.linalg.norm(X)
– Divide each value in the dataset using the matrix normprint(normalized_x)
– prints the normalized array.
When you print the normalized array, you’ll see that the data is between the range 0
and 1
.
Output
[[0. 0.02839809]
[0.05679618 0.08519428]
[0.11359237 0.14199046]
[0.17038855 0.19878664]
[0.22718473 0.25558283]
[0.28398092 0.31237901]
[0.3407771 0.36917519]
[0.39757328 0.42597138]]
This is how you can normalize the data between 0
and 1
using the np.linalg.norm()
method.
Using Maths Formula
You can also normalize the data using the sum of squares of the data using the below snippet.
Snippet
import numpy as np
X = np.array([
[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]
])
normalized_x = X / np.sqrt(np.sum(X**2))
print(normalized_x)
When you print the normalized value, you’ll see that the values will be in the range 0
and 1
.
Output
[[0. 0.02839809]
[0.05679618 0.08519428]
[0.11359237 0.14199046]
[0.17038855 0.19878664]
[0.22718473 0.25558283]
[0.28398092 0.31237901]
[0.3407771 0.36917519]
[0.39757328 0.42597138]]
This is how you can normalize the data using the maths formula.
Using Min and Max Values
In this section, you’ll learn how to normalize data using the minimum and maximum values of the dataset. You’ll not use any libraries for this min-max normalization.
Use the NumPy
library to find the minimum and maximum values of the datasets.
np.min – Finds the minimum value of the dataset.
np.max – Finds the maximum value of the dataset.
You can use these minimum and maximum values to normalize the value by subtracting it from the minimum value and divide it by using the difference between the maximum and minimum value.
Use the below snippet to normalize the data using min
and max
values.
Snippet
import numpy as np
def NormalizeData(data):
return (data - np.min(data)) / (np.max(data) - np.min(data))
X = np.array([
[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]
])
scaled_x = NormalizeData(X)
print(scaled_x)
When you print the array, you’ll see that the data will be in the range 0
and 1
.
Output
[[0. 0.06666667]
[0.13333333 0.2 ]
[0.26666667 0.33333333]
[0.4 0.46666667]
[0.53333333 0.6 ]
[0.66666667 0.73333333]
[0.8 0.86666667]
[0.93333333 1. ]]
This is how you can normalize the data using the minimum and maximum values.
Conclusion
To summarize, you’ve learned how to normalize values between 0
and 1
range. You’ve used the libraries sklearn minmaxscaler to normalize using the minimum values and also used the NumPy
method norm() to normalize the data using the matrix norms.
If you’ve any questions, comment below.