How to Normalize Data Between 0 and 1 Range in Python?

Normalization of data is transforming the data to appear on the same scale across all the records.

You can normalize data between 0 and 1 range by subtracting it from the minimum value of the dataset and divide it by the difference of the maximum and minimum values of the dataset.

In this tutorial, you’ll learn how to normalize data between 0 and 1 range using different options in python.

If You’re in Hurry…

You can use the below code snippet to normalize data between 0 and 1 ranges.

The below code snippet uses the NumPy array to store the values and a user-defined function is created to normalize the data by using the minimum value and maximum value in the array.

Snippet

import numpy as np

def NormalizeData(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))

X = np.array([
    [ 0,  1],
    [ 2,  3],
    [ 4,  5],
    [ 6,  7],
    [ 8,  9],
    [10, 11],
    [12, 13],
    [14, 15]
])

scaled_x = NormalizeData(X)

print(scaled_x)

When you print the normalized array, you’ll see the below output.

The minimum value in the array will always be normalized to 0 and the maximum value in the array will be normalized to 1. All the other values will be in the range between 0 and 1.

Output

    [[0.         0.06666667]
     [0.13333333 0.2       ]
     [0.26666667 0.33333333]
     [0.4        0.46666667]
     [0.53333333 0.6       ]
     [0.66666667 0.73333333]
     [0.8        0.86666667]
     [0.93333333 1.        ]]

This is how you can normalize the data in a NumPy array between 0 and 1.

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods available to normalize data between 0 and 1.

Why You Need to Normalize Data

You need to normalize data when you’re performing some sort of analysis on the dataset and that dataset has multiple variables measured using the different scales.

For example, your dataset may have a column that stores the value of length of an object in meters and another column that stores the value of width of an object in inches.

Let’s consider one record.

Length = 2 Meters and Width = 78 Inches.

In the normal conversion scale, 1 meter equals 39 inches.

So when you convert the width of 78 inches to meters, then it’ll be 2 meters only.

However, if you pass this data without normalizing for statistical analysis or any machine learning algorithm, there is a high chance that the width parameters get overly influential. Because of its value 78 over the value of length 2. Hence scaling must be done.

What Does It Mean To Normalize Data

When you normalize the data of the different scales, both the values will be transformed to the same scale/range. For example, both values will be in the range between 0 and 1.

The lowest value in the data will have the value 0 and the highest value in the data will have the value 1 and the other values will be within the range 0 and 1.

Normalization Formula

The formula for normalizing the data between 0 and 1 range is given below.

zi = (xi – min(x)) / (max(x) – min(x))

where,

  • xi – Value of the current iteration in your dataset
  • min(x) – Minimum value in the dataset
  • max(x) – Maximum value in the dataset
  • zi – Normalized value of the current iteration

To normalize a value, subtract it from the minimum value of the dataset and divide it by using the difference between the maximum and minimum value of the dataset.

Using SKLearn MinMaxScaler

When you’re handling data analysis on Python, there are multiple libraries available to perform the normalization. One such library is Sklearn.

It has a scaler object known as MinMaxScaler which will normalize the dataset using the minimum and maximum value of the dataset.

Note: When you’re scaling the training data, you need to scale the test data also on the same scale. Because training data will have different minimum and maximum values and test data will have different minimum and maximum values. However, the test data also must be scaled with the minimum and maximum value of the Train dataset for the proper scaling.

Use the below snippet to normalize the data using the Sklearn MinMaxScaler in Python.

Snippet

import numpy as np

from sklearn import preprocessing

X = np.array([
    [ 0,  1],
    [ 2,  3],
    [ 4,  5],
    [ 6,  7],
    [ 8,  9],
    [10, 11],
    [12, 13],
    [14, 15]
])

min_max_scaler = preprocessing.MinMaxScaler()

scaled_x = min_max_scaler.fit_transform(X)

scaled_x

Where,

  • numpy – Used to create an array
  • sklearn preprocessing – To using the min_max_scaler from the preprocessing class.
  • min_max_scaler.fit_transform(X) – Scales the array X using the min max scaler object.

When you print the scaled_x, you could see that the values are between the range 0 and 1.

Output

    array([[0.        , 0.        ],
           [0.14285714, 0.14285714],
           [0.28571429, 0.28571429],
           [0.42857143, 0.42857143],
           [0.57142857, 0.57142857],
           [0.71428571, 0.71428571],
           [0.85714286, 0.85714286],
           [1.        , 1.        ]])

This is how you can normalize the data between the range 0 and 1 using the sklearn library.

Using np.linalg.norm()

In this section, you’ll learn how to normalize the data using the method norm() available in the NumPy library.

This method will return one of eight different matrix norms or one of an infinite number of vector norms depending on the value of the ord parameter. If you do not pass the ord parameter, it’ll use the FrobeniusNorm.

Once you have this matrix norm you can divide the values with this norm which will normalize the data.

Use the below snippet to normalize data using the matrix norms.

Snippet

import numpy as np

X = np.array([
    [ 0,  1],
    [ 2,  3],
    [ 4,  5],
    [ 6,  7],
    [ 8,  9],
    [10, 11],
    [12, 13],
    [14, 15]
])

normalized_x= X/np.linalg.norm(X)

print(normalized_x)

Where,

  • np.linalg.norm(X) – Gets the matrix norm of the dataset
  • X/np.linalg.norm(X) – Divide each value in the dataset using the matrix norm
  • print(normalized_x) – prints the normalized array.

When you print the normalized array, you’ll see that the data is between the range 0 and 1.

Output

    [[0.         0.02839809]
     [0.05679618 0.08519428]
     [0.11359237 0.14199046]
     [0.17038855 0.19878664]
     [0.22718473 0.25558283]
     [0.28398092 0.31237901]
     [0.3407771  0.36917519]
     [0.39757328 0.42597138]]

This is how you can normalize the data between 0 and 1 using the np.linalg.norm() method.

Using Maths Formula

You can also normalize the data using the sum of squares of the data using the below snippet.

Snippet

import numpy as np

X = np.array([
    [ 0,  1],
    [ 2,  3],
    [ 4,  5],
    [ 6,  7],
    [ 8,  9],
    [10, 11],
    [12, 13],
    [14, 15]
])


normalized_x = X / np.sqrt(np.sum(X**2))

print(normalized_x)

When you print the normalized value, you’ll see that the values will be in the range 0 and 1.

Output

    [[0.         0.02839809]
     [0.05679618 0.08519428]
     [0.11359237 0.14199046]
     [0.17038855 0.19878664]
     [0.22718473 0.25558283]
     [0.28398092 0.31237901]
     [0.3407771  0.36917519]
     [0.39757328 0.42597138]]

This is how you can normalize the data using the maths formula.

Using Min and Max Values

In this section, you’ll learn how to normalize data using the minimum and maximum values of the dataset. You’ll not use any libraries for this min-max normalization.

Use the NumPy library to find the minimum and maximum values of the datasets.

np.min – Finds the minimum value of the dataset.

np.max – Finds the maximum value of the dataset.

You can use these minimum and maximum values to normalize the value by subtracting it from the minimum value and divide it by using the difference between the maximum and minimum value.

Use the below snippet to normalize the data using min and max values.

Snippet

import numpy as np

def NormalizeData(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))

X = np.array([
    [ 0,  1],
    [ 2,  3],
    [ 4,  5],
    [ 6,  7],
    [ 8,  9],
    [10, 11],
    [12, 13],
    [14, 15]
])

scaled_x = NormalizeData(X)

print(scaled_x)

When you print the array, you’ll see that the data will be in the range 0 and 1.

Output

    [[0.         0.06666667]
     [0.13333333 0.2       ]
     [0.26666667 0.33333333]
     [0.4        0.46666667]
     [0.53333333 0.6       ]
     [0.66666667 0.73333333]
     [0.8        0.86666667]
     [0.93333333 1.        ]]

This is how you can normalize the data using the minimum and maximum values.

Conclusion

To summarize, you’ve learned how to normalize values between 0 and 1 range. You’ve used the libraries sklearn minmaxscaler to normalize using the minimum values and also used the NumPy method norm() to normalize the data using the matrix norms.

If you’ve any questions, comment below.

You May Also Like

Leave a Comment