How to Normalize a Numpy Array to A Unit Vector in Python?

Numpy arrays are a grid of values of the same type. You can use these arrays to store a list of values that needs to be used for data analysis or machine learning activities.

You can normalize a NumPy array to a unit vector using the sklearn.normalize() method.

When using the array of data in machine learning, you can only pass the normalized values to the algorithms to achieve better accuracy. A unit vector is a vector that has a magnitude of 1.

In this tutorial, you’ll learn how to normalize a NumPy array to a unit vector using the python libraries sklearn.normalize() and numpy.norm() method.

If you’re in Hurry

You can use the below code snippet to normalize an array in NumPy to a unit vector.

np.linalg.norm() method will return one of eight different matrix norms or one of an infinite number of vector norms depending on the value of the ord parameter. If you do not pass the ord parameter, it’ll use the FrobeniusNorm.

When you divide the data using this Norm, you’ll get normalized data as shown below.

Snippet

import numpy as np

x = np.random.rand(10)*10

normalized_x= x/np.linalg.norm(x)

print(normalized_x)

Output

    [0.46925769 0.12092959 0.37642505 0.09316824 0.38277321 0.07894217
     0.36265182 0.28934431 0.49484541 0.04406218]

This is how you can get a unit vector of a NumPy array.

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn how to get the unit vector from a NumPy array using different methods.

Sample Numpy Array

First, let’s create a sample NumPy array with 10 random values. You can use this in the later steps to learn how to normalize the data.

Snippet

import numpy as np

from sklearn.preprocessing import normalize

x = np.random.rand(10)*10

x

Output

    array([4.59743528, 2.49994446, 5.45313476, 2.22769086, 3.19143523,
           8.56257209, 7.01471989, 6.23370745, 7.21487837, 8.86694182])

Using SKlearn Normalize

In this section, you’ll learn how to normalize a NumPy array using the sklearn normalize() method.

Normalize() method scales the input vector to an individual unit norm.

It accepts one mandatory parameter.

X – Array-like input. You can pass the data to be normalized in this parameter.

Parameters

It also accepts three other optional parameters.

norm{‘l1’, ‘l2’, ‘max’}, default=’l2’ – The norm to be used for normalizing the data.

axis{0, 1}, default=1 – axis used to normalize the data along. If 1, each sample will be normalized individually, If 0, each feature will be normalized.

copybool, default=True – If false, the normalization will take place in the same instance of the array. Otherwise, a new copy of the array will be created and normalized.

return_normbool, default=False – Whether you need the computed norms to be returned or not.

Snippet

normalize(x[:,np.newaxis], axis=0) is used to normalize the data in variable X.

Where,

np.newaxis increases the dimension of the NumPy array. Using it along the array X will make the array a one-dimensional array.

  • x[:, np.newaxis] – To return all rows from the array for normalization.
  • axis=0 – To normalize the each feature in the array
import numpy as np

from sklearn.preprocessing import normalize

x = np.random.rand(10)*10

normalized_x = normalize(x[:,np.newaxis], axis=0)

print(normalized_x)

When you print the array, you’ll see the array is in a normalized form.

Output

    [[0.05341832]
     [0.42901918]
     [0.34359858]
     [0.00150131]
     [0.48057246]
     [0.3178608 ]
     [0.27146542]
     [0.27559803]
     [0.37805814]
     [0.26545377]]

Using np.linalg.norm()

You can also use the np.linalg.norm() method from the NumPy library to normalize the NumPy array into a unit vector.

np.linalg.norm() method will return one of eight different matrix norms or one of an infinite number of vector norms depending on the value of the ord parameter. If you do not pass the ord parameter, it’ll use the FrobeniusNorm.

You can divide the data using the returned norm to get the unit vector of the NumPy array.

Snippet

import numpy as np

x = np.random.rand(10)*10

normalized_x= x/np.linalg.norm(x)

print(normalized_x)

When you print the normalized vector, you’ll see the normalized value as shown below.

Output

    [0.46925769 0.12092959 0.37642505 0.09316824 0.38277321 0.07894217
     0.36265182 0.28934431 0.49484541 0.04406218]

This is how you can use the np.linalg.norm() method to normalize the NumPy array to a unit vector.

Using Maths Formula

In this section, you’ll create a maths formula to normalize the NumPy array to a unit vector.

You’ll create a vector norm by taking the square root of the sum of the values in the array. Then using this vector, you can create a normalized form of the data.

Use the below form to normalize the NumPy array using the mathematical form.

Snippet

import numpy as np

x = np.random.rand(10)*10

normalized_x = x / np.sqrt(np.sum(x**2))

print(normalized_x)

Output

    [0.12280124 0.36840538 0.05669781 0.27392538 0.43742201 0.45143303
     0.20542178 0.03980713 0.13138495 0.5610464 ]

This is how you can normalize a NumPy array into a unit vector by using the mathematical formula.

Normalize Numpy Array Along Axis

In this section, you’ll learn how to normalize the NumPy array into a unit vector along the different axis. Namely, row axis and column axis.

Normalize Numpy Array By Columns

You can use the axis=0 in the normalize function to normalize the NumPy array into a unit vector by columns. When you use this, each feature of the dataset will be normalized.

Snippet

import numpy as np

from sklearn.preprocessing import normalize

x = np.random.rand(10)*10

normalized_x = normalize(x[:,np.newaxis], axis=0)

print(normalized_x)

This array has only one feature. Hence, when you print the normalized array, you’ll see the below values.

Output

    [[0.23542553]
     [0.38018535]
     [0.05725614]
     [0.01711471]
     [0.59367405]
     [0.58159005]
     [0.04489816]
     [0.09942305]
     [0.1961091 ]
     [0.23538758]]

Normalize Numpy Array By Rows

You can use the axis=1 in the normalize function to normalize the NumPy array into a unit vector by rows. When you use this, each sample of the dataset will be normalized individually.

Snippet

import numpy as np

from sklearn.preprocessing import normalize

x = np.random.rand(10)*10

normalized_x = normalize(x[:,np.newaxis], axis=1)

print(normalized_x)

The array has only one column. When you normalize based on the row, each sample will be normalized and you’ll see the output as below.

Output

    [[1.]
     [1.]
     [1.]
     [1.]
     [1.]
     [1.]
     [1.]
     [1.]
     [1.]
     [1.]]

This is how you can normalize the NumPy array by rows. Each sample will be sample individually.

Conclusion

To summarize, you’ve learned how to normalize a NumPy array into a unit vector for using it for various data analysis purposes.

You’ve also learned how to get the unit vector from a NumPy array using the maths formula, NumPy norm() method, and the sklearn normalize() method.

If you’ve any questions, comment below.

You May Also Like

Leave a Comment