How To Save And Load Machine Learning Models To Disk in Python?

Machine learning models are created for classifying or predicting future data. To use it for predicting the future data, these models must be saved into the disk, so that they can be loaded again to predict the new data.

You can save and load machine learning models by using the pickle library.

In this tutorial, you’ll learn how to save and load machine learning models to disk using pickle or joblib and how to save scalar objects to disk so that they can be used to scale the new data on the same scale as the training data.

If you’re in Hurry

You can use the below code snippet to save the machine learning model to disk.

Snippet

import pickle

model_filename = "My_KNN_model.sav"

saved_model = pickle.dump(knn, open(model_filename,'wb'))

print('Model is saved into to disk successfully Using Pickle')

Use the below snippet to load the machine learning model from the disk.

import pickle

model_filename = "My_KNN_model.sav"

my_knn_model = pickle.load(open(model_filename, 'rb'))

result = my_knn_model.predict(X_test)

result

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the different methods available to save the machine learning model to disk and load it later when you want to predict the new data.

This applies to any type of model you create. For example, you can use the same steps to save the classifier model to disk and use it later.

Normally machine learning models are created using the scikitlearn library. But, you cannot save the model using the sklearn library directly. You need to use the libraries like pickle or joblib to save models created using sklearn.

Creating the Model

To save a machine learning model, first, the model needs to be created. In this section, you’ll create a model by using the iris dataset and the Kneighbours classification algorithm which can be used to classify the Iris flowers based on the Sepal Length, Sepal Width, and Petal length, and petal width.

The model will be stored in the variable called knn. You’ll save this model to disk and load it at a later point in time.

Snippet

import numpy as np

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier as KNN

iris = load_iris()

X = iris.data
y = iris.target

# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)


knn = KNN(n_neighbors = 3)

# train th model
knn.fit(X_train, y_train)

print('Model is Created')

Now the model is created.

Output

    Model is Created

Using Pickle To Save And Load The Model

Pickle is a python library that can be used to serialize the python objects to disk and later to deserialize the objects from disk to load them into to python program.

You can use the Pickle library to save and load the machine learning model.

Pickle is a module installed for both Python 2 and Python 3 by default. Hence no explicit installation is required to use the pickle library.

Saving The Model Using Pickle

You can save the machine learning model using the pickle.dump() method. It’ll serialize the object to the disk.

It accepts two parameters.

  • object_to_be_serialized – Model object which needs to be serialized to disk
  • A File_Object – A binary file object opened in the write mode using open(model_filename,'wb'). model_filename is the name for the file which will be saved on disk. w denotes file needs to be opened in write mode and b denotes that this file object is a binary object.

When you execute the below program, the line pickle.dump(knn, open(model_filename,'wb')) will serialize the model object knn to the file name My_KNN_Model.sav.

Snippet

import pickle

model_filename = "My_KNN_model.sav"

saved_model = pickle.dump(knn, open(model_filename,'wb'))

print('Model is saved into to disk successfully Using Pickle')

Output

    Model is saved into to disk successfully Using Pickle

This is how you can save the classifier model to disk using pickle.

Dumping a machine learning model to disk will replace an already existing file with the same name. Hence, you can list the files in a directory and see to ensure a file with the same name doesn’t exist.

Next, you’ll learn how to load the saved model using pickle and use it for prediction.

Loading The Model Using Pickle

To classify the new data you’re seeing, you need to load the model which you’ve trained and saved it to disk.

You can load the saved machine learning model using pickle.load() method.

It accepts one parameter.

  • File_Object – A file object opened in a read mode using open(file_name, 'rb') where file_name denotes the name of the file to be loaded. r denotes to open the file in the read mode and b denotes it is a binary file.

To learn more about reading a binary file, read How to Read Binary File in Python?

When you execute the below script, the model will be read into the object my_knn_model and you can use the same model to predict the new data.

Snippet

import pickle

model_filename = "My_KNN_model.sav"

my_knn_model = pickle.load(open(model_filename, 'rb'))

result = my_knn_model.predict(X_test)

result

The model is read into the object my_knn_model and the test data available in X_test variable is predicted and the result is stored in the variable result and printed as shown below.

Output

    array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
           0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
           0, 1, 1, 2, 1, 2, 1, 2, 1, 0, 2, 1, 0, 0, 0, 1])

This is how you can save the machine learning model to disk and load the model to predict the new data using the pickle library.

Next, you’ll learn about another library called joblib.

Using Joblib To Save And Load The Model

You can use the joblib library to save and load the machine learning model.

joblib library is available by default in most cases. You can import joblib by using the following import statement.

import joblib

Saving The Model Using JobLib

You can use the dump() method available in the joblib library to save the machine learning model. It’ll serialize the object to the disk.

It accepts two parameters.

  • object_to_be_serialized – Model object which needs to be serialized to disk
  • File_name – Target file name to which the model should be saved to disk. You can just pass the filename. No need to create a file object.

When you execute the below program, the line joblib.dump(knn, model_filename) will serialize the model object knn to the file name My_KNN_model_job_lib.sav.

Snippet

import joblib

model_filename = "My_KNN_model_job_lib.sav"

joblib.dump(knn, model_filename)

print('Model is saved into to disk successfully Using Job Lib')

Output

    Model is saved into to disk successfully Using Job Lib

This is how you can save the machine learning model to disk using the joblib library.

Loading The Model Using JobLib

To classify the new data you’re seeing, you need to load the model which you’ve trained and saved it to disk.

You can load the saved model using joblib.load() method. It accepts one parameter.

  • File_Name – A filename of the model.

When you execute the below script, the model will be read into the object my_knn_model and you can use the same model to predict the new data.

Snippet

import joblib

model_filename = "My_KNN_model_job_lib.sav"

my_knn_model = joblib.load(model_filename)

result = my_knn_model.predict(X_test)

result

Output

    array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
           0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
           0, 1, 1, 2, 1, 2, 1, 2, 1, 0, 2, 1, 0, 0, 0, 1])

This is how you can load the model using the joblib library and use it for predicting future data.

Saving And Loading Scaler Object

When you are saving a machine learning model to disk and loading it again to predict the new data, it is important to normalize the new data appropriately.

The data needs to be scaled using the same scale on which the training data is scaled. So that the prediction or classification works normally.

Example

Consider, you are using minmaxscaler to scale the data. Now the training data will have a different set of minimum and maximum values.

If you try to create a new minmaxscalar for your new data that needs to be classified or predicted, the new data will have different minimum and maximum values. Scaling this data using the default minmaxscaler will scale this differently and you’ll see an error that the new data is scaled using a different scalar object.

Hence, you need to use the same minmaxscaler used for the training. So the new data is also scaled similarly to the trained model. To achieve this, you MUST save the scalar object also to the disk.

Read Why you need to normalize data to understand more about scaling the training and test data.

Saving the Scaler Object

You can use the below script to save the scaler object which you used to scale the training data.

Snippet

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

scaled_array = scaler.fit_transform(X_train)

pickle.dump(scaler, open("scaler.pkl", 'wb'))

print("Scalar object is successfully stored into the disk")

Output

    Scalar object is successfully stored into the disk

Loading the Scaler Object

You can use the below script to load the scalar object which you used to scale the training data. So that you can scale the new data using the same scaler object.

Snippet

from sklearn.preprocessing import MinMaxScaler

scalerObj = pickle.load(open("scaler.pkl", 'rb'))

scaled_array  = scalerObj.transform(X_train)

print(scaled_array)

print("New data is trained using the same scale used for normalizing the train data")

Output

    [[0.58823529 0.25       0.67857143 0.70833333]
     [0.14705882 0.6        0.14285714 0.04166667]
     [0.20588235 0.4        0.08928571 0.04166667]
     [0.08823529 0.5        0.05357143 0.04166667]
   ...
   ...
     [0.44117647 0.9        0.01785714 0.04166667]
     [0.44117647 0.2        0.51785714 0.45833333]
     [0.82352941 0.4        0.85714286 0.83333333]]
    New data is trained using the same scale used for normalizing the train data

Common Errors And Solutions

1. Different number of features Error

ValueError: The number of features of the model must match the input. Model n_features is 8 and input n_features is 7.

Solution

You need to use the same number of features in your training and your test data. For example, if you have used 8 features to train and create the model, you need to use the same 8 features in new prediction data also. If you do not have a specific feature, try to create a dummy value for that feature based on some feature engineering techniques.

Conclusion

To summarize, you’ve learned how to save and load machine models by using the pickle and joblib libraries.

You have also learned how to save the scaler object and why it’s important to use the same scalar object.

You can use the same method to save any type of model such as Random forest classifier, Gridsearchcv, Support vector machines (SVM), and load it later on.

If you have any questions, comment below.

You May Also Like

Leave a Comment