Visualizations play an essential role in the exploratory data analysis activity of machine learning.

**You can plot confusion matrix using the confusion_matrix() method from sklearn.metrics package.**

**Why Confusion Matrix?**

After creating a machine learning model, accuracy is a metric used to evaluate the machine learning model. On the other hand, you cannot use accuracy in every case as it’ll be misleading. Because the accuracy of 99% may look good as a percentage, but consider a machine learning model used for Fraud Detection or Drug consumption detection.

In such critical scenarios, **the 1% percentage failure can create a significant impact.**

For example, if a model predicted a fraud transaction of 10000$ as *Not Fraud*, then it is not a good model and cannot be used in production.

In the drug consumption model, consider if the model predicted that the person had consumed the drug but actually has not. But due to the *False prediction* of the model, the person may be imprisoned for a crime that is not committed actually.

In such scenarios, *you need a better metric than accuracy to validate the machine learning model.*

This is where the confusion matrix comes into the picture.

In this tutorial, you’ll learn what a confusion matrix is, how to plot confusion matrix for the binary classification model and the multivariate classification model.

## What is Confusion Matrix?

Confusion matrix is a matrix that allows you to visualize the performance of the classification machine learning models. With this visualization, you can get a better idea of how your machine learning model is performing.

## Creating Binary Class Classification Model

In this section, you’ll create a classification model that will predict whether a patient has breast cancer or not, denoted by output classes `True`

or `False.`

The breast cancer dataset is available in the sklearn dataset library.

It contains a total number of 569 data rows. Each row includes 30 numeric features and one output class. If you want to manipulate or visualize the sklearn dataset, you can convert it into pandas dataframe and play around with the pandas dataframe functionalities.

To create the model, you’ll load the sklearn dataset, split it into train and testing set and fit the **train data** into the `KNeighborsClassifier`

model.

After creating the model, you can use the **test data** to predict the values and check how the model is performing.

You can use the actual output classes from your test data and the predicted output returned by the `predict()`

method to plot the confusion matrix and evaluate the model accuracy.

Use the below snippet to create the model.

**Snippet**

```
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier as KNN
breastCancer = load_breast_cancer()
X = breastCancer.data
y = breastCancer.target
# Split the dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)
knn = KNN(n_neighbors = 3)
# train the model
knn.fit(X_train, y_train)
print('Model is Created')
```

The KNeighborsClassifier model is created for the breast cancer training data.

**Output**

` Model is Created`

To test the model created, you can use the test data obtained from the train test split and predict the output. Then, you’ll have the predicted values.

**Snippet**

```
y_pred = knn.predict(X_test)
y_pred
```

**Output**

```
array([0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1,
0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0,
0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1,
1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1,
0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
0, 1, 0, 0, 1, 1, 0, 1])
```

Now use the predicted classes and the actual output classes from the test data to visualize the confusion matrix.

You’ll learn how to plot the confusion matrix for the binary classification model in the next section.

## Plot Confusion Matrix for Binary Classes

You can create the confusion matrix using the confusion_matrix() method from `sklearn.metrics`

package. The `confusion_matrix()`

method will give you an array that depicts the *True Positives*, *False Positives*, *False Negatives*, and *True negatives*.

** Snippet**

```
from sklearn.metrics import confusion_matrix
#Generate the confusion matrix
cf_matrix = confusion_matrix(y_test, y_pred)
print(cf_matrix)
```

**Output**

```
[[ 73 7]
[ 7 141]]
```

Once you have the confusion matrix created, you can use the `heatmap()`

method available in the seaborn library to plot the confusion matrix.

Seaborn heatmap() method accepts one mandatory parameter and few other optional parameters.

`data`

– A rectangular dataset that can be coerced into a 2d array. Here, you can pass the confusion matrix you already have`annot=True`

– To write the data value in the cell of the printed matrix. By default, this is`False`

.`cmap=Blues`

– This is to denote the matplotlib color map names. Here, we’ve created the plot using the blue color shades.

The `heatmap()`

method returns the matplotlib axes that can be stored in a variable. Here, you’ll store in variable `ax`

. Now, you can set *title*, *x-axis* and *y-axis* labels and *tick labels* for x-axis and y-axis.

**Title**– Used to label the complete image. Use the set_title() method to set the title.**Axes-labels**– Used to name the`x`

axis or`y`

axis. Use the set_xlabel() to set the*x-axis*label and set_ylabel() to set the*y-axis*label.**Tick labels**– Used to denote the datapoints on the axes. You can pass the tick labels in an array, and it must be in ascending order. Because the confusion matrix contains the values in the ascending order format. Use the xaxis.set_ticklabels() to set the tick labels for*x-axis*and yaxis.set_ticklabels() to set the tick labels for*y-axis*.

Finally, use the plot.show() method to plot the confusion matrix.

Use the below snippet to create a confusion matrix, set *title* and *labels* for the axis, and set the tick labels, and plot it.

**Snippet**

```
import seaborn as sns
ax = sns.heatmap(cf_matrix, annot=True, cmap='Blues')
ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Values')
ax.set_ylabel('Actual Values ');
## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['False','True'])
ax.yaxis.set_ticklabels(['False','True'])
## Display the visualization of the Confusion Matrix.
plt.show()
```

**Output**

Alternatively, you can also plot the confusion matrix using the ConfusionMatrixDisplay.from_predictions() method available in the sklearn library itself if you want to avoid using the seaborn.

Next, you’ll learn how to plot a confusion matrix with percentages.

### Plot Confusion Matrix for Binary Classes With Percentage

The objective of creating and plotting the confusion matrix is to check the accuracy of the machine learning model. It’ll be good to visualize the accuracy with percentages rather than using just the number. In this section, you’ll learn how to plot a confusion matrix for binary classes with percentages.

To plot the confusion matrix with percentages, first, you need to calculate the percentage of *True Positives*, *False Positives*, *False Negatives*, and *True negatives*. You can calculate the percentage of these values by dividing the value by the sum of all values.

Using the `np.sum()`

method, you can sum all values in the confusion matrix.

Then pass the percentage of each value as data to the `heatmap()`

method by using the statement `cf_matrix/np.sum(cf_matrix)`

.

Use the below snippet to plot the confusion matrix with percentages.

**Snippet**

`ax = sns.heatmap(``cf_matrix/np.sum(cf_matrix)`, annot=True,
fmt='.2%', cmap='Blues')
ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Values')
ax.set_ylabel('Actual Values ');
## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['False','True'])
ax.yaxis.set_ticklabels(['False','True'])
## Display the visualization of the Confusion Matrix.
plt.show()

**Output**

### Plot Confusion Matrix for Binary Classes With Labels

In this section, you’ll plot a confusion matrix for Binary classes with labels *True Positives*, *False Positives*, *False Negatives*, and *True negatives*.

You need to create a list of the labels and convert it into an array using the `np.asarray()`

method with shape `2,2`

. Then, this array of labels must be passed to the attribute `annot`

. This will plot the confusion matrix with the labels annotation.

Use the below snippet to plot the confusion matrix with labels.

**Snippet**

```
labels = ['True Neg','False Pos','False Neg','True Pos']
labels =
````np.asarray(labels).reshape(2,2)`
ax = sns.heatmap(cf_matrix, `annot=labels`, fmt='', cmap='Blues')
ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Values')
ax.set_ylabel('Actual Values ');
## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['False','True'])
ax.yaxis.set_ticklabels(['False','True'])
## Display the visualization of the Confusion Matrix.
plt.show()

**Output**

### Plot Confusion Matrix for Binary Classes With Labels And Percentages

In this section, you’ll learn how to plot a confusion matrix with labels, counts, and percentages.

You can use this to measure the percentage of each label. For example, how much percentage of the predictions are *True Positives*, *False Positives*, *False Negatives*, and *True negatives*

For this, first, you need to create a list of labels, then count each label in one list and measure the percentage of the labels in another list.

Then you can zip these different lists to create labels. Zipping means concatenating an item from each list and create one list. Then, this list must be converted into an array using the `np.asarray()`

method.

Then pass the final array to `annot`

attribute. This will create a confusion matrix with the label, count, and percentage information for each class.

Use the below snippet to visualize the confusion matrix with all the details.

**Snippet**

```
group_names = ['True Neg','False Pos','False Neg','True Pos']
group_counts = ["{0:0.0f}".format(value) for value in
cf_matrix.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cf_matrix.flatten()/np.sum(cf_matrix)]
labels = [f"{v1}\n{v2}\n{v3}" for v1, v2, v3 in
````zip(group_names,group_counts,group_percentages)`]
labels = np.asarray(labels).reshape(2,2)
ax = sns.heatmap(cf_matrix, `annot=labels`, fmt='', cmap='Blues')
ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Values')
ax.set_ylabel('Actual Values ');
## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['False','True'])
ax.yaxis.set_ticklabels(['False','True'])
## Display the visualization of the Confusion Matrix.
plt.show()

**Output**

This is how you can create a confusion matrix for the binary classification machine learning model.

Next, you’ll learn about creating a confusion matrix for a classification model with multiple output classes.

## Creating Classification Model For Multiple Classes

In this section, you’ll create a classification model for multiple output classes. In other words, it’s also called multivariate classes.

You’ll be using the iris dataset available in the sklearn dataset library.

It contains a total number of 150 data rows. Each row includes four numeric features and one output class. Output class can be any of one Iris flower type. Namely, *Iris Setosa*, *Iris Versicolour*, *Iris Virginica*.

To create the model, you’ll load the sklearn dataset, split it into train and testing set and fit the **train data** into the `KNeighborsClassifier`

model.

After creating the model, you can use the **test data** to predict the values and check how the model is performing.

You can use the actual output classes from your test data and the predicted output returned by the `predict()`

method to plot the confusion matrix and evaluate the model accuracy.

Use the below snippet to create the model.

**Snippet**

```
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier as KNN
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)
knn = KNN(n_neighbors = 3)
# train th model
knn.fit(X_train, y_train)
print('Model is Created')
```

**Output**

` Model is Created`

Now the model is created.

Use the test data from the train test split and predict the output value using the `predict()`

method as shown below.

**Snippet**

```
y_pred = knn.predict(X_test)
y_pred
```

You’ll have the predicted output as an array. The value 0, 1, 2 shows the predicted category of the test data.

**Output**

```
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
0, 1, 1, 2, 1, 2, 1, 2, 1, 0, 2, 1, 0, 0, 0, 1])
```

Now, you can use the predicted data available in `y_pred`

to create a confusion matrix for multiple classes.

## Plot Confusion matrix for Multiple Classes

In this section, you’ll learn how to plot a confusion matrix for multiple classes.

You can use the `confusion_matrix()`

method available in the sklearn library to create a confusion matrix. It’ll contain three rows and columns representing the actual flower category and the predicted flower category in ascending order.

**Snippet**

```
from sklearn.metrics import confusion_matrix
#Get the confusion matrix
cf_matrix = confusion_matrix(y_test, y_pred)
print(cf_matrix)
```

**Output**

```
[[23 0 0]
[ 0 19 0]
[ 0 1 17]]
```

The below output shows the confusion matrix for actual and predicted flower category counts.

You can use this matrix to plot the confusion matrix using the seaborn library, as shown below.

**Snippet**

```
import seaborn as sns
import matplotlib.pyplot as plt
ax = sns.heatmap(cf_matrix,
````annot=True`, cmap='Blues')
ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Flower Category')
ax.set_ylabel('Actual Flower Category ');
## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['Setosa','Versicolor', 'Virginia'])
ax.yaxis.set_ticklabels(['Setosa','Versicolor', 'Virginia'])
## Display the visualization of the Confusion Matrix.
plt.show()

**Output**

### Plot Confusion Matrix for Multiple Classes With Percentage

In this section, you’ll plot the confusion matrix for multiple classes with the percentage of each output class. You can calculate the percentage by dividing the values in the confusion matrix by the sum of all values.

Use the below snippet to plot the confusion matrix for multiple classes with percentages.

**Snippet**

`ax = sns.heatmap(``cf_matrix/np.sum(cf_matrix)`, annot=True,
fmt='.2%', cmap='Blues')
ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Flower Category')
ax.set_ylabel('Actual Flower Category ');
## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['Setosa','Versicolor', 'Virginia'])
ax.yaxis.set_ticklabels(['Setosa','Versicolor', 'Virginia'])
## Display the visualization of the Confusion Matrix.
plt.show()

**Output**

### Plot Confusion Matrix for Multiple Classes With Numbers And Percentages

In this section, you’ll learn how to plot a confusion matrix with labels, counts, and percentages for the multiple classes.

You can use this to measure the percentage of each label. For example, how much percentage of the predictions belong to each category of flowers.

For this, first, you need to create a list of labels, then count each label in one list and measure the percentage of the labels in another list.

Then you can zip these different lists to create concatenated labels. Zipping means concatenating an item from each list and create one list. Then, this list must be converted into an array using the `np.asarray()`

method.

This final array must be passed to `annot`

attribute. This will create a confusion matrix with the label, count, and percentage information for each category of flowers.

Use the below snippet to visualize the confusion matrix with all the details.

**Snippet**

```
#group_names = ['True Neg','False Pos','False Neg','True Pos','True Pos','True Pos','True Pos','True Pos','True Pos']
group_counts = ["{0:0.0f}".format(value) for value in
cf_matrix.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cf_matrix.flatten()/np.sum(cf_matrix)]
labels = [f"{v1}\n{v2}\n" for v1, v2 in
````zip(group_counts,group_percentages)`]
labels = np.asarray(labels).reshape(3,3)
ax = sns.heatmap(cf_matrix, `annot=labels`, fmt='', cmap='Blues')
ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Flower Category')
ax.set_ylabel('Actual Flower Category ');
## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['Setosa','Versicolor', 'Virginia'])
ax.yaxis.set_ticklabels(['Setosa','Versicolor', 'Virginia'])
## Display the visualization of the Confusion Matrix.
plt.show()

**Output**

This is how you can plot a confusion matrix for multiple classes with percentages and numbers.

## Plot Confusion Matrix Without Classifier

To plot the confusion matrix without a classifier model, refer to this StackOverflow answer.

## Conclusion

To summarize, you’ve learned how to plot a confusion matrix for the machine learning model with binary output classes and multiple output classes.

You’ve also learned how to annotate the confusion matrix with more details such as labels, count of each label, and percentage of each label for better visualization.

If you’ve any questions, comment below.