Sklearn datasets become handy for learning machine learning concepts. When using the sklearn datasets, you may need to convert them to pandas dataframe for manipulating and cleaning the data.

You can convert the sklearn dataset to pandas dataframe by using the pd.Dataframe(data=iris.data) method.

In this tutorial, you’ll learn how to convert sklearn datasets into pandas dataframe.

If you’re in Hurry

You can use the following code to convert the sklearn dataset to a pandas dataframe.

Code

import pandas as pd

from sklearn import datasets

iris = datasets.load_iris()

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

df["target"] = iris.target

df.head()

Dataframe Will Look Like

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

This is how you can convert the sklearn dataset to a pandas dataframe.

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn how to convert sklearn datasets to pandas dataframe while using the sklearn datasets to create a machine learning models.

Table of Contents

Converting Sklearn Datasets To Dataframe Without Column Names

In this section, you’ll convert the sklearn datasets to dataframes without columns names.

You can use this when you want to convert the dataset to a pandas dataframe for visualization purposes.
The columns will be named with the default indexes 0, 1, 2, 3, 4, and so on.

Code

import pandas as pd

from sklearn import datasets

iris = datasets.load_iris()

df = pd.DataFrame(data=iris.data)

df["target"] = iris.target

df.head()

Dataframe Will Look Like

	0	1	2	3
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

Next, you’ll learn about the column names.

Converting Sklearn Datasets To Dataframe Using Feature Names As Columns

Sklearn providers the names of the features in the attribute feature_names.

You can use this attribute in the pd.DataFrame() method to create the dataframe with the column headers.
If the dataset is a classification-type dataset, then sklearn also provides the target variable for the samples in the attribute target. You can use the target to fetch the target values and append them into your dataframe

Code

import pandas as pd

from sklearn import datasets

iris = datasets.load_iris()

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

df["target"] = iris.target

df.head()

Dataframe Will Look Like

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

This is how you can convert the sklearn dataset to pandas dataframe with column headers by using the sklearn datasets’ feature_names attribute.

Later, if you want to rename the features, you can also rename the dataframe columns.

Using Custom Column Headers

In some cases, you may need to use custom headers as columns rather than using the sklearn datasets feature_names attribute.

You can do it by passing the list of column headers as the list to the pd.Dataframe() method.

Code

In the following example,

You’ll be using the column headers only with the column names ignoring the unit of the data (cm). Here, the unit (cm) doesn’t make a big difference.

import pandas as pd

from sklearn import datasets

# Load the IRIS dataset
iris = datasets.load_iris()

df = pd.DataFrame(data=iris.data, columns=["sepal_length", "sepal_width", "petal_length", "petal_width"])

df["target"] = iris.target

df.head()

Dataframe will Look Like

	sepal_length	sepal_width	petal_length	petal_width
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

Converting Only Specific Columns from Sklearn Dataset

In some scenarios, you may not need all the columns in the sklearn datasets to be available in the pandas dataframe.

In that case, you need to create a pandas dataframe with specific columns from the sklearn datasets.

There is no method directly available to do this. Because the sklearn datasets return a bunch of objects. You cannot retrieve a specific column from it.

First, you need to convert the entire dataset to the dataframe
Drop the unnecessary columns, or you can only select a few columns from the dataframe and create another dataframe.

Code

import pandas as pd

from sklearn import datasets

iris = datasets.load_iris()

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

df = df[["sepal_length", "petal_length"]]

df["target"] = iris.target

df.head()

Dataframe will Look Like

	sepal_length	petal_length
0	5.1	1.4
1	4.9	1.4
2	4.7	1.3
3	4.6	1.5
4	5.0	1.4

This is how you can convert only specific columns from the sklearn datasets to pandas dataframe.

Display Names of Target Instead Of Numbers

To display the names of the target instead of the numbers in the target column, you can use the pandas map function.

Having names in the column looks more descriptive to visualise the dataset and is easily understandable.

To map the target names to numbers after creating a dataframe:

Create a dictionary with mapping for each target number with its name
Apply the map() function with the dictionary on the target columns
You’ll see the names of the target instead of numbers

import pandas as pd

from sklearn import datasets

iris = datasets.load_iris()

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

df["target"] = iris.target

target_names = {0: "Iris-Setosa", 1: "Iris-Versicolour", 2:"Iris-Virginica" }

df['target'] =df['target'].map(target_names)

df.head()

Dataframe Will Look Like

The target column in the dataframe will have the actual name of the target instead of the numbers.

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
0	5.1	3.5	1.4	0.2	`Iris-Setosa`
1	4.9	3.0	1.4	0.2	`Iris-Setosa`
2	4.7	3.2	1.3	0.2	`Iris-Setosa`
3	4.6	3.1	1.5	0.2	`Iris-Setosa`
4	5.0	3.6	1.4	0.2	`Iris-Setosa`

Conclusion

To summarize, you’ve learned how to convert the sklearn dataset to a pandas dataframe. This is the same for all the datasets you use such as

Boston house prices dataset
Iris plants dataset
Diabetes dataset
Linnerrud dataset
Wine recognition dataset
Breast cancer dataset
The Olivetti faces dataset
California Housing dataset

If you’ve any questions, comment below.

Converting Sklearn Datasets To Dataframe Without Column Names

Converting Sklearn Datasets To Dataframe Using Feature Names As Columns

Using Custom Column Headers

Converting Only Specific Columns from Sklearn Dataset

Display Names of Target Instead Of Numbers

Conclusion

You May Also Like

4 thoughts on “How to Convert Sklearn Dataset to Pandas Dataframe in Python”

Leave a Comment Cancel reply