How To Split A Column Of Lists Into Multiple Columns in Pandas DataFrame – Definitive Guide

A column in pandas might contain a list of values.

You can split a column of lists into multiple columns in the pandas dataframe using pd.DataFrame(df.Values.tolist(), index= df.index) statement.

This tutorial teaches you the different methods to split a column of lists into multiple columns and when it is appropriate to use them.

Sample Dataframe

Create the following dataframe.

  • One columns with a String value
  • Another column with a list of values. (All the lists are of same size)

Code

import pandas as pd

df = pd.DataFrame({
    'Name' : ['A', 'B', 'C'],
    'Values': [[10,20,30], [40,50,60], [70,80,90]]
})

df

DataFrame Will Look Like

NameValues
0A[10, 20, 30]
1B[40, 50, 60]
2C[70, 80, 90]

Using to_list()

The to_list() method returns a list of values from a pandas series.

This is the fastest method to split a column into multiple columns.

To split a list of columns into multiple columns,

  • Create a new dataframe with the list of values using df.Values.to_list()
  • Use the same index as the existing dataframe using index= df.index
  • Assign the resultant dataframe to the existing dataframe by specifying the column names for the new columns

Code

The following code demonstrates how to split the column Values and assign it to the new columns in the dataframe with the column names Value_1, Value_2, Value_3.

df[['Value_1','Value_2', 'Value_3']] = pd.DataFrame(df.Values.to_list(), index= df.index)

df

DataFrame Will Look Like

NameValuesValue_1Value_2Value_3
0A[10, 20, 30]102030
1B[40, 50, 60]405060
2C[70, 80, 90]708090

Drop the List Column

After splitting the list column into multiple columns, you can drop the list column using the following code.

df.drop('Values', axis=1, inplace=True)

df

DataFrame Will Look Like

NameValue_1Value_2Value_3
0A102030
1B405060
2C708090

Using Apply and Series

The apply() method applies a specific function into a pandas columns and returns the results.

To split a list of columns into multiple columns,

  • Invoke the apply() function in the specific column and pass the pd.series attribute. This applies the pd.series function in the column and returns the list as a pandas series.
  • Assign the values to the existing dataframe with the new columns.

Code

df[['Value_1','Value_2', 'Value_3']] = df.Values.apply(pd.Series)

df

DataFrame Will Look Like

NameValuesValue_1Value_2Value_3
0A[10, 20, 30]102030
1B[40, 50, 60]405060
2C[70, 80, 90]708090

Handling List with Different Sizes

Sometimes, the lists in the column might be of different sizes.

In this case, the to_list() method in pandas uses the NaN values to denote the missing value for the column. There is no explicit code required to handle the list with different sizes.

Code

In the sample dataframe,

  • two lists contain two elements
  • one list contains three elements
import pandas as pd

df = pd.DataFrame({
    'Name' : ['A', 'B', 'C'],
    'Values': [[10,20], [40,50,60], [70,90]]
})

df

The split operations use the NaN to denote those missing values, and no error is thrown.

Code

df[['Value_1','Value_2', 'Value_3']] = pd.DataFrame(df.Values.tolist(), index= df.index)

df

DataFrame Will Look Like

NameValuesValue_1Value_2Value_3
0A[10, 20]1020NaN
1B[40, 50, 60]405060.0
2C[70, 90]7090NaN

Additional Resources

Leave a Comment