A column in pandas might contain a list of values.
You can split a column of lists into multiple columns in the pandas dataframe using pd.DataFrame(df.Values.tolist(), index= df.index) statement.
This tutorial teaches you the different methods to split a column of lists into multiple columns and when it is appropriate to use them.
Table of Contents
Sample Dataframe
Create the following dataframe.
- One columns with a String value
- Another column with a list of values. (All the lists are of same size)
Code
import pandas as pd
df = pd.DataFrame({
'Name' : ['A', 'B', 'C'],
'Values': [[10,20,30], [40,50,60], [70,80,90]]
})
df
DataFrame Will Look Like
Name | Values | |
---|---|---|
0 | A | [10, 20, 30] |
1 | B | [40, 50, 60] |
2 | C | [70, 80, 90] |
Using to_list()
The to_list()
method returns a list of values from a pandas series.
This is the fastest method to split a column into multiple columns.
To split a list of columns into multiple columns,
- Create a new dataframe with the list of values using
df.Values.to_list()
- Use the same index as the existing dataframe using
index= df.index
- Assign the resultant dataframe to the existing dataframe by specifying the column names for the new columns
Code
The following code demonstrates how to split the column Values and assign it to the new columns in the dataframe with the column names Value_1, Value_2, Value_3.
df[['Value_1','Value_2', 'Value_3']] = pd.DataFrame(df.Values.to_list(), index= df.index)
df
DataFrame Will Look Like
Name | Values | Value_1 | Value_2 | Value_3 | |
---|---|---|---|---|---|
0 | A | [10, 20, 30] | 10 | 20 | 30 |
1 | B | [40, 50, 60] | 40 | 50 | 60 |
2 | C | [70, 80, 90] | 70 | 80 | 90 |
Drop the List Column
After splitting the list column into multiple columns, you can drop the list column using the following code.
df.drop('Values', axis=1, inplace=True)
df
DataFrame Will Look Like
Name | Value_1 | Value_2 | Value_3 | |
---|---|---|---|---|
0 | A | 10 | 20 | 30 |
1 | B | 40 | 50 | 60 |
2 | C | 70 | 80 | 90 |
Using Apply and Series
The apply()
method applies a specific function into a pandas columns and returns the results.
To split a list of columns into multiple columns,
- Invoke the
apply()
function in the specific column and pass thepd.series
attribute. This applies the pd.series function in the column and returns the list as a pandas series. - Assign the values to the existing dataframe with the new columns.
Code
df[['Value_1','Value_2', 'Value_3']] = df.Values.apply(pd.Series)
df
DataFrame Will Look Like
Name | Values | Value_1 | Value_2 | Value_3 | |
---|---|---|---|---|---|
0 | A | [10, 20, 30] | 10 | 20 | 30 |
1 | B | [40, 50, 60] | 40 | 50 | 60 |
2 | C | [70, 80, 90] | 70 | 80 | 90 |
Handling List with Different Sizes
Sometimes, the lists in the column might be of different sizes.
In this case, the to_list()
method in pandas uses the NaN values to denote the missing value for the column. There is no explicit code required to handle the list with different sizes.
Code
In the sample dataframe,
- two lists contain two elements
- one list contains three elements
import pandas as pd
df = pd.DataFrame({
'Name' : ['A', 'B', 'C'],
'Values': [[10,20], [40,50,60], [70,90]]
})
df
The split operations use the NaN to denote those missing values, and no error is thrown.
Code
df[['Value_1','Value_2', 'Value_3']] = pd.DataFrame(df.Values.tolist(), index= df.index)
df
DataFrame Will Look Like
Name | Values | Value_1 | Value_2 | Value_3 | |
---|---|---|---|---|---|
0 | A | [10, 20] | 10 | 20 | NaN |
1 | B | [40, 50, 60] | 40 | 50 | 60.0 |
2 | C | [70, 90] | 70 | 90 | NaN |