Pandas allow you to store values as rows and columns.
You can split the dataframe string columns into multiple columns using df[‘Col Name’].str.split(expand=True) statement.
Basic Example
df['Name'].str.split(expand=True)
- By default, the string will be split using the space.
Dataframe Will Look Like
0 | 1 | |
---|---|---|
0 | Shivam.Pandey.A | None |
1 | Kumar.Ram | None |
2 | Felix | John |
This tutorial teaches you how to split the dataframe String columns into multiple columns using different methods.
Sample Dataframe
The following sample dataframe is used to demonstrate the split operation.
It contains a string column, and the string column contains:
- a string with three words separated using
.
- a string with two words separated using
.
- a string with two words separated using
space
.
Code
import pandas as pd
# List of Tuples
users = [ ('Shivam.Pandey.A', 'India', 1),
('Kumar.Ram' , 'US', 2 ),
('Felix John' , 'Germany', 3 ),
]
#Create a DataFrame object
df = pd.DataFrame( users,
columns = ['Name', 'Country', 'Country Code']
)
df
Dataframe Will Look Like
Name | Country | Country Code | |
---|---|---|---|
0 | Shivam.Pandey.A | India | 1 |
1 | Kumar.Ram | US | 2 |
2 | Felix John | Germany | 3 |
Using Split
The series String split() method splits a single column into multiple columns.
It accepts a few parameters to customise the split operation.
pat
–Optional
. String or compiled regex to split. If no pattern is specified, the string is split based on whitespace by default.expand
– To expand the split strings into a new column.True
– expands into a new column.False
returns a series containing a list of strings. Default isFalse
regex
– Determines if thepat
is a regular expression or a literal string
Code
The following code demonstrates
- how the
Name
column is split into multiple columns using thesplit()
method. - The pattern is
None
; hence split will be done based on white space.
df['Name'].str.split(pat=None, expand=True)
Dataframe Will Look Like
0 | 1 | |
---|---|---|
0 | Shivam.Pandey.A | None |
1 | Kumar.Ram | None |
2 | Felix | John |
To set the column headers after splitting, you can refer to How to Add Header To Pandas Dataframe
Specify the Number Of Splits
This section teaches you how to specify the number of splits using the split()
method.
Specifying the number of splits will be helpful when the columns contain delimiters more than once.
For example, a string may contain more than one .
in it.
- To specify the number of splits, use the
n=1
parameter where1
sets the count. - n=
None
or0
or-1
will return all the splits.
Code
The following code specifies how to limit the number of splits using the n
parameter.
df['Name'].str.split('.', n=1, expand=True)
Since n=1
, there will be only one split though the column contains more than one delimiter.
Dataframe Will Look Like
0 | 1 | |
---|---|---|
0 | Shivam | Pandey.A |
1 | Kumar | Ram |
2 | Felix John | None |
Split One Column Into Multiple Columns With Delimiter
To split one column into multiple columns with the delimiter, specify the delimiter to the split()
method directly.
- The column will be split into multiple columns based on the maximum number of delimiters available in that column.
- If a string contains less delimiter, then the
None
value will be used to fill the columns.
Code
The following code demonstrates how to use the .
delimiter to split one column into multiple columns.
df['Name'].str.split('.', expand=True)
Dataframe Will Look Like
0 | 1 | 2 | |
---|---|---|---|
0 | Shivam | Pandey | A |
1 | Kumar | Ram | None |
2 | Felix John | None | None |
Split Columns Based on Conditions
To split columns based on conditions, use any methods that return True
or False
.
For example, the string contains()
method returns:
True
when the specified value is present in the StringFalse
when the value is not present.
To learn the possible String methods check out the String methods.
Split based on Condition
- First, create a mask based on the result of the
contains()
method. - While invoking the
split()
method, use where() to find the rows that match theTrue
mask.
Code
The following code demonstrates how to split the string that contains .
in them.
msk = df['Name'].str.contains('.')
df[['First Name','Last Name']] = df['Name'].str.split('.', 1, expand=True).where(msk, '')
df
Dataframe Will Look Like
Name | Country | Country Code | First Name | Last Name | |
---|---|---|---|---|---|
0 | Shivam.Pandey.A | India | 1 | Shivam | Pandey.A |
1 | Kumar.Ram | US | 2 | Kumar | Ram |
2 | Felix John | Germany | 3 | Felix John | None |
This is how you can split string columns into multiple columns using the split()
method and the masks.