How To Split the Dataframe String Columns into Multiple Columns – With Examples

Pandas allow you to store values as rows and columns.

You can split the dataframe string columns into multiple columns using df[‘Col Name’].str.split(expand=True) statement.

Basic Example

df['Name'].str.split(expand=True)
  • By default, the string will be split using the space.

Dataframe Will Look Like

01
0Shivam.Pandey.ANone
1Kumar.RamNone
2FelixJohn

This tutorial teaches you how to split the dataframe String columns into multiple columns using different methods.

Sample Dataframe

The following sample dataframe is used to demonstrate the split operation.

It contains a string column, and the string column contains:

  • a string with three words separated using .
  • a string with two words separated using .
  • a string with two words separated using space.

Code

import pandas as pd 

# List of Tuples
users = [ ('Shivam.Pandey.A', 'India', 1),
             ('Kumar.Ram' , 'US', 2 ),
         ('Felix John' , 'Germany', 3 ),
              ]

#Create a DataFrame object
df = pd.DataFrame(  users, 
                    columns = ['Name', 'Country', 'Country Code']
                    ) 

df

Dataframe Will Look Like

NameCountryCountry Code
0Shivam.Pandey.AIndia1
1Kumar.RamUS2
2Felix JohnGermany3

Using Split

The series String split() method splits a single column into multiple columns.

It accepts a few parameters to customise the split operation.

  • patOptional. String or compiled regex to split. If no pattern is specified, the string is split based on whitespace by default.
  • expand – To expand the split strings into a new column. True – expands into a new column. False returns a series containing a list of strings. Default is False
  • regex – Determines if the pat is a regular expression or a literal string

Code

The following code demonstrates

  • how the Name column is split into multiple columns using the split() method.
  • The pattern is None; hence split will be done based on white space.
df['Name'].str.split(pat=None, expand=True)

Dataframe Will Look Like

01
0Shivam.Pandey.ANone
1Kumar.RamNone
2FelixJohn

To set the column headers after splitting, you can refer to How to Add Header To Pandas Dataframe

Specify the Number Of Splits

This section teaches you how to specify the number of splits using the split() method.

Specifying the number of splits will be helpful when the columns contain delimiters more than once.

For example, a string may contain more than one . in it.

  • To specify the number of splits, use the n=1 parameter where 1 sets the count.
  • n= None or 0 or -1 will return all the splits.

Code

The following code specifies how to limit the number of splits using the n parameter.

df['Name'].str.split('.', n=1, expand=True)

Since n=1, there will be only one split though the column contains more than one delimiter.

Dataframe Will Look Like

01
0ShivamPandey.A
1KumarRam
2Felix JohnNone

Split One Column Into Multiple Columns With Delimiter

To split one column into multiple columns with the delimiter, specify the delimiter to the split() method directly.

  • The column will be split into multiple columns based on the maximum number of delimiters available in that column.
  • If a string contains less delimiter, then the None value will be used to fill the columns.

Code

The following code demonstrates how to use the . delimiter to split one column into multiple columns.

df['Name'].str.split('.', expand=True)

Dataframe Will Look Like

012
0ShivamPandeyA
1KumarRamNone
2Felix JohnNoneNone

Split Columns Based on Conditions

To split columns based on conditions, use any methods that return True or False.

For example, the string contains() method returns:

  • True when the specified value is present in the String
  • False when the value is not present.

To learn the possible String methods check out the String methods.

Split based on Condition

  • First, create a mask based on the result of the contains() method.
  • While invoking the split() method, use where() to find the rows that match the True mask.

Code

The following code demonstrates how to split the string that contains . in them.

msk = df['Name'].str.contains('.')

df[['First Name','Last Name']] = df['Name'].str.split('.', 1, expand=True).where(msk, '')

df

Dataframe Will Look Like

NameCountryCountry CodeFirst NameLast Name
0Shivam.Pandey.AIndia1ShivamPandey.A
1Kumar.RamUS2KumarRam
2Felix JohnGermany3Felix JohnNone

This is how you can split string columns into multiple columns using the split() method and the masks.

Additional Resources

Leave a Comment