How To Split the Dataframe String Columns into Multiple Columns – With Examples

Pandas allow you to store values as rows and columns.

You can split the dataframe string columns into multiple columns using df[‘Col Name’].str.split(expand=True) statement.

This tutorial teaches you how to split the dataframe String columns into multiple columns using different methods.

If You’re in Hurry…

You can use the str.split() method as explained in the following code to split the dataframe string columns into multiple columns.

By default, the string will be split using the space.

Code

df['Name'].str.split(expand=True)

Dataframe Will Look Like

01
0Shivam.Pandey.ANone
1Kumar.RamNone
2FelixJohn

If You Want to Understand Details, Read on…

Different methods are available to split dataframe string columns into two or multiple columns. You can also use other delimiters to separate the string values. Let us learn each method in detail.

To combine the columns after splitting, you can refer to How To Combine Two Columns In Pandas

Sample Dataframe

The following sample dataframe is used to demonstrate the split operation.

It contains a string column, and the string column contains:

  • a string with three words separated using .
  • a string with two words separated using .
  • a string with two words separated using space.

Code

import pandas as pd 

# List of Tuples
users = [ ('Shivam.Pandey.A', 'India', 1),
             ('Kumar.Ram' , 'US', 2 ),
         ('Felix John' , 'Germany', 3 ),
              ]

#Create a DataFrame object
df = pd.DataFrame(  users, 
                    columns = ['Name', 'Country', 'Country Code']
                    ) 

df

Dataframe Will Look Like

NameCountryCountry Code
0Shivam.Pandey.AIndia1
1Kumar.RamUS2
2Felix JohnGermany3

Using Split

You can use the series String split() method to split a single column into multiple columns.

It accepts a few parameters to customise the split operation.

  • patOptional. String or compiled regex to split. If no pattern is specified, the string is split based on whitespace by default.
  • expand – To expand the split strings into a new column. True – expands into a new column. False returns a series containing a list of strings. Default is False
  • regex – Determines if the pat is a regular expression or a literal string

Code

The following code demonstrates how the Name column is split into multiple columns using the split() method.

The pattern is None; hence split will be done based on white space.

df['Name'].str.split(pat=None, expand=True)

Dataframe Will Look Like

01
0Shivam.Pandey.ANone
1Kumar.RamNone
2FelixJohn

This is how you can use the split method.

To set the column headers after splitting, you can refer to How to Add Header To Pandas Dataframe

Specify the Number Of Splits

This section teaches you how to specify the number of splits using the split() method.

Specifying the number of splits will be helpful when the columns contain delimiters more than once.

For example, a string may contain more than one . in it.

  • To specify the number of splits, use the n=1 parameter where 1 sets the count.
  • n= None or 0 or -1 will return all the splits.

Code

The following code specifies how to limit the number of splits using the n parameter.

df['Name'].str.split('.', n=1, expand=True)

Since n=1, there will be only one split though the column contains more than one delimiter.

Dataframe Will Look Like

01
0ShivamPandey.A
1KumarRam
2Felix JohnNone

Split One Column Into Multiple Columns With Delimiter

This section teaches you how to split one column into multiple columns with the delimiter.

You can specify the delimiter to the split method directly.

  • The column will be split into multiple columns based on the maximum number of delimiters available in that column.
  • If a string contains less delimiter, then the None value will be used to fill the columns.

Code

The following code demonstrates how to use the . delimiter to split one column into multiple columns.

df['Name'].str.split('.', expand=True)

Dataframe Will Look Like

012
0ShivamPandeyA
1KumarRamNone
2Felix JohnNoneNone

Split Columns Based on Conditions

This section teaches you how to split columns based on conditions.

You can use any methods that return True or False.

For example, the string contains() method returns

  • True when the specified value is present in the String
  • False when the value is not present.

To learn the possible String methods check out the String methods.

Split based on Condition

  • First, create a mask based on the result of the contains() method.
  • While invoking the split() method, use where() to find the rows that match the True mask.

Code

The following code demonstrates how to split the string that contains . in them.

msk = df['Name'].str.contains('.')

df[['First Name','Last Name']] = df['Name'].str.split('.', 1, expand=True).where(msk, '')

df

Dataframe Will Look Like

NameCountryCountry CodeFirst NameLast Name
0Shivam.Pandey.AIndia1ShivamPandey.A
1Kumar.RamUS2KumarRam
2Felix JohnGermany3Felix JohnNone

This is how you can split string columns into multiple columns using the split() method and the masks.

Conclusion

You’ve learned how to split the dataframe string columns into two or multiple columns using the split() method. You’ve also learned how to specify the number of splits and how to split columns based on conditions.

If you have any questions, please feel free to comment below.

You May Also Like

Leave a Comment