When working with data using Pandas, you may need to combine two columns in Pandas to create another column.
You can combine two columns in Pandas using df[“new column name“] = df[“column 1”] + df["column 2”]
statement.
In this tutorial, you’ll learn how to combine or concatenate two or more columns in Pandas dataframe to create another column.
If You’re in Hurry…
You can use the +
operator to concatenate two columns in the pandas dataframe.
This is the simplest method to join two columns in Pandas dataframe.
df[“New Column Name“] = df[“Column 1“] + " " + df[“Column 2“]
df
This will concatenate Column 1
and Column 2
and add the value to the new column
.
If You Want to Understand Details, Read on…
The dataset might contain a text value in the dataframe, or some cells might have missing values.
Different methods are available to concatenate two or more columns in the dataframe in different scenarios.
Let us learn each one.
Table of Contents
Sample Dataframe
For the demonstration, you’ll use the below dataframe. It has four columns.
First Name – String
Type column
Last Name – String
Type column
Country – String
Type column
Country Code – Number
Type column
You’ll use these columns to concatenate and create a new column.
Code
import pandas as pd
# List of Tuples
users = [ ('Shivam', 'Pandey', 'India', 1),
('Kumar', 'Ram' , 'US', 2 ),
('Felix','John' , 'Germany', 3 ),
]
#Create a DataFrame object
df = pd.DataFrame( users,
columns = ['First Name' , 'Last Name', 'Country', 'Country Code']
)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | |
---|---|---|---|---|
0 | Shivam | Pandey | India | 1 |
1 | Kumar | Ram | US | 2 |
2 | Felix | John | Germany | 3 |
You’ll use the columns in the dataframe df
to learn different concatenation methods.
Using + Operator
You can use the plus operator to concatenate two Strings.
Similarly, you can pass the two dataframe columns with the plus operator to combine two Strings and form a new column.
You can assign the result value to the same dataframe using the new column name. This will add a new column to the dataframe with the combined string value.
Using the plus(+) operator is the simplest method of concatenation.
The following code demonstrates how the First Name Column and the Second Name column are combined to form a column called Full Name.
Also, a space is added as a separator between the two-column values.
Code
df["Full Name"] = df["First Name"] + " " + df["Last Name"]
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam Pandey |
1 | Kumar | Ram | US | 2 | Kumar Ram |
2 | Felix | John | Germany | 3 | Felix John |
This is how you can use the plus (+) operator to combine two columns of the Pandas dataframe.
Using Map() Method
You can use the map() function to combine two columns of the Pandas dataframe.
When to Use
You can use this method when you want to handle the Null or missing values in the Pandas Dataframe.
There are two actions available to handle the NaN
values.
None
– Default option. It uses None when a missing value is available in the column.’ignore’
– Used to ignore the concatenation operation when any values are missing in the dataframe.
The following code uses the map()
method to handle the Null
values and uses the Plus (+) operator to concatenate the values.
Code
df["Full Name"] = df["First Name"].map(str, na_action=None) + df["Last Name"].map(str, na_action='ignore')
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
Now, you’ll add another row to the dataframe with None
values to learn how the map()
method handles these values during concatenation.
Code
df.loc[3] = ['Borris', None, 'UK', 4, None]
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
3 | Borris | None | UK | 4 | None |
Now, you’ll use the map()
method and (+) operator to combine two string columns of the dataframe.
The na_action = None
parameter in the map()
method denotes that the Null
or missing values must be ignored during concatenation.
Code
df["Full Name"] = df["First Name"].map(str, na_action=None) + df["Last Name"].map(str, na_action='ignore')
df
Dataframe Will Look Like
The output dataframe will show that the rows with the Null
or missing values are not combined. Instead, the NaN
value is added to the column.
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
3 | Borris | None | UK | 4 | NaN |
Now, drop the rows that are added to demonstrate the missing values.
Code
df.drop([3], axis=0, inplace=True)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
This is how you can use the map()
method to combine two String columns of the dataframe and handle null values or missing values differently.
Using Cat() method
You can use the cat method to concatenate two strings.
The cat()
method concatenates two strings in the series with the given separator.
The default separator is ‘’. So if you do not specify any separator, the Strings will be concatenated without any separator.
When to Use
You can use the cat()
method when you want to specify a representation character that needs to be used instead of Null
or missing values.
map() vs cat()
The difference between the map()
method and the cat()
method is that the map method doesn’t allow specifying the replacement character for the missing values. It only allows specifying if the missing values MUST be ignored or used.
Code
df['Full Name'] = df['First Name'].str.cat(df['Last Name'],sep=“,”, na_rep='----')
df
The missing value is represented with the ----
value and concatenated as shown in the fourth row.
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam,Pandey |
1 | Kumar | Ram | US | 2 | Kumar,Ram |
2 | Felix | John | Germany | 3 | Felix,John |
3 | Borris | None | UK | 4 | Borris,—- |
This is how you can use the cat()
method to concatenate two String columns in Pandas Dataframe.
Using Apply() Method
You can use the apply method to combine two columns of the Pandas Dataframe.
The apply()
method applies the defined function across the dataframe in the specified axis(Columns or Rows).
To concatenate two columns, you can select the two columns and apply the join() function.
Join method concatenates two strings with a separator. If you do not want to use any separator while combining two strings, you can use ’’.join()
.
The following code combines the First Name and Last Name columns and assigns the result to the Full Name column.
Code
df['Full Name'] = df[['First Name', 'Last Name']].apply(','.join , axis=1)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam,Pandey |
1 | Kumar | Ram | US | 2 | Kumar,Ram |
2 | Felix | John | Germany | 3 | Felix,John |
This is how you can use the apply()
method and concatenate two string columns of the Pandas Dataframe.
Using agg() Method
You can also use the agg() method to combine two strings columns of the Pandas Dataframe.
This method is used to aggregate the columns or rows. There are No specific advantages of using this method over the other methods to combine two string columns.
Code
#concat two columns with space
df['Full Name'] = df[['First Name', 'Last Name']].agg(' '.join, axis=1)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam Pandey |
1 | Kumar | Ram | US | 2 | Kumar Ram |
2 | Felix | John | Germany | 3 | Felix John |
These are the different methods to combine two string columns of the Pandas dataframe and the appropriate use-cases.
Now let us see how these are applied in different scenarios.
Convert to Text and Combine Two Columns
In this section, you’ll learn how to convert a number column to a String before concatenating it with the String column.
The sample dataframe contains a number type column called Country Code. This column cannot be concatenated directly with another String column. Hence, you need to convert this number column to a string value before concatenating.
You can use the astype(str)
to convert a number to a string.
The following code demonstrates how to convert a column to text and combine two columns. The country and the country code columns are combined with the separator ‘-‘.
Code
df['Country & Code'] = df['Country'] +'-' + df['Country Code'].astype(str)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam Pandey | India-1 |
1 | Kumar | Ram | US | 2 | Kumar Ram | US-2 |
2 | Felix | John | Germany | 3 | Felix John | Germany-3 |
Concatenate Two Columns With a Separator
You can concatenate two columns of Pandas data frame with a separator using the cat()
method below.
Code
df['Full Name'] = df['First Name'].str.cat(df['Last Name'],sep=",")
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam,Pandey | India-1 |
1 | Kumar | Ram | US | 2 | Kumar,Ram | US-2 |
2 | Felix | John | Germany | 3 | Felix,John | Germany-3 |
3 | Borris | None | UK | 4 | NaN | None |
Concatenate Two Columns If Not Null
This section will teach you how to concatenate two columns only if the columns are not null.
You can use the map()
function because it supports defining the action for the Null
or missing values using the na_action
parameter.
The following code shows how to combine two columns of the Pandas dataframe only if the values are not null.
Code
df.loc[3] = ['Borris', None, 'UK', 4, None, None]
df["Full Name"] = df["First Name"].map(str, na_action= 'ignore') + df["Last Name"].map(str, na_action='ignore')
df
Since the fourth row of the dataframe has Null
values, that specific row is not combined. It has the value of NaN
as shown in the output.
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey | India-1 |
1 | Kumar | Ram | US | 2 | KumarRam | US-2 |
2 | Felix | John | Germany | 3 | FelixJohn | Germany-3 |
3 | Borris | None | UK | 4 | NaN | None |
Now, drop the fourth row that contains the missing values.
df.drop([3], axis=0, inplace=True)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey | India-1 |
1 | Kumar | Ram | US | 2 | KumarRam | US-2 |
2 | Felix | John | Germany | 3 | FelixJohn | Germany-3 |
Combine More Than Two Columns
In this section, you’ll learn how to combine more than two columns of the Pandas Dataframe.
You can use the agg()
method to combine more than two columns using the join()
method.
Code
#concat two columns with space
df['Full Name'] = df[['First Name', 'Last Name', 'Country']].agg(' '.join, axis=1)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam Pandey India | India-1 |
1 | Kumar | Ram | US | 2 | Kumar Ram US | US-2 |
2 | Felix | John | Germany | 3 | Felix John Germany | Germany-3 |
Conclusion
To summarise, you’ve learned how to combine two columns in Pandas Dataframe.
Among the different methods available such as cat()
, map()
and the plus
operator, you’ve learned when it’s appropriate to use each technique.
If you have any questions, please comment below.