When working with data using Pandas, you may need to combine two columns in Pandas to create another column.
You can combine two columns in Pandas using df[“new column name“] = df[“column 1”] + df["column 2”]
statement.
Basic Example
df[“New Column Name“] = df[“Column 1“] + " " + df[“Column 2“]
df
- This will concatenate
Column 1
andColumn 2
and add the value to thenew column
.
In this tutorial, you’ll learn how to combine or concatenate two or more columns in Pandas dataframe to create another column.
Sample Dataframe
For the demonstration, you’ll use the below dataframe. It has four columns.
- First Name –
String
Type column - Last Name –
String
Type column - Country –
String
Type column - Country Code –
Number
Type column
Use these columns to concatenate and create a new column.
Code
import pandas as pd
# List of Tuples
users = [ ('Shivam', 'Pandey', 'India', 1),
('Kumar', 'Ram' , 'US', 2 ),
('Felix','John' , 'Germany', 3 ),
]
#Create a DataFrame object
df = pd.DataFrame( users,
columns = ['First Name' , 'Last Name', 'Country', 'Country Code']
)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | |
---|---|---|---|---|
0 | Shivam | Pandey | India | 1 |
1 | Kumar | Ram | US | 2 |
2 | Felix | John | Germany | 3 |
You’ll use the columns in the dataframe df
to learn different concatenation methods.
Using + Operator
The plus operator concatenates two Strings. This is the simplest method of concatenation.
Using plus operator, you can,
- Combine two columns and form a new column
- Assign the result value to the same dataframe using the new column name.
- This will add a new column to the dataframe with the combined string value.
Code
The following code demonstrates how the First Name Column and the Second Name column are combined to form a column called Full Name.
df["Full Name"] = df["First Name"] + " " + df["Last Name"]
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam Pandey |
1 | Kumar | Ram | US | 2 | Kumar Ram |
2 | Felix | John | Germany | 3 | Felix John |
This is how you can use the plus (+) operator to combine two columns of the Pandas dataframe.
Using Map() Method
The map() function maps the values of the series according to an input function.
- Pass a function to combine two columns
Use this method when you want to handle the Null or missing values in the Pandas Dataframe.
There are two actions available to handle the NaN
values.
None
– Default option. It usesNone
when a missing value is available in the column.’ignore’
– Used to ignore the concatenation operation when any values are missing in the dataframe.
Code
The following code uses the map()
method to handle the Null
values and uses the Plus (+) operator to concatenate the values.
df["Full Name"] = df["First Name"].map(str, na_action=None) + df["Last Name"].map(str, na_action='ignore')
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
Add another row to the dataframe with None
values to learn how the map()
method handles these values during concatenation.
Code
df.loc[3] = ['Borris', None, 'UK', 4, None]
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
3 | Borris | None | UK | 4 | None |
Now, use the map()
method and (+) operator to combine two string columns of the dataframe.
The na_action = None
parameter in the map()
method denotes that the Null
or missing values must be ignored during concatenation.
Code
The following code demonstrates how to ignore missing values during concatenation.
df["Full Name"] = df["First Name"].map(str, na_action=None) + df["Last Name"].map(str, na_action='ignore')
df
Dataframe Will Look Like
The output dataframe will show that the rows with the Null
or missing values are not combined. Instead, the NaN
value is added to the column.
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
3 | Borris | None | UK | 4 | NaN |
Now, drop the rows that are added to demonstrate the missing values.
df.drop([3], axis=0, inplace=True)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey |
1 | Kumar | Ram | US | 2 | KumarRam |
2 | Felix | John | Germany | 3 | FelixJohn |
This is how you can use the map()
method to combine two String columns of the dataframe and handle null values or missing values differently.
Using Cat() method
You can use the cat method to concatenate two strings.
- The
cat()
method concatenates two strings in the series with the given separator. - The default separator is ‘’. So if you do not specify any separator, the strings will be concatenated without any separator.
Use the cat()
method when you want to specify a representation character that needs to be used instead of Null
or missing values.
map() vs cat()
The difference between the map()
method and the cat()
method is that the map method doesn’t allow specifying the replacement character for the missing values. It only allows specifying if the missing values MUST be ignored or used.
Code
df['Full Name'] = df['First Name'].str.cat(df['Last Name'],sep=“,”, na_rep='----')
df
Dataframe Will Look Like
The missing value is represented with the ----
value and concatenated as shown in the fourth row.
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam,Pandey |
1 | Kumar | Ram | US | 2 | Kumar,Ram |
2 | Felix | John | Germany | 3 | Felix,John |
3 | Borris | None | UK | 4 | Borris,—- |
This is how you can use the cat()
method to concatenate two String columns in Pandas Dataframe.
Using Apply() Method
You can use the apply method to combine two columns of the Pandas Dataframe.
The apply()
method applies the defined function across the dataframe in the specified axis(Columns or Rows).
- To concatenate two columns, you can select the two columns and apply the join() function.
- The join method concatenates two strings with a separator. If you do not want to use any separator while combining two strings, you can use
’’.join()
.
Code
The following code combines the First Name and Last Name columns and assigns the result to the Full Name column.
df['Full Name'] = df[['First Name', 'Last Name']].apply(','.join , axis=1)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | |
---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam,Pandey |
1 | Kumar | Ram | US | 2 | Kumar,Ram |
2 | Felix | John | Germany | 3 | Felix,John |
This is how you can use the apply()
method and concatenate two string columns of the Pandas Dataframe.
Convert to Text and Combine Two Columns
In this section, you’ll learn how to convert a number column to a String before concatenating it with the String column.
The sample dataframe contains a number-type column called Country Code. This column cannot be concatenated directly with another String column.
- Convert the number column to string using astype(str)
- Concatenate with the desired columns
Code
The following code demonstrates how to convert a column to text and combine two columns. The country and the country code columns are combined with the separator ‘-‘.
df['Country & Code'] = df['Country'] +'-' + df['Country Code'].astype(str)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam Pandey | India-1 |
1 | Kumar | Ram | US | 2 | Kumar Ram | US-2 |
2 | Felix | John | Germany | 3 | Felix John | Germany-3 |
Concatenate Two Columns With a Separator
You can concatenate two columns of Pandas data frame with a separator using the cat()
method below.
Code
df['Full Name'] = df['First Name'].str.cat(df['Last Name'],sep=",")
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam,Pandey | India-1 |
1 | Kumar | Ram | US | 2 | Kumar,Ram | US-2 |
2 | Felix | John | Germany | 3 | Felix,John | Germany-3 |
3 | Borris | None | UK | 4 | NaN | None |
Concatenate Two Columns If Not Null
This section will teach you how to concatenate two columns only if the columns are not null.
- Use the
map()
function because it supports defining the action for theNull
or missing values using thena_action
parameter.
Code
The following code shows how to combine two columns of the Pandas dataframe only if the values are not null.
df.loc[3] = ['Borris', None, 'UK', 4, None, None]
df["Full Name"] = df["First Name"].map(str, na_action= 'ignore') + df["Last Name"].map(str, na_action='ignore')
df
Since the fourth row of the dataframe has Null
values, that specific row is not combined. It has the value of NaN
as shown in the output.
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey | India-1 |
1 | Kumar | Ram | US | 2 | KumarRam | US-2 |
2 | Felix | John | Germany | 3 | FelixJohn | Germany-3 |
3 | Borris | None | UK | 4 | NaN | None |
Now, drop the fourth row that contains the missing values.
df.drop([3], axis=0, inplace=True)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | ShivamPandey | India-1 |
1 | Kumar | Ram | US | 2 | KumarRam | US-2 |
2 | Felix | John | Germany | 3 | FelixJohn | Germany-3 |
Combine More Than Two Columns
In this section, you’ll learn how to combine more than two columns of the Pandas Dataframe.
You can use the agg()
method to combine more than two columns using the join()
method.
Code
#concat two columns with space
df['Full Name'] = df[['First Name', 'Last Name', 'Country']].agg(' '.join, axis=1)
df
Dataframe Will Look Like
First Name | Last Name | Country | Country Code | Full Name | Country & Code | |
---|---|---|---|---|---|---|
0 | Shivam | Pandey | India | 1 | Shivam Pandey India | India-1 |
1 | Kumar | Ram | US | 2 | Kumar Ram US | US-2 |
2 | Felix | John | Germany | 3 | Felix John Germany | Germany-3 |