How to Change Column Type In Pandas Dataframe- Definitive Guide

Pandas Dataframe is a powerful two-dimensional data structure that can be used to store and manipulate data for your Data analysis tasks.

You can change the column type in pandas dataframe using the df.astype() method.

Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.

In this tutorial, you’ll learn how to change the column type of the pandas dataframe using

  • pandas astype()
  • pandas to_numeric()

If You’re in Hurry…

You can use the below code snippet to change the column type of the pandas dataframe using the astype() method.

df = df.astype({"Column_name": str}, errors='raise') 

df.dtypes

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"Column_name": str} – List of columns to be cast into another format. Column_name is the column which need to be cast into another format. str is the target datatype to which the column values should be converted. You can use any of the builtin datatypes of Python or the datatypes available in Numpy .
  • errors='raise' – To specify how the exceptions to be handled while converting. raise will raise the error and ignore will ignore the errors and performs conversion only on the possible cell values.

This is how you can convert data types of columns in the dataframe.

If You Want to Understand Details, Read on…

In this detailed tutorial, you’ll learn how to change column type in pandas dataframe using different methods provided by the pandas themselves.

Also, the examples to perform different types of conversion.

Sample Dataframe

This is the sample dataframe used throughout the tutorial.

  • import pandas as pd to use functionalities provided by pandas
  • import numpy as np to use functionalities provided by numpy. You’ll specifically use the datatype int64 from np as int64 is not available in python by default.

Snippet

import pandas as pd
import numpy as np

# Creating a Dictionary
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers"],
        "Unit_Price":[500,200, 5000, 10000, 250.50],
        "No_Of_Units":[5,5, 10, 20, 8],
        "Available_Quantity":[5,10,11,15, "Not Available"],
        "Available_Since_Date":['11/5/2021', '4/23/2021', '08/21/2021','09/18/2021','01/05/2021']
       }

# Creating a dataframe from the dictionary
df = pd.DataFrame(data)

# Printing the datatype of the columns
df.dtypes

You can check out the datatype of each column by using the code df.dtypes. Then you’ll see the type of each column printed.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

The dataframe consists of types object, float64 and int64.

Note: The String types are displayed as objects.

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

You have the sample dataframe created with different data types.

Next, you’ll see how different types of columns can be cast to another format.

Pandas Change Column Type To String

In this section, you’ll learn how to change column type to String.

You can use it by using the astype() method and mentioning the str as target datatype.

In the sample dataframe, the column Unit_Price is float64. When the below line is executed, Unit_Price column will be converted to String format.

Snippet

df = df.astype({"Unit_Price": str})

df.dtypes

Where,

  • df.astype – Method to convert to another datatype
  • {"Unit_Price": str}Unit_Price is column name and str is the target datatype.

The df.dtypes will print the types of the column.

Datatypes of Columns

    product_name            object
    Unit_Price              object
    No_Of_Units              int64
    Available_Quantity      object
    Available_Since_Date    object
    dtype: object

Before conversion, the column Unit_Price was float64.

Now you can see the Unit_Price is converted to String, and it is displayed as object type.

Refer to this link to understand why String is displayed as an object.

You’ve learned how to cast a column type to String.

Next, you’ll see how to convert column type to int.

Pandas Change Column Type To Int

In this section, you’ll learn how to change the column type to int.

You can convert a column to int using the to_numeric() method or astype() method.

Let’s look at both methods in detail.

Using to_numeric()

to_numeric() method will convert a column to int or float based on the values available in the column.

  • If column contains only numbers without decimal, to_numeric() will convert it to int64
  • If column contains numbers with decimal points, to_numeric() will convert it to float64.

Example: The Unit_Price column in the sample dataframe contains decimal numbers and the No_Of_Units column contains only numbers.

Hence the to_numeric() method will convert the Unit_Price column to float64 and the No_Of_Units column to int64.

# convert column "Unit_Price" of a DataFrame

df["Unit_Price"] = pd.to_numeric(df["Unit_Price"])

df["No_Of_Units"] = pd.to_numeric(df["No_Of_Units"])

df.dtypes

Datatypes after converting it using the to_numeric() method.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

Now, you’ll see how to handle exceptions while using to_numeric() method.

Error Handling in to_numeric

Exception handling or error handling is one of the good programming practices. Any operation in a program is prone to errors.

While converting a column to int , errors may occur because the column can contain non-numeric values. In that case, the conversion cannot take place. So you need to specify how to handle the errors that occur during conversion.

You can use the additional optional parameter errors to specify how the errors should be handled.

errors='raise' will raise the error.

For example, the Available_Quantity column in the sample dataframe contains a String value Not Available in one of the cells. It cannot be converted to a number. In this case, the conversion will raise the error.

Snippet

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='raise')

An error will be raised as ValueError: Unable to parse string "Not Available" as follows.

Error Output

    ---------------------------------------------------------------------------

    ValueError                                Traceback (most recent call last)

    pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()


    ValueError: Unable to parse string "Not Available"


    During handling of the above exception, another exception occurred:

    pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()


    ValueError: Unable to parse string "Not Available" at position 4

This is how you can raise the error and stop the conversion if there is any problem during conversion.

Next, you’ll see how to ignore the errors.

Ignoring the errors

You can ignore the errors that occur during the conversion by using the errors='ignore'.

For example, when you convert the Availability_Quantity column to int which has a String value, errors will occur.

When errors='ignore' is used, the conversion will be stopped silently without raising any errors. You’ll have the original dataframe intact.

Snippet

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='ignore')

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

This is how you can ignore the errors while converting.

Coercing the Error

Coercing means, persuade (an unwilling person) to do something by using force. Similarly, in this context, you’ll force the to_numeric() method to convert the columns though it has some invalid values.

It’ll convert the possible cell values and ignore the invalid values.

Snippet

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='coerce')

df.dtypes

You could see the Available_Quantity column is converted to float64. The String values in the column are converted to NaN, which denotes
Not A Number.

You can see that in the below visualized dataframe.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity      float64
    Available_Since_Date     object
    dtype: object

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.055.011/5/2021
1Mouse200.0510.04/23/2021
2Monitor5000.01011.008/21/2021
3CPU10000.02015.009/18/2021
4Speakers250.58NaN01/05/2021

This is how you can use the to_numeric() to convert the column to any of the number types.

Next, you’ll learn about the astype() method.

Using astype()

astype() method is used to convert columns to any type specified in the method parameter.

You can convert column to int by specifying int in the method parameter as shown below.

Snippet

df = df.astype({"No_Of_Units": int})

df.dtypes

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"No_Of_Units": int} – List of columns to be cast into another format. No_Of_Units is the column which need to be cast into int format. int is the target datatype to which the column values should be converted. Now the column will be converted to int32.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int32
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

Note : astype() converts into int32 whereas to_numeric() converts it into int64 by default.

astype() is useful but you need to note few points. You need to use np.int64, if you want to convert it into 64-bit integer.

Now, let’s see how to handle errors during astype() conversion.

Error Handling in astype()

As said before, errors are part of any programming. You need to specify how it needs to be handled when it occurs.

You can do this by using the optional parameter errors.

errors='raise' will raise the error.

For example, the Available_Quantity column in the sample dataframe contains a String value Not Available in one of the cells. It cannot be converted to a number. In this case, the conversion will raise the error.

Snippet

df = df.astype({"Available_Quantity": float}, errors='raise')

df.dtypes

Error will be raised as below.

Error Output

    ---------------------------------------------------------------------------

    ValueError                                Traceback (most recent call last)

    <ipython-input-13-616dd5b910d4> in <module>
    ----> 1 df = df.astype({"Available_Quantity": float},errors='raise')
          2 
          3 df.dtypes

    ValueError: could not convert string to float: 'Not Available'

You’ve raised the error during conversion.

Next, you’ll see how to ignore the errors.

Ignoring the errors

You can ignore the errors that occur during the conversion by using the errors='ignore'.

For example, when you convert the Availability_Quantity column to int which has a String value, errors will occur.

When errors='ignore' is used, the conversion will be stopped silently without raising any errors. you’ll have the original dataframe intact.

df = df.astype({"Available_Quantity": float}, errors='ignore')

df.dtypes

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int32
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

You could see that the Availability_Quantity column is still the type object which means it is not converted but no other errors were raised as well.

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

This is how you can ignore the errors during conversion.

Note:

astype() doesn’t coerce and performs the conversion on the applicable value. It either converts or ignores and returns the original values. Hence, you’ll not be able to use errors=’coerce’ with the astype() method.

You’ve learned how to cast column type to int.

Next, you’ll see how to convert objects to int64.

Pandas Change Column Type From Object to Int64

In this section, you’ll learn how to change column type from object to int64.

You can do it by using the to_numeric() method as shown below. It automatically converts numbers to int64 by default.

Snippet

df["No_Of_Units"] = pd.to_numeric(df["No_Of_Units"])

df.dtypes

You’re converting No_Of_Units column to int. See it is converted to int64.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

Now, let’s see the default behavior of the astype() method and how it can be used to convert objects to int64.

If you just specify int in astype, it converts the column to int32.

Snippet

df = df.astype({"No_Of_Units": int})

df.dtypes

You’re converting No_Of_Units column to int. See it is converted to int32.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int32
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

Now, you’ll convert object to int64 using astype().

You can use np.int64 in type to convert column to int64.

Snippet

df = df.astype({"No_Of_Units": np.int64})

df.dtypes

You’re converting No_Of_Units column to np.int64. See it is converted to int64.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

This is how you can convert to_numeric() and astype() to cast column type from object to int64.

Next, you’ll see how to convert column type from int to string.

Pandas Change Column Type From Int To String

In this section, you’ll learn how to change column type from Int to String.

You can use the astype() method to convert an int column to a String.

In the sample dataframe, the column No_Of_Units is of number type. Now you’ll convert it to string.

Snippet

df = df.astype({"No_Of_Units": str}, errors='raise')

df.dtypes

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"No_Of_Units": str} – List of columns to be cast into another format. No_Of_Units is the column which need to be cast into another format. str is the target datatype to which the column values should be converted.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units              object
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

Now you can see the No_Of_Units is converted to String, and it is displayed as object type.

Note: Refer to this link to understand why String is displayed as an object.

This is how you can cast the int column to String or Object.

Next, you’ll see how to convert column type to float.

Pandas Change Column Type To Float

In this section, you’ll learn how to change column type to float.

You can use the astype() method to convert a column to float.

In the sample dataframe, the column Unit_Price has numbers with decimal values but column type is String format. Now you’ll convert it to float.

df = df.astype({"Unit_Price": float})

df.dtypes

Where,

  • df.astype() – Method to invoke the astype funtion in the dataframe.
  • {"Unit_Price": float} – List of columns to be cast into another format. No_Of_Units is the column which need to be cast into another format. float is the target datatype to which the column values should be converted.

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

You can see that the Unit_Price column is converted into float64.

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

You’ve converted a column that has only numbers to float.

Now, let’s try to convert the column Available_Quantity to float. which has the non-numeric characters in one of the cells. The non-numeric value is Not Available.

Note that, you’re using errors='coerce' which will force the conversion of the possible values.

df["Available_Quantity"] = pd.to_numeric(df["Available_Quantity"], errors='coerce')

df.dtypes

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity      float64
    Available_Since_Date     object
    dtype: object

The column is converted to float64 without any problems. The non-numeric characters are converted to NaN which means Not A Number.

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.055.011/5/2021
1Mouse200.0510.04/23/2021
2Monitor5000.01011.008/21/2021
3CPU10000.02015.009/18/2021
4Speakers250.58NaN01/05/2021

This is how you can cast column type to float.

Next, you’ll learn how to cast column type to Datetime.

Pandas Change Column Type To Datetime64

In this section, you’ll learn how to change the column type to Datetime64.

You can use the method to_datetime() to convert a string to DateTime.

In the sample dataframe, the column Available_Since_Date has the date value as a String type.

You’ll convert the column type to datetime using the below snippet.

Snippet

df['Available_Since_Date']= pd.to_datetime(df['Available_Since_Date'])

df.dtypes

Datatypes of Columns

    product_name                    object
    Unit_Price                     float64
    No_Of_Units                      int64
    Available_Quantity              object
    Available_Since_Date    datetime64[ns]
    dtype: object

You could see that the column Available_Since_Date column is converted into datetime64[ns].

to_datetime() also supports error handling where,

  • errors='raise' will raise an error if there is invalid date values available in any of the cells.
  • errors='ignore' will silently ignore errors if there is invalid date values available in any of the cells and returns the column intact.
  • errors='coerce' will convert the valid dates to datetime type and set other cells to NaT.

This is how you can convert column type to DateTime.

Next, you’ll see how to convert multiple columns to int.

Pandas Convert Multiple Columns to Int

In this section, you’ll learn how to convert multiple columns to int using the astype() method.

It’s similar to how you converted a single column to int using the astype(). You can just add the additional columns as shown below.

df[['column_1','column_2']] = df[['column_1','column_2']].astype(np.int64)

df.dtypes

The column_1 and Column_2 will be converted to int using the astype().

For example, We’ve shown only one column as the sample dataframe has only one numbers column.

df[['No_Of_Units']] = df[['No_Of_Units']].astype(np.int64)

df.dtypes

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               np.int64
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

You can see that the column No_Of_Units converted into int64.

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

Next, let’s convert multiple columns using the to_numeric() method.

You’ve to use the apply method to apply the function to_numeric() to the specified columns as shown below.

df[['column_1','column_2']] = df[['column_1','column_2']].apply(pd.to_numeric)

df.dtypes

For example, We’ve shown only one column as the sample dataframe has only one numbers column.

df[["No_Of_Units"]] = df[["No_Of_Units"]].apply(pd.to_numeric)

df.dtypes

Datatypes of Columns

    product_name             object
    Unit_Price              float64
    No_Of_Units               int64
    Available_Quantity       object
    Available_Since_Date     object
    dtype: object

You can see that the column No_Of_Units converted into int64.

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

This is how you can convert multiple column types to another format.

Next, you’ll see how to cast all columns to another type.

Pandas Convert All Columns

In this section, you’ll learn how to change the column type of all columns in a dataframe. For example, Converting All Object Columns To String.

You can use the astype() method also for converting all columns.

First, create a list of all columns called columns_list by using list(df).

Then you can pass this list to the dataframe and invoke the astype() method, pass the target datatype to the astype() method.

For example, str to convert all columns to string.

Snippet

columns_list = list(df)

df[columns_list] = df[columns_list].astype(str)

df.dtypes

Datatypes of Columns

    product_name            object
    Unit_Price              object
    No_Of_Units             object
    Available_Quantity      object
    Available_Since_Date    object
    dtype: object

You can see that all the columns of the dataframe are converted to String and it is displayed as an object.

Refer to this link to understand why String is displayed as an object.

Printing the dataframe

df

Dataframe Looks Like

product_nameUnit_PriceNo_Of_UnitsAvailable_QuantityAvailable_Since_Date
0Keyboard500.05511/5/2021
1Mouse200.05104/23/2021
2Monitor5000.0101108/21/2021
3CPU10000.0201509/18/2021
4Speakers250.58Not Available01/05/2021

Conclusion

To summarize, you’ve learned how to change column type in pandas dataframe.
You’ve used the methods to_numeric() and astype() to change the column types and how to use these methods for performing various type conversions along with the exception handling.

If you have any questions, comment below.

You May also Like

Leave a Comment