How to Iterate over rows in Pandas Dataframe – Definitive Guide

Pandas DataFrame is a two-dimensional data structure used to store the data in the tabular format.

You can iterate over the pandas dataframe using the df.itertuples() method.

This tutorial teaches you how to iterate over rows using different methods.

If You’re in Hurry…

You can use the below code to iterate over rows in pandas dataframe.

This is one of the fastest methods available to iterate over rows in the pandas dataframe.

Example

for tuple in df.itertuples():
    print(tuple)

You’ll see the below output.

Each row in the dataframe will be iterated over and printed using the print statement.

Output

    Pandas(Index=0, Lang='Java', Difficulty='Medium', Difficulty_Score=5, Type='Statically Typed')
    Pandas(Index=1, Lang='Python', Difficulty='Easy', Difficulty_Score=2, Type='Dynamically Typed')
    Pandas(Index=2, Lang='Cobol', Difficulty='Hard', Difficulty_Score=10, Type='NA')
    Pandas(Index=3, Lang='Javascript', Difficulty='Medium', Difficulty_Score=8, Type='Dynamically typed')

If You Want to Understand Details, Read on…

In this tutorial, you’ll learn the various methods available to iterate over rows in the Pandas Dataframe.

Sample DataFrame

import pandas as pd

data = {
    "Lang":["Java","Python","Cobol","Javascript"],        
    "Difficulty":["Medium","Easy","Hard","Medium"],
    "Difficulty_Score":[5,2,10,8],
    "Type":["Statically Typed","Dynamically Typed","NA","Dynamically typed"],

}

df = pd.DataFrame(data)

print(df)

DataFrame Visualization

             Lang Difficulty  Difficulty_Score               Type
    0        Java     Medium                 5   Statically Typed
    1      Python       Easy                 2  Dynamically Typed
    2       Cobol       Hard                10                 NA
    3  Javascript     Medium                 8  Dynamically typed

Now let’s discuss the various methods available to iterate over the rows in the pandas dataframe.

Using Itertuples() Function

In this section, you’ll learn how to iterate over rows in Pandas dataframe using the Itertuples() method.

Itertuples() method iterates over the dataframe rows and returns a named tuple.

It accepts two parameters.

  • Index – If true, it’ll include the index of the row as the first element of the tuple. If false, it’ll not in include the index of the row in the tuple. Default is set to true.
  • name – You can give a name to each tuple. By Default, its Pandas.

This is the fastest method to iterate over rows in Pandas dataframe.

Example

for tuple in df.itertuples():
    print(tuple)

You’ve not passed the index parameter or the name parameter. Hence, the default value for both parameters is used.

Each tuple is named pandas, and it contains the index element for each row.

Output

    Pandas(Index=0, Lang='Java', Difficulty='Medium', Difficulty_Score=5, Type='Statically Typed')
    Pandas(Index=1, Lang='Python', Difficulty='Easy', Difficulty_Score=2, Type='Dynamically Typed')
    Pandas(Index=2, Lang='Cobol', Difficulty='Hard', Difficulty_Score=10, Type='NA')
    Pandas(Index=3, Lang='Javascript', Difficulty='Medium', Difficulty_Score=8, Type='Dynamically typed')

This is how you can iterate over rows in Pandas DataFrame using itertuples().

Using Iterrows() Function

In this section, you’ll learn how to use iterrows() function to iterate through rows in a pandas dataframe.

iterrows() method iterates over dataframe as (index, series) pairs.

  • Index is the name of each row
  • series is a set of data in each row.

Example

for index, row in df.iterrows():
    print(index)
    print("*****")
    print(row)

Output

   0
*****
Lang                            Java
Difficulty                    Medium
Difficulty_Score                   5
Type                Statically Typed
Name: 0, dtype: object
1
*****
Lang                           Python
Difficulty                       Easy
Difficulty_Score                    2
Type                Dynamically Typed
Name: 1, dtype: object
2
*****
Lang                Cobol
Difficulty           Hard
Difficulty_Score       10
Type                   NA
Name: 2, dtype: object
3
*****
Lang                       Javascript
Difficulty                     Medium
Difficulty_Score                    8
Type                Dynamically typed
Name: 3, dtype: object

This is how you can use the iterrows() method to iterate through the pandas dataframe and access the index and series of data in the dataframe.

Using Iteritems() Function

In this section, you’ll use the Iteritems() function to iterate over the dataframe.

The iteritems() function iterates over the dataframe columns and returns a tuple with column name and content as a series.

iteritems() is deprecated and will be removed in the future pandas version. You can use the items() method instead.

Example

for item in df.iteritems():
    print(item)

Output

    ('Lang', 0          Java
    1        Python
    2         Cobol
    3    Javascript
    Name: Lang, dtype: object)
    ('Difficulty', 0    Medium
    1      Easy
    2      Hard
    3    Medium
    Name: Difficulty, dtype: object)
    ('Difficulty_Score', 0     5
    1     2
    2    10
    3     8
    Name: Difficulty_Score, dtype: int64)
    ('Type', 0     Statically Typed
    1    Dynamically Typed
    2                   NA
    3    Dynamically typed
    Name: Type, dtype: object)

This is how you can use the iteritems() method.

Using Items() Function

In this section, you’ll use the items() method in the dataframe to iterate over the rows.

The items() method iterate over the dataframe and returns a tuple with the column name and content as a series of data.

Example

for item in df.items():
    print(item)

Output

    ('Lang', 0          Java
    1        Python
    2         Cobol
    3    Javascript
    Name: Lang, dtype: object)
    ('Difficulty', 0    Medium
    1      Easy
    2      Hard
    3    Medium
    Name: Difficulty, dtype: object)
    ('Difficulty_Score', 0     5
    1     2
    2    10
    3     8
    Name: Difficulty_Score, dtype: int64)
    ('Type', 0     Statically Typed
    1    Dynamically Typed
    2                   NA
    3    Dynamically typed
    Name: Type, dtype: object)

pandas iterate over rows by column name

In this subsection, you’ll use the iteritems() to iterate over the dataframe and use the columnName and columnData fields to access the column data.

Example

for (columnName, columnData) in df.iteritems():
    print('Column Name : ', columnName)
    print('Column Contents : ', columnData.values)

Output

    Column Name :  Lang
    Column Contents :  ['Java' 'Python' 'Cobol' 'Javascript']
    Column Name :  Difficulty
    Column Contents :  ['Medium' 'Easy' 'Hard' 'Medium']
    Column Name :  Difficulty_Score
    Column Contents :  [ 5  2 10  8]
    Column Name :  Type
    Column Contents :  ['Statically Typed' 'Dynamically Typed' 'NA' 'Dynamically typed']

This is also known as Pandas Iterate Over Columns.

pandas iterate over rows with condition

In this subsection, you’ll use the iteritems() to iterate over the dataframe with the condition.

  • Use an if condition to check if the current column is a specific column
  • Access the column data if the condition is True. Else, skip the column

Example

for (columnName, columnData) in df.iteritems():
    if(columnName == "Lang"):
        print('Column Name : ', columnName)
        print('Column Contents : ', columnData.values)

Output

    Column Name :  Lang
    Column Contents :  ['Java' 'Python' 'Cobol' 'Javascript']

Iteration Performance

Pandas documentation states that the iteration is generally slow. Also, it states that the iteration over the row of the dataframe is not required in most cases.

  • You can use any vectorised solutions using the built-in pandas methods, or you can use the libraries like NumPy functions.
  • When you have a function that is not applicable for all rows, you can use the apply() function to apply the function conditionally.

As per the documentation, you should never modify the data while iterating over it. It is not guaranteed that the modification will work.

Speed Comparison

The speed of the iteration depends on various factors such as the size of the dataset, OS, memory and so on.

In general, the itertuples() is one of the fastest methods to iterate over rows in the pandas dataframe.

Conclusion

To summarize, you’ve learned how to iterate over rows in Pandas dataframe using the different methods available in the Dataframe.

Among all the methods available, itertuples() is the fastest method available to iterate over the pandas dataframe.

If you have any questions, feel free to comment below.

You may also like

2 thoughts on “How to Iterate over rows in Pandas Dataframe – Definitive Guide”

    • Hello Nick,

      I appreciate for taking the time to write the feedback and am glad that you found it helpful.

      Yes. Both iteritems() and items are the same. iteritems() yields the result from the items() internally.

      iteritems() would be removed in future versions. Hence, items() is the recommended method.

      I have updated the tutorial with this information.

      Regards,
      Vikram

      Reply

Leave a Comment