Pandas DataFrame is a two-dimensional data structure used to store the data in the tabular format.
You can iterate over the pandas dataframe using the df.itertuples() method.
This tutorial teaches you how to iterate over rows using different methods.
If you’re in Hurry
You can use the below code to iterate over rows in pandas dataframe.
This is one of the fastest methods available to iterate over rows in the pandas dataframe.
Example
for tuple in df.itertuples():
print(tuple)
You’ll see the below output.
Each row in the dataframe will be iterated over and printed using the print statement.
Output
Pandas(Index=0, Lang='Java', Difficulty='Medium', Difficulty_Score=5, Type='Statically Typed')
Pandas(Index=1, Lang='Python', Difficulty='Easy', Difficulty_Score=2, Type='Dynamically Typed')
Pandas(Index=2, Lang='Cobol', Difficulty='Hard', Difficulty_Score=10, Type='NA')
Pandas(Index=3, Lang='Javascript', Difficulty='Medium', Difficulty_Score=8, Type='Dynamically typed')
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the various methods available to iterate over rows in the Pandas Dataframe.
Table of Contents
Sample DataFrame
import pandas as pd
data = {
"Lang":["Java","Python","Cobol","Javascript"],
"Difficulty":["Medium","Easy","Hard","Medium"],
"Difficulty_Score":[5,2,10,8],
"Type":["Statically Typed","Dynamically Typed","NA","Dynamically typed"],
}
df = pd.DataFrame(data)
print(df)
DataFrame Visualization
Lang Difficulty Difficulty_Score Type
0 Java Medium 5 Statically Typed
1 Python Easy 2 Dynamically Typed
2 Cobol Hard 10 NA
3 Javascript Medium 8 Dynamically typed
Now let’s discuss the various methods available to iterate over the rows in the pandas dataframe.
Using Itertuples() Function
Itertuples() method iterates over the dataframe rows and returns a named tuple.
It accepts two parameters.
- Index – If true, it’ll include the index of the row as the first element of the tuple. If false, it’ll not in include the index of the row in the tuple. Default is set to
true
. - name – You can give a name to each tuple. By Default, its Pandas.
This is the fastest method to iterate over rows in Pandas dataframe.
Example
for tuple in df.itertuples():
print(tuple)
You’ve not passed the index parameter or the name parameter. Hence, the default value for both parameters is used.
Each tuple is named pandas, and it contains the index element for each row.
Output
Pandas(Index=0, Lang='Java', Difficulty='Medium', Difficulty_Score=5, Type='Statically Typed')
Pandas(Index=1, Lang='Python', Difficulty='Easy', Difficulty_Score=2, Type='Dynamically Typed')
Pandas(Index=2, Lang='Cobol', Difficulty='Hard', Difficulty_Score=10, Type='NA')
Pandas(Index=3, Lang='Javascript', Difficulty='Medium', Difficulty_Score=8, Type='Dynamically typed')
This is how you can iterate over rows in Pandas DataFrame using itertuples()
.
Using Iterrows() Function
iterrows() method iterates over dataframe as (index, series) pairs.
- Index is the name of each row
- series is a set of data in each row.
Example
for index, row in df.iterrows():
print(index)
print("*****")
print(row)
Output
0
*****
Lang Java
Difficulty Medium
Difficulty_Score 5
Type Statically Typed
Name: 0, dtype: object
1
*****
Lang Python
Difficulty Easy
Difficulty_Score 2
Type Dynamically Typed
Name: 1, dtype: object
2
*****
Lang Cobol
Difficulty Hard
Difficulty_Score 10
Type NA
Name: 2, dtype: object
3
*****
Lang Javascript
Difficulty Medium
Difficulty_Score 8
Type Dynamically typed
Name: 3, dtype: object
This is how you can use the iterrows()
method to iterate through the pandas dataframe and access the index and series of data in the dataframe.
Using Iteritems() Function
The iteritems() function iterates over the dataframe columns and returns a tuple with column name and content as a series.
iteritems()
is deprecated and will be removed in the future pandas version. You can use the items() method instead.
Example
for item in df.iteritems():
print(item)
Output
('Lang', 0 Java
1 Python
2 Cobol
3 Javascript
Name: Lang, dtype: object)
('Difficulty', 0 Medium
1 Easy
2 Hard
3 Medium
Name: Difficulty, dtype: object)
('Difficulty_Score', 0 5
1 2
2 10
3 8
Name: Difficulty_Score, dtype: int64)
('Type', 0 Statically Typed
1 Dynamically Typed
2 NA
3 Dynamically typed
Name: Type, dtype: object)
This is how you can use the iteritems()
method.
Using Items() Function
The items()
method iterates over the dataframe and returns a tuple with the column name and content as a data series.
Example
for item in df.items():
print(item)
Output
('Lang', 0 Java
1 Python
2 Cobol
3 Javascript
Name: Lang, dtype: object)
('Difficulty', 0 Medium
1 Easy
2 Hard
3 Medium
Name: Difficulty, dtype: object)
('Difficulty_Score', 0 5
1 2
2 10
3 8
Name: Difficulty_Score, dtype: int64)
('Type', 0 Statically Typed
1 Dynamically Typed
2 NA
3 Dynamically typed
Name: Type, dtype: object)
pandas iterate over rows by column name
In this subsection, you’ll use the iteritems()
to iterate over the dataframe and use the columnName and columnData fields to access the column data.
Example
for (columnName, columnData) in df.iteritems():
print('Column Name : ', columnName)
print('Column Contents : ', columnData.values)
Output
Column Name : Lang
Column Contents : ['Java' 'Python' 'Cobol' 'Javascript']
Column Name : Difficulty
Column Contents : ['Medium' 'Easy' 'Hard' 'Medium']
Column Name : Difficulty_Score
Column Contents : [ 5 2 10 8]
Column Name : Type
Column Contents : ['Statically Typed' 'Dynamically Typed' 'NA' 'Dynamically typed']
This is also known as Pandas Iterate Over Columns.
pandas iterate over rows with condition
The iteritems()
iterates over the dataframe with the condition.
- Use an if condition to check if the current column is a specific column
- Access the column data if the condition is
True
. Else, skip the column
Example
for (columnName, columnData) in df.iteritems():
if(columnName == "Lang"):
print('Column Name : ', columnName)
print('Column Contents : ', columnData.values)
Output
Column Name : Lang
Column Contents : ['Java' 'Python' 'Cobol' 'Javascript']
Iteration Performance
Pandas documentation states that the iteration is generally slow.
Also, it states that iteration over the row of the dataframe is not required in most cases.
- You can use any vectorised solutions using the built-in pandas methods, or you can use the libraries like
NumPy
functions. - When you have a function that is not applicable for all rows, you can use the apply() function to apply the function conditionally.
As per the documentation, you should never modify the data while iterating over it. It is not guaranteed that the modification will work.
Speed Comparison
The speed of the iteration depends on various factors such as the size of the dataset, OS, memory and so on.
In general, the itertuples()
is one of the fastest methods to iterate over rows in the pandas dataframe.
Thanks Vikram. Really nice tutorial.
My only comment would be that iteritems() and items() seem to be the same thing.
Hello Nick,
I appreciate for taking the time to write the feedback and am glad that you found it helpful.
Yes. Both iteritems() and items are the same. iteritems() yields the result from the items() internally.
iteritems() would be removed in future versions. Hence, items() is the recommended method.
I have updated the tutorial with this information.
Regards,
Vikram