How To Import Multiple CSV files And Concatenate them into Single Pandas Dataframe

Data might be in multiple CSVs, and you may need it in a single dataframe for analysis.

You can import multiple csv files and concatenate them into a single pandas dataframe using list comprehension and the pandas concat() function.

Using Read_CSV and List of Files

The read_csv() method reads the CSV file and creates a pandas dataframe from it.

To read multiple CSVs and merge them into a single dataframe,

  • Iterate over the list of CSV files using a for loop and read each CSV file.
  • Use the header=None to ignore the headers in the CSV file. If your CSV files have header information, pass header=1 to consider the first row as the header.
  • Concatenate the list of dataframes using the concat() function and ignore the index while concatenating. Because each dataframe might contain conflicting indexes in them.

Use this method when you know the list of CSV file names and their location.

Code

The following code demonstrates how to use the read_CSV() method to read multiple csv files using the for loop and merge them into a single dataframe.

import pandas as pd

files = ["addresses.csv","addresses-2.csv"]

df_files = (pd.read_csv(f, header=None) for f in files)

df = pd.concat(df_files, ignore_index=True)

df

DataFrame Will Look Like

0123456
0a-1JohnDoe120 jefferson st.RiversideNJ8075
1a-2VikramAruchamyChepauk, second streetCoimbatoreTamilNadu600100
2a-3FelixJohn120 jefferson st.RiversideNJ8075
3a-4HeikoA150 Daimler streetStuttgartGermany50012

Using Read_CSV and Glob

In this section, you’ll use the read_csv() method and glob library to import multiple CSV files and merge them into a single pandas dataframe.

The glob library supports regex to find multiple files based on a pattern.

To use the glob library and read multiple csv files,

  • Find all the CSV files using the glob constructor.
  • Iterate over the list of files and read the CSV files with the appropriate header configuration
  • Concatenate the list of dataframes using the pandas concat() function and ignore the index while concatenating.

Use this method when you want to identify multiple CSV files using regular expression/pattern matching without using the exact file names.

Code

The following code demonstrates how to read the CSV files starting with name address and merge them into a single dataframe.

import pandas as pd

import glob

files = glob.glob("address*.csv")

df_files = (pd.read_csv(f, header=None) for f in files)

df  = pd.concat(df_files, ignore_index=True)

df

DataFrame Will Look Like

Two CSV files are read and merged into a single dataframe.

0123456
0a-1JohnDoe120 jefferson st.RiversideNJ8075
1a-2VikramAruchamyChepauk, second streetCoimbatoreTamilNadu600100
2a-3FelixJohn120 jefferson st.RiversideNJ8075
3a-4HeikoA150 Daimler streetStuttgartGermany50012

Reading Multiple CSVs from Multiple Folders

In this section, you’ll use the read_csv() method and glob library to read multiple CSV files from multiple folders.

The glob library supports regex to find multiple files based on a pattern, and you can recursively find files in the sub-directories.

To use the glob library and read multiple csv files,

  • Find all the CSV files using the glob constructor using **/address*.csv to find files in the current directory and the subdirectories.
  • Use the parameter recursive=True to find files recursively in the subfolders.
  • Iterate over the list of files and read the CSV files with the appropriate header configuration
  • Concatenate the list of dataframes using the pandas concat() function and ignore the index while concatenating.

Use this method when you want to find multiple CSV files in various folders and read them into a single dataframe.

Code

The following code demonstrates how to read the CSV files starting with name address in the current directory and the subdirectories and merge them into a single dataframe.

import pandas as pd

import glob

files = glob.glob('**/address*.csv',recursive=True)

df_files = (pd.read_csv(f, header=None) for f in files)

df  = pd.concat(df_files, ignore_index=True)

df

DataFrame Will Look Like

Three CSV files are read and merged into a single dataframe.

0123456
0a-1JohnDoe120 jefferson st.RiversideNJ8075
1a-2VikramAruchamyChepauk, second streetCoimbatoreTamilNadu600100
2a-3FelixJohn120 jefferson st.RiversideNJ8075
3a-4HeikoA150 Daimler streetStuttgartGermany50012
4a-5FrankH120 jefferson st.RiversideGermany50026
5a-6MichaelH150 Daimler streetStuttgartGermany50012

Additional Resources

Leave a Comment