Data might be in multiple CSVs, and you may need it in a single dataframe for analysis.
You can import multiple csv files and concatenate them into a single pandas dataframe using list comprehension and the pandas concat()
function.
Using Read_CSV and List of Files
The read_csv()
method reads the CSV file and creates a pandas dataframe from it.
To read multiple CSVs and merge them into a single dataframe,
- Iterate over the list of CSV files using a for loop and read each CSV file.
- Use the
header=None
to ignore the headers in the CSV file. If your CSV files have header information, passheader=1
to consider the first row as the header. - Concatenate the list of dataframes using the
concat()
function and ignore the index while concatenating. Because each dataframe might contain conflicting indexes in them.
Use this method when you know the list of CSV file names and their location.
Code
The following code demonstrates how to use the read_CSV()
method to read multiple csv files using the for
loop and merge them into a single dataframe.
import pandas as pd
files = ["addresses.csv","addresses-2.csv"]
df_files = (pd.read_csv(f, header=None) for f in files)
df = pd.concat(df_files, ignore_index=True)
df
DataFrame Will Look Like
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
0 | a-1 | John | Doe | 120 jefferson st. | Riverside | NJ | 8075 |
1 | a-2 | Vikram | Aruchamy | Chepauk, second street | Coimbatore | TamilNadu | 600100 |
2 | a-3 | Felix | John | 120 jefferson st. | Riverside | NJ | 8075 |
3 | a-4 | Heiko | A | 150 Daimler street | Stuttgart | Germany | 50012 |
Using Read_CSV and Glob
In this section, you’ll use the read_csv()
method and glob
library to import multiple CSV files and merge them into a single pandas dataframe.
The glob
library supports regex to find multiple files based on a pattern.
To use the glob
library and read multiple csv files,
- Find all the CSV files using the glob constructor.
- Iterate over the list of files and read the CSV files with the appropriate header configuration
- Concatenate the list of dataframes using the pandas
concat()
function and ignore the index while concatenating.
Use this method when you want to identify multiple CSV files using regular expression/pattern matching without using the exact file names.
Code
The following code demonstrates how to read the CSV files starting with name address and merge them into a single dataframe.
import pandas as pd
import glob
files = glob.glob("address*.csv")
df_files = (pd.read_csv(f, header=None) for f in files)
df = pd.concat(df_files, ignore_index=True)
df
DataFrame Will Look Like
Two CSV files are read and merged into a single dataframe.
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
0 | a-1 | John | Doe | 120 jefferson st. | Riverside | NJ | 8075 |
1 | a-2 | Vikram | Aruchamy | Chepauk, second street | Coimbatore | TamilNadu | 600100 |
2 | a-3 | Felix | John | 120 jefferson st. | Riverside | NJ | 8075 |
3 | a-4 | Heiko | A | 150 Daimler street | Stuttgart | Germany | 50012 |
Reading Multiple CSVs from Multiple Folders
In this section, you’ll use the read_csv()
method and glob
library to read multiple CSV files from multiple folders.
The glob
library supports regex to find multiple files based on a pattern, and you can recursively find files in the sub-directories.
To use the glob
library and read multiple csv files,
- Find all the CSV files using the glob constructor using
**/address*.csv
to find files in the current directory and the subdirectories. - Use the parameter
recursive=True
to find files recursively in the subfolders. - Iterate over the list of files and read the CSV files with the appropriate header configuration
- Concatenate the list of dataframes using the pandas
concat()
function and ignore the index while concatenating.
Use this method when you want to find multiple CSV files in various folders and read them into a single dataframe.
Code
The following code demonstrates how to read the CSV files starting with name address in the current directory and the subdirectories and merge them into a single dataframe.
import pandas as pd
import glob
files = glob.glob('**/address*.csv',recursive=True)
df_files = (pd.read_csv(f, header=None) for f in files)
df = pd.concat(df_files, ignore_index=True)
df
DataFrame Will Look Like
Three CSV files are read and merged into a single dataframe.
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
0 | a-1 | John | Doe | 120 jefferson st. | Riverside | NJ | 8075 |
1 | a-2 | Vikram | Aruchamy | Chepauk, second street | Coimbatore | TamilNadu | 600100 |
2 | a-3 | Felix | John | 120 jefferson st. | Riverside | NJ | 8075 |
3 | a-4 | Heiko | A | 150 Daimler street | Stuttgart | Germany | 50012 |
4 | a-5 | Frank | H | 120 jefferson st. | Riverside | Germany | 50026 |
5 | a-6 | Michael | H | 150 Daimler street | Stuttgart | Germany | 50012 |