How to Read Multiple Files in One Python Function
In this short tutorial, I'll prove you how to use Python to combine multiple Excel files into one principal spreadsheet. Imagine that you take dozens of Excel files with the same data fields, and your chore is to aggregate sheets from those files. Manually doing this job is super inefficient, and Python will salve you a lot of fourth dimension in the long run, then let'south all work smarter!
Note that this article talks about appending Excel files with the same format/data fields. Merging multiple dataset is a different task.
If y'all are new to Python, this series Integrate Python with Excel offers some tips on how to use Python to supercharge your Excel spreadsheets.
The workflow
To solve the problem, we'll need to follow the below work menstruum:
- Place the files we demand to combine
- Get data from the file
- Motility information from footstep 2) to a primary dataset (nosotros will call it "dataframe")
- Report 2-three for the number of files
- Save the chief dataset into an Excel spreadsheet
Import libraries
Alright, allow'south see how to code the above work menstruum in Python. For this exercise, we'll need to use two Python libraries: os
and pandas
. If y'all want to follow forth, feel free to catch the source code and files used in this tutorial from hither. Although you can combine as many Excel files as you lot wish, nosotros'll use three files to demonstrate the procedure.
If yous need help with installing Python or libraries, here'south a guide on how to exercise that.
os
library gives a way of using operating system dependent functionalities. Such every bit manipulating folder and file paths. We utilise this library to get all the Excel file names, including their paths.
pandas
library is the gold standard for data analysis and manipulation. It is fast, powerful, and flexible. We use this library to load Excel information into Python, manipulate data, and recreate the master spreadsheet.
We'll starting time by importing these ii libraries. And so find the current working directory, as well every bit all the file names inside the directory.
import os import pandas as pd cwd = bone.path.abspath('') files = os.listdir(cwd)
The variable cwd
shows the path to the current working directory, and the variable files
is a list of all the file names inside the current working directory. Notice there are non-Excel files, and we don't want to open those, so we'll handle that shortly.
Combine multiple Excel files into one spreadsheet
Next, we create an empty dataframe df
for storing the data for principal spreadsheet. Nosotros loop through all the files within the current working directory, but only process the Excel files whose name ends with ".xlsx". This is washed past this line of code
if file.endswith('.xlsx'):
pd.read_excel()
will read Excel data into Python and store it equally a pandas DataFrame object. Be aware that this method reads merely the outset tab/sheet of the Excel file past default. If your Excel file contains more than 1 canvass, continue reading to the next section.
df.append()
will suspend/combine information from one file to another. Recall virtually copying a cake of data from one Excel file and pasting it into another. Instead of opening up Excel, information is stored inside your computer's retention.
df = pd.DataFrame() for file in files: if file.endswith('.xlsx'): df = df.append(pd.read_excel(file), ignore_index=True) df.head()
The above code does the following:
- Loop through all the files in the electric current working directory, determine if a file is Excel by checking the file name ends with ".xlsx".
- If yes, read the file content (data), and append/add it to the master dataframe variable chosen
df
. - Save the main dataframe into an Excel spreadsheet.
We can examine the primary dataframe by checking df.head() , which shows the first five rows of the data.
Seems good! Just another quick check to brand sure we accept loaded everything in the DataFrame. df.shape
volition prove us the dimension (36 rows, v columns) of the data:
Everything looks good, so let's output the data dorsum into Excel. The last line df.to_excel()
will do that.
Combine multiple sheets from the same Excel file
I talked virtually the ii techniques to read multiple sheets from the same Excel file, then I won't repeat it. However, I'll walk through an example here with a slightly different setting.
We take 2 files each contains a number of sheets. We don't know how many sheets are in each file, simply nosotros know the format is the aforementioned for all sheets. Our goal is to aggregate all sheets into one spreadsheet (and one file).
The workflow is similar:
- Get all Excel files
- Loop through the Excel files
- For each file, loop through all sheets
- Read each canvass into a dataframe, then combine all dataframes together.
df_total = pd.DataFrame() for file in files: # loop through Excel files if file.endswith('.xlsx'): excel_file = pd.ExcelFile(file) sheets = excel_file.sheet_names for sheet in sheets: # loop through sheets inside an Excel file df = excel_file.parse(sheet_name = sheet) df_total = df_total.append(df) df_total.to_excel('combined_file.xlsx')
Putting it all together
Below is the total lawmaking put together. 10 lines of code will assistance you combine all your Excel files or sheets into one chief spreadsheet. Enjoy!
import bone import pandas as pd cwd = os.path.abspath('') files = os.listdir(cwd)
## Method one gets the first sheet of a given filedf = pd.DataFrame() for file in files: if file.endswith('.xlsx'): df = df.append(pd.read_excel(file), ignore_index=Truthful) df.head() df.to_excel('total_sales.xlsx')
## Method 2 gets all sheets of a given filedf_total = pd.DataFrame() for file in files: # loop through Excel files if file.endswith('.xlsx'): excel_file = pd.ExcelFile(file) sheets = excel_file.sheet_names for sheet in sheets: # loop through sheets inside an Excel file df = excel_file.parse(sheet_name = canvas) df_total = df_total.append(df) df_total.to_excel('combined_file.xlsx')
Source: https://pythoninoffice.com/use-python-to-combine-multiple-excel-files/
0 Response to "How to Read Multiple Files in One Python Function"
Post a Comment