How to Read Multiple Files in One Python Function

In this short tutorial, I'll prove you how to use Python to combine multiple Excel files into one principal spreadsheet. Imagine that you take dozens of Excel files with the same data fields, and your chore is to aggregate sheets from those files. Manually doing this job is super inefficient, and Python will salve you a lot of fourth dimension in the long run, then let'south all work smarter!

Note that this article talks about appending Excel files with the same format/data fields. Merging multiple dataset is a different task.

If y'all are new to Python, this series Integrate Python with Excel offers some tips on how to use Python to supercharge your Excel spreadsheets.

The workflow

To solve the problem, we'll need to follow the below work menstruum:

  1. Place the files we demand to combine
  2. Get data from the file
  3. Motility information from footstep 2) to a primary dataset (nosotros will call it "dataframe")
  4. Report 2-three for the number of files
  5. Save the chief dataset into an Excel spreadsheet

Import libraries

Alright, allow'south see how to code the above work menstruum in Python. For this exercise, we'll need to use two Python libraries: os and pandas . If y'all want to follow forth, feel free to catch the source code and files used in this tutorial from hither. Although you can combine as many Excel files as you lot wish, nosotros'll use three files to demonstrate the procedure.

If yous need help with installing Python or libraries, here'south a guide on how to exercise that.

os library gives a way of using operating system dependent functionalities. Such every bit manipulating folder and file paths. We utilise this library to get all the Excel file names, including their paths.

pandas library is the gold standard for data analysis and manipulation. It is fast, powerful, and flexible. We use this library to load Excel information into Python, manipulate data, and recreate the master spreadsheet.

We'll starting time by importing these ii libraries. And so find the current working directory, as well every bit all the file names inside the directory.

            import os import pandas as pd cwd = bone.path.abspath('')  files = os.listdir(cwd)                      
python code showing current working directory
Getting current working directory and files inside it

The variable cwd shows the path to the current working directory, and the variable files is a list of all the file names inside the current working directory. Notice there are non-Excel files, and we don't want to open those, so we'll handle that shortly.

Combine multiple Excel files into one spreadsheet

Next, we create an empty dataframe df for storing the data for principal spreadsheet. Nosotros loop through all the files within the current working directory, but only process the Excel files whose name ends with ".xlsx". This is washed past this line of code
if file.endswith('.xlsx'):

pd.read_excel() will read Excel data into Python and store it equally a pandas DataFrame object. Be aware that this method reads merely the outset tab/sheet of the Excel file past default. If your Excel file contains more than 1 canvass, continue reading to the next section.


df.append() will suspend/combine information from one file to another. Recall virtually copying a cake of data from one Excel file and pasting it into another. Instead of opening up Excel, information is stored inside your computer's retention.

            df = pd.DataFrame() for file in files:      if file.endswith('.xlsx'):          df = df.append(pd.read_excel(file), ignore_index=True)  df.head()          

The above code does the following:

  1. Loop through all the files in the electric current working directory, determine if a file is Excel by checking the file name ends with ".xlsx".
  2. If yes, read the file content (data), and append/add it to the master dataframe variable chosen df.
  3. Save the main dataframe into an Excel spreadsheet.

We can examine the primary dataframe by checking df.head() , which shows the first five rows of the data.

python code showing the first 5 rows of data
Checking the first five rows of data in the dataframe

Seems good! Just another quick check to brand sure we accept loaded everything in the DataFrame. df.shape volition prove us the dimension (36 rows, v columns) of the data:

python code showing the size of the master dataframe

Everything looks good, so let's output the data dorsum into Excel. The last line df.to_excel() will do that.

Combine multiple sheets from the same Excel file

I talked virtually the ii techniques to read multiple sheets from the same Excel file, then I won't repeat it. However, I'll walk through an example here with a slightly different setting.

We take 2 files each contains a number of sheets. We don't know how many sheets are in each file, simply nosotros know the format is the aforementioned for all sheets. Our goal is to aggregate all sheets into one spreadsheet (and one file).

The workflow is similar:

  1. Get all Excel files
  2. Loop through the Excel files
  3. For each file, loop through all sheets
  4. Read each canvass into a dataframe, then combine all dataframes together.
            df_total = pd.DataFrame() for file in files:  # loop through Excel files     if file.endswith('.xlsx'):         excel_file = pd.ExcelFile(file)         sheets = excel_file.sheet_names         for sheet in sheets: # loop through sheets inside an Excel file             df = excel_file.parse(sheet_name = sheet)             df_total = df_total.append(df) df_total.to_excel('combined_file.xlsx')                      

Putting it all together

Below is the total lawmaking put together. 10 lines of code will assistance you combine all your Excel files or sheets into one chief spreadsheet. Enjoy!

            import bone import pandas as pd cwd = os.path.abspath('')  files = os.listdir(cwd)                        ## Method one gets the first sheet of a given file              df = pd.DataFrame() for file in files:     if file.endswith('.xlsx'):         df = df.append(pd.read_excel(file), ignore_index=Truthful)  df.head()  df.to_excel('total_sales.xlsx')            ## Method 2 gets all sheets of a given file            df_total = pd.DataFrame() for file in files:                         # loop through Excel files     if file.endswith('.xlsx'):         excel_file = pd.ExcelFile(file)         sheets = excel_file.sheet_names         for sheet in sheets:               # loop through sheets inside an Excel file             df = excel_file.parse(sheet_name = canvas)             df_total = df_total.append(df) df_total.to_excel('combined_file.xlsx')          

cervantesthaught.blogspot.com

Source: https://pythoninoffice.com/use-python-to-combine-multiple-excel-files/

0 Response to "How to Read Multiple Files in One Python Function"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel