Skip to main content

Command Palette

Search for a command to run...

Batch Excel File Processor Using Python

Published
3 min read
Batch Excel File Processor Using Python

Introduction

Manually opening, reading, and analyzing dozens of Excel files one by one is time-consuming and prone to error. With Python, we can automate this task by building a batch processor that can:

  • Read all Excel files in a folder

  • Perform operations like filtering, aggregation, or transformation

  • Combine and save results to a new Excel file

This tool is ideal for:

  • Sales teams consolidating monthly reports

  • Researchers analyzing repeated survey files

  • Finance teams tracking expenses across departments

What You’ll Need

Install the required packages:

pip install pandas openpyxl
  • pandas handles data manipulation

  • openpyxl enables reading/writing .xlsx files

What We'll Build

The script will:

  1. Loop through all .xlsx files in a folder

  2. Read each file into a Pandas DataFrame

  3. Apply filtering or data transformations (optional)

  4. Combine results from all files

  5. Save the final output to a new Excel file

Step-by-Step Script with Explanations

Step 1: Import Required Libraries

import os
import pandas as pd

Explanation:
We import os to navigate through folders and pandas to handle Excel files.

Step 2: Set Folder Path and Get Excel Files

folder_path = r"C:\Users\YourName\Documents\ExcelData"  # Update this
excel_files = [f for f in os.listdir(folder_path) if f.endswith('.xlsx')]

Explanation:
This gets a list of all .xlsx files in the specified folder. You can change the path as needed.

Step 3: Read and Process Each File

combined_data = []

for file in excel_files:
    file_path = os.path.join(folder_path, file)
    df = pd.read_excel(file_path)

    # Optional: filter rows where "Sales" > 1000
    if "Sales" in df.columns:
        df = df[df["Sales"] > 1000]

    df["Source File"] = file  # Add filename for tracking
    combined_data.append(df)

Explanation:

  • Each Excel file is read into a DataFrame

  • Filters can be added based on your needs

  • A column for "Source File" is added to track which data came from which file

Step 4: Combine All DataFrames

final_df = pd.concat(combined_data, ignore_index=True)

Explanation:
Combines all filtered data into one master DataFrame.

Step 5: Save the Final Output

output_path = os.path.join(folder_path, "combined_output.xlsx")
final_df.to_excel(output_path, index=False)
print("Batch processing complete! Saved to:", output_path)

Full Script: Batch Excel Processor

import os
import pandas as pd

# Step 1: Set folder path
folder_path = r"C:\Users\YourName\Documents\ExcelData"  # Change this

# Step 2: Get all .xlsx files in the folder
excel_files = [f for f in os.listdir(folder_path) if f.endswith('.xlsx')]

# Step 3: Read and process each file
combined_data = []

for file in excel_files:
    file_path = os.path.join(folder_path, file)
    df = pd.read_excel(file_path)

    # Optional: Filter rows where "Sales" > 1000
    if "Sales" in df.columns:
        df = df[df["Sales"] > 1000]

    df["Source File"] = file  # Track origin file
    combined_data.append(df)

# Step 4: Combine all data
if combined_data:
    final_df = pd.concat(combined_data, ignore_index=True)

    # Step 5: Save the combined result
    output_path = os.path.join(folder_path, "combined_output.xlsx")
    final_df.to_excel(output_path, index=False)
    print("Batch processing complete! File saved to:", output_path)
else:
    print("No Excel files found or files are empty.")

Output Example

The final Excel file (combined_output.xlsx) might look like this:

DateProductSalesRegionSource File
2024-01-03WidgetA1500Eastsales_jan.xlsx
2024-02-10WidgetB1200Westsales_feb.xlsx

Customizations

You can easily extend this script to:

  • Process CSV files: Use pd.read_csv()

  • Apply multiple filters:

      df = df[(df["Sales"] > 1000) & (df["Region"] == "West")]
    
  • Create summary reports:

      summary = final_df.groupby("Region")["Sales"].sum()
    

Final Thoughts

Batch Excel processing is a powerful automation technique that saves hours of repetitive work. With a bit of Python and pandas, you can:

  • Scale data processing across multiple files

  • Add filters, summaries, and exports

  • Customize workflows for business, finance, or research

Happy Scripting ! !

More from this blog

PyScript Academy

29 posts

PyScript Academy is a blog sharing practical Python scripts, tips, and mini projects—helping you learn Python by doing, one useful script at a time.