Batch Excel File Processor Using Python

Introduction
Manually opening, reading, and analyzing dozens of Excel files one by one is time-consuming and prone to error. With Python, we can automate this task by building a batch processor that can:
Read all Excel files in a folder
Perform operations like filtering, aggregation, or transformation
Combine and save results to a new Excel file
This tool is ideal for:
Sales teams consolidating monthly reports
Researchers analyzing repeated survey files
Finance teams tracking expenses across departments
What You’ll Need
Install the required packages:
pip install pandas openpyxl
pandashandles data manipulationopenpyxlenables reading/writing.xlsxfiles
What We'll Build
The script will:
Loop through all
.xlsxfiles in a folderRead each file into a Pandas DataFrame
Apply filtering or data transformations (optional)
Combine results from all files
Save the final output to a new Excel file
Step-by-Step Script with Explanations
Step 1: Import Required Libraries
import os
import pandas as pd
Explanation:
We import os to navigate through folders and pandas to handle Excel files.
Step 2: Set Folder Path and Get Excel Files
folder_path = r"C:\Users\YourName\Documents\ExcelData" # Update this
excel_files = [f for f in os.listdir(folder_path) if f.endswith('.xlsx')]
Explanation:
This gets a list of all .xlsx files in the specified folder. You can change the path as needed.
Step 3: Read and Process Each File
combined_data = []
for file in excel_files:
file_path = os.path.join(folder_path, file)
df = pd.read_excel(file_path)
# Optional: filter rows where "Sales" > 1000
if "Sales" in df.columns:
df = df[df["Sales"] > 1000]
df["Source File"] = file # Add filename for tracking
combined_data.append(df)
Explanation:
Each Excel file is read into a DataFrame
Filters can be added based on your needs
A column for "Source File" is added to track which data came from which file
Step 4: Combine All DataFrames
final_df = pd.concat(combined_data, ignore_index=True)
Explanation:
Combines all filtered data into one master DataFrame.
Step 5: Save the Final Output
output_path = os.path.join(folder_path, "combined_output.xlsx")
final_df.to_excel(output_path, index=False)
print("Batch processing complete! Saved to:", output_path)
Full Script: Batch Excel Processor
import os
import pandas as pd
# Step 1: Set folder path
folder_path = r"C:\Users\YourName\Documents\ExcelData" # Change this
# Step 2: Get all .xlsx files in the folder
excel_files = [f for f in os.listdir(folder_path) if f.endswith('.xlsx')]
# Step 3: Read and process each file
combined_data = []
for file in excel_files:
file_path = os.path.join(folder_path, file)
df = pd.read_excel(file_path)
# Optional: Filter rows where "Sales" > 1000
if "Sales" in df.columns:
df = df[df["Sales"] > 1000]
df["Source File"] = file # Track origin file
combined_data.append(df)
# Step 4: Combine all data
if combined_data:
final_df = pd.concat(combined_data, ignore_index=True)
# Step 5: Save the combined result
output_path = os.path.join(folder_path, "combined_output.xlsx")
final_df.to_excel(output_path, index=False)
print("Batch processing complete! File saved to:", output_path)
else:
print("No Excel files found or files are empty.")
Output Example
The final Excel file (combined_output.xlsx) might look like this:
| Date | Product | Sales | Region | Source File |
| 2024-01-03 | WidgetA | 1500 | East | sales_jan.xlsx |
| 2024-02-10 | WidgetB | 1200 | West | sales_feb.xlsx |
Customizations
You can easily extend this script to:
Process CSV files: Use
pd.read_csv()Apply multiple filters:
df = df[(df["Sales"] > 1000) & (df["Region"] == "West")]Create summary reports:
summary = final_df.groupby("Region")["Sales"].sum()
Final Thoughts
Batch Excel processing is a powerful automation technique that saves hours of repetitive work. With a bit of Python and pandas, you can:
Scale data processing across multiple files
Add filters, summaries, and exports
Customize workflows for business, finance, or research
Happy Scripting ! !




