Excel Data Processor in Python

Introduction

Excel is the go-to tool for data storage and reporting — but manual editing, filtering, and summarizing can be tedious and error-prone. What if you could process Excel data with just a few lines of Python?

This article walks you through building a Python Excel Data Processor that can:

Read Excel files
Filter rows based on conditions
Perform calculations (like sum, average)
Create new columns
Export cleaned data to a new Excel file

All using Pandas — a powerful and beginner-friendly Python library for data analysis.

What You’ll Need

Before running the script, install these two libraries if you haven’t already:

pip install pandas openpyxl

pandas: To load and manipulate Excel data
openpyxl: To read/write .xlsx files (Excel 2010+ format)

Step-by-Step Script with Explanations

Step 1: Import Required Modules

import pandas as pd

Explanation:
We only need pandas. It handles loading, transforming, and saving Excel files.

Step 2: Load the Excel File

file_path = "sales_data.xlsx"  # Replace with your file name
df = pd.read_excel(file_path)

Explanation:

pd.read_excel() reads the spreadsheet into a DataFrame (a table-like structure).
This assumes the data is in the first sheet.

Step 3: View the First Few Rows (Optional)

print(df.head())

Explanation:
This lets you preview the dataset structure: column names, sample values, etc.

Step 4: Filter Rows Based on Condition

Let’s filter data to only include sales above $1000:

filtered_df = df[df["Sales"] > 1000]

Explanation:
This line keeps only the rows where the value in the Sales column is greater than 1000.

Step 5: Create a New Column

Let’s add a Tax column at 10% rate:

filtered_df["Tax"] = filtered_df["Sales"] * 0.10

Explanation:
Creates a new column where each value is 10% of the corresponding Sales value.

Step 6: Perform Aggregation (Total, Average)

total_sales = filtered_df["Sales"].sum()
average_sales = filtered_df["Sales"].mean()

print("Total Sales:", total_sales)
print("Average Sales:", average_sales)

Explanation:
You can compute statistics like total and average on any numeric column.

Step 7: Save the Processed Data to a New File

output_file = "filtered_sales_data.xlsx"
filtered_df.to_excel(output_file, index=False)

Explanation:
Exports the cleaned/filtered DataFrame to a new Excel file.
index=False avoids writing row numbers as a separate column.

Full Script: Excel Data Processor

import pandas as pd

# 📂 Step 1: Load the Excel file
file_path = "sales_data.xlsx"  # Change this to your Excel file
df = pd.read_excel(file_path)

# 👀 Step 2: Preview data
print("Original Data:")
print(df.head())

# 🎯 Step 3: Filter rows (e.g., Sales > 1000)
filtered_df = df[df["Sales"] > 1000]

# ➕ Step 4: Add a new column for Tax (10%)
filtered_df["Tax"] = filtered_df["Sales"] * 0.10

# 📈 Step 5: Calculate total and average sales
total_sales = filtered_df["Sales"].sum()
average_sales = filtered_df["Sales"].mean()

print("\nFiltered Data:")
print(filtered_df.head())

print("\nTotal Sales:", total_sales)
print("Average Sales:", average_sales)

# 💾 Step 6: Export to a new Excel file
output_file = "filtered_sales_data.xlsx"
filtered_df.to_excel(output_file, index=False)
print(f"\nProcessed data saved to: {output_file}")

What You Can Customize

Filter by multiple conditions:

  df[(df["Sales"] > 1000) & (df["Region"] == "West")]

Use Excel column letters/names
Ensure headers match the actual Excel file structure.

Work with multiple sheets:

  pd.read_excel(file_path, sheet_name="Sheet2")

Apply formatting or charts: Use openpyxl or xlsxwriter for advanced Excel features.

Use Cases

Sales reporting automation
HR or payroll data analysis
Cleaning survey results or user submissions
Academic performance summaries

Safety Tips

Always keep a backup of your original Excel file
Check column names and data types before processing
Test filters on small datasets before using in production

Final Thoughts

With just a few lines of Python and pandas, you can automate and scale any Excel-based task. Whether you're managing monthly reports or large-scale datasets, Python gives you the flexibility and speed Excel can't.

Happy Scripting ! !

Excel Data Processor in Python — Automate, Analyze, and Export Like a Pro

Introduction

What You’ll Need

Step-by-Step Script with Explanations

Step 1: Import Required Modules

Step 2: Load the Excel File

Step 3: View the First Few Rows (Optional)

Step 4: Filter Rows Based on Condition

Step 5: Create a New Column

Step 6: Perform Aggregation (Total, Average)

Step 7: Save the Processed Data to a New File

Full Script: Excel Data Processor

What You Can Customize

Use Cases

Safety Tips

Final Thoughts

Comments

More from this blog

Build a Network Sniffer in Python Using Scapy

Building a Port Scanner with Python

Build a Simple URL Shortener in Python

QR Code Generator with Python

Create a Secure Password Generator Using Python

Command Palette

Introduction

What You’ll Need

Step-by-Step Script with Explanations

Step 1: Import Required Modules

Step 2: Load the Excel File

Step 3: View the First Few Rows (Optional)

Step 4: Filter Rows Based on Condition

Step 5: Create a New Column

Step 6: Perform Aggregation (Total, Average)

Step 7: Save the Processed Data to a New File

Full Script: Excel Data Processor

What You Can Customize

Use Cases

Safety Tips

Final Thoughts

Comments

More from this blog