If you’ve recently jumped into Python and started using it to transform Excel files, read different types of data sources like Parquet, CSV, cloud services, or SQL databases, and your goal is just to apply a few transformations and export a report in Excel to share, then this might be the post you need.
There’s a Python tool called DuckDB that can really help you.
This little gem allows you to write SQL queries directly against your data sources. I’ve found it to be great, easy to understand, and very useful when you come from a SQL-heavy background. Personally, I’ve worked in SQL for many years, creating reports for stakeholders, and I still find it easier to think in terms of queries when I need to pull and shape data.
Even though DuckDB doesn’t work exactly like Oracle SQL or SQL Server Management Studio (SSMS), it’s very close. You can always prepare your query and review it with an AI assistant if needed.
How Does It Work?
You only need 3 steps to create your report:
- Load your source datasets into a folder
- Write your SQL query using DuckDB
- Export the result to Excel
Here’s an example using Parquet files as the source:
import duckdb
import pandas as pd
# Step 1: Load your parquet files
duckdb.sql("CREATE TABLE sales AS SELECT * FROM 'data/sales.parquet'")
duckdb.sql("CREATE TABLE customers AS SELECT * FROM 'data/customers.parquet'")
# OR From a Excel File
# Load Excel file into a DataFrame
df_targets = pd.read_excel('data/targets.xlsx', sheet_name='Sheet1')
# OR From a Python DF
# Register the DataFrame as a DuckDB table
duckdb.register('targets', df_targets)
# Step 2: Write your SQL query
query = """
SELECT
c.customer_name,
s.product,
s.quantity,
s.sale_date
FROM sales s
LEFT JOIN customers c ON s.customer_id = c.customer_id
WHERE s.sale_date >= DATE '2024-01-01'
"""
result_df = duckdb.sql(query).df()
# Step 3: Export to Excel
result_df.to_excel("sales_report.xlsx", index=False)
You can also use CSV or DataFrames instead of Parquet, DuckDB supports many input types.
DuckDB vs SQL Server: A Quick Comparison
If you’re coming from SQL Server, here are some of the differences you’ll notice:
| Transformation / Function | SQL Server (SSMS) | DuckDB |
| Convert string to date | CAST(date_string AS DATE) | CAST(date_string AS DATE) |
| Convert varchar to integer | CAST(col AS INT) | CAST(col AS INTEGER) |
| Format date to YYYY-MM-DD | FORMAT(col, ‘yyyy-MM-dd’) | STRFTIME(col, ‘%Y-%m-%d’) |
| Get current date | GETDATE() | CURRENT_DATE |
| String contains | col LIKE ‘%abc%’ | col ILIKE ‘%abc%’ |
| Left Join | LEFT JOIN … ON … | LEFT JOIN … ON … |
Most of it is very similar, but DuckDB follows more of the PostgreSQL standards, so functions like string handling and formatting might vary slightly.
Final Thoughts
DuckDB for Python is a great alternative for those of us who have spent years working in SQL, creating reports and answering business questions with data. It allows you to continue thinking in queries while using the flexibility of Python to automate and export your reports.
If SQL comes naturally to you, DuckDB will feel like home. But if you’re just starting out and Python is your comfort zone, don’t stress about learning SQL right away. Keep using data frames and build your skills from there. Both paths lead to the same goal, using data to create insights and impact.
I hope you enjoyed this post. If you want to learn more, feel free to join one of our free webinars at www.excelinbi.com/webinars