CodeGym /Courses /Python SELF EN /Preparing Data for Reports with pandas Methods

Preparing Data for Reports with pandas Methods

Python SELF EN
Level 28 , Lesson 3
Available

1. Structuring Data for Reports

When building a report, it’s not just about throwing a bunch of data into a pretty package, but making the information clear and accessible. How you prep the data directly impacts how easily and quickly your report will be understood.

Pivot Tables: Your Reporting Swiss Army Knife

Pivot tables are a powerful way to summarize, aggregate, and present data, often used for analytics in Excel. In pandas, we can easily create pivot tables using the pivot_table method. Let’s check out a simple example.

Imagine you have some sales data from a store:

Python

import pandas as pd

data = {
    'Product': ['Orange', 'Apple', 'Banana', 'Orange', 'Banana', 'Apple'],
    'Month': ['January', 'January', 'January', 'February', 'February', 'February'],
    'Sales': [150, 200, 250, 130, 180, 210]
}

df = pd.DataFrame(data)

To create a pivot table showing the total sales for each product by month, use pivot_table.

Python

pivot_df = df.pivot_table(values='Sales', index='Product', columns='Month', aggfunc='sum')
print(pivot_df)

This simple example shows how you can quickly aggregate data by categories and dates — a perfect base for reports.

2. Using pandas Methods for Data Preparation

Method pivot_table

The pivot_table method is super flexible and can not only sum data, but also apply other aggregation functions like mean, count, etc. Let’s see how to use it to calculate the average sales:

Python

pivot_mean_df = df.pivot_table(values='Sales', index='Product', columns='Month', aggfunc='mean')
print(pivot_mean_df)

Method crosstab

The crosstab method in pandas works like pivot tables but is used for frequency calculation. Imagine we want to count how many times each product was sold in different months:

Python

cross_df = pd.crosstab(df['Product'], df['Month'])
print(cross_df)

This method is handy when you need to explore data distribution across categories or analyze event frequency.

3. Customizing Data Display with Multi-Indexing

Multi-Indices

Yep, pandas lets you go beyond two-dimensional data and use multi-indices. This comes in handy for representing data with multiple aggregation levels. For example, let’s add "Year" to our dataset and create a multi-index.

Python

data['Year'] = [2023, 2023, 2023, 2024, 2024, 2024]
df = pd.DataFrame(data)

multi_pivot_df = df.pivot_table(values='Sales', index=['Year', 'Product'], columns='Month', aggfunc='sum')
print(multi_pivot_df)

Now our data is aggregated by year and product, making it more insightful.

4. Practice and Analysis

Let’s apply what we’ve learned in practice. Task: prepare data for a sales report.

Imagine you have the following sales data:

Python

data = {
    'Product': ['Orange', 'Orange', 'Apple', 'Apple', 'Banana', 'Banana'],
    'Month': ['January', 'February', 'January', 'February', 'January', 'February'],
    'Year': [2023, 2023, 2024, 2024, 2023, 2024],
    'Sales': [150, 200, 180, 220, 120, 130]
}

df = pd.DataFrame(data)

Your task is to create a pivot table showing the total sales for each product by month, including year totals.

Python

pivot_total_df = df.pivot_table(values='Sales', index=['Year', 'Product'], columns='Month', aggfunc='sum', margins=True, margins_name='Total')
print(pivot_total_df)

This "Total" row will show the sum across all categories, which is often needed for a complete overview.

5. Discussing Strategies for Data Representation and Analysis

Preparing data for reports with pandas isn’t just math; it’s an art. A big part of this process is choosing the most informative way to display the data, whether it's simple sum tables or complex charts. In real life, structuring data like this not only helps you understand current metrics but also makes data-driven decisions possible. You’ll likely find yourself feeling like a wizard turning boring rows of numbers into a clear picture.

If you want to master automated report preparation, don’t forget about the capabilities of pandas and don’t be shy about experimenting with different methods and techniques. This skill will be useful not only in building analytical summaries but also during interviews and in your future career. Dive into the world of data and make your reports colorful, informative, and easy to understand!

1
Task
Python SELF EN, level 28, lesson 3
Locked
Creating a Pivot Table with Basic Data
Creating a Pivot Table with Basic Data
2
Task
Python SELF EN, level 28, lesson 3
Locked
Data Analysis on Sales Using Aggregation
Data Analysis on Sales Using Aggregation
3
Task
Python SELF EN, level 28, lesson 3
Locked
Frequency analysis of sales by products and months using crosstab
Frequency analysis of sales by products and months using crosstab
4
Task
Python SELF EN, level 28, lesson 3
Locked
Creating a multidimensional pivot table with totals
Creating a multidimensional pivot table with totals
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION