CodeGym /Java Course /Python SELF EN /Filtering and Sorting Data for Analysis and Reports

Filtering and Sorting Data for Analysis and Reports

Python SELF EN
Level 27 , Lesson 4
Available

1. Basics of Filtering Data in DataFrame

Why do we need filtering?

Just like filters in your favorite selfie app, filtering in pandas helps us focus on what really matters. We can exclude unnecessary data and leave only what’s relevant. It’s an essential tool for data analysis, especially when we need to prepare reports or analyze large datasets.

Our first filter

Let’s start with a simple example. We have a DataFrame with product sales data:

Python

import pandas as pd

data = {
    "Product": ["Oranges", "Bananas", "Apples", "Oranges", "Apples"],
    "Quantity": [10, 5, 8, 7, 6],
    "Price": [3.5, 2.0, 4.0, 3.0, 4.5]
}

df = pd.DataFrame(data)

# Filtering products by name
filtered_df = df[df["Product"] == "Oranges"]
print(filtered_df)

Using conditions

We can combine conditions, just like in a multi-filter! For example, let’s find all products with sales over 6 units and price no higher than 3.5 units:

Python

# Combining conditions using & (logical "and")
filtered_df = df[(df["Quantity"] > 6) & (df["Price"] <= 3.5)]
print(filtered_df)

Remember important points

Don’t forget that in pandas we use square brackets for filters and round ones for conditions. Don’t be "that person" who forgets these brackets, okay?

2. Sorting Data

Why is sorting important?

Sorting data lets us see the structure and order in chaos (if datasets even have chaos, that is). Imagine being in a library where books are randomly scattered — yikes! Sorting helps organize the information, whether it’s for analysis, reports, or just finding the right value.

Sorting by one column

Let’s sort our DataFrame by product price in ascending order:

Python

# Sorting by price in ascending order
sorted_df = df.sort_values(by="Price")
print(sorted_df)

Sorting by multiple columns

Imagine you need to sort your data first by quantity, and if the quantities are the same, by price in descending order. Here's how it's done:

Python

# Sorting by quantity and price
sorted_df = df.sort_values(by=["Quantity", "Price"], ascending=[True, False])
print(sorted_df)

Sorting nuances

Pay attention to the ascending parameter. It lets you define whether to sort data in ascending or descending order. Handy when you want to "command" your data in style.

3. Practical Application

Hands-on Assignment

Now it’s time to put our knowledge to use! Let’s say you’re a data analyst at a gadget company. Your task is to find all phone models that sold more than 100 units and sort them by descending profit.

Python

# Sales data for phones
data = {
    "Model": ["Phone X", "Phone Y", "Phone Z", "Phone X", "Phone Y"],
    "Sold": [120, 150, 90, 130, 80],
    "Profit": [200, 180, 220, 210, 190]
}

df = pd.DataFrame(data)

# Filtering models with sales over 100
filtered_df = df[df["Sold"] > 100]

# Sorting by descending profit
sorted_df = filtered_df.sort_values(by="Profit", ascending=False)
print(sorted_df)

Analysis and Discussion

Use a combination of filters and sorts to get the data that’s actually interesting and important for you or your report. Like any tool, filtering and sorting become more powerful with practice. Also, keep in mind that well-formatted data isn’t just an analyst’s blessing but also their biggest pride.

Common Mistakes

Some beginners forget to use parentheses in filtering conditions, which leads to errors. It’s like forgetting to put a lid on a pot — things can get messy fast. Another common mistake is improperly specifying the sort order for multiple fields. Make sure you’re using ascending correctly to avoid flipping everything upside down.

Real-World Applications

In practice, filtering and sorting are indispensable tools for any data analyst. They help quickly retrieve the data you need from complex datasets and prep it for reports. In the real world, they’re used everywhere: from fintech to medical research. For instance, banks can filter transactions to detect suspicious activity, while researchers sort drug trial data to uncover hidden patterns. And let’s not forget, mastering these tools can be a handy ally during job interviews.

4. External Resources for Learning

For those who want to dive deeper, I recommend checking out the official pandas documentation, which is an excellent resource to understand all the capabilities of this library. You’ll find plenty of examples and explanations there to help you master this tool like a pro.

1
Task
Python SELF EN, level 27, lesson 4
Locked
Basic Data Filtering
Basic Data Filtering
2
Task
Python SELF EN, level 27, lesson 4
Locked
Complex Data Filtering
Complex Data Filtering
3
Task
Python SELF EN, level 27, lesson 4
Locked
Sorting data by multiple criteria
Sorting data by multiple criteria
4
Task
Python SELF EN, level 27, lesson 4
Locked
A combined task for sorting and filtering
A combined task for sorting and filtering
1
Опрос
Working with Excel,  27 уровень,  4 лекция
недоступен
Working with Excel
Working with Excel
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION