1. Basics of Filtering Data in DataFrame
Why do we need filtering?
Just like filters in your favorite selfie app, filtering in pandas helps us focus on what really matters. We can exclude unnecessary data and leave only what’s relevant. It’s an essential tool for data analysis, especially when we need to prepare reports or analyze large datasets.
Our first filter
Let’s start with a simple example. We have a DataFrame
with
product sales data:
import pandas as pd
data = {
"Product": ["Oranges", "Bananas", "Apples", "Oranges", "Apples"],
"Quantity": [10, 5, 8, 7, 6],
"Price": [3.5, 2.0, 4.0, 3.0, 4.5]
}
df = pd.DataFrame(data)
# Filtering products by name
filtered_df = df[df["Product"] == "Oranges"]
print(filtered_df)
Using conditions
We can combine conditions, just like in a multi-filter! For example, let’s find all products with sales over 6 units and price no higher than 3.5 units:
# Combining conditions using & (logical "and")
filtered_df = df[(df["Quantity"] > 6) & (df["Price"] <= 3.5)]
print(filtered_df)
Remember important points
Don’t forget that in pandas we use square brackets for filters and round ones for conditions. Don’t be "that person" who forgets these brackets, okay?
2. Sorting Data
Why is sorting important?
Sorting data lets us see the structure and order in chaos (if datasets even have chaos, that is). Imagine being in a library where books are randomly scattered — yikes! Sorting helps organize the information, whether it’s for analysis, reports, or just finding the right value.
Sorting by one column
Let’s sort our DataFrame
by product price in ascending order:
# Sorting by price in ascending order
sorted_df = df.sort_values(by="Price")
print(sorted_df)
Sorting by multiple columns
Imagine you need to sort your data first by quantity, and if the quantities are the same, by price in descending order. Here's how it's done:
# Sorting by quantity and price
sorted_df = df.sort_values(by=["Quantity", "Price"], ascending=[True, False])
print(sorted_df)
Sorting nuances
Pay attention to the ascending
parameter. It lets
you define whether to sort data in ascending or descending
order. Handy when you want to "command" your data in style.
3. Practical Application
Hands-on Assignment
Now it’s time to put our knowledge to use! Let’s say you’re a data analyst at a gadget company. Your task is to find all phone models that sold more than 100 units and sort them by descending profit.
# Sales data for phones
data = {
"Model": ["Phone X", "Phone Y", "Phone Z", "Phone X", "Phone Y"],
"Sold": [120, 150, 90, 130, 80],
"Profit": [200, 180, 220, 210, 190]
}
df = pd.DataFrame(data)
# Filtering models with sales over 100
filtered_df = df[df["Sold"] > 100]
# Sorting by descending profit
sorted_df = filtered_df.sort_values(by="Profit", ascending=False)
print(sorted_df)
Analysis and Discussion
Use a combination of filters and sorts to get the data that’s actually interesting and important for you or your report. Like any tool, filtering and sorting become more powerful with practice. Also, keep in mind that well-formatted data isn’t just an analyst’s blessing but also their biggest pride.
Common Mistakes
Some beginners forget to use parentheses in filtering conditions,
which leads to errors. It’s like forgetting to put a lid
on a pot — things can get messy fast. Another common mistake
is improperly specifying the sort order for multiple fields.
Make sure you’re using ascending
correctly to avoid
flipping everything upside down.
Real-World Applications
In practice, filtering and sorting are indispensable tools for any data analyst. They help quickly retrieve the data you need from complex datasets and prep it for reports. In the real world, they’re used everywhere: from fintech to medical research. For instance, banks can filter transactions to detect suspicious activity, while researchers sort drug trial data to uncover hidden patterns. And let’s not forget, mastering these tools can be a handy ally during job interviews.
4. External Resources for Learning
For those who want to dive deeper, I recommend checking out the official pandas documentation, which is an excellent resource to understand all the capabilities of this library. You’ll find plenty of examples and explanations there to help you master this tool like a pro.
GO TO FULL VERSION