CodeGym /Java Blog /Learning Python /Remove NaN Using Python
Author
Artem Divertitto
Senior Android Developer at United Tech

Remove NaN Using Python

Published in the Learning Python group

Welcome, fellow, today, we’re diving into a topic that’s not only essential for data processing but also pretty common when you’re working with real-world datasets. We’re talking about those pesky NaNs—short for "Not a Number"—that tend to pop up in your data and mess with your calculations. But fear not! By the end of this article, you’ll be equipped with multiple techniques to gracefully handle and remove NaNs from both lists and arrays in Python. Ready? Let’s jump in!

Introduction

So, you’ve got a dataset, and you’re ready to work your Python magic on it. But wait—what’s this? Some of your data points aren’t numbers! Instead, they’re NaN (Not a Number). These NaNs can sneak into your dataset for various reasons, such as missing values or errors during data collection. Unfortunately, NaNs can throw off your calculations and analyses, leading to inaccurate results. But don’t worry—you’re catching on to everything so quickly! In this article, we’ll explore several methods to remove NaNs from lists and arrays in Python.

Methods of Removing NaN from a List

Let’s start with lists. Lists in Python are incredibly versatile, and removing NaN values from them can be done in a few different ways. Here are the most common methods:

1. Using List Comprehension

List comprehension is a concise way to create lists in Python, and it’s perfect for filtering out NaN values. Here’s how you can use it:


import math

data = [1, 2, float('nan'), 4, 5, float('nan'), 7]

# Remove NaN values using list comprehension
clean_data = [x for x in data if not math.isnan(x)]

print(clean_data)

Explanation:

  • We first import the math module to use the math.isnan() function, which checks if a value is NaN.
  • The list comprehension [x for x in data if not math.isnan(x)] creates a new list containing only the elements from data that are not NaN.

Output:


[1, 2, 4, 5, 7]

Excellent! As you can see, the NaN values are filtered out, and you’re left with a clean list.

2. Using the filter() Function

Another approach is to use the filter() function, which is a built-in Python function that constructs an iterator from elements of an iterable for which a function returns true.


import math

data = [1, 2, float('nan'), 4, 5, float('nan'), 7]

# Remove NaN values using filter() and math.isnan
clean_data = list(filter(lambda x: not math.isnan(x), data))

print(clean_data)

Explanation:

  • filter() applies the lambda function lambda x: not math.isnan(x) to each element in data, and returns an iterator with only the elements for which the function returns True.
  • We then convert this iterator back into a list.

Output:


[1, 2, 4, 5, 7]

Smooth and straightforward, right?

3. Using the pandas Library

If you’re working with data, chances are you’re already using the pandas library. It provides a simple and efficient way to handle NaNs in lists.


import pandas as pd

data = [1, 2, float('nan'), 4, 5, float('nan'), 7]

# Convert list to a pandas Series
data_series = pd.Series(data)

# Remove NaN values using dropna()
clean_data = data_series.dropna().tolist()

print(clean_data)

Explanation:

  • We first convert the list to a pandas Series, which is a one-dimensional array-like object.
  • The dropna() function removes all NaN values from the Series.
  • Finally, we convert the Series back to a list.

Output:


[1, 2, 4, 5, 7]

Using pandas is especially helpful if you’re already using it for other data manipulations.

Methods of Removing NaN from an Array

Now let’s tackle arrays. Arrays are slightly different from lists, and you might encounter them if you’re working with the numpy library. Here’s how you can handle NaNs in arrays.

1. Using numpy.isnan() and Boolean Indexing

numpy provides a very intuitive way to remove NaNs from arrays using numpy.isnan() combined with Boolean indexing.


import numpy as np

data = np.array([1, 2, np.nan, 4, 5, np.nan, 7])

# Remove NaN values using Boolean indexing
clean_data = data[~np.isnan(data)]

print(clean_data)

Explanation:

  • np.isnan(data) returns a Boolean array where True corresponds to NaN values.
  • The tilde ~ operator negates the Boolean array, so we’re left with True for non-NaN values.
  • We then use this Boolean array to index the original array, effectively filtering out the NaNs.

Output:


[1. 2. 4. 5. 7.]

Neat, right? This method is both powerful and concise.

2. Using numpy.compress()

The numpy.compress() function is another way to filter out NaNs from an array, especially if you prefer a more explicit approach.


import numpy as np

data = np.array([1, 2, np.nan, 4, 5, np.nan, 7])

# Create a Boolean mask for non-NaN values
mask = ~np.isnan(data)

# Remove NaN values using numpy.compress()
clean_data = np.compress(mask, data)

print(clean_data)

Explanation:

  • We first create a Boolean mask mask that indicates which elements are not NaN.
  • numpy.compress(mask, data) then compresses the array, keeping only the elements where mask is True.

Output:


[1. 2. 4. 5. 7.]

3. Using pandas with Arrays

Just like with lists, you can use pandas to handle NaNs in arrays as well.


import numpy as np
import pandas as pd

data = np.array([1, 2, np.nan, 4, 5, np.nan, 7])

# Convert numpy array to pandas Series
data_series = pd.Series(data)

# Remove NaN values using dropna()
clean_data = data_series.dropna().to_numpy()

print(clean_data)

Explanation:

  • We convert the numpy array to a pandas Series.
  • Use dropna() to remove the NaN values.
  • Convert the cleaned Series back to a numpy array.

Output:


[1. 2. 4. 5. 7.]

Pandas is versatile, and this method works well if you’re switching between lists and arrays.

Summary and Conclusion

Congratulations! You’ve just learned several methods to remove NaN values from both lists and arrays in Python. Whether you’re using list comprehension, filter(), or pandas for lists, or numpy for arrays, you’ve got the tools to keep your data clean and your calculations accurate.

In summary:

  • For lists: You can use list comprehension, filter(), or the pandas library.
  • For arrays: numpy.isnan() with Boolean indexing, numpy.compress(), or pandas can be your go-to methods.

Remember, the method you choose depends on your specific needs and the libraries you’re already using in your project. Keep practicing, and soon enough, handling NaNs will become second nature to you. You're doing great, and this is just one more step on your Python journey!

Additional Resources

If you want to dive deeper into handling NaNs and other data processing tasks, here are some resources you might find helpful:

Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION