Pandas Unique Function - All You Need to Know (with Examples) • datagy (2024)

In this tutorial, you’ll learn how to get unique values in a Pandas DataFrame, including getting unique values for a single column and across multiple columns. Being able to understand how to work with unique values is an important skill for a data scientist or data engineer of any skill level.

By the end of this tutorial, you’ll have learned the following:

  • How to use the Pandas .unique() method to get unique values in a Pandas DataFrame column
  • How to get unique values across multiple columns
  • How to count unique values and generate frequency tables for unique values
  • And more

Table of Contents

The Quick Answer: Use Pandas unique()

You can use the Pandas .unique() method to get the unique values in a Pandas DataFrame column. The values are returned in order of appearance and are unsorted.

Take a look at the code block below for how this method works:

# Get Unique Values in a Pandas DataFrame Columnimport pandas as pddf = pd.DataFrame({'Education': ['Graduate','Graduate','Undergraduate','Postgraduate']})unique_vals = df['Education'].unique()print(unique_vals)# Returns: ['Graduate' 'Undergraduate' 'Postgraduate']

If you’d like to learn more, read on! This guide will teach you the ins and outs of working with unique data in a Pandas DataFrame.

Real-World Applications of Unique Data

Let’s dive into some real-world applications of working with unique data and why it matters. Take a look at the sample DataFrame that we’re creating below. We’ll be using this dataset throughout the tutorial.

# Loading a Sample Datasetimport pandas as pddataset = { 'Education Status': ['Graduate','Graduate','Undergraduate','Postgraduate','Graduate','Undergraduate','Postgraduate','Graduate','Undergraduate','Postgraduate','Graduate','Undergraduate','Graduate','Postgraduate','Postgraduate'], 'Employment Status': ['Employed','employed','Unemployed','Employed','Employed','Unemployed','Employed','Employed','Employed','Employed','Unemployed','Employed','Employed','Employed','Employed'], 'Gender': ['F','M','M','F','M','F','M','F','M','F','M','F','M','F','F']}df = pd.DataFrame(dataset)print(df.head())# Returns:# Education Status Employment Status Gender# 0 Graduate Employed F# 1 Graduate employed M# 2 Undergraduate Unemployed M# 3 Postgraduate Employed F# 4 Graduate Employed M

Understanding unique data within a DataFrame allows you to understand:

  1. The data itself, such as what data are included and what data aren’t
  2. Whether or not data quality issues exist. For example, we can see that the Employment Status column has two capitalizations for the word Employed. Understanding what unique values exist, allows us to better understand if we need to clean our data.

Let’s now dive into how to understand the Pandas .unique() method.

Understanding the Pandas unique() Method

The unique() method in Pandas does not actually have any parameters itself. Instead, it is a Series-level function applied on a DataFrame column without any input parameters. When applied to a specific column of a DataFrame, it returns an array of unique values present in that column.

Here’s a breakdown of how the unique() method works:

  • Select the column on which unique() will be applied by specifying the column name in brackets after the DataFrame name.
  • Call theunique()method without any input parameters or arguments.
  • Obtain an array of unique values found in the selected column.

Let’s take a look at the unique() function using the sample dataset we created earlier.

Get Unique Values for a Pandas DataFrame Column

In order to get the unique values in a Pandas DataFrame column, you can simply apply the .unique() method to the column. The method will return a NumPy array, in the order in which the values appear.

Let’s take a look at how we can get the unique values in the Education Status column:

# Get Unique Values for a Column in Pandasprint(df['Education Status'].unique())# Returns:# ['Graduate' 'Undergraduate' 'Postgraduate']

In the example above, we applied the .unique() method to the df['Education Status'] column. This returned the three unique values as a NumPy Array.

Let’s explore how we can return the unique values as a list in the next section.

Get Unique Values for a Pandas Column as a List

By default, the Pandas .unique() method returns a NumPy array of the unique values. In order to return a list instead, we can apply the .tolist() method to the array to convert it to a Python list.

Let’s see what this looks like:

# Get Unique Values for a Column in Pandas as a Listprint(df['Education Status'].unique().tolist())# Returns:# ['Graduate' 'Undergraduate' 'Postgraduate']

In the example above, we applied the .tolist() method to our NumPy array, converting it to a list.

Let’s now take a look at how we can get unique values for multiple Pandas DataFrame columns.

Get Unique Values for Multiple Pandas DataFrame Columns

By default, the Pandas .unique() method can only be applied to a single column. This is because the method is a Pandas Series method, rather than a DataFrame method.

In order to get the unique values of multiple DataFrame columns, we can use the .drop_duplicates() method. This will return a DataFrame of all of the unique combinations.

Let’s take a look at what this looks like:

# Get Unique Values for Multiple DataFrame Columnsunique = df[['Education Status', 'Gender']].drop_duplicates()print(unique)# Returns:# Education Status Gender# 0 Graduate Female# 1 Graduate Male# 2 Undergraduate Male# 3 Postgraduate Female# 5 Undergraduate Female# 6 Postgraduate Male

The Pandas .drop_duplicates() method can be a helpful way to identify only the unique values across two or more columns.

Count Unique Values in a Pandas DataFrame Column

In order to count how many unique values exist in a given DataFrame column (or columns), we can apply the .nunique() method. The method will return a single value if applied to a single column, and a Pandas Series if applied to multiple columns.

Let’s see how we can use the .nunique() method to count how many unique values exist in a column:

# Count Unique Values in a Pandas DataFrame Columnnum_statuses = df['Employment Status'].nunique()print(num_statuses)# Returns: 3

The nunique method can be incredibly helpful to understand the number of unique values that exist in a column.

Count Occurrences of Unique Values in a Pandas DataFrame Column

In this section, we’ll explore how to count the occurrences of values across unique values. This, in essence, generates a frequency table for the unique values in a DataFrame column.

Let’s see how we can use the .value_counts() method to count occurrences of unique values in a Pandas DataFrame column:

# Count Occurrences of Unique Values in a Pandas DataFrame Columnprint(df['Education Status'].value_counts())# Returns:# Graduate 6# Postgraduate 5# Undergraduate 4# Name: Education Status, dtype: int64

When we applied the .value_counts() method to our DataFrame column, it returned a series in which each unique value is counted.

Frequently Asked Questions

What is the unique() method in Pandas?

The unique() method is is a Pandas method that is used to find the unique values in a Series object. It can be applied on a specific DataFrame column to return an array of unique values present in that column.

How are NaN values handled by the unique() method?

By default, the unique() method includes NaN values in its output array. In order to exclude missing values, you can first apply the .dropna() method to the column.

How can I sort the unique values of a DataFrame column when using the unique() method?

After using the unique() method to obtain the unique values in a DataFrame column, you can sort the resulting array by employing Python’s built-in sorted() function. This function accepts a sequence (such as the array returned by unique()) and returns a sorted list of elements.

How can I find the total number of unique values in a DataFrame column?

To find the total number of unique values in a DataFrame column, use the nunique() method. It is applied the same way as unique() but returns an integer count of distinct values rather than a list of unique values.

Conclusion

In this tutorial, you learned how to get unique values in a Pandas DataFrame, including getting unique values for a single column and across multiple columns. You first learned how to get the unique values for a single column, as well as for multiple columns. Then, you learned how to count unique values, as well as the occurrences of unique values. To learn more about the .unique() method, check out the official documentation.

Pandas Unique Function - All You Need to Know (with Examples) • datagy (2024)

References

Top Articles
Latest Posts
Article information

Author: Barbera Armstrong

Last Updated:

Views: 6187

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.