Retrieving Distinct with SQL

In SQL, the SELECT DISTINCT statement is used to retrieve unique values from a column or a set of columns in a table. This is particularly useful when you want to eliminate duplicate values from your query results. This chapter will cover everything you need to know about using SELECT DISTINCT, from basic syntax to advanced techniques, with detailed examples and explanations.

What is SELECT DISTINCT?

The SELECT DISTINCT statement is used to return only distinct (different) values. In a table, a column may contain many duplicate values, and sometimes you only want to list the different (distinct) values.

				
					SELECT DISTINCT column1, column2, ...
FROM table_name;

SELECT DISTINCT: Indicates that the query should return only unique values.
column1, column2, ...: Specifies the columns from which to retrieve distinct values.
FROM: Specifies the table from which to retrieve the data.
table_name: The name of the table.

Example

Consider a table named employees with the following data:

employee_id	first_name	last_name	department
1	John	Doe	HR
2	Jane	Smith	Finance
3	Mike	Johnson	IT
4	John	Doe	IT

To select distinct first names from the employees table, you would use the following SQL statement:

				
					SELECT DISTINCT first_name FROM employees;

Output of the Example

first_name
John
Jane
Mike

Using SELECT DISTINCT with Multiple Columns

What Happens When Using Multiple Columns?

When you specify multiple columns with SELECT DISTINCT, the combination of values in those columns must be unique for a row to be included in the result set. This means that the entire row must be unique based on the combination of specified columns.

Example

To select distinct combinations of first names and last names from the employees table, you would use:

				
					SELECT DISTINCT first_name, last_name FROM employees;

Output of the Example

first_name	last_name
John	Doe
Jane	Smith
Mike	Johnson

Practical Use Case

Consider a table named sales with the following data:

sale_id	product_id	customer_id	sale_date
1	101	1001	2023-01-01
2	102	1002	2023-01-02
3	101	1003	2023-01-01
4	101	1001	2023-01-03

To get distinct combinations of product_id and sale_date:

				
					SELECT DISTINCT product_id, sale_date FROM sales;

Output of the Example

product_id	sale_date
101	2023-01-01
102	2023-01-02
101	2023-01-03

Using SELECT DISTINCT with WHERE Clause

Filtering Distinct Values

You can use the WHERE clause with SELECT DISTINCT to filter the data before selecting distinct values.

Example

To select distinct departments from the employees table where the department is not ‘IT’:

				
					SELECT DISTINCT department FROM employees WHERE department <> 'IT';

Output of the Example

department
HR
Finance

Using SELECT DISTINCT with ORDER BY Clause

Sorting Distinct Values

You can use the ORDER BY clause to sort the distinct values in the result set.

Example

To select distinct departments from the employees table and sort them alphabetically:

				
					SELECT DISTINCT department FROM employees ORDER BY department;

Output of the Example

department
Finance
HR
IT

Advanced Sorting

You can also sort by columns not included in the SELECT DISTINCT clause by using subqueries.

Example

To select distinct first names and order by last names:

				
					SELECT DISTINCT first_name FROM employees ORDER BY last_name;

This will result in an error because last_name is not in the SELECT clause. Instead, you can use a subquery:

				
					SELECT first_name FROM (SELECT DISTINCT first_name, last_name FROM employees) AS subquery ORDER BY last_name;

first_name
John
Jane
Mike

SELECT DISTINCT with Aggregate Functions

Combining DISTINCT with Aggregate Functions

You can combine SELECT DISTINCT with aggregate functions to perform calculations on distinct values.

Example

To count the number of distinct departments in the employees table:

				
					SELECT COUNT(DISTINCT department) FROM employees;

Output of the Example

count
3

DISTINCT Inside Aggregate Functions

You can use DISTINCT inside aggregate functions to eliminate duplicate values before performing the calculation.

Example

To calculate the total salary for distinct departments:

				
					SELECT SUM(DISTINCT salary) FROM employees;

Output of the Example

sum
180000

Performance Considerations

Impact on Query Performance

Using SELECT DISTINCT can impact query performance, especially on large tables. The database engine needs to compare each row to determine if it is unique, which can be computationally expensive.

Indexes

Proper indexing can improve the performance of queries using SELECT DISTINCT. Indexes on the columns involved in the DISTINCT clause can speed up the comparison process.

Example with Index

If the employees table has an index on the department column:

				
					CREATE INDEX idx_department ON employees(department);

The query:

				
					SELECT DISTINCT department FROM employees;

Will perform better due to the index.

Common Pitfalls and How to Avoid Them

Selecting All Columns with DISTINCT

Using SELECT DISTINCT * is generally not recommended because it can return unexpected results if there are columns with many unique values.

Example to Avoid

				
					SELECT DISTINCT * FROM employees;

Instead, specify only the columns you need:

				
					SELECT DISTINCT first_name, last_name FROM employees;

Misunderstanding DISTINCT with Multiple Columns

Remember that SELECT DISTINCT applies to the combination of all specified columns, not each column individually.

Example

				
					SELECT DISTINCT first_name, department FROM employees;

Ensures that the combination of first_name and department is unique.

Using DISTINCT with Large Result Sets

Avoid using SELECT DISTINCT with large result sets without proper indexing or optimization, as it can lead to performance issues.

The SELECT DISTINCT statement is a powerful tool in SQL that allows you to eliminate duplicate values from your query results. It is essential for data analysis, reporting, and ensuring data quality. This chapter covered the basics of SELECT DISTINCT, using it with multiple columns, filtering and sorting distinct values, combining it with aggregate functions, and understanding its performance implications. By mastering these concepts, you can write more efficient and effective SQL queries. Happy coding !❤️

Retrieving Distinct with SQL

What is SELECT DISTINCT?

Example

Using SELECT DISTINCT with Multiple Columns

What Happens When Using Multiple Columns?

Example

Using SELECT DISTINCT with WHERE Clause

Filtering Distinct Values

Example

Using SELECT DISTINCT with ORDER BY Clause

Sorting Distinct Values

Example

Advanced Sorting

Example

SELECT DISTINCT with Aggregate Functions

Combining DISTINCT with Aggregate Functions

Example

DISTINCT Inside Aggregate Functions

Example

Performance Considerations

Impact on Query Performance

Indexes

The query:

Common Pitfalls and How to Avoid Them

Selecting All Columns with DISTINCT

Misunderstanding DISTINCT with Multiple Columns

Example

Using DISTINCT with Large Result Sets

Table of Contents

Explore

Popular Tutorials

Contact here