In SQL, the SELECT DISTINCT statement is used to retrieve unique values from a column or a set of columns in a table. This is particularly useful when you want to eliminate duplicate values from your query results. This chapter will cover everything you need to know about using SELECT DISTINCT, from basic syntax to advanced techniques, with detailed examples and explanations.
The SELECT DISTINCT
statement is used to return only distinct (different) values. In a table, a column may contain many duplicate values, and sometimes you only want to list the different (distinct) values.
SELECT DISTINCT column1, column2, ...
FROM table_name;
SELECT DISTINCT
: Indicates that the query should return only unique values.column1, column2, ...
: Specifies the columns from which to retrieve distinct values.FROM
: Specifies the table from which to retrieve the data.table_name
: The name of the table.Consider a table named employees
with the following data:
employee_id | first_name | last_name | department |
---|---|---|---|
1 | John | Doe | HR |
2 | Jane | Smith | Finance |
3 | Mike | Johnson | IT |
4 | John | Doe | IT |
To select distinct first names from the employees
table, you would use the following SQL statement:
SELECT DISTINCT first_name FROM employees;
Output of the Example
first_name |
---|
John |
Jane |
Mike |
When you specify multiple columns with SELECT DISTINCT
, the combination of values in those columns must be unique for a row to be included in the result set. This means that the entire row must be unique based on the combination of specified columns.
To select distinct combinations of first names and last names from the employees
table, you would use:
SELECT DISTINCT first_name, last_name FROM employees;
Output of the Example
first_name | last_name |
---|---|
John | Doe |
Jane | Smith |
Mike | Johnson |
Practical Use Case
Consider a table named sales
with the following data:
sale_id | product_id | customer_id | sale_date |
---|---|---|---|
1 | 101 | 1001 | 2023-01-01 |
2 | 102 | 1002 | 2023-01-02 |
3 | 101 | 1003 | 2023-01-01 |
4 | 101 | 1001 | 2023-01-03 |
To get distinct combinations of product_id and sale_date:
SELECT DISTINCT product_id, sale_date FROM sales;
Output of the Example
product_id | sale_date |
---|---|
101 | 2023-01-01 |
102 | 2023-01-02 |
101 | 2023-01-03 |
You can use the WHERE
clause with SELECT DISTINCT
to filter the data before selecting distinct values.
To select distinct departments from the employees
table where the department is not ‘IT’:
SELECT DISTINCT department FROM employees WHERE department <> 'IT';
Output of the Example
department |
---|
HR |
Finance |
You can use the ORDER BY
clause to sort the distinct values in the result set.
To select distinct departments from the employees
table and sort them alphabetically:
SELECT DISTINCT department FROM employees ORDER BY department;
Output of the Example
department |
---|
Finance |
HR |
IT |
You can also sort by columns not included in the SELECT DISTINCT
clause by using subqueries.
To select distinct first names and order by last names:
SELECT DISTINCT first_name FROM employees ORDER BY last_name;
This will result in an error because last_name
is not in the SELECT clause. Instead, you can use a subquery:
SELECT first_name FROM (SELECT DISTINCT first_name, last_name FROM employees) AS subquery ORDER BY last_name;
first_name |
---|
John |
Jane |
Mike |
You can combine SELECT DISTINCT
with aggregate functions to perform calculations on distinct values.
To count the number of distinct departments in the employees
table:
SELECT COUNT(DISTINCT department) FROM employees;
Output of the Example
count |
---|
3 |
You can use DISTINCT
inside aggregate functions to eliminate duplicate values before performing the calculation.
To calculate the total salary for distinct departments:
SELECT SUM(DISTINCT salary) FROM employees;
Output of the Example
sum |
---|
180000 |
Using SELECT DISTINCT
can impact query performance, especially on large tables. The database engine needs to compare each row to determine if it is unique, which can be computationally expensive.
Proper indexing can improve the performance of queries using SELECT DISTINCT
. Indexes on the columns involved in the DISTINCT clause can speed up the comparison process.
Example with Index
If the employees
table has an index on the department
column:
CREATE INDEX idx_department ON employees(department);
SELECT DISTINCT department FROM employees;
Will perform better due to the index.
Using SELECT DISTINCT *
is generally not recommended because it can return unexpected results if there are columns with many unique values.
Example to Avoid
SELECT DISTINCT * FROM employees;
Instead, specify only the columns you need:
SELECT DISTINCT first_name, last_name FROM employees;
Remember that SELECT DISTINCT
applies to the combination of all specified columns, not each column individually.
SELECT DISTINCT first_name, department FROM employees;
Ensures that the combination of first_name
and department
is unique.
Avoid using SELECT DISTINCT
with large result sets without proper indexing or optimization, as it can lead to performance issues.
The SELECT DISTINCT statement is a powerful tool in SQL that allows you to eliminate duplicate values from your query results. It is essential for data analysis, reporting, and ensuring data quality. This chapter covered the basics of SELECT DISTINCT, using it with multiple columns, filtering and sorting distinct values, combining it with aggregate functions, and understanding its performance implications. By mastering these concepts, you can write more efficient and effective SQL queries. Happy coding !❤️