Correlated Subqueries

Correlated subqueries are a powerful concept in SQL that allow dynamic interaction between the inner and outer queries. Unlike regular subqueries, correlated subqueries reference columns from the outer query, creating a dependency that evaluates the subquery for each row processed by the outer query.

What Are Correlated Subqueries?

Definition

A correlated subquery is a subquery that references one or more columns from the outer query. It executes once for every row processed by the outer query.

Importance

Correlated subqueries are used when:

  • The subquery needs to compare values row by row.
  • Dynamic relationships between the main and subquery are necessary.

Example of Dependency

				
					SELECT e1.name, e1.salary
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

				
			
  • The inner query (e2.salary) depends on the e1.department_id from the outer query.

Key Differences Between Correlated and Non-Correlated Subqueries

AspectCorrelated SubqueriesNon-Correlated Subqueries
DependencyReferences columns from the outer queryExecutes independently
ExecutionExecutes once per row in the outer queryExecutes only once, regardless of rows
PerformanceGenerally slower due to repeated executionFaster because of single execution

Syntax of Correlated Subqueries

				
					SELECT column1, column2
FROM table1 alias1
WHERE column3 OPERATOR (
    SELECT aggregate_function(column4)
    FROM table2 alias2
    WHERE alias2.column5 = alias1.column6
);

				
			

Explanation:

  1. Outer Query: Processes rows one by one.
  2. Inner Query: Executes for each row of the outer query using its column values.

Use Cases of Correlated Subqueries

Filtering Rows

Filter rows dynamically based on related table data.

Example: Find employees earning above the average salary of their department:

				
					SELECT name, salary, department_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

				
			

Explanation:

  • The inner query calculates the average salary for the department of each employee.
  • The outer query filters employees with a salary above this average.

Output:

NameSalaryDepartment_ID
Alice75000101
Bob90000102

Aggregated Comparisons

Find rows based on group-level calculations.

Example: Identify products priced higher than the average price in their category:

				
					SELECT product_name, price, category_id
FROM products p1
WHERE price > (
    SELECT AVG(price)
    FROM products p2
    WHERE p2.category_id = p1.category_id
);

				
			

Output:

Product_NamePriceCategory_ID
Laptop12001
Smartphone8002

Data Updates

Update rows based on related table data.

Example: Increase salaries of employees earning less than the average salary of their department

				
					UPDATE employees e1
SET salary = salary * 1.10
WHERE salary < (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

				
			

Explanation:

  • The inner query computes the department-wise average salary.
  • The outer query identifies employees earning below this threshold and updates their salary.

Data Deletion

Delete rows dynamically based on related data.

Example: Remove employees whose salary is the lowest in their department

				
					DELETE FROM employees e1
WHERE salary = (
    SELECT MIN(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

				
			

Output Before Deletion

NameSalaryDepartment_ID
John30000101
Alice75000101

Output After Deletion

NameSalaryDepartment_ID
Alice75000101

Performance Considerations

Execution Overhead

Correlated subqueries execute for every row in the outer query, which can slow performance, especially for large datasets.

Optimization Techniques

  • Indexes: Use indexes on columns referenced in the WHERE clause of the subquery.
  • Rewrite as Joins: Rewrite correlated subqueries as joins for better performance where possible.

Example: Equivalent join for a correlated subquery

				
					SELECT e1.name, e1.salary
FROM employees e1
JOIN (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) avg_dept ON e1.department_id = avg_dept.department_id
WHERE e1.salary > avg_dept.avg_salary;

				
			

Best Practices for Using Correlated Subqueries

  • Understand the Need: Use correlated subqueries only when dynamic row-by-row evaluation is required.
  • Avoid Redundancy: Rewrite as joins if the correlation is not essential.
  • Test Performance: Evaluate execution plans to identify bottlenecks.

Common Pitfalls

  1. Multiple Rows in Subquery: Ensure the inner query returns only one value per outer query row.
    • Use aggregate functions (MAX, MIN) or ensure unique values.
  2. Slow Execution: Test for alternatives like joins for large datasets.

Correlated subqueries are indispensable when querying data dynamically with interdependencies between rows and tables. While they offer flexibility, their row-by-row execution can impact performance. Happy Coding!❤️

Table of Contents