Correlated Subqueries

Correlated subqueries are a powerful concept in SQL that allow dynamic interaction between the inner and outer queries. Unlike regular subqueries, correlated subqueries reference columns from the outer query, creating a dependency that evaluates the subquery for each row processed by the outer query.

What Are Correlated Subqueries?

Definition

A correlated subquery is a subquery that references one or more columns from the outer query. It executes once for every row processed by the outer query.

Importance

Correlated subqueries are used when:

The subquery needs to compare values row by row.
Dynamic relationships between the main and subquery are necessary.

Example of Dependency

				
					SELECT e1.name, e1.salary
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

The inner query (e2.salary) depends on the e1.department_id from the outer query.

Key Differences Between Correlated and Non-Correlated Subqueries

Aspect	Correlated Subqueries	Non-Correlated Subqueries
Dependency	References columns from the outer query	Executes independently
Execution	Executes once per row in the outer query	Executes only once, regardless of rows
Performance	Generally slower due to repeated execution	Faster because of single execution

Syntax of Correlated Subqueries

				
					SELECT column1, column2
FROM table1 alias1
WHERE column3 OPERATOR (
    SELECT aggregate_function(column4)
    FROM table2 alias2
    WHERE alias2.column5 = alias1.column6
);

Explanation:

Outer Query: Processes rows one by one.
Inner Query: Executes for each row of the outer query using its column values.

Use Cases of Correlated Subqueries

Filtering Rows

Filter rows dynamically based on related table data.

Example: Find employees earning above the average salary of their department:

				
					SELECT name, salary, department_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

Explanation:

The inner query calculates the average salary for the department of each employee.
The outer query filters employees with a salary above this average.

Output:

Name	Salary	Department_ID
Alice	75000	101
Bob	90000	102

Aggregated Comparisons

Find rows based on group-level calculations.

Example: Identify products priced higher than the average price in their category:

				
					SELECT product_name, price, category_id
FROM products p1
WHERE price > (
    SELECT AVG(price)
    FROM products p2
    WHERE p2.category_id = p1.category_id
);

Output:

Product_Name	Price	Category_ID
Laptop	1200	1
Smartphone	800	2

Data Updates

Update rows based on related table data.

Example: Increase salaries of employees earning less than the average salary of their department

				
					UPDATE employees e1
SET salary = salary * 1.10
WHERE salary < (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

Explanation:

The inner query computes the department-wise average salary.
The outer query identifies employees earning below this threshold and updates their salary.

Data Deletion

Delete rows dynamically based on related data.

Example: Remove employees whose salary is the lowest in their department

				
					DELETE FROM employees e1
WHERE salary = (
    SELECT MIN(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

Output Before Deletion

Name	Salary	Department_ID
John	30000	101
Alice	75000	101

Output After Deletion

Name	Salary	Department_ID
Alice	75000	101

Performance Considerations

Execution Overhead

Correlated subqueries execute for every row in the outer query, which can slow performance, especially for large datasets.

Optimization Techniques

Indexes: Use indexes on columns referenced in the WHERE clause of the subquery.
Rewrite as Joins: Rewrite correlated subqueries as joins for better performance where possible.

Example: Equivalent join for a correlated subquery

				
					SELECT e1.name, e1.salary
FROM employees e1
JOIN (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) avg_dept ON e1.department_id = avg_dept.department_id
WHERE e1.salary > avg_dept.avg_salary;

Best Practices for Using Correlated Subqueries

Understand the Need: Use correlated subqueries only when dynamic row-by-row evaluation is required.
Avoid Redundancy: Rewrite as joins if the correlation is not essential.
Test Performance: Evaluate execution plans to identify bottlenecks.

Common Pitfalls

Multiple Rows in Subquery: Ensure the inner query returns only one value per outer query row.
- Use aggregate functions (MAX, MIN) or ensure unique values.
Slow Execution: Test for alternatives like joins for large datasets.

Correlated subqueries are indispensable when querying data dynamically with interdependencies between rows and tables. While they offer flexibility, their row-by-row execution can impact performance. Happy Coding!❤️

Correlated Subqueries

What Are Correlated Subqueries?

Definition

Importance

Example of Dependency

Key Differences Between Correlated and Non-Correlated Subqueries

Syntax of Correlated Subqueries

Explanation:

Use Cases of Correlated Subqueries

Filtering Rows

Aggregated Comparisons

Output:

Data Updates

Explanation:

Data Deletion

Output Before Deletion

Output After Deletion

Performance Considerations

Execution Overhead

Optimization Techniques

Best Practices for Using Correlated Subqueries

Common Pitfalls

Table of Contents

Explore

Popular Tutorials

Contact here