As the workhorse of data analytics and reporting, the SQL language enables both users and developers slice-and-dice a database for desired information. With a natural-language-based syntax, SQL is relatively easy for business analysts to learn for basic database analysis. As analysts become more comfortable with the language, however, they may require more advanced techniques, such as aggregates on subsets of data within a query. In these situations, correlated subqueries can bring new capability and virtually unlimited depth to a SQL statement.
A simple example of a correlated subquery can be found in a Book database. Let’s assume a table of authors and books as follows:
Author
- Author_id
- Author_first_name
- Author_last_name
Book
- Book_id
- Author_id
- Book_name
- Book_num_sold
At the beginning of the analysis, we will perform a query on all the authors in the database.
select first_name, last_name from author
As we start analyzing the data, the we might be interested to discover the most prolific writers with the most published books:
select first_name, last_name,
(select count(*) from book where book.author_id = author.author_id) as num_books
from author
The subquery enables querying one dataset within another, as seen by the select statement within a select statement. As we further browse the data, we might next be interested to learn the most popular book by each author. Although this would have been a challenging task in basic SQL, it is easily accomplished through correlated subqueries:
select first_name, last_name,
(select count(*) from book where book.author_id = author.author_id) as num_books,
(select top 1 book_name from book where book.author_id = author.author_id order by book_num_sold desc) as most_popular_book,
from author
Using this technique, correlated subqueries can turn complicated join expressions into simple procedural functions. This, in turn, can reduce development time by improving the reusability and maintainability of code.
The power of correlated subqueries, however, can also come with a few drawbacks. Poorly written subqueries can reduce SQL performance. If two expressions referencing the same data set are used within the same expression, the SQL interpreter may not optimize the expression correctly and result in double the execution time. If the datasets are highly intertwined, it is instead recommended to use joins or temporary tables to improve query speed. These drawbacks are mostly performance issues, and can often be fixed through optimization. A well-written correlated subquery can provide a unique tool for proceduralizing the SQL language. With its flexibility and recursive nesting, there are few tasks that cannot be accomplished within a single SQL statement through the judicious and creative use of correlated subqueries.
Check description of our Database Services at Custom Database Software.
Written by Andrew Palczewski
About the Author
Andrew Palczewski is CEO of apHarmony, a Chicago software development company. He holds a Master's degree in Computer Engineering from the University of Illinois at Urbana-Champaign and has over ten years' experience in managing development of software projects.
Google+