Sub Queries
- The SQL language is very powerful and rich, there are many ways to solve one problem:
- Join condition × Where condition,
- Join ×
IN
operator,
- Join × sub-query.
- All approaches may be combined freely.
- For some problems, some approaches cannot be used.
- Alert: Heavy thinking required!
Example – Query 1
Example – Query 2
Example – Query 3
Example – Query 4
Example Summary
- Example of 4 different approaches to retrieving persons with e-mails.
- They can be combined freely.
- Query 3 as well as 4 does not allow to select the
contact
value, otherwise results are same.
- Query 1 – join everything together, then select what you want.
- Query 2 – select first, then join together (mind aliases).
- Query 3 – use the
IN
operator and sub-queries to select sets.
- Query 4 – use the
EXISTS
operator and correlated sub-queries.
Another Example – Query 1A
- Gives you a list of persons living in Brno city.
Example – Query 1B
- Same result as Query 1A
- Sub-query returns a table named
location_brno
which contains only addresses in Brno
Example – Query 2A
- Without any condition, the same result.
- With a condition, gives you a list of persons living in Brno city or without an address.
Example – Query 2B
- Gives a list of all persons and their addresses provided that they live in Brno.
Example – Summary
- In Queries 2A and 2B the search condition is moved into a sub-query.
- Queries 1A and 2A are functionally equivalent.
- 1A is simpler, but whatever you do, you cannot change it to the result 2B.
- It is an example of a dead-end approach.
- An example of how tiny changes in the query can lead to very different results.
- An example of how tiny changes in the requirements can lead to very different queries.
Aggregation
- Aggregation in SQL is an operation which merges two rows (entities) into one:
- Needs an aggregation function (SUM, MAX, AVG, …).
- Causes loss of information.
- Used to gain overview / high-level information.
- Used when not interested in all details.
- Examples:
- What is the sum of sales on each day? (vs. list of all receipts)
- What is the average age of our customer? (vs. list of all customers)
Aggregation functions
- Commonly used functions: COUNT, SUM, MAX, MIN, AVG.
- Argument is a column (or column_expression).
- Examples:
SELECT COUNT(*) FROM person
– number of rows in the person
table (49).
SELECT COUNT(height) FROM person
– number of values in the height
column (40).
SELECT COUNT(DISTINCT height) FROM person
– number of unique values in the height
column (26).
- Combining
DISTINCT
and *
makes no sense.
- All of the above queries return a single row – they aggregate the entire table.
GROUP BY clause
SELECT [ ALL | DISTINCT ] column_expression, ...
FROM table_expression
[ WHERE search_condition ]
[ GROUP BY column_expression [, ... ] ]
[ HAVING search_condition ]
[ ORDER BY { column_expression [ ASC | DESC ] }
[, column_expression [ASC | DESC ], ... ]
Aggregation of Groups
- Aggregating the entire table is not that useful – too big loss of information.
- Use the
GROUP BY
statement to divide table in groups.
- An aggregation function is applied to each group.
- The result of aggregation contains one row per group.
- A group is defined by a column (or columns).
- Each distinct value (or combination of values) identifies one group.
Example
- Number of emails for each person having a contact.
- Source table is
contact
:
id_contact |
id_person |
id_contact_type |
contact |
2 |
1 |
4 |
john@example.com |
3 |
1 |
4 |
johnny@example.com |
8 |
1 |
4 |
master@example.com |
4 |
2 |
4 |
jill@example.com |
6 |
2 |
4 |
jj@example.com |
7 |
6 |
4 |
tom@example.com |
Example cont.
SELECT id_person, COUNT(*) FROM contact GROUP BY id_person
COUNT
function is applied to each group of id_person
values.
- How many times does
id_person = X
occur in the table?
- Multiple rows of the original table are merged into one.
id_person |
COUNT |
1 |
3 |
2 |
2 |
6 |
1 |
Example 2 – Aggregation with Optional elements
- Number of emails for each person (including those not having any).
- Non-solution (returns 1 for a person without any contact, does not return persons without an email):
Correction – All persons (sub-query)
Correction – All persons (join condition)
Correction – Count Only Values
- Column XY must appear in the GROUP BY clause or be used in an aggregate function.
person.first_name
can be added without problem.
contact.contact
makes no sense here:
HAVING clause
SELECT [ ALL | DISTINCT ] column_expression, ...
FROM table_expression
[ WHERE search_condition ]
[ GROUP BY column_expression [, ... ] ]
[ HAVING search_condition ]
[ ORDER BY { column_expression [ ASC | DESC ] }
[, column_expression [ASC | DESC ], ... ]
- A search condition applied to the result of aggregation:
SQL Summary
- SQL is rich and very complicated language.
- It has many features, but CRUD are the most important ones.
- JOINs, aggregation and basic sub-queries are essential.
- JOIN must be used whenever you need to display data from multiple tables.
- Reading and Debugging SQL queries must be done hierarchically.
SQL Summary
- Selecting the right data is complex, you need to understand the requirement.
- Nothing will help you with that.
- SQL is not a procedural language, you cannot do something and then fix it. You
need to be precise and perfect in implementation.
- If a SQL query does not return anything, it does not mean that it is wrong (and vice versa)
Checkpoint
- Does every SELECT with an aggregation function need to have a GROUP BY?
- Can you substitute LEFT JOIN with RIGHT JOIN?
- What conditions must a sub-query used with
IN
satisfy?
- Can you list more than one column in the
GROUP BY
clause?
- Can you use a sub-query without
JOIN
?
- Can you use more than one aggregation function in a single query?
- Does it make sense to use GROUP BY without an aggregation function?
- Can you fully replace JOIN with sub-queries?