Let's dig into the basics of the SQL query syntax by querying a table called invoices
. Each row in this table contains all the information related to a single invoice: customer details, the country in which the invoice was issued, its total amount, VAT rate, and so on.
Start with a simple query that retrieves (selects, in SQL terminology) all the data in the table:
SELECT * FROM learnsql.invoices;
SELECT
is exactly the same as select
; column names, on the other hand, are case-sensitive!
Note the *
wildcard symbol, which translates to "all the columns in this table"; in alternative, we can explicitly select only the columns we're interested in, for example:
SELECT id, business_name, country, total, vat_rate FROM learnsql.invoices;
Let's now change the query so that it only returns invoices issued in Germany:
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE country = 'Germany';
New syntax: the WHERE
keyword introduces the conditions’ block: now the database will test each row against our conditions (a single one, in this case) and only return the rows that fulfill them.
Think about the SELECT
and WHERE
clauses as working on two different axes of your table: the SELECT
clause works on the vertical axis, creating a projection that restricts the columns which are returned to you; the WHERE
clause works instead on the horizontal axis, creating a filter that restricts the rows which are returned to you.
Let's now do the opposite and select only the invoices which have not been issued in Germany:
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE country <> 'Germany';
Try it in the Notebook
There are multiple operators that we can use to filter data, depending on the column type. For example, on an integer column like `total` we could use:
SELECT * FROM learnsql.invoices WHERE total > 10000;
SELECT * FROM learnsql.invoices WHERE total >= 10000; -- greater or equal to
SELECT * FROM learnsql.invoices WHERE total < 10000;
SELECT * FROM learnsql.invoices WHERE total <= 10000; -- less than or equal to
Did you know? In PushMetrics you can have multiple queries in a single SQL block: just remember to add a trailing `;` to each. When you want to execute one of them, take care of selecting it before hitting the Execute button: if you forget to do so, you'll only see the results of the last one.
If you look closely at the results of the previous queries, you'll notice that some rows contain the `null` value in the `business_name` column: `null` is actually not a value, but a keyword indicating the absence of value. When writing conditions that involve `null`, we need to use a slightly different syntax:
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name IS NULL;
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name IS NOT NULL;
# don't do this!
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name = NULL;
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name <> NULL;
New syntax: IS [NOT] NULL
allows you to filter on columns that may contain null
values.
Important: using the =
/ <>
operators in combination with null
will return unexpected results, so take care to avoid doing so. Some databases will return a syntax error and outright refuse to run the query, while others might silently accept the query, but there's no guarantee that they will return the expected results.
Filtering data by a single condition is not very effective when our table has several columns, but luckily we can actually have multiple ones: when using more than one condition, we have to tell the database how we would like them to be combined.
Let's imagine that have two filters in mind:
Do we want a row to be returned only when it matches both conditions (= the invoice was issued in Germany and its total amount is greater than 100)? Then we should use the AND
keyword.
Do we want a row to be returned if it matches at least one condition? Then we should use the OR
keyword.
SELECT * FROM learnsql.invoices WHERE country = 'Germany' AND total > 100;
SELECT * FROM learnsql.invoices WHERE country = 'Germany' OR total > 100;
New syntax: AND
and OR
keywords, to combine conditions.
Deep dive: Database engines try their utmost to optimize query executions, so they'll short-circuit boolean conditions whenever possible. What this means is that they will evaluate the first condition and, depending on the chosen operator (AND
, OR
), they might decide that there's no need to evaluate any remaining condition. These are the situations in which short-circuiting will happen:
AND
: if the first condition evaluates to FALSE
, the row is immediately discarded from the result setOR
: if the first condition evaluates to TRUE
, the row is immediately added to the result setIn all other situations, it's necessary to also evaluate the next conditions performing the same steps as above, until it's clear whether the row belongs to the result set or not. In the worst case, all conditions will need to be evaluated.
References: Boolean logic
When using both AND
and OR
in the same query, we have to be aware of their precedence rules, which are best explained with a practical example.
Let's suppose that we would like to select all invoices that have already been paid and have been charged either 21% or a 19% VAT rate. Our first attempt might look like this:
SELECT paid, vat_rate
FROM learnsql.invoices
WHERE paid IS TRUE
AND vat_rate = 0.21 OR vat_rate = 0.19;
If you execute the query and inspect the results, you'll quickly spot some rows where paid
is actually false
, which is not what we wanted! The problem is that we assumed that our query would be executed as follows (note the added parentheses, to make logical precedence explicit):
SELECT paid, vat_rate
FROM learnsql.invoices
WHERE paid IS TRUE
AND (vat_rate = 0.21 OR vat_rate = 0.19);
whereas the database executed it as follows (again, note the parentheses):
SELECT paid, vat_rate
FROM learnsql.invoices
WHERE (paid IS TRUE
AND vat_rate = 0.21) OR vat_rate = 0.19;
thus returning rows for invoices which have already been paid and had 21% VAT rate, as well as those that had 19% VAT rate (regardless of their payment status).
In order to avoid this, we have to make our intended precedence explicit by using parentheses, as in the first of the above two queries.
New syntax: when filtering on boolean columns such as paid
, we have to use the IS TRUE / IS FALSE
syntax.
Side note: there’s actually a better way to write the above query, which doesn’t require us to use OR
:
SELECT paid, vat_rate
FROM learnsql.invoices
WHERE paid IS TRUE AND vat_rate IN (0.21, 0.19);
The IN
operator takes as input a comma-separated list of values that will be accepted for that column.
Are you ready to practice what you’ve just learned? Head over to the Notebook and try to solve the exercises you find there!
Note: the solution to each exercise is in the collapsed block right below it: you can expand it by clicking the arrow icon that appears on the far left when you hover it.
Sign up and start using PushMetrics for free.
Or schedule a demo and discuss your use case.