Basic query syntax

Let's dig into the basics of the SQL query syntax by querying a table called invoices. Each row in this table contains all the information related to a single invoice: customer details, the country in which the invoice was issued, its total amount, VAT rate, and so on.

Start with a simple query that retrieves (selects, in SQL terminology) all the data in the table:

SELECT * FROM learnsql.invoices;

Note the * wildcard symbol, which translates to "all the columns in this table"; in alternative, we can explicitly select only the columns we're interested in, for example:

SELECT id, business_name, country, total, vat_rate FROM learnsql.invoices;

Filtering data

Let's now change the query so that it only returns invoices issued in Germany:

SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE country = 'Germany';

New syntax: the WHERE keyword introduces the conditions’ block: now the database will test each row against our conditions (a single one, in this case) and only return the rows that fulfill them.

Think about the SELECT and WHERE clauses as working on two different axes of your table: the SELECT clause works on the vertical axis, creating a projection that restricts the columns which are returned to you; the WHERE clause works instead on the horizontal axis, creating a filter that restricts the rows which are returned to you.

Let's now do the opposite and select only the invoices which have not been issued in Germany:

SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE country <> 'Germany';

Try it in the Notebook

There are multiple operators that we can use to filter data, depending on the column type. For example, on an integer column like `total` we could use:

SELECT * FROM learnsql.invoices WHERE total  > 10000;
SELECT * FROM learnsql.invoices WHERE total >= 10000; -- greater or equal to
SELECT * FROM learnsql.invoices WHERE total  < 10000;
SELECT * FROM learnsql.invoices WHERE total <= 10000; -- less than or equal to < code>

Did you know? In PushMetrics you can have multiple queries in a single SQL block: just remember to add a trailing `;` to each. When you want to execute one of them, take care of selecting it before hitting the Execute button: if you forget to do so, you'll only see the results of the last one.

If you look closely at the results of the previous queries, you'll notice that some rows contain the `null` value in the `business_name` column: `null` is actually not a value, but a keyword indicating the absence of value. When writing conditions that involve `null`, we need to use a slightly different syntax:

SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name IS NULL;
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name IS NOT NULL;
# don't do this!
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name = NULL;
SELECT id, business_name, total, vat_rate FROM learnsql.invoices WHERE business_name <> NULL;

New syntax: IS [NOT] NULL allows you to filter on columns that may contain null values.

Important: using the =  / <> operators in combination with null will return unexpected results, so take care to avoid doing so. Some databases will return a syntax error and outright refuse to run the query, while others might silently accept the query, but there's no guarantee that they will return the expected results.

Combining conditions

Filtering data by a single condition is not very effective when our table has several columns, but luckily we can actually have multiple ones: when using more than one condition, we have to tell the database how we would like them to be combined.

Let's imagine that have two filters in mind:

  • the invoice has been issued in Germany
  • its total amount is greater than 100

Do we want a row to be returned only when it matches both conditions (= the invoice was issued in Germany and its total amount is greater than 100)? Then we should use the AND keyword.

Do we want a row to be returned if it matches at least one condition? Then we should use the OR keyword.

SELECT * FROM learnsql.invoices WHERE country = 'Germany' AND total > 100;
SELECT * FROM learnsql.invoices WHERE country = 'Germany' OR total > 100;

New syntax: AND and OR keywords, to combine conditions.

Deep dive: Database engines try their utmost to optimize query executions, so they'll short-circuit boolean conditions whenever possible. What this means is that they will evaluate the first condition and, depending on the chosen operator (AND , OR), they might decide that there's no need to evaluate any remaining condition. These are the situations in which short-circuiting will happen:

  • in case of  AND : if the first condition evaluates to FALSE , the row is immediately discarded from the result set
  • in case of OR: if the first condition evaluates to TRUE, the row is immediately added to the result set

In all other situations, it's necessary to also evaluate the next conditions performing the same steps as above, until it's clear whether the row belongs to the result set or not. In the worst case, all conditions will need to be evaluated.

References: Boolean logic

Using both AND and OR in the same query

When using both AND and OR in the same query, we have to be aware of their precedence rules, which are best explained with a practical example.

Let's suppose that we would like to select all invoices that have already been paid and have been charged either 21% or a 19% VAT rate. Our first attempt might look like this:

SELECT paid, vat_rate 
FROM learnsql.invoices 
WHERE paid IS TRUE 
AND vat_rate = 0.21 OR vat_rate = 0.19;

If you execute the query and inspect the results, you'll quickly spot some rows where paid is actually false, which is not what we wanted! The problem is that we assumed that our query would be executed as follows (note the added parentheses, to make logical precedence explicit):

SELECT paid, vat_rate 
FROM learnsql.invoices 
WHERE paid IS TRUE 
AND (vat_rate = 0.21 OR vat_rate = 0.19);

whereas the database executed it as follows (again, note the parentheses):

SELECT paid, vat_rate 
FROM learnsql.invoices 
WHERE (paid IS TRUE 
AND vat_rate = 0.21) OR vat_rate = 0.19;

thus returning rows for invoices which have already been paid and had 21% VAT rate, as well as those that had 19% VAT rate (regardless of their payment status).

In order to avoid this, we have to make our intended precedence explicit by using parentheses, as in the first of the above two queries.

New syntax: when filtering on boolean columns such as paid, we have to use the IS TRUE / IS FALSE syntax.

Side note: there’s actually a better way to write the above query, which doesn’t require us to use OR:

SELECT paid, vat_rate 
FROM learnsql.invoices 
WHERE paid IS TRUE AND vat_rate IN (0.21, 0.19);

The IN operator takes as input a comma-separated list of values that will be accepted for that column.

Practice

Are you ready to practice what you’ve just learned? Head over to the Notebook and try to solve the exercises you find there!

Note: the solution to each exercise is in the collapsed block right below it: you can expand it by clicking the arrow icon that appears on the far left when you hover it.