Why Jinja and SQL?
Jinja can be used in combination with SQL to create dynamic and flexible SQL queries. Jinja allows you to include variables and expressions in your SQL queries, and to use control structures like if / else
and for
loops to control the flow of execution and generate more complex queries.
Reuse is valuable because it promotes consistency in your data pipelines, saving time and improving maintainability: maintaining 1 configurable query is easier than maintaining 10 "very similar yet slightly different" queries.
Jinja enables you to generate SQL queries dynamically based on input data, as in the following example:
{% set columns = [ 'first_name', 'last_name', 'username', 'email' ] %}
SELECT
{%- for value in columns %}
{{ value }}
{%- if not loop.last -%},{% endif -%}
{% endfor %}
FROM users;
rendering the above template outputs
SELECT
first_name,
last_name,
username,
email
FROM users;
Declaring and using columns
instead of having to write each value in the query makes the query more readable. This query template can be even more generic, making the table name configurable as well
{% set columns = [ 'first_name', 'last_name', 'username', 'email' ] %}
{% set table_name = "users" %}
SELECT
{%- for value in columns %}
{{ value }}
{%- if not loop.last -%}, {% endif -%}
{% endfor %}
FROM {{ table_name }}
This template now works for any basic query!
You can keep going and add support for a where
clause, when provided
{% set columns = [ 'last_name', 'username', 'email' ] %}
{% set table_name = "users" %}
{% set clauses = {"first_name": "John", "age": 25} %}
SELECT
{%- for value in columns %}
{{ value }}
{%- if not loop.last -%}, {% endif -%}
{% endfor %}
FROM {{ table_name }}
{% if clauses %}
WHERE
{% for name, value in clauses.items() -%}
{{ name -}} = {%- if value is string -%}"{%- endif -%}{{- value }}{%- if value is string -%}"{%- endif -%}
{% if not loop.last %} AND {% endif -%}
{% endfor %}
{% endif %}
Rendering the above results in
SELECT
last_name,
username,
email
FROM users
WHERE
first_name="John" AND age=25
There's a lot going on in the last part of the query, let's break it down:
- first, check if any
clauses
have been defined: according to truthiness rules for dictionaries, this condition will only beTrue
if the dictionary contains at least one item - append
WHERE
to the query - loop over the items in
clauses
and for each of them - append the
name
clause to the query followed by an equal sign (=
), then - if
value
is of typestring
, surround it with quotes - otherwise, just append
value
to the query - if this is not the last item in the dictionary, append
AND
This query is still not bullet-proof (it cannot deal with IN
clauses, for example), but it can already get you pretty far.
You can do even more by using macros and/or template inheritance to reuse common template code, simplifying your ETL pipelines.
Jinja and SQL in PushMetrics
In PushMetrics SQL blocks you can preview your evaluated Jinja by clicking the Jinja button: