SQLFLUFF(1) | SQLFluff | SQLFLUFF(1) |
sqlfluff - SQLFluff 1.4.5
Bored of not having a good SQL linter that works with whichever dialect you're working with? Fluff is an extensible and modular linter designed to help you write good SQL and catch errors and bad SQL before it hits your database.
Notable changes:
Want to see where and how people are using SQLFluff in their projects? Head over to SQLFluff in the Wild for inspiration.
To get started just install the package, make a sql file and then run SQLFluff and point it at the file. For more details or if you don't have python or pip already installed see Getting Started.
$ pip install sqlfluff $ echo " SELECT a + b FROM tbl; " > test.sql $ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | L050 | Files must not begin with newlines or whitespace. L: 1 | P: 3 | L003 | First line has unexpected indent L: 1 | P: 11 | L039 | Unnecessary whitespace found. L: 1 | P: 14 | L039 | Unnecessary whitespace found. L: 1 | P: 27 | L001 | Unnecessary trailing whitespace.
To get started with SQLFluff you'll need python and pip installed on your machine, if you're already set up, you can skip straight to Installing sqlfluff.
How to install python and pip depends on what operating system you're using. In any case, the python wiki provides up to date instructions for all platforms here.
There's a chance that you'll be offered the choice between python versions. Support for python 2 was dropped in early 2020, so you should always opt for a version number starting with a 3. As for more specific options beyond that, SQLFluff aims to be compatible with all current python versions, and so it's best to pick the most recent.
You can confirm that python is working as expected by heading to your terminal or console of choice and typing python --version which should give you a sensible read out and not an error.
$ python --version Python 3.9.1
For most people, their installation of python will come with pip (the python package manager) preinstalled. To confirm this you can type pip --version similar to python above.
$ pip --version pip 21.3.1 from ...
If however, you do have python installed but not pip, then the best instructions for what to do next are on the python website.
Assuming that python and pip are already installed, then installing SQLFluff is straight forward.
$ pip install sqlfluff
You can confirm its installation by getting SQLFluff to show its version number.
$ sqlfluff version 1.4.5
To get a feel for how to use SQLFluff it helps to have a small .sql file which has a simple structure and some known issues for testing. Create a file called test.sql in the same folder that you're currently in with the following content:
SELECT a+b AS foo, c AS bar from my_table
You can then run sqlfluff lint test.sql --dialect ansi to lint this file.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations
| and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is
| only one select target. L: 1 | P: 9 | L006 | Missing whitespace before + L: 1 | P: 9 | L006 | Missing whitespace after + L: 1 | P: 11 | L039 | Unnecessary whitespace found. L: 2 | P: 1 | L003 | Expected 1 indentations, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case.
You'll see that SQLFluff has failed the linting check for this file. On each of the following lines you can see each of the problems it has found, with some information about the location and what kind of problem there is. One of the errors has been found on line 1, position * (as shown by :code:`L: 1 | P: 9`) and it's a problem with rule *L006 (for a full list of rules, see Rules Reference). From this (and the following error) we can see that the problem is that there is no space either side of the + symbol in a+b. Head into the file, and correct this issue so that the file now looks like this:
SELECT a + b AS foo, c AS bar from my_table
Rerun the same command as before, and you'll see that the original error (violation of L006) no longer shows up.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations
| and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is
| only one select target. L: 1 | P: 13 | L039 | Unnecessary whitespace found. L: 2 | P: 1 | L003 | Expected 1 indentations, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case.
To fix the remaining issues, we're going to use one of the more advanced features of SQLFluff, which is the fix command. This allows more automated fixing of some errors, to save you time in sorting out your sql files. Not all rules can be fixed in this way and there may be some situations where a fix may not be able to be applied because of the context of the query, but in many simple cases it's a good place to start.
For now, we only want to fix the following rules: L003, L009, L010
$ sqlfluff fix test.sql --rules L003,L009,L010 --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 2 | P: 1 | L003 | Expected 1 indentations, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case. ==== fixing violations ==== 2 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n]
...at this point you'll have to confirm that you want to make the changes by pressing y on your keyboard...
Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content is now different.
SELECT a + b AS foo,
c AS bar FROM my_table
In particular:
We could also fix all of the fixable errors by not specifying --rules.
$ sqlfluff fix test.sql --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations
| and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is
| only one select target. L: 1 | P: 13 | L039 | Unnecessary whitespace found. ==== fixing violations ==== 3 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content has been updated again.
SELECT
c AS bar,
a + b AS foo FROM my_table
The SQL statement is now well formatted according to all the rules defined in SQLFluff.
The --rules argument is optional, and could be useful when you or your organisation follows a slightly different convention than what we have defined.
So far we've covered the stock settings of SQLFluff, but there are many different ways that people style their sql, and if you or your organisation have different conventions, then many of these behaviours can be configured. For example, given the example above, what if we actually think that indents should only be two spaces, and rather than uppercase keywords, they should all be lowercase?
To achieve this we create a configuration file named .sqlfluff and place it in the same directory as the current file. In that file put the following content:
[sqlfluff] dialect = ansi [sqlfluff:rules] tab_space_size = 2 [sqlfluff:rules:L010] capitalisation_policy = lower
Then rerun the same command as before.
$ sqlfluff fix test.sql --rules L003,L009,L010,L034,L036,L039
Then examine the file again, and you'll notice that the file has been fixed accordingly.
select c as bar, a + b as foo from my_table
For a full list of configuration options check out Default Configuration. To see how these options apply to specific rules check out the "Configuration" section within each rule's documentation in Rules Reference.
From here, there are several more things to explore.
One last thing to note is that SQLFluff is a relatively new project and you may find bugs or strange things while using it. If you do find anything, the most useful thing you can do is to post the issue on GitHub where the maintainers of the project can work out what to do with it. The project is in active development and so updates and fixes may come out regularly.
SQL has been around for a long time, as a language for communicating with databases, like a communication protocol. More recently with the rise of data as a business function, or a domain in its own right SQL has also become an invaluable tool for defining the structure of data and analysis - not just as a one off but as a form of infrastructure as code.
As analytics transitions from a profession of people doing one-offs, and moves to building stable and reusable pieces of analytics, more and more principles from software engineering are moving in the analytics space. One of the best articulations of this is written in the viewpoint section of the docs for the open-source tool dbt. Two of the principles mentioned in that article are quality assurance and modularity.
The primary aim of SQLFluff as a project is in service of that first aim of quality assurance. With larger and larger teams maintaining large bodies of SQL code, it becomes more and more important that the code is not just valid but also easily comprehensible by other users of the same codebase. One way to ensure readability is to enforce a consistent style, and the tools used to do this are called linters.
Some famous linters which are well known in the software community are flake8 and jslint (the former is used to lint the SQLFluff project itself).
SQLFluff aims to fill this space for SQL.
SQL itself doesn't lend itself well to modularity, so to introduce some flexibility and reusability it is often templated. Typically this is done in the wild in one of the following ways:
"SELECT {foo} FROM {tbl}".format(foo="bar", tbl="mytable")
Which would evaluate to:
SELECT bar FROM mytable
All of these templating tools are great for modularity but they also mean that the SQL files themselves are no longer valid SQL code, because they now contain these configured placeholder values, intended to improve modularity.
SQLFluff supports both of the templating methods outlined above, as well as dbt projects, to allow you to still lint these "dynamic" SQL files as part of your CI/CD pipeline (which is great 🙌), rather than waiting until you're in production (which is bad 🤦, and maybe too late).
During the CI/CD pipeline (or any time that we need to handle templated code), SQLFluff needs additional info in order to interpret your templates as valid SQL code. You do so by providing dummy parameters in SQLFluff configuration files. When substituted into the template, these values should evaluate to valid SQL (so SQLFluff can check its style, formatting, and correctness), but the values don't need to match actual values used in production. This means that you can use much simpler dummy values than what you would really use. The recommendation is to use the simplest possible dummy value that still allows your code to evaluate to valid SQL so that the configuration values can be as streamlined as possible.
SQLFluff has a few components:
The core vision [1] for SQLFluff is to be really good at being the linter. The reasoning for this is outlined in SQL in the Wild.
Most of the codebase for SQLFluff is the parser, mostly because at the point of developing SQLFluff, there didn't appear to be a good option for a whitespace-aware parser that could be used instead.
With regards to the rules, SQLFluff aims to be opinionated but it also accepts that many organisations and groups have pre-existing strong conventions around how to write SQL and so ultimately SQLFluff should be flexible enough to support whichever rule set a user wishes to.
Notes
Rolling out SQLFluff, like rolling out any other linter or style guide, is not just about the technical rollout, but also how you introduce the tool to the team and organisation around you.
With that in mind, it's worth reminding ourselves what we're trying to achieve with a tool like this. A set of potential success criteria might be:
You like leading commas? Make a PR to .sqlfluff and let's discuss with the team what the implications would be.
Consider which of these success measures is most important and most desirable for your team. Write that down.
The following steps are a guide, which you should adapt to your organisation, and in particular its level of data maturity.
This step is done by you, or a small group of people who already think that linting is a good idea.
There are three sensible rollout phases:
In each of these phases you have three levers to play with:
Work out a sensible roadmap of how hard you want to go in each phase. Be clear who is responsible for changes at each phase. An example plan might look like this:
Bring your team together to introduce both linting as a concept and also SQLFluff as a tool. At this stage it's really important that the team understand *why* this is a good thing.
Consider whether to discuss the whole plan from step 2, or whether to only talk about the first few steps. Aim to make this an empowering experience that everyone can get involved with rather than another piece of admin they need to do.
At this stage, you might also want to consider other tools in the SQLFluff ecosystem such as the SQLFluff pre-commit hook and the SQLFluff VSCode plugin or SQLFluff online formatter.
Once the plan is in motion, make sure to start putting in place norms and rituals around how you change the rules. In particular:
It's normal for your usage of tools like SQLFluff to change and evolve over time. It's important to expect this change in advance, and welcome it when it happens. Always make sure you're driving toward the success measures you decided up front, rather than just resisting the change.
Did it work? If so, spread the word. Tell a friend about SQLFluff.
If you're lucky they might share your views on comma placement 🤷♀️.
If there is one part of building a linter that is going to be controversial it's going to be whitespace (closely followed by cApiTaLiSaTiOn 😁).
More specifically, whitespace divides into three key themes:
SQLFluff aims to be opinionated on this theme, but also configurable (see Configuring Layout). The tool will have a default viewpoint and will aim to have views on all of the important aspects of SQL layout, but if you (or your organisation) don't like those views then we aim to allow enough configuration that you can lint in line with your views, and still use SQLFluff. For more information on how to configure rules to your own viewpoint see Configuration.
NOTE:
Of the different elements of whitespace, spacing is likely the least controversial. By default, all elements are separated by a single space character. Except for very specific circumstances (see section on Aligned elements), any additional space between elements is usually unwanted and a distraction for the reader. There are however several common cases where no whitespace is more appropriate, which fall into two cases (for more details on where to configure these see Configuring layout and spacing).
SELECT
col_a
, col_b
-- Newline present before column
, col_c
-- When inline, comma should still touch element before.
, GREATEST(col_d, col_e) as col_f FROM tbl_a
A special case of spacing is where elements are set to be aligned within some limits. This is not enabled by default, but can be be configured to achieve layouts like:
SELECT
a AS first_column,
b AS second_column,
(a + b) / 2 AS third_column FROM foo AS bar
In this example, the alias expressions are all aligned with each other. To configure this, SQLFluff needs to know what elements to align and how far to search to find elements which should be aligned with each other. The configuration to achieve this layout is:
[sqlfluff:layout:type:alias_expression] # We want non-default spacing _before_ the alias expressions. spacing_before = align # We want to align them within the next outer select clause. # This means for example that alias expressions within the FROM # or JOIN clause would _not_ be aligned with them. align_within = select_clause # The point at which to stop searching outward for siblings, which # in this example would likely be the boundary of a CTE. Stopping # when we hit brackets is usually a good rule of thumb for this # configuration. align_scope = bracketed
Of these configuration values, the align_scope is potentially the least obvious. The following example illustrates the impact it has.
-- With -- align_scope = bracketed -- align_within = select_clause WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar; -- With -- align_scope = bracketed -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- Now the FROM alias is also aligned. -- With -- align_scope = file -- align_within = select_clause WITH foo as (
SELECT
a,
b,
c AS first_column -- Now the aliases here are aligned
d + e AS second_column -- with the outer query. ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- With -- align_scope = file -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar
When controlling line breaks, we are trying to achieve a few different things:
Lastly, given we have multiple lines of SQL, to what extent should we indent some lines to provide visual cues to the structure of that SQL. It's important to note that SQL is not whitespace sensitive in its interpretation and that means that any principles we apply here are entirely for the benefit of humans. Your database doesn't care.
The indentation therefore should be treated as a hint to the reader of the structure of the code. This explains the common practice within most languages that nested elements (for example the contents of a set of brackets in a function call) should be indented one step from the outer elements. It's also convention that elements with the same level in a nested structure should have the same indentation, at least with regards to their local surroundings. As an example:
SELECT
nested_within_select AS first_column,
some_function(
nested_within_function,
also_nested_within_function
) AS indented_the_same_as_opening_bracket FROM indented_the_same_as_select
NOTE:
Comments are dealt with differently, depending on whether they're block comments (/* like this */), which might optionally include newlines, or inline comments (-- like this) which are necessarily only on one line.
SELECT
/* This is a block comment starting on a new line
which contains a newline (continuing with at least
the same indent.
- potentially containing greater indents
- having no other code following it in the same line
- and aligned with the line of code following it */
this_column as what_we_align_the_column_to FROM my_table
SELECT
-- This is fine
this_column as what_we_align_to,
another_column as something_short, -- Is ok
case
-- This is aligned correctly with below
when indented then take_care
else try_harder
end as the_general_guidance -- Even here we align with the line below FROM my_table
NOTE:
One approach to indenting nested elements is a layout called a hanging indent. In this layout, there is no line break before the first nested element, but subsequent elements are indented to match the line position of that first element. Two examples might be:
-- A select statement with two hanging indents: SELECT no_line_break_before_me,
indented_to_match_the_first,
1 + (a
+ b) AS another_more_complex_example FROM my_table; -- This TSQL example is also in essence a hanging indent: DECLARE @prv_qtr_1st_dt DATETIME,
@last_qtr INT,
@last_qtr_first_mn INT,
@last_qtr_yr INT;
In some circumstances this layout can be quite neat (the DECLARE statement is a good example of this), however once indents are nested or indentation styles are mixed it can rapidly become confusing (as partially shown in the first example). Additionally, unless the leading element of the first line is very short, hanging indents use much larger indents than a traditional simple indent where a line break is used before the first element.
Hanging indents have been supported in SQLFluff up to the 1.x versions, however they will no longer by supported from 2.0.0 onwards. This is due to the ambiguity which they bring to fixing poorly formatted SQL. Take the following code:
SELECT this_is, badly_formatted, code_and,
not_obvious,
what_was, intended FROM my_table
Given the lack of line break between SELECT and this_is, it would appear that the user is intending a hanging indent, however it is also plausible that they did not and they just forgot to add a line break between them. This ambiguity is unhelpful, both for SQLFluff as a tool, but also for people who write SQL that there two ways of indenting their SQL. Given SQLFluff aims to provide consistency in SQL layout and remove some of the burden of needing to make choices like this - and that it would be very unusual to keep only hanging indents and disable traditional ones - the only route left to consistency is to not allow hanging indents. Starting in 2.0.0, any hanging indents detected will be converted to traditional indents.
SQLFluff supports templated elements in code, such as those offered by jinja2 (or dbt which relies on it). For simple cases, templated elements are handled as you would expect by introducing additional indents into the layout.
SELECT
a,
{% for n in ['b', 'c', 'd'] %}
-- This section is indented relative to 'a' because
-- it is inside a jinja for loop.
{{ n }},
{% endfor %}
e FROM my_table
This functionality can be turned off if you wish using the template_blocks_indent option in your Configuration.
It's important to note here, that SQLFluff lints the code after it has been rendered, and so only has access to code which is still present after that process.
SELECT
a,
{% if False %}
-- This section of the code cannot be linted because
-- it is never rendered due to the `if False` condition.
my + poorly
+ spaced - and/indented AS section_of_code
{% endif %}
e FROM my_table
More complex templated cases are usually characterised by templated tags cutting across the parse tree. This more formally is where the opening and closing tags of a templated section exist at different levels in the parsed structure. Starting in version 2.x, these will be treated differently (Prior to version 2.x, situations like this were sometimes handled inconsistently or incorrectly).
Indentation should act as a visual cue to the structure of the written SQL, and as such, the most important thing is that template tags belonging to the same block structure use the same indentation. In the example below, this is the opening and closing elements of the second if statement. If treated as a simple case, these tags would have different indents, because they are at different levels of the parse tree and so clearly there is a conflict to be resolved.
The view SQLFluff takes on how to resolve this conflict is to pull all of the tags in this section down to the indent of the least indented (in the example below that would be the closing endif tag). This is similar to the treatment of C Preprocessor Directives, which are treated somewhat as being outside the structure of the rest of the file. In these cases, the content is also not further indented as in the simple case because it makes it harder to line up elements within the affected section and outside (in the example below the SELECT and FROM are a good illustration).
SELECT
a,
{% if True %}
-- This is a simple case. The opening and closing tag are
-- both at the same level within the SELECT clause.
simple_case AS example,
{% endif %}
b, {% if True %}
-- This is a complex case. The opening tag is within the SELECT
-- clause, but the closing tag is outside the statement
-- entirely.
complex_case AS example FROM table_option_one {% else %}
complex_case_two AS example FROM table_option_two {% endif %}
Configuration for layout is spread across three places:
One of the key areas for this is the indentation of the JOIN expression, which we'll use as an example.
Semantically, a JOIN expression is part of the FROM expression and therefore would be expected to be indented. However according to many of the most common SQL style guides (including the dbt Labs SQL style guide and the Mozilla SQL style guide) the JOIN keyword is expected to at the same indent as the FROM keyword. By default, SQLFluff sides with the current consensus, which is to not indent the JOIN keyword, however this is one element which is configurable.
By setting values in the sqlfluff:indentation section of your config file you can control how this is parsed.
For example, the default indentation would be as follows:
SELECT
a,
b FROM my_table JOIN another_table
ON condition1
AND condition2
By setting your config file to:
[sqlfluff:indentation] indented_joins = True
Then the expected indentation will be:
SELECT
a,
b FROM my_table
JOIN another_table
ON condition1
AND condition2
There is a similar indented_using_on config (defaulted to True) which can be set to False to prevent the USING or ON clause from being indented, in which case the original SQL would become:
SELECT
a,
b FROM my_table JOIN another_table ON condition1
AND condition2
There is also a similar indented_on_contents config (defaulted to True) which can be set to False to align any AND subsections of an ON block with each other. If set to False the original SQL would become:
SELECT
a,
b FROM my_table JOIN another_table
ON condition1
AND condition2
These can also be combined, so if indented_using_on config is set to False, and indented_on_contents is also set to False then the SQL would become:
SELECT
a,
b FROM my_table JOIN another_table ON condition1 AND condition2
There is also a similar indented_ctes config (defaulted to False) which can be set to True to enforce CTEs to be indented within the WITH clause:
WITH
some_cte AS (
SELECT 1 FROM table1
),
some_other_cte AS (
SELECT 1 FROM table1
) SELECT 1 FROM some_cte
By default, SQLFluff aims to follow the most common approach to indentation. However, if you have other versions of indentation which are supported by published style guides, then please submit an issue on GitHub to have that variation supported by SQLFluff.
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. The syntax of this section is very expressive; however in normal use, only very small alterations should be necessary from the Default Configuration.
The syntax of the section headings here select by type, which corresponds to the type defined in the dialect. For example the following section applies to elements of the type comma, i.e. ,.
[sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing
Within these configurable sections there are a few key elements which are available:
[sqlfluff:layout:type:comma] line_position = leading
-- Setting line_position to just `alone` -- within [sqlfluff:layout:type:set_operator] -- would not allow: SELECT a UNION SELECT b; -- ...or... SELECT a UNION SELECT b; -- but *would* allow both of the following: SELECT a UNION SELECT b; SELECT a UNION SELECT b; -- However the default is set to `alone:strict` -- then the *only* acceptable configuration is: SELECT a UNION SELECT b;
Rules in SQLFluff are implemented as crawlers. These are entities which work their way through the parsed structure of a query to evaluate a particular rule or set of rules. The intent is that the definition of each specific rule should be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away.
Certain rules belong to the core rule group. In order for a rule to be designated as core, it must meet the following criteria:
Core rules can also make it easier to roll out SQLFluff to a team by only needing to follow a 'common sense' subset of rules initially, rather than spending time understanding and configuring all the rules, some of which your team may not necessarily agree with.
We believe teams will eventually want to enforce more than just the core rules, and we encourage everyone to explore all the rules and customize a rule set that best suites their organization.
See the Configuration section for more information on how to enable only core rules by default.
SQLFluff features inline error ignoring. For example, the following will ignore the lack of whitespace surrounding the * operator.
a.a*a.b AS bad_1 -- noqa: L006
Multiple rules can be ignored by placing them in a comma-delimited list.
a.a * a.b AS bad_2, -- noqa: L007, L006
It is also possible to ignore non-rule based errors, and instead opt to ignore templating (TMP) & parsing (PRS) errors.
WHERE
col1 = 2 AND
dt >= DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY) -- noqa: PRS
NOTE:
Should the need arise, not specifying specific rules to ignore will ignore all rules on the given line.
a.a*a.b AS bad_3 -- noqa
Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.
-- Ignore rule L012 from this line forward SELECT col_a a FROM foo -- noqa: disable=L012 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all
SQLFluff is designed to be flexible in supporting a variety of dialects. Not all potential dialects are supported so far, but several have been implemented by the community. Below are a list of the currently available dialects. Each inherits from another, up to the root ansi dialect.
For a canonical list of supported dialects, run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff.
NOTE:
Consider when adding new features to a dialect:
This is the base dialect which holds most of the definitions of common SQL commands and structures. If the dialect which you're actually using isn't specifically implemented by SQLFluff, using this dialect is a good place to start.
This dialect doesn't intend to be brutal in adhering to (and only to) the ANSI SQL spec (mostly because ANSI charges for access to that spec). It aims to be a representation of vanilla SQL before any other project adds their spin to it, and so may contain a slightly wider set of functions than actually available in true ANSI SQL.
The dialect for Amazon Athena.
The dialect for Google BigQuery.
The dialect for ClickHouse.
The dialect Databricks is an alias for the SparkSQL.
Since Databricks builds on top of Apache Spark, the Spark SQL dialect holds most of the definitions of common commands and structures.
Specifics to Databricks, such as Delta Live Table syntax, are added to the Spark SQL dialect to simplify implementation and prevent code duplication for minor syntax updates. This follows SQLFluff's philosophy of not being strict in adhering to dialect specifications to permit slightly wider set of functions than actually available in a given dialect.
The dialect for Db2.
The dialect for Exasol.
The dialect for Hive.
The dialect for Materialize.
The dialect for MySQL.
The dialect for Oracle SQL. Note: this does not include PL/SQL.
This is based around the PostgreSQL spec. Many other SQL instances are often based on PostreSQL syntax. If you're running an unsupported dialect, then this is often the dialect to use (until someone makes a specific dialect).
The dialect for Amazon Redshift.
The dialect for Snowflake, which has much of its syntax inherited from PostgreSQL.
The dialect for SOQL (Salesforce Object Query Language).
The dialect for Apache Spark SQL. It inherits from ANSI and includes relevant syntax from Hive for commands that permit Hive Format. Spark SQL extensions provided by the Delta Lake project are also implemented in this dialect.
This implementation focuses on the Ansi Compliant Mode introduced in Spark3, instead of being Hive Compliant. The introduction of ANSI Compliance provides better data quality and easier migration from traditional DBMS.
Versions of Spark prior to 3.x will only support the Hive dialect.
The dialect for SQLite.
The dialect for T-SQL (aka Transact-SQL).
The dialect for Teradata.
SQLFluff is designed to be used both as a utility for developers but also to be part of CI/CD pipelines.
The exit code provided by SQLFluff when run as a command line utility is designed to assist usefulness in deployment pipelines. If no violations are found then the exit code will be 0. If violations are found then a non-zero code will be returned which can be interrogated to find out more.
For projects with large amounts of (potentially imperfect) SQL code, the full SQLFluff output could be very large, which can be distracting -- perhaps the CI build for a one-line SQL change shouldn't encourage the developer to fix lots of unrelated quality issues.
To support this use case, SQLFluff integrates with a quality checking tool called diff-quality. By running SQLFluff using diff-quality (rather than running it directly), you can limit the the output to the new or modified SQL in the branch (aka pull request or PR) containing the proposed changes.
Currently, diff-quality requires that you are using git for version control.
NOTE: Installing SQLFluff automatically installs the diff_cover package that provides the diff-quality tool.
In your CI build script:
1. Set the current working directory to the git repository containing the SQL code to be checked.
$ diff-quality --violations sqlfluff
The output will look something like:
------------- Diff Quality Quality Report: sqlfluff Diff: origin/master...HEAD, staged and unstaged changes ------------- sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql (0.0%): sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql:5: Unquoted Identifiers must be consistently upper case. ------------- Total: 1 line Violations: 1 line % Quality: 0% -------------
These messages are basically the same as those provided directly by SQLFluff, although the format is a little different. Note that diff-quality only lists the line _numbers_, not the character position. If you need the character position, you will need to run SQLFluff directly.
For more information on diff-quality, see the documentation. It covers topics such as:
pre-commit is a framework to manage git "hooks" triggered right before a commit is made.
A git hook is a git feature to "fire off custom scripts" when specific actions occur.
Using pre-commit with SQLFluff is a good way to provide automated linting to SQL developers.
With pre-commit, you also get the benefit of only linting/fixing the files that changed.
SQLFluff comes with two pre-commit hooks:
WARNING:
Although it is not advised, you can tell SQLFluff to try and fix these files by overriding the fix_even_unparsable setting in .sqlfluff config file or using the sqlfluff fix --FIX-EVEN-UNPARSABLE command line option.
Overriding this behavior may break your SQL. If you use this override, always be sure to review any fixes applied to files with templating or parse errors to verify they are okay.
You should create a file named .pre-commit-config.yaml at the root of your git project, which should look like this:
repos: - repo: https://github.com/sqlfluff/sqlfluff
rev: 1.4.5
hooks:
- id: sqlfluff-lint
# For dbt projects, this installs the dbt "extras".
# You will need to select the relevant dbt adapter for your dialect
# (https://docs.getdbt.com/docs/available-adapters):
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
- id: sqlfluff-fix
# Arbitrary arguments to show an example
# args: [--rules, "L003,L014"]
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
When trying to use the dbt templater, uncomment the additional_dependencies to install the extras. This is equivalent to running pip install <dbt-adapter> sqlfluff-templater-dbt.
You can specify the version of dbt-adapter used in pre-commit, for example:
additional_dependencies : ['dbt-bigquery==1.0.0', 'sqlfluff-templater-dbt']
See the list of available dbt-adapters.
Note that you can pass the same arguments available through the CLI using args:.
There are two way to utilize SQLFluff to annotate Github PRs.
For more information and examples on using SQLFluff in GitHub Actions, see the sqlfluff-github-actions repository.
SQLFluff accepts configuration either through the command line or through configuration files. There is rough parity between the two approaches with the exception that templating configuration must be done via a file, because it otherwise gets slightly complicated.
For details of what's available on the command line check out the CLI Reference.
For file based configuration SQLFluff will look for the following files in order. Later files will (if found) will be used to overwrite any values read from earlier files.
Within these files, the first four will be read like a cfg file, and SQLFluff will look for sections which start with sqlfluff, and where subsections are delimited by a semicolon. For example the jinjacontext section will be indicated in the section started with [sqlfluff:jinjacontext].
For example, a snippet from a .sqlfluff file (as well as any of the supported cfg file types):
[sqlfluff] templater = jinja sql_file_exts = .sql,.sql.j2,.dml,.ddl [sqlfluff:indentation] indented_joins = false indented_using_on = true template_blocks_indent = false [sqlfluff:templater] unwrap_wrapped_queries = true [sqlfluff:templater:jinja] apply_dbt_builtins = true
For the pyproject.toml file, all valid sections start with tool.sqlfluff and subsections are delimited by a dot. For example the jinjacontext section will be indicated in the section started with [tool.sqlfluff.jinjacontext].
For example, a snippet from a pyproject.toml file:
[tool.sqlfluff.core] templater = jinja sql_file_exts = .sql,.sql.j2,.dml,.ddl [tool.sqlfluff.indentation] indented_joins = false indented_using_on = true template_blocks_indent = false [tool.sqlfluff.templater] unwrap_wrapped_queries = true [tool.sqlfluff.templater.jinja] apply_dbt_builtins = true
SQLFluff uses nesting in its configuration files, with files closer overriding (or patching, if you will) values from other files. That means you'll end up with a final config which will be a patchwork of all the values from the config files loaded up to that path. The exception to this is the value for templater, which cannot be set in config files in subdirectories of the working directory. You don't need any config files to be present to make SQLFluff work. If you do want to override any values though SQLFluff will use files in the following locations in order, with values from later steps overriding those from earlier:
This whole structure leads to efficient configuration, in particular in projects which utilise a lot of complicated templating.
In addition to configuration files mentioned above, SQLFluff also supports comment based configuration switching in files. This allows specific SQL file to modify a default configuration if they have specific needs.
When used, these apply to the whole file, and are parsed from the file in an initial step before the rest of the file is properly parsed. This means they can be used for both rule configuration and also for parsing configuration.
To use these, the syntax must start as an inline sql comment beginning with sqlfluff (i.e. -- sqlfluff). The line is then interpreted as a colon-seperated address of the configuation value you wish to set. A few common examples are shown below:
-- Set Indented Joins -- sqlfluff:indentation:indented_joins:true -- Set a smaller indent for this file -- sqlfluff:indentation:tab_space_size:2 -- Set keywords to be capitalised -- sqlfluff:rules:L010:capitalisation_policy:upper SELECT * FROM a
JOIN b USING(c)
We recommend only using this configuration approach for configuration that applies to one file in isolation. For configuration changes for areas of a project or for whole projects we recommend Nesting of configuration files.
Rules can be configured with the .sqlfluff config files.
Common rule configurations can be set in the [sqlfluff:rules] section.
For example:
[sqlfluff:rules] tab_space_size = 4 max_line_length = 80 indent_unit = space allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all
Rule specific configurations are set in rule specific subsections.
For example, enforce that keywords are upper case by configuring the rule L010:
[sqlfluff:rules:L010] # Keywords capitalisation_policy = upper
All possible options for rule sections are documented in Rules Reference.
For an overview of the most common rule configurations that you may want to tweak, see Default Configuration (and use Rules Reference to find the available alternatives).
To disable individual rules, set exclude_rules in the top level section of sqlfluff configuration. The value is a comma separated list of rule ids.
For example, to disable the rules L022 and L027:
[sqlfluff] exclude_rules = L022, L027
To enable individual rules, configure rules, respectively.
For example, to enable L027:
[sqlfluff] rules = L027
If both exclude_rules and rules have non-empty value, then the excluded rules are removed from the rules list. This allows for example enabling common rules on top level but excluding some on subdirectory level.
Rules can also be enabled/disabled by their grouping. Right now, the only rule grouping is core. This will enable (or disable) a select group of rules that have been deemed 'core rules'.
[sqlfluff] rules = core
More information about 'core rules' can be found in the Rules Reference.
Additionally, some rules have a special force_enable configuration option, which allows to enable the given rule even for dialects where it is disabled by default. The rules that support this can be found in the Rules Reference.
The default values can be seen in Default Configuration.
See also: Ignoring Errors & Files.
To keep displaying violations for specific rules, but not have those issues lead to a failed run, rules can be downgraded to warnings. Rules set as warnings won't cause a file to fail, but will still be shown in the CLI to warn users of their presence.
The configuration of this behaves very like exclude_rules above:
[sqlfluff] warnings = L019, L007
With this configuration, files with no other issues (other than those set to warn) will pass. If there are still other issues, then the file will still fail, but will show both warnings and failures.
== [test.sql] PASS L: 2 | P: 9 | L006 | WARNING: Missing whitespace before + == [test2.sql] FAIL L: 2 | P: 8 | L014 | Unquoted identifiers must be consistently upper case. L: 2 | P: 11 | L006 | WARNING: Missing whitespace before +
This is particularly useful as a transitional tool when considering the introduction on new rules on a project where you might want to make users aware of issues without blocking their workflow (yet).
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. To understand more about this section, see the section of the docs dedicated to layout: Configuring Layout.
When thinking about Jinja templating there are two different kinds of things that a user might want to fill into a templated file, variables and functions/macros. Currently functions aren't implemented in any of the templaters.
Variables are available in the jinja, python and placeholder templaters. By default the templating engine will expect variables for templating to be available in the config, and the templater will be look in the section corresponding to the context for that templater. By convention, the config for the jinja templater is found in the sqlfluff:templater:jinja:context section, the config for the *python templater is found in the sqlfluff:templater:python:context section, the one for the placeholder templater is found in the sqlfluff:templater:placeholder:context section
For example, if passed the following .sql file:
SELECT {{ num_things }} FROM {{ tbl_name }} WHERE id > 10 LIMIT 5
...and the following configuration in .sqlfluff in the same directory:
[sqlfluff:templater:jinja:context] num_things=456 tbl_name=my_table
...then before parsing, the sql will be transformed to:
SELECT 456 FROM my_table WHERE id > 10 LIMIT 5
NOTE:
Libraries such as SQLAlchemy or Psycopg use different parameter placeholder styles to mark where a parameter has to be inserted in the query.
For example a query in SQLAlchemy can look like this:
SELECT * FROM table WHERE id = :myid
At runtime :myid will be replace by a value provided by the application and escaped as needed, but this is not standard SQL and cannot be parsed as is.
In order to parse these queries is then necessary to replace these placeholders with sample values, and this is done with the placeholder templater.
Placeholder templating can be enabled in the config using:
[sqlfluff] templater = placeholder
A few common styles are supported:
-- colon
WHERE bla = :my_name
-- colon_nospaces
-- (use with caution as more prone to false positives)
WHERE bla = table:my_name
-- numeric_colon
WHERE bla = :2
-- pyformat
WHERE bla = %(my_name)s
-- dollar
WHERE bla = $my_name or WHERE bla = ${my_name}
-- question_mark
WHERE bla = ?
-- numeric_dollar
WHERE bla = $3 or WHERE bla = ${3}
-- percent
WHERE bla = %s
-- ampersand
WHERE bla = &s or WHERE bla = &{s} or USE DATABASE MARK_{ENV}
These can be configured by setting param_style to the names above:
[sqlfluff:templater:placeholder] param_style = colon my_name = 'john'
then it is necessary to set sample values for each parameter, like my_name above. Notice that the value needs to be escaped as it will be replaced as a string during parsing.
When parameters are positional, like question_mark, then their name is simply the order in which they appear, starting with 1.
[sqlfluff:templater:placeholder] param_style = question_mark 1 = 'john'
In case you need a parameter style different from the ones above, you can pass a custom regex.
[sqlfluff:templater:placeholder] param_regex = __(?P<param_name>[\w_]+)__ my_name = 'john'
N.B. quotes around param_regex in the config are interpreted literally by the templater. e.g. param_regex='__(?P<param_name>[w_]+)__' matches '__some_param__' not __some_param__
the named parameter param_name will be used as the key to replace, if missing, the parameter is assumed to be positional and numbers are used instead.
Also consider making a pull request to the project to have your style added, it may be useful to other people and simplify your configuration.
Two more advanced features of variable templating are case sensitivity and native python types. Both are illustrated in the following example:
[sqlfluff:templater:jinja:context] my_list = ['a', 'b', 'c'] MY_LIST = ("d", "e", "f") my_where_dict = {"field_1": 1, "field_2": 2}
SELECT
{% for elem in MY_LIST %}
'{{elem}}' {% if not loop.last %}||{% endif %}
{% endfor %} as concatenated_list FROM tbl WHERE
{% for field, value in my_where_dict.items() %}
{{field}} = {{value}} {% if not loop.last %}and{% endif %}
{% endfor %}
...will render as...
SELECT
'd' || 'e' || 'f' as concatenated_list FROM tbl WHERE
field_1 = 1 and field_2 = 2
Note that the variable was replaced in a case sensitive way and that the settings in the config file were interpreted as native python types.
Macros (which also look and feel like functions are available only in the jinja templater. Similar to Variable Templating, these are specified in config files, what's different in this case is how they are named. Similar to the context section above, macros are configured separately in the macros section of the config. Consider the following example.
If passed the following .sql file:
SELECT {{ my_macro(6) }} FROM some_table
...and the following configuration in .sqlfluff in the same directory (note the tight control of whitespace):
[sqlfluff:templater:jinja:macros] a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}
...then before parsing, the sql will be transformed to:
SELECT 6 + 12 FROM some_table
Note that in the code block above, the variable name in the config is a_macro_def, and this isn't apparently otherwise used anywhere else. Broadly this is accurate, however within the configuration loader this will still be used to overwrite previous values in other config files. As such this introduces the idea of config blocks which could be selectively overwritten by other configuration files downstream as required.
In addition to macros specified in the config file, macros can also be loaded from files or folders. This is specified in the config file:
[sqlfluff:templater:jinja] load_macros_from_path = my_macros
load_macros_from_path is a comma-separated list of .sql files or folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for macros in the folder /home/my_project/my_macros/ (but not subfolders). Any macros defined in the config will always take precedence over a macro defined in the path.
Note: The load_macros_from_path setting also defines the search path for Jinja include or import. Unlike with macros (as noted above), subdirectories are supported. For example, if load_macros_from_path is set to my_macros, and there is a file my_macros/subdir/my_file.sql, you can do:
{% include 'subdir/include_comment.sql' %}
NOTE:
REMEMBER: The reason SQLFluff supports macros is to enable it to parse templated sql without it being a blocker. It shouldn't be a requirement that the templating is accurate - it only needs to work well enough that parsing and linting are helpful.
One of the main use cases which inspired SQLFluff as a project was dbt. It uses jinja templating extensively and leads to some users maintaining large repositories of sql files which could potentially benefit from some linting.
NOTE:
To use the dbt templater, go to dbt Project Configuration.
SQLFluff anticipates this use case and provides some built in macro blocks in the Default Configuration which assist in getting started with dbt projects. In particular it provides mock objects for:
NOTE:
Ignoring Jinja templating errors provides a way for users to use SQLFluff while reducing or avoiding the need to spend a lot of time adding variables to [sqlfluff:templater:jinja:context].
When --ignore=templating is enabled, the Jinja templater behaves a bit differently. This additional behavior is usually but not always helpful for making the file at least partially parsable and fixable. It definitely doesn’t guarantee that every file can be fixed, but it’s proven useful for some users.
Here's how it works:
For example:
select {{ my_variable }} from {% include "my_table.sql" %}
is interpreted as:
select my_variable from my_table
The values provided by the Jinja templater act a bit (not exactly) like a mixture of several types:
Because the values behave like Undefined, it's possible to replace them using Jinja's default() filter. For example:
select {{ my_variable | default("col_a") }} from my_table
is interpreted as:
select col_a from my_table
If using SQLFluff for dbt with jinja as your templater, you may have library function calls within your sql files that can not be templated via the normal macro templating mechanisms:
SELECT foo, bar FROM baz {{ dbt_utils.group_by(2) }}
To template these libraries, you can use the sqlfluff:jinja:library_path config option:
[sqlfluff:templater:jinja] library_path = sqlfluff_libs
This will pull in any python modules from that directory and allow sqlfluff to use them in templates. In the above example, you might define a file at sqlfluff_libs/dbt_utils.py as:
def group_by(n):
return "GROUP BY 1,2"
If an __init__.py is detected, it will be loaded alongside any modules and submodules found within the library path.
SELECT
{{ custom_sum('foo', 'bar') }},
{{ foo.bar.another_sum('foo', 'bar') }} FROM
baz
sqlfluff_libs/__init__.py:
def custom_sum(a: str, b: str) -> str:
return a + b
sqlfluff_libs/foo/__init__.py:
# empty file
sqlfluff_libs/foo/bar.py:
def another_sum(a: str, b: str) -> str:
return a + b
NOTE:
dbt templating is still a relatively new feature added in 0.4.0 and is still in very active development! If you encounter an issue, please let us know in a GitHub issue or on the SQLFluff slack workspace.
dbt is not the default templater for SQLFluff (it is jinja). dbt is a complex tool, so using the default jinja templater will be simpler. You should be aware when using the dbt templater that you will be exposed to some of the complexity of dbt. Users may wish to try both templaters and choose according to how they intend to use SQLFluff.
A simple rule of thumb might be:
Pros:
Cons:
In order to get started using SQLFluff with a dbt project you will first need to install the relevant dbt adapter for your dialect and the sqlfluff-templater-dbt package using your package manager of choice (e.g. pip install dbt-postgres sqlfluff-templater-dbt) and then will need the following configuration:
In .sqlfluff:
[sqlfluff] templater = dbt
In .sqlfluffignore:
target/ # dbt <1.0.0 dbt_modules/ # dbt >=1.0.0 dbt_packages/ macros/
You can set the dbt project directory, profiles directory and profile with:
[sqlfluff:templater:dbt] project_dir = <relative or absolute path to dbt_project directory> profiles_dir = <relative or absolute path to the directory that contains the profiles.yml file> profile = <dbt profile> target = <dbt target>
NOTE:
To use builtin dbt Jinja functions SQLFluff provides a configuration option that enables usage within templates.
[sqlfluff:templater:jinja] apply_dbt_builtins = True
This will provide dbt macros like ref, var, is_incremental(). If the need arises builtin dbt macros can be customised via Jinja macros in .sqlfluff configuration file.
[sqlfluff:templater:jinja:macros] # Macros provided as builtins for dbt projects dbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %} dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %} dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %} dbt_var = {% macro var(variable, default='') %}item{% endmacro %} dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %}
If your project requires that you pass variables to dbt through command line, you can specify them in template:dbt:context section of .sqlfluff. See below configuration and its equivalent dbt command:
[sqlfluff:templater:dbt:context] my_variable = 1
dbt run --vars '{"my_variable": 1}'
You already know you can pass arguments (--verbose, --exclude-rules, etc.) through the CLI commands (lint, fix, etc.):
$ sqlfluff lint my_code.sql -v --exclude-rules L022,L027
You might have arguments that you pass through every time, e.g rules you always want to ignore. These can also be configured:
[sqlfluff] verbose = 1 exclude_rules = L022,L027
Note that while the exclude_rules config looks similar to the above example, the verbose config has an integer value. This is because verbose is stackable meaning there are multiple levels of verbosity that are available for configuration. See CLI Reference for more details about the available CLI arguments. For more details about rule exclusion, see Enabling and Disabling Rules.
Similar to flake8's ignore, individual lines can be ignored by adding -- noqa to the end of the line. Additionally, specific rules can be ignored by quoting their code or the category.
-- Ignore all errors SeLeCt 1 from tBl ; -- noqa -- Ignore rule L014 & rule L030 SeLeCt 1 from tBl ; -- noqa: L014,L030 -- Ignore all parsing errors SeLeCt from tBl ; -- noqa: PRS
Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.
-- Ignore rule L012 from this line forward SELECT col_a a FROM foo -- noqa: disable=L012 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all
General categories of errors can be ignored using the --ignore command line option or the ignore setting in .sqlfluffignore. Types of errors that can be ignored include:
Similar to Git's .gitignore and Docker's .dockerignore, SQLFluff supports a .sqlfluffignore file to control which files are and aren't linted. Under the hood we use the python pathspec library which also has a brief tutorial in their documentation.
An example of a potential .sqlfluffignore placed in the root of your project would be:
# Comments start with a hash. # Ignore anything in the "temp" path /temp/ # Ignore anything called "testing.sql" testing.sql # Ignore any ".tsql" files *.tsql
Ignore files can also be placed in subdirectories of a path which is being linted and the sub files will also be applied within that subdirectory.
The default configuration is as follows, note the Builtin Macro Blocks in section [sqlfluff:templater:jinja:macros] as referred to above.
[sqlfluff] # verbose is an integer (0-2) indicating the level of log output verbose = 0 # Turn off color formatting of output nocolor = False # Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html # Or run 'sqlfluff dialects' dialect = None # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to check, default to all rules = all # Comma separated list of rules to exclude, or None exclude_rules = None # The depth to recursively parse to (0 for unlimited) recurse = 0 # Below controls SQLFluff output, see max_line_length for SQL output output_line_length = 80 # Number of passes to run before admitting defeat runaway_limit = 10 # Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating) ignore = None # Warn only for rule codes (one of more rule codes, seperated by commas: e.g. L001,L002) # Also works for templating and parsing errors by using TMP or PRS warnings = None # Ignore linting errors found within sections of code coming directly from # templated code (e.g. from within Jinja curly braces. Note that it does not # ignore errors from literal code found within template loops. ignore_templated_areas = True # can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig encoding = autodetect # Ignore inline overrides (e.g. to test if still required) disable_noqa = False # Comma separated list of file extensions to lint # NB: This config will only apply in the root folder sql_file_exts = .sql,.sql.j2,.dml,.ddl # Allow fix to run on files, even if they contain parsing errors # Note altering this is NOT RECOMMENDED as can corrupt SQL fix_even_unparsable = False # Very large files can make the parser effectively hang. # The more efficient check is the _byte_ limit check which # is enabled by default. The previous _character_ limit check # is still present for backward compatibility. This will be # removed in a future version. # Set either to 0 to disable. large_file_skip_char_limit = 0 large_file_skip_byte_limit = 20000 # CPU processes to use while linting. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = 1 [sqlfluff:indentation] # See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations indented_joins = False indented_ctes = False indented_using_on = True indented_on_contents = True template_blocks_indent = True # Layout configuration # See https://docs.sqlfluff.com/en/stable/layout.html#configuring-layout-and-spacing [sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing [sqlfluff:layout:type:binary_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:statement_terminator] spacing_before = touch line_position = trailing [sqlfluff:layout:type:end_of_file] spacing_before = touch [sqlfluff:layout:type:set_operator] line_position = alone:strict [sqlfluff:layout:type:start_bracket] spacing_after = touch [sqlfluff:layout:type:end_bracket] spacing_before = touch [sqlfluff:layout:type:start_square_bracket] spacing_after = touch [sqlfluff:layout:type:end_square_bracket] spacing_before = touch [sqlfluff:layout:type:casting_operator] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:comparison_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:assignment_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:object_reference] spacing_within = inline [sqlfluff:layout:type:comment] spacing_before = any spacing_after = any # Template loop tokens shouldn't dictate spacing around them. [sqlfluff:layout:type:template_loop] spacing_before = any spacing_after = any [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True # Some rules can be configured directly from the config common to other rules [sqlfluff:rules] tab_space_size = 4 # Max line length is set by default to be in line with the dbt style guide. # https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md max_line_length = 80 indent_unit = space allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all # Some rules have their own specific config [sqlfluff:rules:L003] hanging_indents = True [sqlfluff:rules:L010] # Keywords capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L011] # Aliasing preference for tables aliasing = explicit [sqlfluff:rules:L012] # Aliasing preference for columns aliasing = explicit [sqlfluff:rules:L014] # Unquoted identifiers extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L016] # Line length ignore_comment_lines = False ignore_comment_clauses = False [sqlfluff:rules:L027] # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L026] # References must be in FROM clause # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:L028] # References must be consistently used # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:L029] # Keywords should not be used as identifiers. unquoted_identifiers_policy = aliases quoted_identifiers_policy = none # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L030] # Function names extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L031] # Avoid table aliases in from clauses and join conditions. # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:L036] wildcard_policy = single [sqlfluff:rules:L038] # Trailing commas select_clause_trailing_comma = forbid [sqlfluff:rules:L040] # Null & Boolean Literals capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L042] # By default, allow subqueries in from clauses, but not join clauses forbid_subquery_in = join [sqlfluff:rules:L047] # Consistent syntax to count all rows prefer_count_1 = False prefer_count_0 = False [sqlfluff:rules:L051] # Fully qualify JOIN clause fully_qualify_join_types = inner [sqlfluff:rules:L052] # Semi-colon formatting approach multiline_newline = False require_final_semicolon = False [sqlfluff:rules:L054] # GROUP BY/ORDER BY column references group_by_and_order_by_style = consistent [sqlfluff:rules:L057] # Special characters in identifiers unquoted_identifiers_policy = all quoted_identifiers_policy = all allow_space_in_identifier = False additional_allowed_characters = None ignore_words = None ignore_words_regex = None [sqlfluff:rules:L059] # Policy on quoted and unquoted identifiers prefer_quoted_identifiers = False ignore_words = None ignore_words_regex = None force_enable = False [sqlfluff:rules:L062] # Comma separated list of blocked words that should not be used blocked_words = None blocked_regex = None [sqlfluff:rules:L063] # Data Types extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L064] # Consistent usage of preferred quotes for quoted literals preferred_quoted_literal_style = consistent # Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres) force_enable = False [sqlfluff:rules:L066] min_alias_length = None max_alias_length = None [sqlfluff:rules:L067] # SQL type casting preferred_type_casting_style = consistent
SQLFluff exposes a public api for other python applications to use. A basic example of this usage is given here, with the documentation for each of the methods below.
"""This is an example of how to use the simple sqlfluff api.""" from typing import Any, Dict, Iterator, List, Union import sqlfluff # -------- LINTING ---------- my_bad_query = "SeLEct *, 1, blah as fOO from mySchema.myTable" # Lint the given string and return an array of violations in JSON representation. lint_result = sqlfluff.lint(my_bad_query, dialect="bigquery") # lint_result = # [ # { # "code": "L010", # "line_no": 1, # "line_pos": 1, # "description": "Keywords must be consistently upper case.", # } # ... # ] # -------- FIXING ---------- # Fix the given string and get a string back which has been fixed. fix_result_1 = sqlfluff.fix(my_bad_query, dialect="bigquery") # fix_result_1 = 'SELECT *, 1, blah AS foo FROM myschema.mytable\n' # We can also fix just specific rules. fix_result_2 = sqlfluff.fix(my_bad_query, rules=["L010"]) # fix_result_2 = 'SELECT *, 1, blah AS fOO FROM mySchema.myTable' # Or a subset of rules... fix_result_3 = sqlfluff.fix(my_bad_query, rules=["L010", "L014"]) # fix_result_3 = 'SELECT *, 1, blah AS fOO FROM myschema.mytable' # -------- PARSING ---------- # Parse the given string and return a JSON representation of the parsed tree. parse_result = sqlfluff.parse(my_bad_query) # parse_result = {'file': {'statement': {...}, 'newline': '\n'}} # This JSON structure can then be parsed as required. # An example usage is shown below: def get_json_segment(
parse_result: Dict[str, Any], segment_type: str ) -> Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
"""Recursively search JSON parse result for specified segment type.
Args:
parse_result (Dict[str, Any]): JSON parse result from `sqlfluff.fix`.
segment_type (str): The segment type to search for.
Yields:
Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
Retrieves children of specified segment type as either a string for a raw
segment or as JSON or an array of JSON for non-raw segments.
"""
for k, v in parse_result.items():
if k == segment_type:
yield v
elif isinstance(v, dict):
yield from get_json_segment(v, segment_type)
elif isinstance(v, list):
for s in v:
yield from get_json_segment(s, segment_type) # e.g. Retrieve array of JSON for table references. table_references = list(get_json_segment(parse_result, "table_reference")) print(table_references) # [[{'identifier': 'mySchema'}, {'dot': '.'}, {'identifier': 'myTable'}]] # Retrieve raw table name from last identifier in the table reference. for table_reference in table_references:
table_name = list(get_json_segment(parse_result, "naked_identifier"))[-1]
print(f"table_name: {table_name}") # table_name: myTable
The simple API presents only a fraction of the functionality present within the core SQLFluff library. For more advanced use cases, users can import the Linter() and FluffConfig() classes from sqlfluff.core. As of version 0.4.0 this is considered as experimental only as the internals may change without warning in any future release. If you come to rely on the internals of SQLFluff, please post an issue on GitHub to share what you're up to. This will help shape a more reliable, tidy and well documented public API for use.
It is recommended that the following is read in conjunction with exploring the codebase. dialect_ansi.py in particular is helpful to understand the recursive structure of segments and grammars. Some more detail is also given on our Wiki including a Contributing Dialect Changes guide.
At a high level, the behaviour of SQLFluff is divided into a few key stages. Whether calling sqlfluff lint, sqlfluff fix or sqlfluff parse, the internal flow is largely the same.
This stage only applies to templated SQL, most commonly Jinja and dbt. Vanilla SQL is sent straight to stage 2, the lexer.
In order to lint templated SQL, SQLFluff must first convert the 'raw' or pre-templated code into valid SQL, which can then be parsed. The templater returns both the raw and post-templated SQL so that any rule violations which occur in templated sections can be ignored and the rest mapped to their original line location for user feedback.
SQLFluff supports two templating engines: Jinja and dbt.
Under the hood dbt also uses Jinja, but in SQLFluff uses a separate mechanism which interfaces directly with the dbt python package.
For more details on how to configure the templater see Jinja Templating Configuration.
The lexer takes SQL and separates it into segments of whitespace and code. No meaning is imparted; that is the job of the parser.
The parser is arguably the most complicated element of SQLFluff, and is relied on by all the other elements of the tool to do most of the heavy lifting.
#. If no match is found for a segment, the contents will be wrapped in an UnparsableSegment which is picked up as a parsing error later.
When working on the parser there are a couple of design principles to keep in mind.
Given the complete parse tree, rule classes check for linting errors by traversing the tree, looking for segments and patterns of concern. If the rule discovers a violation, it returns a LintResult pointing to the segment which caused the violation.
Some rules are able to fix the problems they find. If this is the case, the rule will return a list of fixes, which describe changes to be made to the tree. This can include edits, inserts, or deletions. Once the fixes have been applied, the updated tree is written to the original file.
Many rules supported by SQLFluff involve the spacing and layout of different elements, either to enforce a particular layout or just to add or remove code elements in a way sensitive to the existing layout configuration. The way this is achieved is through some centralised utilities in the sqlfluff.utils.reflow module.
This module aims to achieve several things: * Less code duplication by implementing reflow logic in only one place.
To support this, the module provides a ReflowSequence class which allows access to all of the relevant operations which can be used to reformat sections of code, or even a whole file. Unless there is a very good reason, all rules should use this same approach to ensure consistent treatment of layout.
Rules in SQLFluff are implemented as classes inheriting from BaseRule. SQLFluff crawls through the parse tree of a SQL file, calling the rule's _eval() function for each segment in the tree. For many rules, this allows the rule code to be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away.
Some rules are a poor fit for the simple traversal pattern described above. Typical reasons include:
These rules can override BaseRule's recurse_into field, setting it to False. For these rules False, _eval() is only called once, with the root segment of the tree. This can be much more efficient, especially on large files. For example, see rules L050 and L009 , which only look at the beginning or end of the file, respectively.
By default, SQLFluff calls _eval() for all segments, even "unparsable" segments, i.e. segments that didn't match the parsing rules in the dialect. This causes issues for some rules. If so, setting _works_on_unparsable to False tells SQLFluff not to call _eval() for unparsable segments and their descendants.
These are other fields on BaseRule. Rules can override them.
needs_raw_stack defaults to False. Some rules use RuleContext.raw_stack property to access earlier segments in the traversal. This can be useful, but it adds significant overhead to the linting process. For this reason, it is disabled by default.
There are two phases of rule running.
1. The main phase is appropriate for most rules. These rules are assumed to interact and potentially cause a cascade of fixes requiring multiple passes. These rules run the runaway_limit number of times (default 10).
2. The post phase is for post-processing rules, not expected to trigger any downstream rules, e.g. capitalization fixes. They are run in a post-processing loop at the end. This loop is identical to the main loop, but is only run 2 times at the end (once to fix, and once again to confirm no remaining issues).
The two phases add complexity, but they also improve performance by allowing SQLFluff to run fewer rules during the main phase, which often runs several times.
NOTE: post rules also run on the first pass of the main phase so that any issues they find will be presented in the list of issues output by sqlfluff fix and sqlfluff lint.
These newer modules provide a higher-level API for rules working with segments and slices. Rules that need to navigate or search the parse tree may benefit from using these. Eventually, the plan is for all rules to use these modules. As of December 30, 2021, 17+ rules use these modules.
The modules listed below are submodules of sqlfluff.utils.functional.
SQLFluff is extensible through "plugins". We use the pluggy library to make linting Rules pluggable, which enable users to implement rules that are just too "organization specific" to be shared, or too platform specific to be included in the core library.
We have an example plugin in sqlfluff/plugins/sqlfluff-plugin-example which you can use as a template for rules, or the sqlfluff/plugins/sqlfluff-templater-dbt which you can use as a template for templater plugins.
Currently, only Rules and Templaters can be added through plugins. Over time we expect more elements of SQLFluff will be extensible with plugins. Each plugin can implement multiple Rules or Templaters.
We recommend that the name of a plugin should start with "sqlfluff-" to be clear on the purpose of your plugin.
A plugin may need to include a default configuration if its rules are configurable: use plugin default configurations only for that reason! We advise against overwriting core configurations by using a default plugin configuration, as there is no mechanism in place to enforce precedence between the core library configs and plugin configs, and multiple plugins could clash.
A plugin Rule class name should have the structure: "Rule_PluginName_L000". The 'L' can be any letter and is meant to categorize rules; you could use the letter 'S' to denote rules that enforce security checks for example.
An important thing to note when running custom implemented rules: Run pip install -e ., inside the plugin folder so custom rules in linting are included.
A plugin Rule code includes the PluginName, so a rule "Rule_L000" in core will have code "L000", while "Rule_PluginName_L000" will have code "PluginName_L000". Codes are used to display errors, they are also used as configuration keys.
We make it easy for plugin developers to test their rules by exposing a testing library in sqlfluff.utils.testing.
Would you like to have other parts of SQLFluff be "pluggable"? Tell us about it in a GitHub issue 😄.
Want to find other people who are using SQLFluff in production use cases? Want to brag about how you're using it? Just want to show solidarity with the project and provide a testimonial for it?
Just add a section below by raising a PR on GitHub by editing this file ✏️.
We have a fast-growing community on Slack, come and join us!
Follow us On Twitter @SQLFluff for announcements and other related posts.
Alan Cruickshank
2023, Alan Cruickshank
February 3, 2023 |