| SQLFLUFF(1) | SQLFluff | SQLFLUFF(1) |
sqlfluff - SQLFluff stable_version
Bored of not having a good SQL linter that works with whichever dialect you're working with? Fluff is an extensible and modular linter designed to help you write good SQL and catch errors and bad SQL before it hits your database.
Notable releases:
For more detail on other releases, see our Release Notes.
Want to see where and how people are using SQLFluff in their projects? Head over to SQLFluff in the Wild for inspiration.
To get started just install the package, make a sql file and then run SQLFluff and point it at the file. For more details or if you don't have python or pip already installed see Getting Started.
$ pip install sqlfluff $ echo " SELECT a + b FROM tbl; " > test.sql $ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT01 | Expected only single space before 'SELECT' keyword.
| Found ' '. [layout.spacing] L: 1 | P: 1 | LT02 | First line should not be indented.
| [layout.indent] L: 1 | P: 1 | LT13 | Files must not begin with newlines or whitespace.
| [layout.start_of_file] L: 1 | P: 11 | LT01 | Expected only single space before binary operator '+'.
| Found ' '. [layout.spacing] L: 1 | P: 14 | LT01 | Expected only single space before naked identifier.
| Found ' '. [layout.spacing] L: 1 | P: 27 | LT01 | Unnecessary trailing whitespace at end of file.
| [layout.spacing] L: 1 | P: 27 | LT12 | Files must end with a single trailing newline.
| [layout.end_of_file] All Finished 📜 🎉!
To get started with SQLFluff you'll need python and pip installed on your machine, if you're already set up, you can skip straight to Installing sqlfluff.
How to install python and pip depends on what operating system you're using. In any case, the python wiki provides up to date instructions for all platforms here.
There's a chance that you'll be offered the choice between python versions. Support for python 2 was dropped in early 2020, so you should always opt for a version number starting with a 3. As for more specific options beyond that, SQLFluff aims to be compatible with all current python versions, and so it's best to pick the most recent.
You can confirm that python is working as expected by heading to your terminal or console of choice and typing python --version which should give you a sensible read out and not an error.
$ python --version Python 3.9.1
For most people, their installation of python will come with pip (the python package manager) preinstalled. To confirm this you can type pip --version similar to python above.
$ pip --version pip 21.3.1 from ...
If however, you do have python installed but not pip, then the best instructions for what to do next are on the python website.
Assuming that python and pip are already installed, then installing SQLFluff is straight forward.
$ pip install sqlfluff
You can confirm its installation by getting SQLFluff to show its version number.
$ sqlfluff version 3.3.1
To get a feel for how to use SQLFluff it helps to have a small .sql file which has a simple structure and some known issues for testing. Create a file called test.sql in the same folder that you're currently in with the following content:
SELECT a+b AS foo, c AS bar from my_table
You can then run sqlfluff lint test.sql --dialect ansi to lint this file.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT09 | Select targets should be on a new line unless there is
| only one select target.
| [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 1 | P: 9 | LT01 | Expected single whitespace between naked identifier and
| binary operator '+'. [layout.spacing] L: 1 | P: 10 | LT01 | Expected single whitespace between binary operator '+'
| and naked identifier. [layout.spacing] L: 1 | P: 11 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords] All Finished 📜 🎉!
You'll see that SQLFluff has failed the linting check for this file. On each of the following lines you can see each of the problems it has found, with some information about the location and what kind of problem there is. One of the errors has been found on line 1, position * (as shown by :code:`L: 1 | P: 9`) and it's a problem with rule *LT01 (for a full list of rules, see Rules Reference). From this (and the following error) we can see that the problem is that there is no space either side of the + symbol in a+b. Head into the file, and correct this issue so that the file now looks like this:
SELECT a + b AS foo, c AS bar from my_table
Rerun the same command as before, and you'll see that the original error (violation of LT01) no longer shows up.
$ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | LT09 | Select targets should be on a new line unless there is
| only one select target.
| [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 1 | P: 13 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords]
To fix the remaining issues, we're going to use one of the more advanced features of SQLFluff, which is the fix command. This allows more automated fixing of some errors, to save you time in sorting out your sql files. Not all rules can be fixed in this way and there may be some situations where a fix may not be able to be applied because of the context of the query, but in many simple cases it's a good place to start.
For now, we only want to fix the following rules: LT02, LT12, CP01
$ sqlfluff fix test.sql --rules LT02,LT12,CP01 --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'FROM'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords] ==== fixing violations ==== 4 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n]
...at this point you'll have to confirm that you want to make the changes by pressing y on your keyboard...
Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content is now different.
SELECT
a + b AS foo,
c AS bar FROM my_table
In particular:
We could also fix all of the fixable errors by not specifying --rules.
$ sqlfluff fix test.sql --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 2 | P: 10 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] ==== fixing violations ==== 2 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.
If we now open up test.sql, we'll see the content has been updated again.
SELECT
c AS bar,
a + b AS foo FROM my_table
The SQL statement is now well formatted according to all the rules defined in SQLFluff.
The --rules argument is optional, and could be useful when you or your organisation follows a slightly different convention than what we have defined.
So far we've covered the stock settings of SQLFluff, but there are many different ways that people style their sql, and if you or your organisation have different conventions, then many of these behaviours can be configured. For example, given the example above, what if we actually think that indents should only be two spaces, and rather than uppercase keywords, they should all be lowercase?
To achieve this we create a configuration file named .sqlfluff and place it in the same directory as the current file. In that file put the following content:
[sqlfluff] dialect = ansi [sqlfluff:indentation] tab_space_size = 2 [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower
Then rerun the same command as before.
$ sqlfluff fix test.sql --rules LT02,LT12,CP01,ST06,LT09,LT01
Then examine the file again, and you'll notice that the file has been fixed accordingly.
select
c as bar,
a + b as foo from my_table
For a full list of configuration options check out Default Configuration. Note that in our example here we've only set a few configuration values and any other configuration settings remain as per the default config. To see how these options apply to specific rules check out the "Configuration" section within each rule's documentation in Rules Reference.
From here, there are several more things to explore.
One last thing to note is that SQLFluff is a relatively new project and you may find bugs or strange things while using it. If you do find anything, the most useful thing you can do is to post the issue on GitHub where the maintainers of the project can work out what to do with it. The project is in active development and so updates and fixes may come out regularly.
SQL has been around for a long time, as a language for communicating with databases, like a communication protocol. More recently with the rise of data as a business function, or a domain in its own right SQL has also become an invaluable tool for defining the structure of data and analysis - not just as a one off but as a form of infrastructure as code.
As analytics transitions from a profession of people doing one-offs, and moves to building stable and reusable pieces of analytics, more and more principles from software engineering are moving in the analytics space. One of the best articulations of this is written in the viewpoint section of the docs for the open-source tool dbt. Two of the principles mentioned in that article are quality assurance and modularity.
The primary aim of SQLFluff as a project is in service of that first aim of quality assurance. With larger and larger teams maintaining large bodies of SQL code, it becomes more and more important that the code is not just valid but also easily comprehensible by other users of the same codebase. One way to ensure readability is to enforce a consistent style, and the tools used to do this are called linters.
Some famous linters which are well known in the software community are flake8 and jslint (the former is used to lint the SQLFluff project itself).
SQLFluff aims to fill this space for SQL.
SQL itself doesn't lend itself well to modularity, so to introduce some flexibility and reusability it is often templated. Typically this is done in the wild in one of the following ways:
"SELECT {foo} FROM {tbl}".format(foo="bar", tbl="mytable")
Which would evaluate to:
SELECT bar FROM mytable
All of these templating tools are great for modularity but they also mean that the SQL files themselves are no longer valid SQL code, because they now contain these configured placeholder values, intended to improve modularity.
SQLFluff supports both of the templating methods outlined above, as well as dbt projects, to allow you to still lint these "dynamic" SQL files as part of your CI/CD pipeline (which is great 🙌), rather than waiting until you're in production (which is bad 🤦, and maybe too late).
During the CI/CD pipeline (or any time that we need to handle templated code), SQLFluff needs additional info in order to interpret your templates as valid SQL code. You do so by providing dummy parameters in SQLFluff configuration files. When substituted into the template, these values should evaluate to valid SQL (so SQLFluff can check its style, formatting, and correctness), but the values don't need to match actual values used in production. This means that you can use much simpler dummy values than what you would really use. The recommendation is to use the simplest possible dummy value that still allows your code to evaluate to valid SQL so that the configuration values can be as streamlined as possible.
SQLFluff has a few components:
The core vision [1] for SQLFluff is to be really good at being the linter. The reasoning for this is outlined in Why SQLFluff?.
Most of the codebase for SQLFluff is the parser, mostly because at the point of developing SQLFluff, there didn't appear to be a good option for a whitespace-aware parser that could be used instead.
With regards to the rules, SQLFluff aims to be opinionated but it also accepts that many organisations and groups have pre-existing strong conventions around how to write SQL and so ultimately SQLFluff should be flexible enough to support whichever rule set a user wishes to.
Notes
This section is of short guides and articles is meant to be read alongside the rest of the documentation which is more reference-oriented.
Rolling out SQLFluff, like rolling out any other linter or style guide, is not just about the technical rollout, but also how you introduce the tool to the team and organisation around you.
With that in mind, it's worth reminding ourselves what we're trying to achieve with a tool like this. A set of potential success criteria might be:
You like leading commas? Make a PR to .sqlfluff and let's discuss with the team what the implications would be.
Consider which of these success measures is most important and most desirable for your team. Write that down.
The following steps are a guide, which you should adapt to your organisation, and in particular its level of data maturity.
This step is done by you, or a small group of people who already think that linting is a good idea.
There are three sensible rollout phases:
In each of these phases you have three levers to play with:
Work out a sensible roadmap of how hard you want to go in each phase. Be clear who is responsible for changes at each phase. An example plan might look like this:
Bring your team together to introduce both linting as a concept and also SQLFluff as a tool. At this stage it's really important that the team understand *why* this is a good thing.
Consider whether to discuss the whole plan from step 2, or whether to only talk about the first few steps. Aim to make this an empowering experience that everyone can get involved with rather than another piece of admin they need to do.
At this stage, you might also want to consider other tools in the SQLFluff ecosystem such as the SQLFluff pre-commit hook and the SQLFluff VSCode plugin or SQLFluff online formatter.
Once the plan is in motion, make sure to start putting in place norms and rituals around how you change the rules. In particular:
It's normal for your usage of tools like SQLFluff to change and evolve over time. It's important to expect this change in advance, and welcome it when it happens. Always make sure you're driving toward the success measures you decided up front, rather than just resisting the change.
Did it work? If so, spread the word. Tell a friend about SQLFluff.
If you're lucky they might share your views on comma placement 🤷♀️.
It's quite common to have organisation-, or project-specific norms and conventions you might want to enforce using SQLFluff. With a little bit of python knowledge this is very achievable with SQLFluff, and there's a plugin architecture to support that.
This guide should be read alongside the code for the SQLFluff example plugin and the more technical documentation for Developing Plugins.
When thinking about developing a rule, the following thought process will help you decide what to develop:
One of most common questions asked with respect to custom plugins is discovery, or "how do I tell SQLFluff where my plugin is". SQLFluff uses pluggy as it's plugin architecture (developed by the folks at pytest). Pluggy uses the python packaging metadata for plugin discovery. This means that your plugin must be installed as a python package for discovery. Specifically, it must define an entry point for SQLFluff. When SQLFluff runs, it inspects installed python packages for this entry point and then can run any which define one. For example you'll see in the SQLFluff example plugin that the pyproject.toml file has the following section:
[project.entry-points.sqlfluff] # Change this name in your plugin, e.g. company name or plugin purpose. sqlfluff_example = "sqlfluff_plugin_example"
You can find equivalent examples for setup.cfg and setup.py in the python docs for entry point. This information is registered on install of your plugin, (i.e. when running pip install, or equivalent if you're using a different package manager) so if you change it later, you may need to re-install your plugin.
You can test whether your rule has been successfully discovered by running sqlfluff rules and reviewing whether your new rule has been included in the readout.
NOTE:
It can at times be tricky to troubleshoot SQLFluff as it exists within an ecosystem of other tools, and can be deployed in wide range of ways.
This step by step guide can help you narrow down what's likely going wrong and point you toward the swiftest resolution.
There are a few error messages you may get which have relatively straightforward resolution paths.
SQLFluff needs to be able to parse your SQL to understand it's structure. That means if it fails to parse the SQL it will give you an error message. The intent is that if SQLFluff cannot parse the SQL, then it should mean the SQL is also invalid and help you understand where and why.
For example, this is a simple query which is not valid SQL:
select 1 2 3 from my_table
When running sqlfluff lint or sqlfluff parse we get the following error message:
==== parsing violations ==== L: 1 | P: 10 | PRS | Line 1, Position 10: Found unparsable section: '2 3'
Furthermore if we look at the full parsing output we can see an unparsable section in the parse tree:
[L: 1, P: 1] |file: [L: 1, P: 1] | statement: [L: 1, P: 1] | select_statement: [L: 1, P: 1] | select_clause: [L: 1, P: 1] | keyword: 'select' [L: 1, P: 7] | [META] indent: [L: 1, P: 7] | whitespace: ' ' [L: 1, P: 8] | select_clause_element: [L: 1, P: 8] | numeric_literal: '1' [L: 1, P: 9] | [META] dedent: [L: 1, P: 9] | whitespace: ' ' [L: 1, P: 10] | unparsable: !! Expected: 'Nothing here.' [L: 1, P: 10] | numeric_literal: '2' [L: 1, P: 11] | whitespace: ' ' [L: 1, P: 12] | numeric_literal: '3' [L: 1, P: 13] | newline: '\n' [L: 2, P: 1] | from_clause: [L: 2, P: 1] | keyword: 'from' [L: 2, P: 5] | whitespace: ' ' [L: 2, P: 6] | from_expression: [L: 2, P: 6] | [META] indent: [L: 2, P: 6] | from_expression_element: [L: 2, P: 6] | table_expression: [L: 2, P: 6] | table_reference: [L: 2, P: 6] | naked_identifier: 'my_table' [L: 2, P: 14] | [META] dedent: [L: 2, P: 14] | newline: '\n' [L: 3, P: 1] | [META] end_of_file:
SQLFluff maintains it's own version of each SQL dialect, and this may not be exhaustive for some of the dialects which are newer to SQLFluff or which are in very active development themselves. This means in some scenarios you may find a query which runs fine in your environment, but cannot be parsed by SQLFluff. This is not a "bug" per-se, but is an indicator of a gap in the SQLFluff dialect.
Many of the issues raised on GitHub relate to parsing errors like this, but it's also a great way to support the project if you feel able to contribute a dialect improvement yourself. We have a short guide on Contributing dialect changes to walk you through the process. In the short term you can also ignore specific files from your overall project so that this specific file doesn't become a blocker for the rest. See Ignoring Errors & Files.
If you're getting ether unexpected behaviour with your config, or errors because config values haven't been set correctly, it's often due to config file discovery (i.e. whether SQLFluff can find your config file, and what order it's combining config files).
For a more general guide to this topic see Setting Configuration.
To help troubleshoot issues, if you run sqlfluff with a more verbose logging setting (e.g. sqlfluff lint /my/model.sql -v, or -vv, or -vvvvvv) you'll get a readout of the root config that SQLFluff is using. This can help debug which values are being used.
If you're still getting strange errors, then the next most useful thing you can do, both to help narrow down the cause, but also to assist with fixing a bug if you have found one, is to isolate SQLFluff from any other tools you're using in parallel:
Often SQL scripts can get very long, and if you're getting an error on a very long script, then it can be extremely difficult to work out what the issue is. To assist with this we recommend iteratively cutting down the file (or alternatively, iteratively building a file back up) until you have the smallest file which still exhibits the issue. Often after this step, the issue can become obvious.
It is recommended that the following is read in conjunction with exploring the codebase. dialect_ansi.py in particular is helpful to understand the recursive structure of segments and grammars.
You may also need to reference the Internal API.
At a high level, the behaviour of SQLFluff is divided into a few key stages. Whether calling sqlfluff lint, sqlfluff fix or sqlfluff parse, the internal flow is largely the same.
This stage only applies to templated SQL. Vanilla SQL is sent straight to stage 2, the lexer.
In order to lint templated SQL, SQLFluff must first convert the 'raw' or pre-templated code into valid SQL, which can then be parsed. The templater returns both the raw and post-templated SQL so that any rule violations which occur in templated sections can be ignored and the rest mapped to their original line location for user feedback.
SQLFluff supports multiple templating engines:
Under the hood dbt also uses Jinja, but in SQLFluff uses a separate mechanism which interfaces directly with the dbt python package.
For more details on how to configure the templater see Templating Configuration.
The lexer takes SQL and separates it into segments of whitespace and code. Where we can impart some high level meaning to segments, we do, but the result of this operation is still a flat sequence of typed segments (all subclasses of RawSegment).
The parser is arguably the most complicated element of SQLFluff, and is relied on by all the other elements of the tool to do most of the heavy lifting.
When working on the parser there are a couple of design principles to keep in mind.
Given the complete parse tree, rule classes check for linting errors by traversing the tree, looking for segments and patterns of concern. If the rule discovers a violation, it returns a LintResult pointing to the segment which caused the violation.
Some rules are able to fix the problems they find. If this is the case, the rule will return a list of fixes, which describe changes to be made to the tree. This can include edits, inserts, or deletions. Once the fixes have been applied, the updated tree is written to the original file.
Many of the contributors may not be familiar with Git and it can be a confusing world for those new to it with perplexing terms like clone, fork, branch, merge conflicts and rebase. This guide aims to provide some information to those of you new to Git about the best way we think of working with it it is and also serve as a quick reference to some of the Git terms, or commands to use.
This section will give some basic background to complete newbies to Git. Feel free to skip to the next section, Recommended way to use Git for SQLFluff, where we talk about how we use it on SQLFluff if you understand the basics already.
Git is a distributed version control system. That mouthful basically means it's a way of keeping track of changes to our source code and other content - especially when many, many people are changing various parts of it. The distributed part of it is what makes Git so interesting (and so complicated!) - there can be many copies of our code, and that can cause fun and games when trying to keep it in sync!
The original and primary copy of a code base (called a repository or repo) is hosted on a server (e.g. GitHub), people will be working on copies in their local machine, and people may have forked a copy of the repo to another one also hosted on the server - and then that forked copy may also be copied locally to your machine. Add in different branches in any of those copies and it can quickly become quite confusing.
Git often involves working with the command line, which might be less familiar and a bit intimidating for those of you less technically minded. Graphical front end tools exist to try to replicate this command line functionality but it's helpful to have some familiarity with using Git on the command line and with a guide like this, hopefully that becomes less daunting a prospect!
GitHub is not Git, but it is one of the most commonly used instances of Git and adds various features on top of the core versioning of code that Git handles. The main thing GitHub gives you is a Git server to store your code, and a nice web front end to manage it all through. Using the web front end you can view (and even change!) code, raise issues, open and review pull requests, use GitHub Actions to automate things (e.g. test code) and even host wiki pages like this one.
In this Wiki I've tried to differentiate between Git concepts and commands and those specific to GitHub. Other instances of Git that you might be familiar with, or use in work or other projects, include GitLab and BitBucket. They have many of the same features as GitHub.
GitHub also have a graphical front end tool called GitHub Desktop for working on on Git locally and syncing it back to GitHub. Check out the GitHub Desktop section for tips on how to use it.
SQLFluff makes extensive use of GitHub to help us manage the project and allow all the many disparate contributors to collaborate easily.
While it is possible to work just using GitHub's website - especially if just comment on issues and adding your advice - managing the code really is best done locally on your own computer and then pushing changes back up to GitHub. Git is very popular and widely available (see installation instructions for Windows, Mac & Linux). You may already have it installed, so to check if that's the case, open a command line and type:
git --version
If you see a version number returned then you've passed the first step!
If not, then for Windows I recommend installing and using Git Bash which is a Linux-like command line. For MacOS the built in Terminal available under Launchpad is fine and running the above version check will prompt you to install XCode and Git. For Linux I presume you'll be familiar with how to install this.
A Git Repository or Repo is a collection of all the code that makes up a project. Well that's not strictly true as a project may also depend on other programs and libraries, but typically they are not stored in the project repo - only the code specific to this project is stored in the repo along with config files that are used to install any necessary libraries to run the code and instead installed (e.g. using a command like npm install for node modules).
The main SQLFluff repo is available on GitHub at: https://github.com/sqlfluff/sqlfluff. However, we also have a few other repos for the VS Code extension and the like, available at https://github.com/sqlfluff.
A repo will usually contain a number of branches. These are copies of the code where you can work independently on a particular item. The name branch is used because, like a tree, these can diverge from each other - though, unlike a tree, they are usually merged back when the work is complete.
There will be one main (or master) branch which everything should be merged back into when ready. Traditionally these have been called the master branch, but many projects are trying to use more inclusive language and have switched to using the name main or similar instead. SQLFluff moved to using main in 2021.
Creating a branch is very quick and is integral to how Git works. Git stores branches in an incredibly efficient way and doesn’t literally have a copy of the same code, but only differences basically. So do not feel like it's a big deal to create a branch (it's not!) and frequently creating small branches, and merging them back in to the main branch when ready is the best way to use Git. Creating large branches or reusing branches for lots of different changes is not the best way of using Git and will lead to issues.
Once your changes are ready to merge back to main you open a pull request (often shortened to PR), which creates a special type of GitHub issue which can be used to merge your changes into the main branch.
A pull request is really a GitHub concept and at the end of the day is basically a fancy way of actioning a merge in Git. Bitbucket also use the term Pull Request, while GitLab uses Merge Request. It should also not be confused with git pull, which is a Git command to pull down changes from the server (e.g. GitHub) into your local copy.
An example pull request on GitHub is shown below: [image: Screenshot of an example pull request on GitHub.] [image]
In this pull request there are the following tabs:
You can tag people to review your pull request, assign it to someone to deal with (not used much as kind of repeat of the author and reviewers), add labels...etc.
At the bottom of the Conversation tab you will see the following: [image: Bottom of a pull request with "Squash and Merge" and "Close" buttons.] [image]
This shows on this PR that all checks have passed and this is ready to merge. Clicking the big green "Squash and Merge" button will copy (the "Merge" part) all this code into main branch with one single commit (the "Squash" part). Usually you don't need to have all the 100s of commits you have have done while developing this code change so "Squash" is what you want but you can change it if you want.
You can also close this pull request if you change your mind with the Close button at the bottom, or add a comment with the Comment button if you make a big change to it since opening that you want people following the pull request to be aware of.
Please note you do NOT need to Close and Reopen the pull request (or even open a new pull request) when you need to make changes based on review feedback - simply pushing changes to the branch will cause any open pull request from that branch to automatically be updated and checks to automatically be rerun. It is expected (and a good thing!) to change your code based on feedback and this is very much part of the workflow of pull requests.
As well as branches, GitHub has the concept of forks, which basically means taking a complete copy of the repo (and all its branches at that time) into your own GitHub account. You can then create a branch in that fork, and then open a pull request to to merge code from your branch on your fork, all the way back to the the original repo (called the upstream repo). It may sound like an Inception level of abstraction and confusion but it actually works quite well once you get your head around it.
NOTE:
Why would you fork when you can just work in the original repo? Well most projects don't want people messing with the original repo so restrict permissions to only allow core contributors to create branches in the original repo. Others must therefore fork to make changes and then open pull requests to the original repo for review before they are committed.
And it's important to use the correct terminology when working with forks. Tempting as it is, the original repo should always be referred to as "original" or "upstream", and never "main" or "master" - which refer to branches within a repo. Similarly a "local" copy, or "clone" refers to the copy on your PC as we shall see and that can be of the original repo or a fork.
Another extra bit of hassle with a fork, is that you must keep it reasonably up to date with the original, upstream repo. To do that you periodically merge or rebase the fork back to the original repo which pulls down changes into your fork. We'll explain how to do that later.
To work on a project in GitHub you would normally clone a repo, which simply means taking a copy of it on your local PC. It is possible to make small edits on the GitHub.com website but it's quite limited and often doesn't allow you to run code locally to test it for example. You can clone a repo by clicking on green Code button on the repo's home page (make sure you do this on your fork and not on the main repo): [image: Screenshot of the clone button in GitHub.] [image]
This offers a number of options:
Once you copy the SSH or HTTPS URL on the command line simply go to the command line on your PC, into a directory you want to create the copy in and type the following (assuming SSH):
git clone git@github.com:sqlfluff/sqlfluff.git
You can clone a local copy of the original repo, if you plan to (and have access to work on branches of that, or you can clone a fork of the original repo. The above example command clones the original repo location, and not the fork location - you should change the git address to the forked version when working from a fork.
After running this command you'll see the repo being downloaded locally. You can then branch, edit any of the files, or add new files, or even delete files to your hearts content. Any changes you make will only be on your machine and then you push changes back up to GitHub. We'll cover that later.
Just like with a fork, you need to keep any local up to date with both the original, upstream repo, and the GitHub version. This is done by using the git pull, git merge and git rebase commands. We'll explain how to do all that below.
When keeping all the different copies in sync you will inevitably run into the dreaded "merge conflict" - a rite of passage every developer must go through. This happens were you've changed some code, but so has someone else, and their changes has been merged into main, so when you attempt to merge (either by syncing main back to your branch to update your branch with any new changes since branching, or by attempting to open a pull request from your branch) Git will give up and say "I don't know what to do here - you deal with it!".
In actually fact, dealing with merge conflicts is actually very simple. When you open the conflicted file you'll see something like this:
If you have questions, please <<<<<<< HEAD open an issue ======= ask your question in Slack >>>>>>> branch-a
In this case someone changed the line to "open an issue" and merged that to main (aka HEAD) and you've also changed it to "ask your question in Slack". Git is warning you that it has been changed since branching but you also changed it. You simply need to decide what line you want and then delete all the other lines (including the ones starting <<<<, ==== and >>>>). Then git add the "resolved" file to your branch.
You can even do it directly on GitHub.
Merge conflicts get a bad name and people think they are scary to deal with but Git actually makes it fairly easy. It will also usually only complain if the exact same line has changed — two people working on different parts of the same file usually won't see any merge conflicts.
Of course if you're both working on lots of the same code, across lots of files they can be a real pain to deal with - this is one of the main reasons to resync your branch back to the original main branch frequently, and also to work on small PRs rather than big unwieldy ones!
When working on SQLFluff you must fork SQLFluff to your own copy and work on that. SQLFluff contributors do not have access to create branches in the original repo.
To create your own Fork do the following:
It is also strongly recommended not to work on the main branch of your forked repo. When creating a new branch you will usually branch from main, so once your main has extra changes in it, it is no longer possible (or at least easy!) to create a clean branch for other work.
If you are only working on one thing at once, then using main in your fork may seem the quickest and easiest thing to do, but you'd be surprised how often you may want to pause that work for a bit and work on something else instead - and that's one of the advantages of using Git. For example if you are waiting on a pull request to be reviewed, you may want to work on another feature in the meantime on a fresh branch, which is completely independent of your other work. Or perhaps someone discovers an urgent, and easily fixed, bug in the code that you can quickly fix, before coming back to the current work. See the Switching between branches section below for more info on how to switch branches.
Working on a separate branch to main allows main to be kept in sync with upstream main, which allows new branches to be created more easily. It also allows you to merge upstream main into your branch periodically more easily. It also keeps your history of changes without a long history on each future pull request. Finally, it also also you to completely reset your main back to the same as upstream if you get it completely messed up, without losing any history in other branches.
As discussed above it is recommended to use the main branch in your fork only to sync up with the main branch in the original SQLFluff repo. To do that you can do one of several things:
So I prefer option 3 - it's the cleanest and ensures main is exactly the same as upstream, and that there will be no long history of commit messages in your next branch and pull request.
You can use the below commands to reset your local fork to upstream and then push those changes to GitHub.
NOTE:
Check if upstream already exists:
git remote -v
Add an upstream remote, if not already added previously:
git remote add upstream git@github.com:sqlfluff/sqlfluff.git
Then force reset your main branch:
git fetch upstream git checkout main git reset --hard upstream/main git push origin main --force
After this your should visit your forked repo on GitHub and check you get a message that "This branch is even with sqlfluff:main.": [image: A forked repo which is even with upstream.] [image]
So, when you're ready to make your first changes, do the following:
My preferred method of committing changes is to use this:
git commit -a
This takes all the changes for existing tracked files and adds them to the commit. New files still need to be added with git add but all files currently tracked by Git are automatically included without having to use git add. This then opens the default Git editor (usually vi) and lists the files for this commit, any files not included and allows you to add the message and complete the commit. If you close vi without adding a message then it cancels the commit.
However vi can be quite a barrier to people as it's quite a confusing editor with a "command" and an "edit" mode, and needing to remember command sequences (like ESC + w + q + !). For those not familiar with linux and vi this might be quite a stumbling block.
You can provide the message on the command line so you don't have to got near vi using a sequence of commands like this:
git status git add file1 file2 git commit -m "Committing file1 and file2"
This does require you to add the files or folders first so a bit more painful than git commit -a. Alternatively you use the -a and -m switches together:
git status git commit -a -m "Committing all open files"
The downside is that, unlike the vi method, it won't show you the list of files it's going to commit, so is a bit more dangerous, hence why I prefer the vi method instead. However, if you do a git status before you commit, you should see the files that will be committed. Plus you can always revert changes if you need to.
So, in summary vi method is preferred but can be more complicated to those not familiar with it so can give commit message on command line but take care with it.
It is also recommended to merge any changes that have happened to SQLFluff code (in its main branch) into your branch periodically in case it affects your code, and particularly important to do this just before opening a PR.
To merge changes into a forked repo from upstream main do the following:
Or to merge from a branch on the main repo do the following:
Git allows working on several branches at once. This allows you to work on something else while you are stuck on one bit of work (waiting for answers, or pull request feedback, or you just fancy a break!). Use git checkout to switch between branches and use git status to ensure all your changes are committed when switching between branches.
For example, let's say you are working on feature1 branch:
git checkout main git pull git checkout -b feature1 # Make some changes git commit -m "Commit my changes for feature1" # Make some more changes git commit -m "Commit some more changes for feature1" # Push changes to Github.com if you want to (always good to do this in case your computer dies!) git push # Note the first time you push a new branch you will need a slightly different push comment: # `git push --set-upstream origin feature1` # Helpfully, git will tell you this if you try using just `git push`.
And then you want to take a break from feature1, in which case you should open a new branch - in most cases you want to branch from main again, and not from feature1 branch so make sure you flip back to main again if this is the case. If working on a fork, you should also check your main is up to date first - see the Resyncing your main branch to upstream section above:
# Check your branch is clean and everything has been committed git status # Create a new branch from main (note I've not included the resyncing of main to upstream here for forks) git checkout main git pull git checkout -b feature2 # Make some changes and push your new branch to GitHub git commit -m "Commit some changes for feature2" git push --set-upstream origin feature2 # Make some more changes and push those too git commit -m "Commit more changes for feature2" git push
You are now free to switch back to feature1 if you want using git checkout (note you don't need the -b flag as that's only needed to create a new branch that doesn't exist, whereas just switching between branches that already exist don't need it):
git checkout feature1
And then switch back to feature2 later:
git checkout feature2
The primary concern with multiple branches like this is getting yourself confused! Using git status here is your friend to double check you're on the correct branch and all outstanding changes have been committed.
As with lots of things in Git, there are many ways to handle branches (including the git branch command), but I'd encourage you to stick with few commands and use git checkout, git commit, git push, git pull and git status as much as possible as you can do most things with those and it's already getting confusing!
The other alternative is to just create a new folder and clone the whole repo again and manage it completely separately. This can be easier, safer and less confusing for those less familiar with Git if working on a limited number of branches. However it doesn't scale very well and is not the way you're going to get the most out of Git so as soon as you go beyond a second branch I'd strongly encourage you get used to checking out between branches.
I do encourage separate folders however if you have different repos (e.g. your do some of the main work on the HTTPArchive repo, and some work on your own fork) as switching repo that a folder points to, while also possible, is adding yet more confusion to an already complex thing! 🙂
Once you are finished making changes, you should take the following steps to open a pull request back to the original repo to accept your code into SQLFluff:
As mentioned above, you can make more changes to your branch and push them up to GitHub and the Pull Request will automatically be updated. There is no need to close the PR and reopen a new one.
When your pull request is reviewed, the first bit of feedback you're likely to see if from the automated tests that run every time a pull request is opened. They take a few minutes to run and then you will then give you a satisfying green tick, or a scary red cross.
The first check that GitHub itself will do is check for any Git Merge Conflicts and these must be resolved before the pull request can be merged. If you merge main to your code before submitting a pull request, then it's unlikely you'll get any of these (unless someone's got in real quick while you were opening it!) which is why it's recommended to do that, but other pull requests can be action while yours is being reviewed so can still happen. Smaller pull request, touching few files reduces the chance of this so again, where at all possible, break up changes into smaller batches.
Another check that we perform is linting the code in the pull request. This runs automated checks for errors or code styling and formatting issues that don't match the code conventions we use.
Python code is linted with flake8 and you can run this command on any files to see similar linting issues. You can use black to auto-fix most flake8 issues, though some need to be manually addresses.
After the automated tests have passed the code will be reviewed manually by a maintainer, or another contributor. They can ask questions, or make suggestions to change the code.
Look upon a code review as an opportunity to learn and improve your code. Feedback can be tough to hear after you've worked hard on code, but stay aware that it is meant with the best possible intention to provide feedback to you for this and future commits, and to keep our codebase to a high standard. It is not a personal sleight upon you or your code, and if you are getting annoyed with feedback I suggest you take a break and step away and read it again later, in a fresh light. Of course if if you feel that a reviewer is acting inappropriately then please raise it - we have a Code of Conduct and want all contributors to feel welcome. Feel free to also reach out to a maintainer if you would like to discuss something privately.
When a reviewer makes a code suggestion you can accept it right in GitHub and it will automatically update your branch. As this happens in GitHub directly, just make sure you do a git pull next time you are working locally on your code to pull down these changes. It's quite slow to accept a lot of changes this way, so if there are a lot of changes then sometimes better to change locally in your favourite editor, and then push one commit with the fixes, and then mark each of the suggestions as resolved. Any suggestions on lines which have changes since they were raised, will be marked as outdated which makes them easy to spot.
Reviewers may also make small, seemingly pedantic changes - usually they will include the word "nit", to indicate this is a "nitpick" in these. Like the linting, they can seem needless, but they help maintain our code quality.
It should also be noted that not all suggestions may be valid! Reviewers can make a mistake just as easily as the pull request author (more easily in some ways as they often will be reviewing based on reading the code rather than testing it). Feel free to push back on suggestions when you disagree with them. However, it is best to get consensus between reviewer and pull request author where at all possible so explain why you don't think you should make the change being suggested rather than just ignoring the suggestion or resolving it without a comment.
After addressing feedback, please re-request a review by clicking the little two arrow icon next to the reviewer name, or make a comment (e.g. "All feedback addresses. I think this is good to merge now."). Sometimes it's difficult to know if someone is still working on feedback and a pull request may be left unintentionally, getting out of date, because reviewers are not aware that it's good for a re-review or merge.
Only SQLFluff maintainers can merge pull requests, but every contributor can review pull requests, and merging becomes a lot easier (and so more likely!) if someone else has already had a look through the changes. Please, please, please help us reviewing_pull_requests to help spread the load!
It is also possible to open a draft pull request, if you want early feedback on your code or approach. Please remember to convert to a full pull request when happy.
Additionally if making a large number of changes (for example if you need to update every dialect), then best to do this in only one or two files initially, open a pull request (whether draft or full) and get feedback on your approach before you spend time updating all the files! It's much easier to review code changes if they are not drowned out by lots of identical changes that needs that code in the same pull request, and much less likely to result in merge conflicts. Then the rest of the files can be added to the pull request, or a separate one opened for those (maybe a new pull request per language if you want different translators to approve any changes).
As mentioned above we strongly encourage contributors to help review pull requests. This is a voluntary, collaborative effort for us all and depending on one or two people creates a bottleneck and a single point of failure for the project.
Even if a review pulls up nothing, it is important to approve the pull request - this indicates that it has been reviewed and is just as useful (if not more so) as commenting on code or making suggestions.
Do also be conscious of how feedback will be read. We have many first time contributors who may not be as familiar with language (either programming language or English language!) so do try to avoid technical terms, colloquialism...etc. Though we are aware of some very commonly used acronyms and terms (which we've included in our Glossary of terms) like LGTM ("Looks Good To Me").
Do be mindful as well that pull request authors have voluntarily spent time on this and we wish to encourage that and foster an inclusive environment. Offensive language is forbidden by our Code of Conduct.
Do remember that a code review is also about reviewing the authors changes, and not about showing off your own knowledge! Try not to get side-tracked but instead raise a new issue if you want to consider something else that comes up during a code review but is not directly related.
On that note, do also remember that code can be improved incrementally. Small changes are much better in Git. So, while it's OK to point out a fuller solution do remember that Perfect is the enemy of good and accepting a change that is an improvement and then improving further in future iterations can often be better than holding out for the perfect solution that may never come. Of course that doesn't mean we should accept code that regresses the quality, or seems like the wrong way of doing it completely!
And finally, we strongly encourage positive GitHub reactions - particular for new contributors. They give quick, encouraging, feedback and add a playful, fun tone: [image: GitHub Heart and Rocket reactions.] [image]
We discourage the negative ones though (thumbs down 👎, or confused 😕). Better to add a comment (politely!) explaining your concerns and then if others agree with you, they can "thumbs up" your comment. This keeps things on a positive tone and also means your cause for concerns are fully understood.
GitHub Desktop is a Windows and MacOS app that provides a visual interface to GitHub. It reduces the need to use and understand Git via the command line.
This section will provide some tips on performing some common tasks via the GitHub Desktop
First make sure you have Git installed. See our section on Installing Git for more details.
You can then download the install file from https://desktop.github.com/, with further instructions from their Installing and configuring GitHub Desktop document. Your main tasks will be to Authenticate with GitHub and Configuring Git for GitHub Desktop so that the systems know who you are.
If you have not done already, you will want to clone a copy of the https://github.com/sqlfluff/sqlfluff repo into your computer. The simplest way is to follow Cloning a repository from GitHub to GitHub Desktop where you go to the repository on the website and select "Open with GitHub Desktop". This will open a window where you can click "Clone" and the job will be done.
Once you have cloned repositories you will be able to select them via the "Current repository" toolbar button, just under the menu on the left. By default the sidebar will show you what edits have been made to the repository, and the main section shows actions you may want to perform.
Over time the original repository will get updated and your copy will become out of date. GitHub Desktop will highlight if your repository is out of date, with an option to pull any changes from the origin so that you have the latest versions.
You want to create your own branch before you start as you very likely do not have permission to edit the SQLFluff main branch. A branch is a way for you to group your own edits so you can later submit (push) them for review. Then, when they are approved, they will get merged back into the main branch.
Before creating a branch, make sure you're currently on the main branch and it is up to date (see above).
If you click on the "Current branch" tab in the toolbar you will see all the public branches in play. To create your own branch, enter a new name in the textbox at the top and click the "Create new branch" button.
At the moment your branch is only known to you. If you want others to see it, then you need to publish it. GitHub Desktop will prompt you to do that.
Once published you and others can select your branch on the GitHub website.
You can edit the repository using your favourite editor. As you edit, GitHub Desktop will show you what changes you have made.
Note that you can change branches at any time, but I suggest you commit and push any edits (see next) before you switch as things can get confusing. If you are working with multiple branches, always keep an eye out to make sure you're on the right one when working.
Every once in a while you want to store and document your changes. This can help you or others in the future. You also have to commit before you can share (push) your changes with anyone. You can quickly commit your current edits via the form to the bottom left.
Once you have commits you will be prompted to push those commits to GitHub. I typically do this straight after committing.
At this point you have a branch with edits committed and everything pushed to GitHub. Once you are happy with your work, you want it to be reviewed, approved and merged back into the main repository.
For this I switch back to the website, as it is there you will be communicating with reviewers. To get this stage started you need to create a pull request. Go to the SQLFluff responsitory on GitHub, make sure your branch is selected, then click the Pull request link and follow the instructions. This will notify the reviewers who will help you to get your changes live.
The main branch of your fork should be kept in sync with the original repository (rebased). Especially before you create any branches to make edits. Details on how to do this are in the Resyncing your main branch to upstream section.
This is done in the exact same way as before (i.e. in Making your own edits (creating a branch)). Create a branch from your master (make sure master is up to date using the above process), publish the branch, edit the files in the branch, commit your edits, push back to GitHub.
With a forked repository the process to get your edits accepted is about the same as before (i.e. in Getting your changes accepted). Go to the web page for your copy of the repository and create a pull request.
This is a list of terms to those less familiar with Git/GitHub:
One of the best ways that SQLFluff users can improve SQLFluff for themselves and others is in contributing dialect changes.
Users will likely know their syntax much better than the regular maintainers and will have access to an instance of that SQL dialect to confirm changes are valid SQL in that dialect.
If you can fix your own issues then that's often the quickest way of unblocking any issues preventing you from using SQLFluff! The maintainers are all volunteers doing this in our spare time and (like you all I'm sure!), we only have so much time to work on this.
SQLFluff has a lexer and parser which is built in a very modular fashion that is easy to read, understand, and expand on without any core programming skills or deep knowledge of Python or how SQLFluff operates. For more information see the Architecture Documentation, but will cover that briefly here to give you enough to start contributing.
We also have a robust Continuous Integration pipeline in GitHub where you can gain confidence your changes are correct and will not break other SQLFluff users, even before a regular maintainer reviews the code.
SQLFluff defines the syntax it will used in dialect files (more on this later). If you look at the dialect_ansi.py file you will see it has syntax like this:
class SelectClauseSegment(BaseSegment):
"""A group of elements in a select target statement."""
type = "select_clause"
match_grammar = StartsWith(
Sequence("SELECT", Ref("WildcardExpressionSegment", optional=True)),
terminator=OneOf(
"FROM",
"WHERE",
"ORDER",
"LIMIT",
"OVERLAPS",
Ref("SetOperatorSegment"),
),
enforce_whitespace_preceding_terminator=True,
)
parse_grammar = Ref("SelectClauseSegmentGrammar")
This says the SelectClauseSegment starts with SELECT or SELECT * and ends when it encounters a FROM, WHERE, ORDER...etc. line.
The match_grammar is what is used primarily to try to match and parse the statement. It can be relatively simple (as in this case), to quickly match just the start and terminating clauses. If that is the case, then a parse_grammar is needed to actually delve into the statement itself with all the clauses and parts it is made up of. The parse_grammar can be fully defined in the class or, like above example, reference another class with the definition.
The match_grammar is used to quickly identify the start and end of this block, as parsing can be quite intensive and complicated as the parser tries various combinations of classes and segments to match the SQL (particularly optional ones like the WildcardExpressionSegment above, or when there is a choice of statements that could be used).
For some statements a quick match is not needed, and so we can delve straight into the full grammar definition. In that case the match_grammar will be sufficient and we don't need the optional parse_grammar.
Here's another statement, which only uses the match_grammar and doesn't have (or need!) an optional parse_grammar:
class JoinOnConditionSegment(BaseSegment):
"""The `ON` condition within a `JOIN` clause."""
type = "join_on_condition"
match_grammar = Sequence(
"ON",
Indent,
OptionallyBracketed(Ref("ExpressionSegment")),
Dedent,
)
You may have noticed that a segment can refer to another segment, and that is a good way of splitting up a complex SQL expression into its component parts to manage and handle them separately.
There are a number of options when creating SQL grammar including:
| Grammar | Used For | Example |
| "KEYWORD" | Having a raw SQL keyword | "SELECT" |
| Sequence() | Having a known sequence of Keywords or Segments | Sequence("SELECT", Ref("SelectClauseElementSegment"), "FROM"...) |
| AnyNumberOf() | Choose from a set of options which may be repeated | "SELECT", AnyNumberOf(Ref("WildcardExpressionSegment"), Ref("ColumnReferenceSegment")...)... |
| OneOf() | A more restrictive from a set of AnyNumberOf limited to just one option | OneOf("INNER","OUTER","FULL"), "JOIN" |
| Delimited() | Used for lists (e.g. comma-delimited - which is the default) | "SELECT", Delimited("SelectClauseElementSegment"), "FROM"... |
| Bracketed() | Used for bracketed options - like function parameters | Ref("FunctionNameSegment"), Bracketed(Ref("FunctionContentsGrammar") |
Some of the keywords have extra params you can give them, the most commonly used will be optional=True. This allows you to further define the make up of a SQL statement. Here's the DeleteStatementSegment definition:
parse_grammar = Sequence(
"DELETE",
Ref("FromClauseSegment"),
Ref("WhereClauseSegment", optional=True), )
You can see the WHERE clause is optional (many's a head has been shaken because of deletes without WHERE clauses I'm sure, but that's what SQL syntax allows!).
Using these Grammar options, it's possible to build up complex structures to define SQL syntax.
A Segment is a piece of the syntax which defines a type (which can be useful to reference later in rules or parse trees). This can be through one of the functions that creates a Segment (e.g. NamedParser, SegmentGenerator...etc.) or through a class.
A Grammar is a section of syntax that can be used in a Segment. Typically these are created to avoid repeating the same code in multiple places. Think of a Grammar as an alias for a piece of syntax to avoid you having to type out the same code again and again and again.
The other good thing about Grammars is it allows other dialects to override a specific part of a Segment without having to redefine the whole thing just to tweak one small part. For example ansi defines this:
NotOperatorGrammar=StringParser("NOT", KeywordSegment, type="keyword")
whereas mysql overrides this to:
NotOperatorGrammar=OneOf(
StringParser("NOT", KeywordSegment, type="keyword"),
StringParser("!", CodeSegment, name="not_operator", type="not_operator"), ),
This allows MySQL to use ! in all the places that NOT was used (providing they use NotOperatorGrammar rather than hardcode the NOT keyword of course). This makes it much easier to customise syntax to a particular dialect without having to copy and paste (and maintain) nearly identical code multiple times just to add the extra ! syntax that MySQL supports to mean NOT.
A lot of SQL is the same no matter which particular type of SQL you are using. The basic SELECT.. FROM... WHERE statement is common to them all. However lots of different SQL dialects (Postgres, Snowflake, Oracle... etc.) have sprung up as different companies have implemented SQL, or expanded it, for their own needs.
For this reason, SQLFluff allows creating dialects, which can have different grammars from each other.
SQLFluff has all the dialects in the src/sqlfluff/dialects folder. The main dialect file (that every other dialect ultimately inherits from) is the dialect_ansi.py file.
In SQLFluff, a dialect is basically a file which inherits everything from the original ANSI dialect, and then adds or overrides parsing segments. If a dialect has the exact same SELECT, FROM and WHERE clauses as ANSI but a different :ORDER BY syntax, then only the :ORDER BY clause needs to overridden so the dialect file will be very small. For some of the other dialects where there's lots of differences (Microsoft T-SQL!) you may be overriding a lot more.
I kind of skipped this part, but before a piece of SQL can be parsed, it is lexed - that is split up into symbols, and logical groupings.
An inline comment, for example, is defined as this:
RegexLexer(
"inline_comment",
r"(--|#)[^\n]*",
CommentSegment,
segment_kwargs={"trim_start": ("--", "#")}, ),
That is, anything after -- or # to the newline. This allows us to deal with that whole comment as one lexed block and so we don't need to define how to parse it (we even give that a parsing segment name here - CommentSegment).
For simple grammar addition, you won't need to to touch the lexing definitions as they usually cover most common ones already. But for slightly more complicated ones, you may have to add to this. So if you see lexing errors then you may have to add something here.
Lexing happens in order. So it starts reading the SQL from the start, until it has the longest lexing match, then it chomps that up, files it away as a symbol to deal with later in the parsing, and starts again with the remaining text. So if you have SELECT * FROM table WHERE col1 = 12345 it will not break that up into S, E, L...etc., but instead into SELECT, *, FROM, table...etc.
An example of where we had to override lexing, is in BigQuery we have parameterised variables which are of the form @variable_name. The ANSI lexer doesn't recognise the @ sign, so you could add a grammar or segment for that. But a better solution, since you don't need to know two parts (@ and variable_name) is to just tell the lexer to go ahead and parse the whole thing into one big symbol, that we will then use later in the parser:
bigquery_dialect.insert_lexer_matchers(
[
RegexLexer("atsign_literal", r"@[a-zA-Z_][\w]*", CodeSegment),
],
before="equals", )
Note the before="equals" which means we tell the lexer the order of preference to try to match this symbol. For example if we'd defined an at_sign lexing rule for other, standalone @ usage, then we'd want this to be considered first, and only fall back to that if we couldn't match this.
Most dialects have a keywords file, listing all the keywords. Some dialects just inherit the ANSI keywords and then add or remove keywords from that. Not quite as accurate as managing the actual keywords, but a lot quicker and easier to manage usually!
Keywords are separated into RESERVED and UNRESERVED lists. RESERVED keywords have extra restrictions meaning they cannot be used as identifiers. If using a keyword in grammar (e.g. "SELECT"), then it needs to be in one of the Keywords lists so you may have to add it or you might see error's like this (showing "NAN" has not been added as a Keyword in this dialect):
RuntimeError: Grammar refers to 'NanKeywordSegment' which was not found in the redshift dialect
Also if editing the main ANSI dialect, and adding the the ANSI keyword list, then take care to consider if it needs added to the other dialects if they will inherit this syntax - usually yes unless explicitly overridden in those dialects.
Now that you know about some of the tools SQLFluff provides for lexing and parsing a SQL statement, what changes will you make to it? While devising ad-hoc changes to the grammar to fix particular issues can be better than nothing, the best and most robust contributions will be created by consulting the source of truth for the grammar of your dialect when mapping it to SQLFluff segments and grammars. This will help you exhaustively find all possible statements that would be accepted by the dialect.
Many computer languages are written using venerable tools like Flex and Bison, or similar parser generators, and SQL database engines are no exception. You can refer to the parser specification in the source code of your database engine for the ultimate source of truth of how a SQL statement will be parsed: you might be surprised at what your SQL engine will parse due to gaps in the documentation!
You should also refer to the reference documentation for your SQL dialect to get a concise high-level overview of what the statement grammar looks like, as well as read of any further restrictions and intended use of the grammar that you find. If your SQL engine is closed-source, then you'll likely have only the reference documentation to work with. However, this will always be a less-accurate resource than the bison grammar that's actually used for code generation inside the database engine itself.
It is also extremely helpful to try parsing the queries that you put into the test fixtures to make sure that they are actually parsable by the database engine. They don't have to be valid queries per se (can refer to non-existing table names, etc), but you should confirm that they are parsable. We do not want to require that SQLFluff be able to parse a statement that the actual database engine would reject: overeager matching logic can create parsing issues elsewhere.
Here is a list of grammars and parsing techniques for some of the dialects implemented by SQLFluff:
Unfortunately, the ANSI SQL standard is not free. If you want a licensed copy of the latest standard, it must be purchased: Part 2 is the most useful section for SQLFluff since it contains the grammar. There are, however, other resources you can find on the Internet related to this standard:
Simply Googling for pg <statement> will often bring up the documentation for an older PG version. Please be sure you're referring to the latest version of the documentation, as well as refer to the bison grammar.
So that's a bit of theory but let's go through some actual examples of how to add to the SQLFluff code to address any issues you are seeing. In this I'm not going to explain about how to set up your Python development environment (see the Contributing to SQLFluff and the CONTRIBUTING.md file for that), nor how to manage Git (see our How to use Git guide if new to that, and we use the standard “Fork, and then open a PR” workflow common to GitHub projects).
So assuming you know (or are willing to follow above guides to find out!) how to set up Python environment, and commit via Git, how do you contribute a simple fix to a dialect for syntax you want SQLFluff to support?
If we look at issue #1520 it was raised to say we couldn't parse this:
CREATE OR REPLACE FUNCTION public.postgres_setof_test() RETURNS SETOF text
and instead returned this message:
Found unparsable section: 'CREATE OR REPLACE FUNCTION crw_public.po...'
This was in the postgres dialect, so I had a look at dialect_postgres.py and found the code in CreateFunctionStatementSegment which had the following:
parse_grammar = Sequence(
"CREATE",
Sequence("OR", "REPLACE", optional=True),
Ref("TemporaryGrammar", optional=True),
"FUNCTION",
Sequence("IF", "NOT", "EXISTS", optional=True),
Ref("FunctionNameSegment"),
Ref("FunctionParameterListGrammar"),
Sequence( # Optional function return type
"RETURNS",
OneOf(
Sequence(
"TABLE",
Bracketed(
Delimited(
OneOf(
Ref("DatatypeSegment"),
Sequence(
Ref("ParameterNameSegment"), Ref("DatatypeSegment")
),
),
delimiter=Ref("CommaSegment"),
)
),
optional=True,
),
Ref("DatatypeSegment"),
),
optional=True,
),
Ref("FunctionDefinitionGrammar"), )
So it allowed returning a table, or a datatype.
Fixing the issue was as simple as adding the SETOF structure as another return option:
parse_grammar = Sequence(
"CREATE",
Sequence("OR", "REPLACE", optional=True),
Ref("TemporaryGrammar", optional=True),
"FUNCTION",
Sequence("IF", "NOT", "EXISTS", optional=True),
Ref("FunctionNameSegment"),
Ref("FunctionParameterListGrammar"),
Sequence( # Optional function return type
"RETURNS",
OneOf(
Sequence(
"TABLE",
Bracketed(
Delimited(
OneOf(
Ref("DatatypeSegment"),
Sequence(
Ref("ParameterNameSegment"), Ref("DatatypeSegment")
),
),
delimiter=Ref("CommaSegment"),
)
),
optional=True,
),
Sequence(
"SETOF",
Ref("DatatypeSegment"),
),
Ref("DatatypeSegment"),
),
optional=True,
),
Ref("FunctionDefinitionGrammar"), )
With that code the above item could parse.
I added a test case (covered below) and submitted pull request #1522 to fix this.
If we look at issue #1537 it was raised to say we couldn't parse this:
select 1 from group
And threw this error:
==== parsing violations ==== L: 1 | P: 10 | PRS | Line 1, Position 10: Found unparsable section: 'from' L: 1 | P: 14 | PRS | Line 1, Position 14: Found unparsable section: ' group'
The reporter had also helpfully included the parse tree (produced by sqlfluff parse):
[L: 1, P: 1] |file: [L: 1, P: 1] | statement: [L: 1, P: 1] | select_statement: [L: 1, P: 1] | select_clause: [L: 1, P: 1] | keyword: 'select' [L: 1, P: 7] | [META] indent: [L: 1, P: 7] | whitespace: ' ' [L: 1, P: 8] | select_clause_element: [L: 1, P: 8] | literal: '1' [L: 1, P: 9] | whitespace: ' ' [L: 1, P: 10] | [META] dedent: [L: 1, P: 10] | from_clause: [L: 1, P: 10] | unparsable: !! Expected: 'FromClauseSegment' [L: 1, P: 10] | keyword: 'from' [L: 1, P: 14] | unparsable: !! Expected: 'Nothing...' [L: 1, P: 14] | whitespace: ' ' [L: 1, P: 15] | raw: 'group' [L: 1, P: 20] | newline: '\n'
So the problem was it couldn't parse the FromClauseSegment. Looking at that definition showed this:
FromClauseTerminatorGrammar=OneOf(
"WHERE",
"LIMIT",
"GROUP",
"ORDER",
"HAVING",
"QUALIFY",
"WINDOW",
Ref("SetOperatorSegment"),
Ref("WithNoSchemaBindingClauseSegment"), ),
So the parser was terminating as soon as it saw the GROUP and saying "hey we must have reached the end of the :code:`FROM` clause".
This was a little restrictive so changing that to this solved the problem:
FromClauseTerminatorGrammar=OneOf(
"WHERE",
"LIMIT",
Sequence("GROUP", "BY"),
Sequence("ORDER", "BY"),
"HAVING",
"QUALIFY",
"WINDOW",
Ref("SetOperatorSegment"),
Ref("WithNoSchemaBindingClauseSegment"), ),
You can see we simply replaced the "GROUP" by a Sequence("GROUP", "BY") so it would only match if both words were given. Rechecking the example with this changed code, showed it now parsed. We did the same for "ORDER", and also changed a few other places in the code with similar clauses and added a test case (covered below) and submitted pull request #1546 to fix this.
As an example of using the reference grammar to fix an existing SQLFluff grammar, pull request #4744 contributed the CREATE CAST / DROP CAST statements to SQLFluff from scratch for both ANSI and PostgreSQL dialects. The first step when contributing a new statement is to check whether the statement is part of the ANSI standard. If it is, then you very likely should first start by adding a generally vendor-neutral version to the SQLFluff ANSI dialect so that other dialects can inherit from it. Every database engine deviates from the ANSI standard in practice, but by adding a reasonably standard segment to the ANSI dialect, you'll probably do a reasonable thing for most other database dialects.
In this case, CREATE and DROP CAST were indeed defined in the ANSI standard <https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#_11_63_user_defined_cast_definition>, as quickly revealed by a quick search of the document:
<user-defined cast definition> ::=
CREATE CAST <left paren> <source data type> AS <target data type> <right paren>
WITH <cast function>
[ AS ASSIGNMENT ]
So the first step was to read this ANSI BNF grammar and use it to build a corresponding vendor-neutral CreateCastSegment in dialect_ansi.py.
class CreateCastStatementSegment(BaseSegment):
"""A `CREATE CAST` statement.
https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#_11_63_user_defined_cast_definition
"""
type = "create_cast_statement"
match_grammar: Matchable = Sequence(
"CREATE",
"CAST",
Bracketed(
Ref("DatatypeSegment"),
"AS",
Ref("DatatypeSegment"),
),
"WITH",
Ref.keyword("SPECIFIC", optional=True),
OneOf(
"ROUTINE",
"FUNCTION",
"PROCEDURE",
Sequence(
OneOf("INSTANCE", "STATIC", "CONSTRUCTOR", optional=True),
"METHOD",
),
),
Ref("FunctionNameSegment"),
Ref("FunctionParameterListGrammar", optional=True),
Sequence("FOR", Ref("ObjectReferenceSegment"), optional=True),
Sequence("AS", "ASSIGNMENT", optional=True),
) # Not shown: register the CreateCastStatementSegment in StatementSegment
As you work your way through the grammar, think about whether other parts of the SQL language might contain similar elements. For example, here we noticed that there are already segments we can reuse for data types, function names, and function parameter lists. This helped simplify our new grammar, as well as make it easy to centrally change those particular areas of the grammar in other dialects. Also consider whether there are entire new segments and grammars you should separately define in addition to the root statement segment you're writing. Introducing new and reusing existing segments adds structure to the SQLFluff parse tree that can make it easier for lint rules to analyze the tree. A strong indicator that there should be a shared segment or grammar is when the reference grammar has a symbol that is reused from multiple other symbols/statements.
After writing the ANSI segment (and corresponding tests), it was time to move on to the PostgreSQL grammar. In this case, a quick glance at the documentation shows us that there are some notable differences from ANSI SQL:
However, we should also consult the bison grammar for CREATE CAST. Bison grammars tend to be very lengthy and daunting, but the right techniques can help you quickly and easily find what you're looking for:
Examining the Bison grammar can take a few extra minutes, but it can be rewarding. You'll be surprised what you might learn. I've found entire alternate spellings of keywords in there that were not in the documentation, and which testing showed were indeed valid SQL! The grammar in PG documentation is human-maintained and not auto-generated, so there can be and are gaps between what is parsable and what is documented.
A good approach if you're still learning might be to draft a segment from the high-level documentation, and then systematically go through it with the bison grammar and verify it's correct (and that you're not forgetting anything).
One aspect of bison grammars to be aware of is that the tend to be very recursive, because it doesn't have the high-level constructs such as AnyOf, Delimited, Bracketed, and so on that SQLFluff provides. On the other hand, SQLFluff doesn't scale well with recursion. Sometimes it's unavoidable and reasonable in many cases (e.g. parenthesized expression) to refer to another segment recursively. But many times the recursion is extremely trivial, and should always be rewritten using an existing high-level SQLFluff concept. For example, this bison grammar defines a bracketed comma-delimited list which would be better represented using Bracketed and Delimited in SQLFluff:
func_args: '(' func_args_list ')' { $$ = $2; }
| '(' ')' { $$ = NIL; }
;
func_args_list:
func_arg { $$ = list_make1($1); }
| func_args_list ',' func_arg { $$ = lappend($1, $3); }
;
As an example of using the reference grammar to fix an existing SQLFluff grammar, issue #4336 reported that array slices were not being parsed correctly in PostgreSQL. A simple SELECT statement was given that I further simplified to the following test case:
SELECT a[2:2+3];
Obviously, we know that a simple query like SELECT a; would parse, so it's surely related to the array access. I started by looking up the bison grammar for PostgreSQL's SELECT statement and drilling down into it to find an array accessor symbol; searching for SelectStmt: proved to be a lucky guess to start with:
SelectStmt: select_no_parens %prec UMINUS
| select_with_parens %prec UMINUS
;
Drilling down into the grammar via SelectStmt --> select_no_parens --> simple_select --> target_list --> target_el show that we are dealing with an a_expr, which is the main symbol widely used to represent an expression throughout the grammar. SQLFluff implements that as ExpressionSegment (and more specifically Expression_A_Grammar). Looking further: target_el --> a_expr --> c_expr --> columnref. Which brings us to a key rule:
columnref: <snip>
| ColId indirection
{
$$ = makeColumnRef($1, $2, @1, yyscanner);
}
Digging into indirection, we finally find where the array accessor is happening:
indirection:
indirection_el { $$ = list_make1($1); }
| indirection indirection_el { $$ = lappend($1, $2); }
; indirection_el: <snip>
| '[' a_expr ']'
{
A_Indices *ai = makeNode(A_Indices);
ai->is_slice = false;
ai->lidx = NULL;
ai->uidx = $2;
$$ = (Node *) ai;
}
| '[' opt_slice_bound ':' opt_slice_bound ']'
{
A_Indices *ai = makeNode(A_Indices);
ai->is_slice = true;
ai->lidx = $2;
ai->uidx = $4;
$$ = (Node *) ai;
}
; opt_slice_bound:
a_expr { $$ = $1; }
| /*EMPTY*/ { $$ = NULL; }
;
From this we observe:
Now that we looked up the relevant PG grammar, we can dig into the corresponding SQLFluff grammar in a similar top-down way: postgres.SelectStatementSegment --> we see it's mostly a copy of the ANSI select statement, so --> ansi.SelectStatementSegment --> remember Ref always picks the dialect-specific grammar first --> postgres.SelectClauseSegment --> ansi.SelectClauseSegment.parse_grammar --> postgres.SelectClauseSegmentGrammar --> ansi.SelectClauseElementSegment --> ansi.BaseExpressionElementGrammar --> ansi.ExpressionSegment --> ansi.Expression_A_Grammar --> ansi.Expression_C_Grammar --> ansi.Expression_D_Grammar --> notice this at the end of the sequence --> postgres.Accessor_Grammar --> postgres.ArrayAccessorSegment. As you navigate, always remember to check for dialect-specific grammar before falling back to the inherited grammar (e.g. ANSI). Finally, we have found the part of the grammar that corresponds to the indirection_el in the bison grammar!
class ArrayAccessorSegment(ansi.ArrayAccessorSegment):
"""Overwrites Array Accessor in ANSI to allow n many consecutive brackets.
Postgres can also have array access like python [:2] or [2:] so
numbers on either side of the slice segment are optional.
"""
match_grammar = Sequence(
AnyNumberOf(
Bracketed(
Sequence(
OneOf(
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
),
Sequence(
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
optional=True,
),
Ref("SliceSegment"),
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
),
),
Sequence(
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
),
Ref("SliceSegment"),
OneOf(
Ref("QualifiedNumericLiteralSegment"),
Ref("NumericLiteralSegment"),
optional=True,
),
),
),
),
bracket_type="square",
)
)
)
Observing this, we can make a few observations. The most glaring are that:
At this point, it's a simple matter of simplifying & rewriting the grammar to fix these shortcomings and better align it with the bison grammar, which was done in pull request #4748.
So you've made your fix, you've tested it fixed the original problem so just submit that change, and all is good now?
Well, no. You want to do two further things:
To test your changes you'll need to have your environment set up (again see the CONTRIBUTING.md file for how to do that).
Adding a test case is simple. Just add a SQL file to test/fixtures/dialects/ in the appropriate dialect directory. You can either expand an existing SQL file test case (e.g. if adding something similar to what's in there) or create a new one.
I advise adding the original SQL raised in the issue, and if you have examples from the official syntax, then they are always good test cases to add as well. For example, the Snowflake documentation has an example section at the bottom of every syntax definition so just copy all them into your example file too.
You should also use the reference grammar to exhaustively test various pedantic combinations of syntax. It doesn't have to be runnable. It just needs to parse correctly into the right structure, and be a statement that can get past the parsing stage of the database engine. The documentation often includes more simple examples that might not reflect all the real-world possibilities. While referring to the reference documentation / bison grammar, try to come up with a statement that uses as much of the grammar as it can!
Be sure that you verify that the SQL statements in your test are actually parsable by the database engine! An easy way to do that is often to copy/paste the statement into the console and try running it, or use a CLI parsing tool that uses the same source code as the database engine (e.g. pgsql-parser). An error is ok (e.g. invalid column name), as long as it's not a syntax error from parsing. Check the reference section at the top of this document for dialect-specific resources.
In addition to the SQL files, we have auto-generated YAML counterparts for them. The YAML contains the parsed version of the SQL, and having these in our source code, allows us to easily see if they change, so if someone redefines a syntax, which changes how a SQL statement is parsed, then the SQL won't change but the parse tree does, so by having that in our source code, and so checking that in with any pull request, we can spot that and make sure we're comfortable the change is expected. For most cases (except adding new test cases obviously!) you would not expect unrelated YML files to change so this is a good check.
To regenerate all the YAML files when you add or edit any test fixture SQL files run the following command:
tox -e generate-fixture-yml
You can also do the following to only generate for a particular dialect, or only for new and changed files, which is often quicker:
tox -e generate-fixture-yml -- --dialect postgres tox -e generate-fixture-yml -- --new-only
It takes a few mins to run, and regenerates all the YAML files. You can then do a git status to see any differences.
When making changes, make sure to check the post-parse structure from the test output or from the associated YAML file: check that each query element is typed correctly. Typical bugs can be that a standalone keyword (such as INTERVAL) is parsed as a function name, or that an element that should be date_part is parsed as an identifier. Typically there is no need to write assertions by hand, but it's the developer's responsibility to verify the structure from auto-generated YAML. One should not assume that everything is working just because no parsing error is raised.
For the basic setup, see the local testing section of the CONTRIBUTING.md file first.
There's a few ways of running the test suite. You could just run the tox command, but this will run all the test suites, for various python versions, and with and without dbt, and take a long time. Best to leave that to our CI infrastructure. You just want to run what you need to have reasonable confidence before submitting.
The dialects_test is parametrized to automatically pick all files under test/fixtures/dialects/.
For example if you're adding or modifying dialects/hive/select_interval.sql, you can test that with:
tox -e py38 -- -s test/dialects/dialects_test.py -k hive-select_interval.sql
The -s flag for pytest enables printing of post-parse structure, which allows you to quickly check that each query element is typed correctly. Same can be seen in the generated fixture YAML file.
To run it a bit faster, you can invoke pytest directly (requires that you have activated the project venv):
pytest -s test/dialects/dialects_test.py -k hive-select_interval.sql
The following command runs just the dialect tests, for all dialects:
tox -e py38 -- test/dialects/dialects_test.py
The following command runs just the dialect tests, for a specific dialect:
tox -e py38 -- test/dialects/dialects_test.py -k ansi
Or, if making a dialect change to fix a rule that is incorrectly flagging, you can just run the tests for that one rule, for example to run the LT01 tests:
tox -e py38 -- -k LT01 test
For formatting and linting it's usually enough to rely on the pre-commit hook.
Run all tests (but only on one Python version, and without dbt):
tox -e py311
I like to kick that off just before opening a PR but does take ~10 minutes to run.
If you want also coverage & linting, run this instead (takes even more time):
tox -e generate-fixture-yml,cov-init,py311,cov-report,linting
Also it should be noted that the coverage tests require several versions to run (windows, and dbt) so can report missing coverage when run locally.
The rest can be left for the CI to check.
Regardless of what testing you do, GitHub will run the full regression suite when the PR is opened or updated. Note first time contributors will need a maintainer to kick off the tests until their first PR is merged.
These tools are run automatically by the pre-commit hook, but can also be run manually for those not using that.
We use ruff to lint our python code (being a linter ourselves we should have high quality code!). Our CI, or the tox commands above will run this and flag any errors.
In most cases running black on the python file(s) will correct any simple errors (e.g. line formatting) but for some you'll need to run ruff to see the issues and manually correct them.
We use the standard GitHub workflow so simply fork the repo, clone it locally, make the change, push it to your fork, then open a pull request back to the original SQLFluff repo. There’s lots more info in our How to use Git guide if you're new to Git.
Once you open the PR CI tests will run, and after 5-10mins should complete. If all green, then a maintainer will pick it up as soon as they can. Have a good, easy to understand, small PR with all the tests passing, makes it easier to review so more likely to be merged quickly.
Feel free to open up any issues on GitHub, or join the SQLFluff Slack for any quick questions to the community/maintainers.
Rules in SQLFluff are implemented as classes inheriting from BaseRule. SQLFluff crawls through the parse tree of a SQL file, calling the rule's _eval() function for each segment in the tree. For many rules, this allows the rule code to be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away.
The majority of the test cases for most bundled rules are "yaml test cases", i.e. test cases defined in yaml files. You can find those yaml fixtures on github. While this provides a very simple way to write tests, it can be occasionally tedious to run specific tests.
Within either a tox environment or virtualenv (as described in the contributing.md file), you can either run all of the rule yaml tests with:
pytest test/rules/yaml_test_cases_test.py -vv
...or to just run tests for a specific rule, there are two options for a syntax to select only those tests:
pytest -vv test/rules/ -k RF01
The -k option simply searches for the content of the argument being in the name of the test, which will match any single or combo tests for that rule. By convention, any test cases for a rule should include the code for that rule.
Some rules are a poor fit for the simple traversal pattern described above. Typical reasons include:
These rules can override BaseRule's recurse_into field, setting it to False. For these rules False, _eval() is only called once, with the root segment of the tree. This can be much more efficient, especially on large files. For example, see rules LT13 and LT12 , which only look at the beginning or end of the file, respectively.
By default, SQLFluff calls _eval() for all segments, even "unparsable" segments, i.e. segments that didn't match the parsing rules in the dialect. This causes issues for some rules. If so, setting _works_on_unparsable to False tells SQLFluff not to call _eval() for unparsable segments and their descendants.
These are other fields on BaseRule. Rules can override them.
needs_raw_stack defaults to False. Some rules use RuleContext.raw_stack property to access earlier segments in the traversal. This can be useful, but it adds significant overhead to the linting process. For this reason, it is disabled by default.
There are two phases of rule running.
1. The main phase is appropriate for most rules. These rules are assumed to interact and potentially cause a cascade of fixes requiring multiple passes. These rules run the runaway_limit number of times (default 10).
2. The post phase is for post-processing rules, not expected to trigger any downstream rules, e.g. capitalization fixes. They are run in a post-processing loop at the end. This loop is identical to the main loop, but is only run 2 times at the end (once to fix, and once again to confirm no remaining issues).
The two phases add complexity, but they also improve performance by allowing SQLFluff to run fewer rules during the main phase, which often runs several times.
NOTE: post rules also run on the first pass of the main phase so that any issues they find will be presented in the list of issues output by sqlfluff fix and sqlfluff lint.
SQLFluff is extensible through "plugins". We use the pluggy library to make linting Rules pluggable, which enable users to implement rules that are just too "organization specific" to be shared, or too platform specific to be included in the core library.
NOTE:
Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented, recommending that you update your plugin.
# The root module will need to import `hookimpl`, but
# should not yet import the rule definitions for the plugin.
from sqlfluff.core.plugin import hookimpl
@hookimpl
def get_rules():
# Rules should be imported within the `get_rules` method instead
from my_plugin.rules import MyRule
return [MyRule]
We have an example plugin in sqlfluff/plugins/sqlfluff-plugin-example which you can use as a template for rules, or the sqlfluff/plugins/sqlfluff-templater-dbt which you can use as a template for templater plugins.
Currently, only Rules and Templaters can be added through plugins. Over time we expect more elements of SQLFluff will be extensible with plugins. Each plugin can implement multiple Rules or Templaters.
We recommend that the name of a plugin should start with "sqlfluff-" to be clear on the purpose of your plugin.
A plugin may need to include a default configuration if its rules are configurable: use plugin default configurations only for that reason! We advise against overwriting core configurations by using a default plugin configuration, as there is no mechanism in place to enforce precedence between the core library configs and plugin configs, and multiple plugins could clash.
A plugin Rule class name should have the structure: "Rule_PluginName_L000". The 'L' can be any letter and is meant to categorize rules; you could use the letter 'S' to denote rules that enforce security checks for example.
An important thing to note when running custom implemented rules: Run pip install -e ., inside the plugin folder so custom rules in linting are included.
A plugin Rule code includes the PluginName, so a rule "Rule_L000" in core will have code "L000", while "Rule_PluginName_L000" will have code "PluginName_L000". Codes are used to display errors, they are also used as configuration keys.
We make it easy for plugin developers to test their rules by exposing a testing library in sqlfluff.utils.testing.
Would you like to have other parts of SQLFluff be "pluggable"? Tell us about it in a GitHub issue 😄.
Contributing to the docs is one of the easiest and most helpful ways to help the project. Documentation changes require relatively little specialist knowledge apart from being familiar with how to use SQLFluff and the docs are read by a very wide range of people.
Documentation takes two forms:
The two are somewhat blurred by the use of autodoc (and some other custom integrations), where documentation is generated directly off docstrings within the codebase, for example the Rules Reference, CLI Reference and Dialects Reference. To understand more about how the custom integrations we use to generate these docs, see the generate-auto-docs.py file.
Embedded documentation of functions, classes and modules is most useful for developer-focussed documentation as it's most accessible in the places which those developers are working: directly in the codebase. We enforce that docstrings are present and correctly formatted using the pydocstyle rules for ruff, which we have configured to enforce the google style of docstrings.
The main documentation (which you're reading now), is build using sphinx, and written using reStructuredText (files ending with .rst). The sphinx project offers a reStructuredText primer for people who are new to the syntax (and the SQLFluff project uses doc8 in the CI process to try and catch any issues early).
On top of those docs, there are a few areas worth highlighting for new (or returning) users, which are either specific to the SQLFluff project, or not particularly clear in the sphinx docs:
SQLFluff accepts configuration either through the command line or through configuration files. There is rough parity between the two approaches with the exception that templating configuration must be done via a file, because it otherwise gets slightly complicated.
For details of what's available on the command line check out the CLI Reference.
For file based configuration SQLFluff will look for the following files in order. Later files will (if found) will be used to overwrite any values read from earlier files.
Within these files, the first four will be read like a cfg file, and SQLFluff will look for sections which start with sqlfluff, and where subsections are delimited by a semicolon. For example the jinjacontext section will be indicated in the section started with [sqlfluff:jinjacontext].
For example, a snippet from a .sqlfluff file (as well as any of the supported cfg file types):
[sqlfluff] templater = jinja sql_file_exts = .sql,.sql.j2,.dml,.ddl [sqlfluff:indentation] indented_joins = False indented_using_on = True template_blocks_indent = False [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True
For the pyproject.toml file, all valid sections start with tool.sqlfluff and subsections are delimited by a dot. For example the jinjacontext section will be indicated in the section started with [tool.sqlfluff.jinjacontext].
For example, a snippet from a pyproject.toml file:
[tool.sqlfluff.core] templater = "jinja" sql_file_exts = ".sql,.sql.j2,.dml,.ddl" [tool.sqlfluff.indentation] indented_joins = false indented_using_on = true template_blocks_indent = false [tool.sqlfluff.templater] unwrap_wrapped_queries = true [tool.sqlfluff.templater.jinja] apply_dbt_builtins = true # For rule specific configuration, use dots between the names exactly # as you would in .sqlfluff. In the background, SQLFluff will unpack the # configuration paths accordingly. [tool.sqlfluff.rules.capitalisation.keywords] capitalisation_policy = "upper"
When setting up a new project with SQLFluff, we recommend keeping your configuration file fairly minimal. The config file should act as a form of documentation for your team i.e. a record of what decisions you've made which govern how your format your SQL. By having a more concise config file, and only defining config settings where they differ from the defaults - you are more clearly stating to your team what choices you've made.
However, there are also a few places where the default configuration is designed more for existing projects, rather than fresh projects, and so there is an opportunity to be a little stricter than you might otherwise be with an existing codebase.
Here is a simple configuration file which would be suitable for a starter project:
[sqlfluff] # Supported dialects https://docs.sqlfluff.com/en/stable/perma/dialects.html # Or run 'sqlfluff dialects' dialect = snowflake # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to exclude, or None # See https://docs.sqlfluff.com/en/stable/perma/rule_disabling.html # AM04 (ambiguous.column_count) and ST06 (structure.column_order) are # two of the more controversial rules included to illustrate usage. exclude_rules = ambiguous.column_count, structure.column_order # The standard max_line_length is 80 in line with the convention of # other tools and several style guides. Many projects however prefer # something a little longer. # Set to zero or negative to disable checks. max_line_length = 120 # CPU processes to use while linting. # The default is "single threaded" to allow easy debugging, but this # is often undesirable at scale. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = -1 # If using the dbt templater, we recommend setting the project dir. [sqlfluff:templater:dbt] project_dir = ./ [sqlfluff:indentation] # While implicit indents are not enabled by default. Many of the # SQLFluff maintainers do use them in their projects. allow_implicit_indents = True [sqlfluff:rules:aliasing.length] min_alias_length = 3 # The default configuration for capitalisation rules is "consistent" # which will auto-detect the setting from the rest of the file. This # is less desirable in a new project and you may find this (slightly # more strict) setting more useful. # Typically we find users rely on syntax highlighting rather than # capitalisation to distinguish between keywords and identifiers. # Clearly, if your organisation has already settled on uppercase # formatting for any of these syntax elements then set them to "upper". # See https://stackoverflow.com/questions/608196/why-should-i-capitalize-my-sql-keywords-is-there-a-good-reason [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower [sqlfluff:rules:capitalisation.identifiers] extended_capitalisation_policy = lower [sqlfluff:rules:capitalisation.functions] extended_capitalisation_policy = lower [sqlfluff:rules:capitalisation.literals] capitalisation_policy = lower [sqlfluff:rules:capitalisation.types] extended_capitalisation_policy = lower # The default configuration for the not equal convention rule is "consistent" # which will auto-detect the setting from the rest of the file. This # is less desirable in a new project and you may find this (slightly # more strict) setting more useful. [sqlfluff:rules:convention.not_equal] # Default to preferring the "c_style" (i.e. `!=`) preferred_not_equal_style = c_style
SQLFluff uses nesting in its configuration files, with files closer overriding (or patching, if you will) values from other files. That means you'll end up with a final config which will be a patchwork of all the values from the config files loaded up to that path. The exception to this is the value for templater, which cannot be set in config files in subdirectories of the working directory. You don't need any config files to be present to make SQLFluff work. If you do want to override any values though SQLFluff will use files in the following locations in order, with values from later steps overriding those from earlier:
This whole structure leads to efficient configuration, in particular in projects which utilise a lot of complicated templating.
In addition to configuration files mentioned above, SQLFluff also supports comment based configuration switching in files. This allows specific SQL file to modify a default configuration if they have specific needs.
When used, these apply to the whole file, and are parsed from the file in an initial step before the rest of the file is properly parsed. This means they can be used for both rule configuration and also for parsing configuration.
To use these, the syntax must start as an inline sql comment beginning with sqlfluff (i.e. -- sqlfluff). The line is then interpreted as a colon-separated address of the configuration value you wish to set. A few common examples are shown below:
-- Set Indented Joins -- sqlfluff:indentation:indented_joins:True -- Set a smaller indent for this file -- sqlfluff:indentation:tab_space_size:2 -- Set keywords to be capitalised -- sqlfluff:rules:capitalisation.keywords:capitalisation_policy:upper SELECT * FROM a
JOIN b USING(c)
We recommend only using this configuration approach for configuration that applies to one file in isolation. For configuration changes for areas of a project or for whole projects we recommend Nesting of configuration files.
This syntax is very similar to the method for Ignoring individual lines.
Rules can be configured with the .sqlfluff config files.
Common rule configurations can be set in the [sqlfluff:rules] section.
For example:
[sqlfluff:rules] allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all
Rule specific configurations are set in rule specific subsections.
For example, enforce that keywords are upper case by configuring the rule CP01:
[sqlfluff:rules:capitalisation.keywords] # Keywords capitalisation_policy = upper
All possible options for rule sections are documented in Rules Reference.
For an overview of the most common rule configurations that you may want to tweak, see Default Configuration (and use Rules Reference to find the available alternatives).
The decision as to which rules are applied to a given file is applied on a file by file basis, by the effective configuration for that file. There are two configuration values which you can use to set this:
Each of these two configuration values accept a comma separated list of references. Each of those references can be:
These different references can be mixed within a given expression, which results in a very powerful syntax for selecting exactly which rules are active for a given file.
NOTE:
When considering configuration inheritance, each of rules and exclude_rules will totally overwrite any values in parent config files if they are set in a child file. While the subtraction operation between both of them is calculated "per file", there is no combination operation between two definitions of rules (just one overwrites the other).
The effect of this is that we recommend one of two approaches:
For example, to disable the rules LT08 and RF02:
[sqlfluff] exclude_rules = LT08, RF02
To enable individual rules, configure rules, respectively.
For example, to enable RF02:
[sqlfluff] rules = RF02
Rules can also be enabled/disabled by their grouping. Right now, the only rule grouping is core. This will enable (or disable) a select group of rules that have been deemed 'core rules'.
[sqlfluff] rules = core
More information about 'core rules' can be found in the Rules Reference.
Additionally, some rules have a special force_enable configuration option, which allows to enable the given rule even for dialects where it is disabled by default. The rules that support this can be found in the Rules Reference.
The default values can be seen in Default Configuration.
See Ignoring Errors & Files for more information on how to turn ignore particular rules for specific lines, sections or files.
To keep displaying violations for specific rules, but not have those issues lead to a failed run, rules can be downgraded to warnings. Rules set as warnings won't cause a file to fail, but will still be shown in the CLI to warn users of their presence.
The configuration of this behaves very like exclude_rules above:
[sqlfluff] warnings = LT01, LT04
With this configuration, files with no other issues (other than those set to warn) will pass. If there are still other issues, then the file will still fail, but will show both warnings and failures.
== [test.sql] PASS L: 2 | P: 9 | LT01 | WARNING: Missing whitespace before + == [test2.sql] FAIL L: 2 | P: 8 | CP02 | Unquoted identifiers must be consistently upper case. L: 2 | P: 11 | LT01 | WARNING: Missing whitespace before +
This is particularly useful as a transitional tool when considering the introduction on new rules on a project where you might want to make users aware of issues without blocking their workflow (yet).
You can use either rule code or rule name for this setting.
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. To understand more about this section, see the section of the docs dedicated to layout: Configuring Layout.
If there is one part of building a linter that is going to be controversial it's going to be whitespace (closely followed by cApiTaLiSaTiOn 😁).
More specifically, whitespace divides into three key themes:
SQLFluff aims to be opinionated on this theme, but also configurable (see Configuring Layout). The tool will have a default viewpoint and will aim to have views on all of the important aspects of SQL layout, but if you (or your organisation) don't like those views then we aim to allow enough configuration that you can lint in line with your views, and still use SQLFluff. For more information on how to configure rules to your own viewpoint see Configuration.
NOTE:
Of the different elements of whitespace, spacing is likely the least controversial. By default, all elements are separated by a single space character. Except for very specific circumstances (see section on Aligned elements), any additional space between elements is usually unwanted and a distraction for the reader. There are however several common cases where no whitespace is more appropriate, which fall into two cases (for more details on where to configure these see Configuring layout and spacing).
SELECT
col_a
, col_b
-- Newline present before column
, col_c
-- When inline, comma should still touch element before.
, GREATEST(col_d, col_e) as col_f FROM tbl_a
A special case of spacing is where elements are set to be aligned within some limits. This is not enabled by default, but can be be configured to achieve layouts like:
SELECT
a AS first_column,
b AS second_column,
(a + b) / 2 AS third_column FROM foo AS bar
In this example, the alias expressions are all aligned with each other. To configure this, SQLFluff needs to know what elements to align and how far to search to find elements which should be aligned with each other. The configuration to achieve this layout is:
[sqlfluff:layout:type:alias_expression] # We want non-default spacing _before_ the alias expressions. spacing_before = align # We want to align them within the next outer select clause. # This means for example that alias expressions within the FROM # or JOIN clause would _not_ be aligned with them. align_within = select_clause # The point at which to stop searching outward for siblings, which # in this example would likely be the boundary of a CTE. Stopping # when we hit brackets is usually a good rule of thumb for this # configuration. align_scope = bracketed
Of these configuration values, the align_scope is potentially the least obvious. The following example illustrates the impact it has.
-- With -- align_scope = bracketed -- align_within = select_clause WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar; -- With -- align_scope = bracketed -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- Now the FROM alias is also aligned. -- With -- align_scope = file -- align_within = select_clause WITH foo as (
SELECT
a,
b,
c AS first_column -- Now the aliases here are aligned
d + e AS second_column -- with the outer query. ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- With -- align_scope = file -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar
When controlling line breaks, we are trying to achieve a few different things:
Lastly, given we have multiple lines of SQL, to what extent should we indent some lines to provide visual cues to the structure of that SQL. It's important to note that SQL is not whitespace sensitive in its interpretation and that means that any principles we apply here are entirely for the benefit of humans. Your database doesn't care.
The indentation therefore should be treated as a hint to the reader of the structure of the code. This explains the common practice within most languages that nested elements (for example the contents of a set of brackets in a function call) should be indented one step from the outer elements. It's also convention that elements with the same level in a nested structure should have the same indentation, at least with regards to their local surroundings. As an example:
SELECT
nested_within_select AS first_column,
some_function(
nested_within_function,
also_nested_within_function
) AS indented_the_same_as_opening_bracket FROM indented_the_same_as_select
NOTE:
Comments are dealt with differently, depending on whether they're block comments (/* like this */), which might optionally include newlines, or inline comments (-- like this) which are necessarily only on one line.
SELECT
/* This is a block comment starting on a new line
which contains a newline (continuing with at least
the same indent.
- potentially containing greater indents
- having no other code following it in the same line
- and aligned with the line of code following it */
this_column as what_we_align_the_column_to FROM my_table
SELECT
-- This is fine
this_column as what_we_align_to,
another_column as something_short, -- Is ok
case
-- This is aligned correctly with below
when indented then take_care
else try_harder
end as the_general_guidance -- Even here we align with the line below FROM my_table
NOTE:
One approach to indenting nested elements is a layout called a hanging indent. In this layout, there is no line break before the first nested element, but subsequent elements are indented to match the line position of that first element. Two examples might be:
-- A select statement with two hanging indents: SELECT no_line_break_before_me,
indented_to_match_the_first,
1 + (a
+ b) AS another_more_complex_example FROM my_table; -- This TSQL example is also in essence a hanging indent: DECLARE @prv_qtr_1st_dt DATETIME,
@last_qtr INT,
@last_qtr_first_mn INT,
@last_qtr_yr INT;
In some circumstances this layout can be quite neat (the DECLARE statement is a good example of this), however once indents are nested or indentation styles are mixed it can rapidly become confusing (as partially shown in the first example). Additionally, unless the leading element of the first line is very short, hanging indents use much larger indents than a traditional simple indent where a line break is used before the first element.
Hanging indents have been supported in SQLFluff up to the 1.x versions, however they will no longer by supported from 2.0.0 onwards. This is due to the ambiguity which they bring to fixing poorly formatted SQL. Take the following code:
SELECT this_is, badly_formatted, code_and,
not_obvious,
what_was, intended FROM my_table
Given the lack of line break between SELECT and this_is, it would appear that the user is intending a hanging indent, however it is also plausible that they did not and they just forgot to add a line break between them. This ambiguity is unhelpful, both for SQLFluff as a tool, but also for people who write SQL that there two ways of indenting their SQL. Given SQLFluff aims to provide consistency in SQL layout and remove some of the burden of needing to make choices like this - and that it would be very unusual to keep only hanging indents and disable traditional ones - the only route left to consistency is to not allow hanging indents. Starting in 2.0.0, any hanging indents detected will be converted to traditional indents.
A close cousin of the hanging indent is the implicit indent. While it does look a little like a hanging indent, it's much more consistent in its behaviour and is supported from SQLFluff 2.0.0 onwards.
An implicit indent is exactly like a normal indent, but doesn't have to be actually taken to influence the indentation of lines after it - it just needs to be left un-closed before the end of the line. These are normally available in clauses which take the form of KEYWORD <expression>, like WHERE clauses or CASE expressions.
-- This WHERE clause here takes advantage of an implicit indent. SELECT * FROM my_table WHERE condition_a
AND condition_b; -- With implicit indents disabled (which is currently the -- default), the above formulation is not allowed, and instead -- there should be a newline immediately after `WHERE` (which -- is the location of the _implicit_ indent). SELECT * FROM my_table WHERE
condition_a
AND condition_b;
When addressing both indentation and line-length, implicit indents allow a slightly more compact layout, without significant drawbacks in legibility. They also enable a style much closer to some established style guides.
They are however not recommended by many of the major style guides at time of writing (including the dbt Labs SQL style guide and the Mozilla SQL style guide), and so are disabled by default. To enable them, set the allow_implicit_indents flag in sqluff.indentation to True.
SQLFluff supports templated elements in code, such as those offered by jinja2 (or dbt which relies on it). For simple cases, templated elements are handled as you would expect by introducing additional indents into the layout.
SELECT
a,
{% for n in ['b', 'c', 'd'] %}
-- This section is indented relative to 'a' because
-- it is inside a jinja for loop.
{{ n }},
{% endfor %}
e FROM my_table
This functionality can be turned off if you wish using the template_blocks_indent option in your Configuration.
It's important to note here, that SQLFluff lints the code after it has been rendered, and so only has access to code which is still present after that process.
SELECT
a,
{% if False %}
-- This section of the code cannot be linted because
-- it is never rendered due to the `if False` condition.
my + poorly
+ spaced - and/indented AS section_of_code
{% endif %}
e FROM my_table
More complex templated cases are usually characterised by templated tags cutting across the parse tree. This more formally is where the opening and closing tags of a templated section exist at different levels in the parsed structure. Starting in version 2.x, these will be treated differently (Prior to version 2.x, situations like this were sometimes handled inconsistently or incorrectly).
Indentation should act as a visual cue to the structure of the written SQL, and as such, the most important thing is that template tags belonging to the same block structure use the same indentation. In the example below, this is the opening and closing elements of the second if statement. If treated as a simple case, these tags would have different indents, because they are at different levels of the parse tree and so clearly there is a conflict to be resolved.
The view SQLFluff takes on how to resolve this conflict is to pull all of the tags in this section down to the indent of the least indented (in the example below that would be the closing endif tag). This is similar to the treatment of C Preprocessor Directives, which are treated somewhat as being outside the structure of the rest of the file. In these cases, the content is also not further indented as in the simple case because it makes it harder to line up elements within the affected section and outside (in the example below the SELECT and FROM are a good illustration).
SELECT
a,
{% if True %}
-- This is a simple case. The opening and closing tag are
-- both at the same level within the SELECT clause.
simple_case AS example,
{% endif %}
b, {% if True %}
-- This is a complex case. The opening tag is within the SELECT
-- clause, but the closing tag is outside the statement
-- entirely.
complex_case AS example FROM table_option_one {% else %}
complex_case_two AS example FROM table_option_two {% endif %}
Configuration for layout is spread across three places:
One of the key areas for this is the indentation of the JOIN expression, which we'll use as an example.
Semantically, a JOIN expression is part of the FROM expression and therefore would be expected to be indented. However according to many of the most common SQL style guides (including the dbt Labs SQL style guide and the Mozilla SQL style guide) the JOIN keyword is expected to at the same indent as the FROM keyword. By default, SQLFluff sides with the current consensus, which is to not indent the JOIN keyword, however this is one element which is configurable.
By setting values in the sqlfluff:indentation section of your config file you can control how this is parsed.
For example, the default indentation would be as follows:
SELECT
a,
b FROM my_table JOIN another_table
ON
condition1
AND condition2
By setting your config file to:
[sqlfluff:indentation] indented_joins = True
Then the expected indentation will be:
SELECT
a,
b FROM my_table
JOIN another_table
ON
condition1
AND condition2
There is a similar indented_using_on config (defaulted to True) which can be set to False to prevent the USING or ON clause from being indented, in which case the original SQL would become:
SELECT
a,
b FROM my_table JOIN another_table ON
condition1
AND condition2
It's worth noting at this point, that for some users, the additional line break after ON is unexpected, and this is a good example of an implicit indent. By setting your config to:
[sqlfluff:indentation] indented_using_on = False allow_implicit_indents = True
Then the expected indentation will be:
SELECT
a,
b FROM my_table JOIN another_table ON condition1
AND condition2
There is also a similar indented_on_contents config (defaulted to True) which can be set to False to align any AND subsections of an ON block with each other. If set to False (assuming implicit indents are still enabled) the original SQL would become:
SELECT
a,
b FROM my_table JOIN another_table
ON condition1
AND condition2
These can also be combined, so if indented_using_on config is set to False, indented_on_contents is also set to False, and allow_implicit_indents is set tot True then the SQL would become:
SELECT
a,
b FROM my_table JOIN another_table ON condition1 AND condition2
There is also a similar indented_ctes config (defaulted to False) which can be set to True to enforce CTEs to be indented within the WITH clause:
WITH
some_cte AS (
SELECT 1 FROM table1
),
some_other_cte AS (
SELECT 1 FROM table1
) SELECT 1 FROM some_cte
There is also a similar indented_then config (defaulted to True) which can be set to False to allow THEN without an indent after WHEN:
SELECT
a,
CASE
WHEN b >= 42 THEN
1
ELSE 0
END AS c FROM some_table
By default, SQLFluff aims to follow the most common approach to indentation. However, if you have other versions of indentation which are supported by published style guides, then please submit an issue on GitHub to have that variation supported by SQLFluff.
The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. The syntax of this section is very expressive; however in normal use, only very small alterations should be necessary from the Default Configuration.
The syntax of the section headings here select by type, which corresponds to the type defined in the dialect. For example the following section applies to elements of the type comma, i.e. ,.
[sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing
Within these configurable sections there are a few key elements which are available:
[sqlfluff:layout:type:comma] line_position = leading
-- Setting line_position to just `alone` -- within [sqlfluff:layout:type:set_operator] -- would not allow: SELECT a UNION SELECT b; -- ...or... SELECT a UNION SELECT b; -- but *would* allow both of the following: SELECT a UNION SELECT b; SELECT a UNION SELECT b; -- However the default is set to `alone:strict` -- then the *only* acceptable configuration is: SELECT a UNION SELECT b;
[sqlfluff:layout:type:where_clause] keyword_line_position = alone
[sqlfluff:layout:type:join_on_condition] keyword_line_position = trailing
[sqlfluff:layout:type:partitionby_clause] keyword_line_position = leading
[sqlfluff:layout:type:where_clause] keyword_line_position = none [sqlfluff:layout:type:orderby_clause] keyword_line_position = none [sqlfluff:layout:type:groupby_clause] keyword_line_position = none [sqlfluff:layout:type:having_clause] keyword_line_position = none
This section explains how to configure templating for SQL files.
When writing SQL files, users might utilise some kind of templating. The SQL file itself is written with placeholders which get rendered to proper SQL at run time. This can range from very simple placeholder templating to complex Jinja templating.
SQLFluff supports templated sections in SQL, see Stage 1, the templater. This is achieved by the following set of operations:
SQLFluff does not automatically have access to the same environment used in production template setup. This means it is necessary to either provide that environment or provide dummy values to effectively render the template and generate valid SQL. Refer to the templater sections below for details.
SQLFluff natively supports the following templating engines
Also, SQLFluff has an integration to use dbt as a templater.
NOTE:
For example, if the raw SQL uses a {% if condition %} block, the rendered version of the template will only include either the {% then %} or the {% else %} block (depending on the provided configuration for the templater), but not both.
In this case, because SQLFluff linting can only operate on the output of the templater, some areas of the raw SQL will never be seen by the linter and will not be covered by lint rules.
This is functionality we hope to support in future.
The Jinja templater uses Jinja2 to render templates.
There are multiple, complementary ways of configuring the Jinja templater.
| Configuration | Variables | Macros | Filters | Documentation |
| Config file | ✅ | ✅ | ❌ | Complex Jinja Variable Templating and Jinja Macro Templating (from config) |
| Macro Path | ❌ | ✅ | ❌ | Jinja Macro Templating (from file) |
| Library | ✅ | ✅ | ✅ | Library Templating |
For example, a snippet from a .sqlfluff file that uses all config options:
[sqlfluff]
templater = jinja
[sqlfluff:templater:jinja]
apply_dbt_builtins = True
load_macros_from_path = my_macros
loader_search_path = included_templates
library_path = sqlfluff_libs
exclude_macros_from_path = my_macros_exclude
[sqlfluff:templater:jinja:context]
my_list = ['a', 'b', 'c']
MY_LIST = ("d", "e", "f")
my_where_dict = {"field_1": 1, "field_2": 2}
[sqlfluff:templater:jinja:macros]
a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}
Apart from the Generic variable templating that is supported for all templaters, two more advanced features of variable templating are available for Jinja.
case sensitivity and native python types. Both are illustrated in the following example:
[sqlfluff:templater:jinja:context]
my_list = ['a', 'b', 'c']
MY_LIST = ("d", "e", "f")
my_where_dict = {"field_1": 1, "field_2": 2}
SELECT
{% for elem in MY_LIST %}
'{{elem}}' {% if not loop.last %}||{% endif %}
{% endfor %} as concatenated_list FROM tbl WHERE
{% for field, value in my_where_dict.items() %}
{{field}} = {{value}} {% if not loop.last %}and{% endif %}
{% endfor %}
...will render as...
SELECT
'd' || 'e' || 'f' as concatenated_list FROM tbl WHERE
field_1 = 1 and field_2 = 2
Note that the variable was replaced in a case sensitive way and that the settings in the config file were interpreted as native python types.
Macros (which also look and feel like functions are available only in the jinja templater. Similar to Generic Variable Templating, these are specified in config files, what's different in this case is how they are named. Similar to the context section above, macros are configured separately in the macros section of the config. Consider the following example.
If passed the following .sql file:
SELECT {{ my_macro(6) }} FROM some_table
...and the following configuration in .sqlfluff in the same directory (note the tight control of whitespace):
[sqlfluff:templater:jinja:macros]
a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}
...then before parsing, the sql will be transformed to:
SELECT 6 + 12 FROM some_table
Note that in the code block above, the variable name in the config is a_macro_def, and this isn't apparently otherwise used anywhere else. Broadly this is accurate, however within the configuration loader this will still be used to overwrite previous values in other config files. As such this introduces the idea of config blocks which could be selectively overwritten by other configuration files downstream as required.
In addition to macros specified in the config file, macros can also be loaded from files or folders. This is specified in the config file:
[sqlfluff:templater:jinja] load_macros_from_path = my_macros,other_macros
load_macros_from_path is a comma-separated list of .sql files or folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for macros in the folders /home/my_project/my_macros/ and /home/my_project/other_macros/, including any of their subfolders. Any macros defined in the config will always take precedence over a macro defined in the path.
exclude_macros_from_path works in the same manner as load_macros_from_path but allows you to have sqlfluff ignore certain macros. This can be useful if you have custom jinja tags.
Macros loaded from these files are available in every .sql file without requiring a Jinja include or import. They are loaded into the Jinja Global Namespace.
Note: The load_macros_from_path setting also defines the search path for Jinja include or import. As with loaded macros, subdirectories are also supported. For example, if load_macros_from_path is set to my_macros, and there is a file my_macros/subdir/my_file.sql, you can do:
{% include 'subdir/my_file.sql' %}
If you would like to define the Jinja search path without also loading the macros into the global namespace, use the loader_search_path setting instead.
NOTE:
REMEMBER: The reason SQLFluff supports macros is to enable it to parse templated sql without it being a blocker. It shouldn't be a requirement that the templating is accurate - it only needs to work well enough that parsing and linting are helpful.
One of the main use cases which inspired SQLFluff as a project was dbt. It uses jinja templating extensively and leads to some users maintaining large repositories of sql files which could potentially benefit from some linting.
NOTE:
To use the dbt templater, go to dbt templater.
SQLFluff anticipates this use case and provides some built in macro blocks in the Default Configuration which assist in getting started with dbt projects. In particular it provides mock objects for:
NOTE:
If using SQLFluff with jinja as your templater, you may have library function calls within your sql files that can not be templated via the normal macro templating mechanisms:
SELECT foo, bar FROM baz {{ dbt_utils.group_by(2) }}
To template these libraries, you can use the sqlfluff:jinja:library_path config option:
[sqlfluff:templater:jinja] library_path = sqlfluff_libs
This will pull in any python modules from that directory and allow sqlfluff to use them in templates. In the above example, you might define a file at sqlfluff_libs/dbt_utils.py as:
def group_by(n):
return "GROUP BY 1,2"
If an __init__.py is detected, it will be loaded alongside any modules and submodules found within the library path.
SELECT
{{ custom_sum('foo', 'bar') }},
{{ foo.bar.another_sum('foo', 'bar') }} FROM
baz
sqlfluff_libs/__init__.py:
def custom_sum(a: str, b: str) -> str:
return a + b
sqlfluff_libs/foo/__init__.py:
# empty file
sqlfluff_libs/foo/bar.py:
def another_sum(a: str, b: str) -> str:
return a + b
Additionally, the library can be used to expose Jinja Filters to the Jinja environment used by SQLFluff.
This is achieve by setting a global variable named SQLFLUFF_JINJA_FILTERS. SQLFLUFF_JINJA_FILTERS is a dictionary where
For example, to make the Airflow filter ds available to SQLFLuff, add the following to the __init__.py of the library:
# https://github.com/apache/airflow/blob/main/airflow/templates.py#L53 def ds_filter(value: datetime.date | datetime.time | None) -> str | None:
"""Date filter."""
if value is None:
return None
return value.strftime("%Y-%m-%d") SQLFLUFF_JINJA_FILTERS = {"ds": ds_filter}
Now, ds can be used in SQL
SELECT "{{ "2000-01-01" | ds }}";
The Jinja environment can be configured to search for files included with include or import in a list of folders. This is specified in the config file:
[sqlfluff:templater:jinja] loader_search_path = included_templates,other_templates
loader_search_path is a comma-separated list of folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for included files in the folders /home/my_project/included_templates/ and /home/my_project/other_templates/, including any of their subfolders. For example, this will read from /home/my_project/included_templates/my_template.sql:
{% include 'included_templates/my_template.sql' %}
Any folders specified in the load_macros_from_path setting are automatically appended to the loader_search_path. It is not necessary to specify a given directory in both settings.
Unlike the load_macros_from_path setting, any macros within these folders are not automatically loaded into the global namespace. They must be explicitly imported using the import Jinja directive. If you would like macros to be automatically included in the global Jinja namespace, use the load_macros_from_path setting instead.
Ignoring Jinja templating errors provides a way for users to use SQLFluff while reducing or avoiding the need to spend a lot of time adding variables to [sqlfluff:templater:jinja:context].
When --ignore=templating is enabled, the Jinja templater behaves a bit differently. This additional behavior is usually but not always helpful for making the file at least partially parsable and fixable. It definitely doesn’t guarantee that every file can be fixed, but it’s proven useful for some users.
Here's how it works:
For example:
select {{ my_variable }}
from {% include "my_table.sql" %}
is interpreted as:
select my_variable from my_table
The values provided by the Jinja templater act a bit (not exactly) like a mixture of several types:
Because the values behave like Undefined, it's possible to replace them using Jinja's default() filter. For example:
select {{ my_variable | default("col_a") }}
from my_table
is interpreted as:
select col_a from my_table
Libraries such as SQLAlchemy or Psycopg use different parameter placeholder styles to mark where a parameter has to be inserted in the query.
For example a query in SQLAlchemy can look like this:
SELECT * FROM table WHERE id = :myid
At runtime :myid will be replace by a value provided by the application and escaped as needed, but this is not standard SQL and cannot be parsed as is.
In order to parse these queries is then necessary to replace these placeholders with sample values, and this is done with the placeholder templater.
Placeholder templating can be enabled in the config using:
[sqlfluff] templater = placeholder
A few common styles are supported:
-- colon
WHERE bla = :my_name
-- colon_nospaces
-- (use with caution as more prone to false positives)
WHERE bla = table:my_name
-- colon_optional_quotes
SELECT :"column" FROM :table WHERE bla = :'my_name'
-- numeric_colon
WHERE bla = :2
-- pyformat
WHERE bla = %(my_name)s
-- dollar
WHERE bla = $my_name or WHERE bla = ${my_name}
-- question_mark
WHERE bla = ?
-- numeric_dollar
WHERE bla = $3 or WHERE bla = ${3}
-- percent
WHERE bla = %s
-- ampersand
WHERE bla = &s or WHERE bla = &{s} or USE DATABASE MARK_{ENV}
These can be configured by setting param_style to the names above:
[sqlfluff:templater:placeholder] param_style = colon my_name = 'john'
then you can set sample values for each parameter, like my_name above. Notice that the value needs to be escaped as it will be replaced as a string during parsing. When the sample values aren't provided, the templater will use parameter names themselves by default.
When parameters are positional, like question_mark, then their name is simply the order in which they appear, starting with 1.
[sqlfluff:templater:placeholder] param_style = question_mark 1 = 'john'
In case you need a parameter style different from the ones above, you can pass a custom regex.
[sqlfluff:templater:placeholder] param_regex = __(?P<param_name>[\w_]+)__ my_name = 'john'
N.B. quotes around param_regex in the config are interpreted literally by the templater. e.g. param_regex='__(?P<param_name>[w_]+)__' matches '__some_param__' not __some_param__
the named parameter param_name will be used as the key to replace, if missing, the parameter is assumed to be positional and numbers are used instead.
Also consider making a pull request to the project to have your style added, it may be useful to other people and simplify your configuration.
Uses native Python f-strings. As described in Generic Variable Templating, an example usage would look be configured as follows:
If passed the following .sql file:
SELECT * FROM {tbl_name}
...and the following configuration in .sqlfluff in the same directory:
[sqlfluff] templater = python [sqlfluff:templater:python:context] tbl_name = my_table
...then before parsing, the sql will be transformed to:
SELECT * FROM my_table
Python string formatting supports accessing object attributes via dot notation (e.g. {foo.bar}). However, since we cannot create Python objects within configuration files, we need a workaround in order to provide dummy values to render templates containing these values. The SQLFluff python templater will interpret any variable containing a "." as a dictionary lookup on the magic fixed context key sqlfluff.
-- this SQL
SELECT * FROM {foo.bar}
-- becomes this
SELECT * FROM {sqlfluff["foo.bar"]}
..which can be populated using the following configuration:
[sqlfluff:templater:python:context]
sqlfluff = {"foo.bar": "abc"}
NOTE:
dbt templating is still a relatively new feature added in 0.4.0 and is still in very active development! If you encounter an issue, please let us know in a GitHub issue or on the SQLFluff slack workspace.
dbt is not the default templater for SQLFluff (it is jinja). dbt is a complex tool, so using the default jinja templater will be simpler. You should be aware when using the dbt templater that you will be exposed to some of the complexity of dbt. Users may wish to try both templaters and choose according to how they intend to use SQLFluff.
A simple rule of thumb might be:
Pros:
Cons:
In order to get started using SQLFluff with a dbt project you will first need to install the relevant dbt adapter for your dialect and the sqlfluff-templater-dbt package using your package manager of choice (e.g. pip install dbt-postgres sqlfluff-templater-dbt) and then will need the following configuration:
In .sqlfluff:
[sqlfluff] templater = dbt
In .sqlfluffignore:
target/ # dbt <1.0.0 dbt_modules/ # dbt >=1.0.0 dbt_packages/ macros/
You can set the dbt project directory, profiles directory and profile with:
[sqlfluff:templater:dbt] project_dir = <relative or absolute path to dbt_project directory> profiles_dir = <relative or absolute path to the directory that contains the profiles.yml file> profile = <dbt profile> target = <dbt target>
NOTE:
To use builtin dbt Jinja functions SQLFluff provides a configuration option that enables usage within templates.
[sqlfluff:templater:jinja] apply_dbt_builtins = True
This will provide dbt macros like ref, var, is_incremental(). If the need arises builtin dbt macros can be customised via Jinja macros in .sqlfluff configuration file.
[sqlfluff:templater:jinja:macros]
# Macros provided as builtins for dbt projects
dbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %}
dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %}
dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %}
dbt_var = {% macro var(variable, default='') %}item{% endmacro %}
dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %}
If your project requires that you pass variables to dbt through command line, you can specify them in template:dbt:context section of .sqlfluff. See below configuration and its equivalent dbt command:
[sqlfluff:templater:dbt:context] my_variable = 1
dbt run --vars '{"my_variable": 1}'
Variables are available in all the templaters. By default the templating engine will expect variables for templating to be available in the config, and the templater will be look in the section corresponding to the context for that templater. By convention, the config for the jinja templater is found in the sqlfluff:templater:jinja:context section, the config for the python templater is found in the sqlfluff:templater:python:context section, the one for the placeholder templater is found in the sqlfluff:templater:placeholder:context section.
For example, if passed the following .sql file:
SELECT {{ num_things }} FROM {{ tbl_name }} WHERE id > 10 LIMIT 5
...and the following configuration in .sqlfluff in the same directory:
[sqlfluff:templater:jinja:context] num_things=456 tbl_name=my_table
...then before parsing, the sql will be transformed to:
SELECT 456 FROM my_table WHERE id > 10 LIMIT 5
NOTE:
Similar to flake8's ignore, individual lines can be ignored by adding -- noqa to the end of the line. Additionally, specific rules can be ignored by quoting their code or the category.
-- Ignore all errors SeLeCt 1 from tBl ; -- noqa -- Ignore rule CP02 & rule CP03 SeLeCt 1 from tBl ; -- noqa: CP02,CP03 -- Ignore all parsing errors SeLeCt from tBl ; -- noqa: PRS
NOTE:
Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.
-- Ignore rule AL02 from this line forward SELECT col_a a FROM foo -- noqa: disable=AL02 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all
Similar to Git's .gitignore and Docker's .dockerignore, SQLFluff supports a .sqlfluffignore file to control which files are and aren't linted. Under the hood we use the python pathspec library which also has a brief tutorial in their documentation.
An example of a potential .sqlfluffignore placed in the root of your project would be:
# Comments start with a hash. # Ignore anything in the "temp" path /temp/ # Ignore anything called "testing.sql" testing.sql # Ignore any ".tsql" files *.tsql
Ignore files can also be placed in subdirectories of a path which is being linted and the sub files will also be applied within that subdirectory.
General categories of errors can be ignored using the --ignore command line option or the ignore setting in .sqlfluffignore. Types of errors that can be ignored include:
The default configuration is as follows, note the Builtin Jinja Macro Blocks in section [sqlfluff:templater:jinja:macros] as referred to above.
NOTE:
This is for two reasons:
If you are starting a fresh project and are looking for a good starter config, check out the New Project Configuration section above.
[sqlfluff] # verbose is an integer (0-2) indicating the level of log output verbose = 0 # Turn off color formatting of output nocolor = False # Supported dialects https://docs.sqlfluff.com/en/stable/perma/dialects.html # Or run 'sqlfluff dialects' dialect = None # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to check, default to all rules = all # Comma separated list of rules to exclude, or None exclude_rules = None # Below controls SQLFluff output, see max_line_length for SQL output output_line_length = 80 # Number of passes to run before admitting defeat runaway_limit = 10 # Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating) ignore = None # Warn only for rule codes (one of more rule codes, separated by commas: e.g. LT01,LT02) # Also works for templating and parsing errors by using TMP or PRS warnings = None # Whether to warn about unneeded '-- noqa:' comments. warn_unused_ignores = False # Ignore linting errors found within sections of code coming directly from # templated code (e.g. from within Jinja curly braces. Note that it does not # ignore errors from literal code found within template loops. ignore_templated_areas = True # can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig encoding = autodetect # Ignore inline overrides (e.g. to test if still required) disable_noqa = False # Ignore inline overrides except those listed. # This take priority over `disabled_noqa`. disable_noqa_except = None # Comma separated list of file extensions to lint # NB: This config will only apply in the root folder sql_file_exts = .sql,.sql.j2,.dml,.ddl # Allow fix to run on files, even if they contain parsing errors # Note altering this is NOT RECOMMENDED as can corrupt SQL fix_even_unparsable = False # Very large files can make the parser effectively hang. # The more efficient check is the _byte_ limit check which # is enabled by default. The previous _character_ limit check # is still present for backward compatibility. This will be # removed in a future version. # Set either to 0 to disable. large_file_skip_char_limit = 0 large_file_skip_byte_limit = 20000 # CPU processes to use while linting. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = 1 # Max line length is set by default to be in line with the dbt style guide. # https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md # Set to zero or negative to disable checks. max_line_length = 80 # NOTE: Templater Variant rendering should currently be considered EXPERIMENTAL. # Only set `render_variant_limit` to more than 1 if you know what you're doing! # Implementation of this will also depend on your templater. render_variant_limit = 1 [sqlfluff:indentation] # See https://docs.sqlfluff.com/en/stable/perma/indent_locations.html indent_unit = space tab_space_size = 4 indented_joins = False indented_ctes = False indented_using_on = True indented_on_contents = True indented_then = True indented_then_contents = True allow_implicit_indents = False template_blocks_indent = True # This is a comma separated list of elements to skip # indentation edits to. skip_indentation_in = script_content # If comments are found at the end of long lines, we default to moving # them to the line _before_ their current location as the convention is # that a comment precedes the line it describes. However if you prefer # comments moved _after_, this configuration setting can be set to "after". trailing_comments = before # To exclude comment lines from indentation entirely set this to "True". ignore_comment_lines = False # Layout configuration # See https://docs.sqlfluff.com/en/stable/perma/layout_spacing.html [sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing [sqlfluff:layout:type:binary_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:column_path_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:statement_terminator] spacing_before = touch line_position = trailing [sqlfluff:layout:type:end_of_file] spacing_before = touch [sqlfluff:layout:type:set_operator] line_position = alone:strict [sqlfluff:layout:type:start_bracket] spacing_after = touch [sqlfluff:layout:type:end_bracket] spacing_before = touch [sqlfluff:layout:type:start_square_bracket] spacing_after = touch [sqlfluff:layout:type:end_square_bracket] spacing_before = touch [sqlfluff:layout:type:start_angle_bracket] spacing_after = touch [sqlfluff:layout:type:end_angle_bracket] spacing_before = touch [sqlfluff:layout:type:casting_operator] spacing_before = touch spacing_after = touch:inline [sqlfluff:layout:type:slice] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:dot] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:comparison_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:assignment_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:object_reference] spacing_within = touch:inline [sqlfluff:layout:type:numeric_literal] spacing_within = touch:inline [sqlfluff:layout:type:sign_indicator] spacing_after = touch:inline [sqlfluff:layout:type:tilde] spacing_after = touch:inline [sqlfluff:layout:type:function_name] spacing_within = touch:inline [sqlfluff:layout:type:function_contents] spacing_before = touch:inline [sqlfluff:layout:type:function_parameter_list] spacing_before = touch:inline [sqlfluff:layout:type:array_type] spacing_within = touch:inline [sqlfluff:layout:type:typed_array_literal] spacing_within = touch [sqlfluff:layout:type:sized_array_type] spacing_within = touch [sqlfluff:layout:type:struct_type] spacing_within = touch:inline [sqlfluff:layout:type:bracketed_arguments] spacing_before = touch:inline [sqlfluff:layout:type:match_condition] spacing_within = touch:inline [sqlfluff:layout:type:typed_struct_literal] spacing_within = touch [sqlfluff:layout:type:semi_structured_expression] spacing_within = touch:inline spacing_before = touch:inline [sqlfluff:layout:type:array_accessor] spacing_before = touch:inline [sqlfluff:layout:type:colon] spacing_before = touch [sqlfluff:layout:type:colon_delimiter] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:path_segment] spacing_within = touch [sqlfluff:layout:type:sql_conf_option] spacing_within = touch [sqlfluff:layout:type:sqlcmd_operator] # NOTE: This is the spacing between the operator and the colon spacing_before = touch [sqlfluff:layout:type:slash] spacing_before = any spacing_after = any [sqlfluff:layout:type:comment] spacing_before = any spacing_after = any [sqlfluff:layout:type:pattern_expression] # Snowflake pattern expressions shouldn't have their spacing changed. spacing_within = any [sqlfluff:layout:type:placeholder] # Placeholders exist "outside" the rendered SQL syntax # so we shouldn't enforce any particular spacing around # them. spacing_before = any spacing_after = any [sqlfluff:layout:type:common_table_expression] # The definition part of a CTE should fit on one line where possible. # For users which regularly define column names in their CTEs they # may which to relax this config to just `single`. spacing_within = single:inline # By setting a selection of clauses to "alone", we hint to the reflow # algorithm that in the case of a long single line statement, the # first place to add newlines would be around these clauses. # Setting this to "alone:strict" would always _force_ line breaks # around them even if the line isn't too long. [sqlfluff:layout:type:select_clause] line_position = alone [sqlfluff:layout:type:where_clause] line_position = alone keyword_line_position = leading [sqlfluff:layout:type:from_clause] line_position = alone [sqlfluff:layout:type:join_clause] line_position = alone [sqlfluff:layout:type:groupby_clause] line_position = alone keyword_line_position = leading [sqlfluff:layout:type:orderby_clause] # NOTE: Order by clauses appear in many places other than in a select # clause. To avoid unexpected behaviour we use `leading` in this # case rather than `alone`. line_position = leading keyword_line_position = leading [sqlfluff:layout:type:having_clause] line_position = alone keyword_line_position = leading [sqlfluff:layout:type:limit_clause] line_position = alone # Template loop tokens shouldn't dictate spacing around them. [sqlfluff:layout:type:template_loop] spacing_before = any spacing_after = any [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True # Some rules can be configured directly from the config common to other rules [sqlfluff:rules] allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all [sqlfluff:rules:capitalisation.keywords] # Keywords capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.identifiers] # Unquoted identifiers extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.functions] # Function names extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.literals] # Null & Boolean Literals capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:capitalisation.types] # Data Types extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:ambiguous.join] # Fully qualify JOIN clause fully_qualify_join_types = inner [sqlfluff:rules:ambiguous.column_references] # GROUP BY/ORDER BY column references group_by_and_order_by_style = consistent [sqlfluff:rules:aliasing.table] # Aliasing preference for tables aliasing = explicit [sqlfluff:rules:aliasing.column] # Aliasing preference for columns aliasing = explicit [sqlfluff:rules:aliasing.unused] alias_case_check = dialect [sqlfluff:rules:aliasing.length] min_alias_length = None max_alias_length = None [sqlfluff:rules:aliasing.forbid] # Avoid table aliases in from clauses and join conditions. # Disabled by default for all dialects unless explicitly enabled. # We suggest instead using aliasing.length (AL06) in most cases. force_enable = False [sqlfluff:rules:convention.not_equal] # Consistent usage of preferred "not equal to" comparison preferred_not_equal_style = consistent [sqlfluff:rules:convention.select_trailing_comma] # Trailing commas select_clause_trailing_comma = forbid [sqlfluff:rules:convention.count_rows] # Consistent syntax to count all rows prefer_count_1 = False prefer_count_0 = False [sqlfluff:rules:convention.terminator] # Semi-colon formatting approach multiline_newline = False require_final_semicolon = False [sqlfluff:rules:convention.blocked_words] # Comma separated list of blocked words that should not be used blocked_words = None blocked_regex = None match_source = False [sqlfluff:rules:convention.quoted_literals] # Consistent usage of preferred quotes for quoted literals preferred_quoted_literal_style = consistent # Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres) force_enable = False [sqlfluff:rules:convention.casting_style] # SQL type casting preferred_type_casting_style = consistent [sqlfluff:rules:references.from] # References must be in FROM clause # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:references.qualification] # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.consistent] # References must be consistently used # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:references.keywords] # Keywords should not be used as identifiers. unquoted_identifiers_policy = aliases quoted_identifiers_policy = none # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.special_chars] # Special characters in identifiers unquoted_identifiers_policy = all quoted_identifiers_policy = all allow_space_in_identifier = False additional_allowed_characters = None ignore_words = None ignore_words_regex = None [sqlfluff:rules:references.quoting] # Policy on quoted and unquoted identifiers prefer_quoted_identifiers = False prefer_quoted_keywords = False ignore_words = None ignore_words_regex = None case_sensitive = True [sqlfluff:rules:layout.long_lines] # Line length ignore_comment_lines = False ignore_comment_clauses = False [sqlfluff:rules:layout.select_targets] wildcard_policy = single [sqlfluff:rules:structure.subquery] # By default, allow subqueries in from clauses, but not join clauses forbid_subquery_in = join [sqlfluff:rules:structure.join_condition_order] preferred_first_table_in_join_clause = earlier
SQLFluff is designed to be used both as a utility for developers but also to be part of CI/CD pipelines.
A full list of Security Advisories is available on GitHub.
Given the context of how SQLFluff is designed to be used, there are three different tiers of access which users may have access to manipulate how the tool functions in a secure environment.
To disable this option entirely via the CLI:
$ sqlfluff lint my_path --library-path none
To disable this option entirely via the python API:
"""This is an example of providing config overrides.""" from sqlfluff.core import FluffConfig, Linter sql = "SELECT 1\n" config = FluffConfig(
overrides={
"dialect": "snowflake",
# NOTE: We explicitly set the string "none" here rather
# than a None literal so that it overrides any config
# set by any config files in the path.
"library_path": "none",
} ) linted_file = Linter(config=config).lint_string(sql) assert linted_file.get_violations() == []
The SQLFluff CLI application is a python application which means if depends on your host python environment (see Installing SQLFluff).
The exit code provided by SQLFluff when run as a command line utility is designed to assist usefulness in deployment pipelines. If no violations are found then the exit code will be 0. If violations are found then a non-zero code will be returned which can be interrogated to find out more.
For details of what commands and options are available in the CLI see the CLI Reference.
For projects with large amounts of (potentially imperfect) SQL code, the full SQLFluff output could be very large, which can be distracting -- perhaps the CI build for a one-line SQL change shouldn't encourage the developer to fix lots of unrelated quality issues.
To support this use case, SQLFluff integrates with a quality checking tool called diff-quality. By running SQLFluff using diff-quality (rather than running it directly), you can limit the the output to the new or modified SQL in the branch (aka pull request or PR) containing the proposed changes.
Currently, diff-quality requires that you are using git for version control.
NOTE: Installing SQLFluff automatically installs the diff_cover package that provides the diff-quality tool.
In your CI build script:
$ diff-quality --violations sqlfluff
The output will look something like:
------------- Diff Quality Quality Report: sqlfluff Diff: origin/master...HEAD, staged and unstaged changes ------------- sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql (0.0%): sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql:5: Unquoted Identifiers must be consistently upper case. ------------- Total: 1 line Violations: 1 line % Quality: 0% -------------
These messages are basically the same as those provided directly by SQLFluff, although the format is a little different. Note that diff-quality only lists the line _numbers_, not the character position. If you need the character position, you will need to run SQLFluff directly.
NOTE:
By aligning the paths of all three, you should be able to achieve a robust setup. If each is rooted in different paths if can be very difficult to achieve the same result, and the resulting behaviour can be difficult to debug.
To debug any issues relating to this setup, we recommend occasionally running sqlfluff directly using the main cli (i.e. calling sqlfluff lint my/project/path) and check whether that route gives you the results you expect. diff-quality should behave as though it's calling the SQLFluff CLI from the same path that you invoke diff-quality.
For more information on diff-quality and the diff_cover package, see the documentation on their github repository. It covers topics such as:
pre-commit is a framework to manage git "hooks" triggered right before a commit is made.
A git hook is a git feature to "fire off custom scripts" when specific actions occur.
Using pre-commit with SQLFluff is a good way to provide automated linting to SQL developers.
With pre-commit, you also get the benefit of only linting/fixing the files that changed.
SQLFluff comes with two pre-commit hooks:
WARNING:
Although it is not advised, you can tell SQLFluff to try and fix these files by overriding the fix_even_unparsable setting in .sqlfluff config file or using the sqlfluff fix --FIX-EVEN-UNPARSABLE command line option.
Overriding this behavior may break your SQL. If you use this override, always be sure to review any fixes applied to files with templating or parse errors to verify they are okay.
You should create a file named .pre-commit-config.yaml at the root of your git project, which should look like this:
repos: - repo: https://github.com/sqlfluff/sqlfluff
rev: stable_version
hooks:
- id: sqlfluff-lint
# For dbt projects, this installs the dbt "extras".
# You will need to select the relevant dbt adapter for your dialect
# (https://docs.getdbt.com/docs/available-adapters):
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
- id: sqlfluff-fix
# Arbitrary arguments to show an example
# args: [--rules, "LT02,CP02"]
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
When trying to use the dbt templater, uncomment the additional_dependencies to install the extras. This is equivalent to running pip install <dbt-adapter> sqlfluff-templater-dbt.
You can specify the version of dbt-adapter used in pre-commit, for example:
additional_dependencies : ['dbt-bigquery==1.0.0', 'sqlfluff-templater-dbt']
See the list of available dbt-adapters.
Note that you can pass the same arguments available through the CLI using args:.
Under the hood, pre-commit works by passing specific files to SQLFluff. For example, if the only two files that are modified in your commit are file_a.sql and file_b.sql, then the command which is called in the background is sqlfluff lint file_a.sql file_b.sql. While this is efficient, it does produce some unwanted noise when also using .sqlfluffignore. This is because SQLFluff is designed to allow users to override an ignore configuration by passing the name of the file directly. This makes a lot of sense in a CLI context, but less so in the context of being invoked by pre-commit.
To avoid noisy logs when using both pre-commit and .sqlfluffignore, we recommend also setting the exclude argument in your .pre-commit-config.yaml file (either the top level config or the hook specific config). This will prevent files matching the given pattern being passed to SQLFluff and so silence any warnings about the .sqlfluffignore being overridden.
There are two way to utilize SQLFluff to annotate Github PRs.
WARNING:
There is an open feature request for GitHub Actions which you can track to follow this issue.
For more information and examples on using SQLFluff in GitHub Actions, see the sqlfluff-github-actions repository.
SQLFluff is designed to be flexible in supporting a variety of dialects. Not all potential dialects are supported so far, but several have been implemented by the community. Below are a list of the currently available dialects. Each inherits from another, up to the root ansi dialect.
For a canonical list of supported dialects, run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff.
NOTE:
Consider when adding new features to a dialect:
Label: ansi
Default Casing: UPPERCASE
Quotes: String Literals: '', Identifiers: ""
This is the base dialect which holds most of the definitions of common SQL commands and structures. If the dialect which you're actually using isn't specifically implemented by SQLFluff, using this dialect is a good place to start.
This dialect doesn't intend to be brutal in adhering to (and only to) the ANSI SQL spec (mostly because ANSI charges for access to that spec). It aims to be a representation of vanilla SQL before any other project adds their spin to it, and so may contain a slightly wider set of functions than actually available in true ANSI SQL.
Label: athena
Inherits from: ANSI
Default Casing: lowercase
Quotes: String Literals: '', "" or , Identifiers: "" or
The dialect for Athena on Amazon Web Services (AWS).
Label: bigquery
Inherits from: ANSI
Default Casing: BigQuery resolves unquoted column identifiers case insensitively, and table/dataset identifiers case sensitively (by default, unless is_case_insensitive is set for the latter). Unless specified, columns are returned in the case which they were defined in, which means columns can be re-cased in the result set without aliasing e.g. if a table is defined with CREATE TEMPORARY TABLE foo (col1 int, COL2 int) then SELECT * FROM foo returns col1 and COL2 in the result, but SELECT COL1, col2 FROM foo returns COL1 and col2 in the result.
Quotes: String Literals: '', "", @ or @@ (with the quoted options, also supporting variants prefixes with r/R (for raw/regex expressions) or b/B (for byte strings)), Identifiers: "" or .
The dialect for BigQuery on Google Cloud Platform (GCP).
Label: clickhouse
Inherits from: ANSI
Default Casing: Clickhouse is case sensitive throughout, regardless of quoting. An unquoted reference to an object using the wrong case will raise an UNKNOWN_IDENTIFIER error.
Quotes: String Literals: '', Identifiers: "" or . Note as above, that because identifiers are always resolved case sensitively, the only reason for quoting identifiers is when they contain invalid characters or reserved keywords.
The dialect for ClickHouse.
Label: databricks
Inherits from: Apache Spark SQL
The dialect for Databricks.
Label: db2
Inherits from: ANSI
The dialect for IBM Db2.
Label: duckdb
Inherits from: PostgreSQL
Default Casing: DuckDB stores all identifiers in the case they were defined, however all identifier resolution is case-insensitive (when unquoted, and more unusually, also when quoted). See the DuckDB Identifiers Documentation for more details.
Quotes: String Literals: '', Identifiers: "" or ''
The dialect for DuckDB.
Label: exasol
Inherits from: ANSI
The dialect for Exasol.
Label: greenplum
Inherits from: PostgreSQL
The dialect for Greenplum.
Label: hive
Inherits from: ANSI
The dialect for Apache Hive.
Label: impala
Inherits from: Apache Hive
The dialect for Apache Impala.
Label: mariadb
Inherits from: MySQL
Default Casing: lowercase
Quotes: String Literals: '', "" or @, Identifiers: .
The dialect for MariaDB.
Label: materialize
Inherits from: PostgreSQL
The dialect for Materialize.
Label: mysql
Inherits from: ANSI
Default Casing: lowercase
Quotes: String Literals: '', "" or @, Identifiers: .
The dialect for MySQL.
Label: oracle
Inherits from: ANSI
The dialect for Oracle SQL. Note: this does not include PL/SQL.
Label: postgres
Inherits from: ANSI
Default Casing: lowercase
Quotes: String Literals: '', Identifiers: "".
This is based around the PostgreSQL spec. Many other SQL dialects are often based on the PostreSQL syntax. If you're running an unsupported dialect, then this is often the dialect to use (until someone makes a specific dialect).
Label: redshift
Inherits from: PostgreSQL
Default Casing: lowercase (unless configured to be case sensitive with all identifiers using the enable_case_sensitive_identifier configuration value, see the Redshift Names & Identifiers Docs).
Quotes: String Literals: '', Identifiers: "".
The dialect for Redshift on Amazon Web Services (AWS).
Label: snowflake
Inherits from: ANSI
Default Casing: UPPERCASE
Quotes: String Literals: '', Identifiers: ""
The dialect for Snowflake, which has much of its syntax inherited from PostgreSQL.
Label: soql
Inherits from: ANSI
The dialect for SOQL (Salesforce Object Query Language).
Label: sparksql
Inherits from: ANSI
Default Casing: SparkSQL is case insensitive with both quoted and unquoted identifiers (_"delimited"_ identifiers in Spark terminology). See the Spark Identifiers docs.
Quotes: String Literals: '' or "", Identifiers: .
The dialect for Apache Spark SQL. This includes relevant syntax from Apache Hive for commands that permit Hive Format. Spark SQL extensions provided by the Delta Lake project are also implemented in this dialect.
This implementation focuses on the Ansi Compliant Mode introduced in Spark3, instead of being Hive Compliant. The introduction of ANSI Compliance provides better data quality and easier migration from traditional DBMS.
Versions of Spark prior to 3.x will only support the Hive dialect.
Label: sqlite
Inherits from: ANSI
Default Casing: Not specified in the docs, but through testing it appears that SQLite stores column names in whatever case they were defined, but is always case-insensitive when resolving those names.
Quotes: String Literals: '' (or "" if not otherwise resolved to an identifier), Identifiers: "", [] or . See the SQLite Keywords Docs for more details.
The dialect for SQLite.
Label: starrocks
Inherits from: MySQL
Default Casing: lowercase
The dialect for StarRocks.
Label: teradata
Inherits from: ANSI
The dialect for Teradata.
Label: trino
Inherits from: ANSI
Default Casing: lowercase, although the case of a reference is used in the result set column label. If a column is defined using CREATE TEMPORARY TABLE foo (COL1 int), then SELECT * FROM foo returns a column labelled col1, however SELECT COL1 FROM foo returns a column labelled COL1.
Quotes: String Literals: '', Identifiers: ""
The dialect for Trino.
Label: tsql
Inherits from: ANSI
The dialect for T-SQL (aka Transact-SQL).
Label: vertica
Inherits from: ANSI
The dialect for Vertica.
This page is an index of available rules which are bundled with SQLFluff.
Certain rules belong to the core rule group. In order for a rule to be designated as core, it must meet the following criteria:
Core rules can also make it easier to roll out SQLFluff to a team by only needing to follow a 'common sense' subset of rules initially, rather than spending time understanding and configuring all the rules, some of which your team may not necessarily agree with.
We believe teams will eventually want to enforce more than just the core rules, and we encourage everyone to explore all the rules and customize a rule set that best suites their organization.
See the Configuration section for more information on how to enable only core rules by default.
| Bundle | Rule Name | Code | Aliases |
| Aliasing bundle | aliasing.table | AL01 | L011 |
| aliasing.column | AL02 | L012 | |
| aliasing.expression | AL03 | L013 | |
| aliasing.unique.table | AL04 | L020 | |
| aliasing.unused | AL05 | L025 | |
| aliasing.length | AL06 | L066 | |
| aliasing.forbid | AL07 | L031 | |
| aliasing.unique.column | AL08 | ||
| aliasing.self_alias.column | AL09 | ||
| Ambiguous bundle | ambiguous.distinct | AM01 | L021 |
| ambiguous.union | AM02 | L033 | |
| ambiguous.order_by | AM03 | L037 | |
| ambiguous.column_count | AM04 | L044 | |
| ambiguous.join | AM05 | L051 | |
| ambiguous.column_references | AM06 | L054 | |
| ambiguous.set_columns | AM07 | L068 | |
| ambiguous.join_condition | AM08 | ||
| Capitalisation bundle | capitalisation.keywords | CP01 | L010 |
| capitalisation.identifiers | CP02 | L014 | |
| capitalisation.functions | CP03 | L030 | |
| capitalisation.literals | CP04 | L040 | |
| capitalisation.types | CP05 | L063 | |
| Convention bundle | convention.not_equal | CV01 | L061 |
| convention.coalesce | CV02 | L060 | |
| convention.select_trailing_comma | CV03 | L038 | |
| convention.count_rows | CV04 | L047 | |
| convention.is_null | CV05 | L049 | |
| convention.terminator | CV06 | L052 | |
| convention.statement_brackets | CV07 | L053 | |
| convention.left_join | CV08 | L055 | |
| convention.blocked_words | CV09 | L062 | |
| convention.quoted_literals | CV10 | L064 | |
| convention.casting_style | CV11 | L067 | |
| convention.join_condition | CV12 | ||
| Jinja bundle | jinja.padding | JJ01 | L046 |
| Layout bundle | layout.spacing | LT01 | L001, L005, L006, L008, L023, L024, L039, L048, L071 |
| layout.indent | LT02 | L002, L003, L004 | |
| layout.operators | LT03 | L007 | |
| layout.commas | LT04 | L019 | |
| layout.long_lines | LT05 | L016 | |
| layout.functions | LT06 | L017 | |
| layout.cte_bracket | LT07 | L018 | |
| layout.cte_newline | LT08 | L022 | |
| layout.select_targets | LT09 | L036 | |
| layout.select_modifiers | LT10 | L041 | |
| layout.set_operators | LT11 | L065 | |
| layout.end_of_file | LT12 | L009, layout.end-of-file | |
| layout.start_of_file | LT13 | L050 | |
| layout.keyword_newline | LT14 | ||
| References bundle | references.from | RF01 | L026 |
| references.qualification | RF02 | L027 | |
| references.consistent | RF03 | L028 | |
| references.keywords | RF04 | L029 | |
| references.special_chars | RF05 | L057 | |
| references.quoting | RF06 | L059 | |
| Structure bundle | structure.else_null | ST01 | L035 |
| structure.simple_case | ST02 | L043 | |
| structure.unused_cte | ST03 | L045 | |
| structure.nested_case | ST04 | L058 | |
| structure.subquery | ST05 | L042 | |
| structure.column_order | ST06 | L034 | |
| structure.using | ST07 | L032 | |
| structure.distinct | ST08 | L015 | |
| structure.join_condition_order | ST09 | ||
| structure.constant_expression | ST10 | ||
| structure.unused_join | ST11 | ||
| TSQL bundle | tsql.sp_prefix | TQ01 | L056 |
Aliasing of table to follow preference (requiring an explicit AS is the default).
Anti-pattern
In this example, the alias voo is implicit.
SELECT
voo.a FROM foo voo
Best practice
Add AS to make it explicit.
SELECT
voo.a FROM foo AS voo This rule is ``sqlfluff fix`` compatible. **Name**: ``aliasing.table`` **Aliases**: ``L011`` **Groups**: ``all``, ``aliasing`` **Configuration** * ``aliasing``: Should alias have an explicit AS or is implicit aliasing required? Must be one of ``['implicit', 'explicit']``.
Aliasing of columns to follow preference (explicit using an AS clause is default).
Anti-pattern
In this example, the alias for column a is implicit.
SELECT
a alias_col FROM foo
Best practice
Add AS to make it explicit.
SELECT
a AS alias_col FROM foo **Name**: ``aliasing.column`` **Aliases**: ``L012`` **Groups**: ``all``, ``core``, ``aliasing`` **Configuration** * ``aliasing``: Should alias have an explicit AS or is implicit aliasing required? Must be one of ``['implicit', 'explicit']``.
Anti-pattern
In this example, there is no alias for both sums.
SELECT
sum(a),
sum(b) FROM foo
Best practice
Add aliases.
SELECT
sum(a) AS a_sum,
sum(b) AS b_sum FROM foo **Name**: ``aliasing.expression`` **Aliases**: ``L013`` **Groups**: ``all``, ``core``, ``aliasing`` **Configuration** * ``allow_scalar``: Whether or not to allow a single element in the select clause to be without an alias. Must be one of ``[True, False]``.
Reusing table aliases is very likely a coding error.
Anti-pattern
In this example, the alias t is reused for two different tables:
SELECT
t.a,
t.b FROM foo AS t, bar AS t -- This can also happen when using schemas where the -- implicit alias is the table name: SELECT
a,
b FROM
2020.foo,
2021.foo
Best practice
Make all tables have a unique alias.
SELECT
f.a,
b.b FROM foo AS f, bar AS b -- Also use explicit aliases when referencing two tables -- with the same name from two different schemas. SELECT
f1.a,
f2.b FROM
2020.foo AS f1,
2021.foo AS f2 **Name**: ``aliasing.unique.table`` **Aliases**: ``L020`` **Groups**: ``all``, ``core``, ``aliasing``, ``aliasing.unique``
Anti-pattern
SELECT
a FROM foo AS zoo
Best practice
Use the alias or remove it. An unused alias makes code harder to read without changing any functionality.
SELECT
zoo.a FROM foo AS zoo -- Alternatively... SELECT
a FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``aliasing.unused`` **Aliases**: ``L025`` **Groups**: ``all``, ``core``, ``aliasing`` **Configuration** * ``alias_case_check``: How to handle comparison casefolding in an alias. Must be one of ``['dialect', 'case_insensitive', 'quoted_cs_naked_upper', 'quoted_cs_naked_lower', 'case_sensitive']``.
Anti-pattern
In this example, alias o is used for the orders table.
SELECT
SUM(o.amount) as order_amount, FROM orders as o
Best practice
Avoid aliases. Avoid short aliases when aliases are necessary.
See also: AL07.
SELECT
SUM(orders.amount) as order_amount, FROM orders SELECT
replacement_orders.amount,
previous_orders.amount FROM
orders AS replacement_orders JOIN
orders AS previous_orders
ON replacement_orders.id = previous_orders.replacement_id **Name**: ``aliasing.length`` **Aliases**: ``L066`` **Groups**: ``all``, ``core``, ``aliasing`` **Configuration** * ``max_alias_length``: The maximum length of an alias to allow without raising a violation. Must be one of ``range(0, 1000)``. * ``min_alias_length``: The minimum length of an alias to allow without raising a violation. Must be one of ``range(0, 1000)``.
NOTE:
This rule is controversial and for many larger databases avoiding alias is neither realistic nor desirable. In particular for BigQuery due to the complexity of backtick requirements and determining whether a name refers to a project or dataset so automated fixes can potentially break working SQL code. For most users AL06 is likely a more appropriate linting rule to drive a sensible behaviour around aliasing.
The stricter treatment of aliases in this rule may be useful for more focused projects, or temporarily as a refactoring tool because the fix routine of the rule can remove aliases.
This rule is disabled by default for all dialects it can be enabled with the force_enable = True flag.
Anti-pattern
In this example, alias o is used for the orders table, and c is used for customers table.
SELECT
COUNT(o.customer_id) as order_amount,
c.name FROM orders as o JOIN customers as c on o.id = c.user_id
Best practice
Avoid aliases.
SELECT
COUNT(orders.customer_id) as order_amount,
customers.name FROM orders JOIN customers on orders.id = customers.user_id -- Self-join will not raise issue SELECT
table1.a,
table_alias.b, FROM
table1
LEFT JOIN table1 AS table_alias ON
table1.foreign_key = table_alias.foreign_key This rule is ``sqlfluff fix`` compatible. **Name**: ``aliasing.forbid`` **Aliases**: ``L031`` **Groups**: ``all``, ``aliasing`` **Configuration** * ``force_enable``: Run this rule even for dialects where this rule is disabled by default. Must be one of ``[True, False]``.
Reusing column aliases is very likely a coding error. Note that while in many dialects, quoting an identifier makes it case-sensitive this rule always compares in a case-insensitive way. This is because columns with the same name, but different case, are still confusing and potentially ambiguous to other readers.
In situations where it is necessary to have columns with the same name (whether they differ in case or not) we recommend disabling this rule for either just the line, or the whole file.
Anti-pattern
In this example, the alias foo is reused for two different columns:
SELECT
a as foo,
b as foo FROM tbl; -- This can also happen when referencing the same column -- column twice, or aliasing an expression to the same -- name as a column: SELECT
foo,
foo,
a as foo FROM tbl;
Best practice
Make all columns have a unique alias.
SELECT
a as foo,
b as bar FROM tbl; -- Avoid also using the same column twice unless aliased: SELECT
foo as foo1,
foo as foo2,
a as foo3 FROM tbl; **Name**: ``aliasing.unique.column`` **Groups**: ``all``, ``core``, ``aliasing``, ``aliasing.unique``
Renaming the column to itself is a redundant piece of SQL, which doesn't affect its functionality. This rule only applies when aliasing to an exact copy of the column reference (e.g. foo as foo or "BAR" as "BAR", see note below on more complex examples). Aliases which effectively change the casing of an identifier are still allowed.
NOTE:
If those two rules are enabled, the fixes applied may result in a situation where this rule can kick in as a secondary effect. For example this Snowflake query:
-- Original Query. AL09 will not trigger because casing and -- quoting are different. RF06 will however fix the unnecessary -- quoting of "COL". SELECT "COL" AS col FROM table; -- After RF06, the query will look like this, at which point -- CP02 will see the inconsistent capitalisation. Depending -- on the configuration it will change one of the identifiers. -- Let's assume the default configuration of "consistent". SELECT COL AS col FROM table; -- After CP02, the alias and the reference will be the same -- and at this point AL09 can take over and remove the alias. SELECT COL AS COL FROM table; -- ..resulting in: SELECT COL FROM table;
This interdependence between the rules, and the configuration options offered by each one means a variety of outcomes can be achieved by enabling and disabling each one. See Enabling and Disabling Rules and Rule Configuration for more details.
Anti-pattern
Aliasing the column to itself, where not necessary for changing the case of an identifier.
SELECT
col AS col,
"Col" AS "Col",
COL AS col FROM table;
Best practice
Not to use alias to rename the column to its original name. Self-aliasing leads to redundant code without changing any functionality, unless used to effectively change the case of the identifier.
SELECT
col,
"Col"
COL, FROM table; -- Re-casing aliasing is still allowed where necessary, i.e. SELECT
col as "Col",
"col" as "COL" FROM table; This rule is ``sqlfluff fix`` compatible. **Name**: ``aliasing.self_alias.column`` **Groups**: ``all``, ``core``, ``aliasing``
When using GROUP BY a DISTINCT` clause should not be necessary as every non-distinct SELECT clause must be included in the GROUP BY clause.
Anti-pattern
DISTINCT and GROUP BY are conflicting.
SELECT DISTINCT
a FROM foo GROUP BY a
Best practice
Remove DISTINCT or GROUP BY. In our case, removing GROUP BY is better.
SELECT DISTINCT
a FROM foo **Name**: ``ambiguous.distinct`` **Aliases**: ``L021`` **Groups**: ``all``, ``core``, ``ambiguous``
NOTE:
Anti-pattern
In this example, UNION DISTINCT should be preferred over UNION, because explicit is better than implicit.
SELECT a, b FROM table_1 UNION SELECT a, b FROM table_2
Best practice
Specify DISTINCT or ALL after UNION (note that DISTINCT is the default behavior).
SELECT a, b FROM table_1 UNION DISTINCT SELECT a, b FROM table_2 This rule is ``sqlfluff fix`` compatible. **Name**: ``ambiguous.union`` **Aliases**: ``L033`` **Groups**: ``all``, ``core``, ``ambiguous``
Anti-pattern
SELECT
a, b FROM foo ORDER BY a, b DESC
Best practice
If any columns in the ORDER BY clause specify ASC or DESC, they should all do so.
SELECT
a, b FROM foo ORDER BY a ASC, b DESC This rule is ``sqlfluff fix`` compatible. **Name**: ``ambiguous.order_by`` **Aliases**: ``L037`` **Groups**: ``all``, ``ambiguous``
Anti-pattern
Querying all columns using * produces a query result where the number or ordering of columns changes if the upstream table's schema changes. This should generally be avoided because it can cause slow performance, cause important schema changes to go undetected, or break production code. For example:
WITH cte AS (
SELECT * FROM foo ) SELECT * FROM cte UNION SELECT a, b FROM t
Best practice
Somewhere along the "path" to the source data, specify columns explicitly.
WITH cte AS (
SELECT * FROM foo ) SELECT a, b FROM cte UNION SELECT a, b FROM t **Name**: ``ambiguous.column_count`` **Aliases**: ``L044`` **Groups**: ``all``, ``ambiguous``
By default this rule is configured to enforce fully qualified INNER JOIN clauses, but not [LEFT/RIGHT/FULL] OUTER JOIN. If you prefer a stricter lint then this is configurable.
Anti-pattern
A join is used without specifying the kind of join.
SELECT
foo
FROM bar
JOIN baz;
Best practice
Use INNER JOIN rather than JOIN.
SELECT
foo
FROM bar
INNER JOIN baz;
This rule is ``sqlfluff fix`` compatible.
**Name**: ``ambiguous.join``
**Aliases**: ``L051``
**Groups**: ``all``, ``ambiguous``
**Configuration**
* ``fully_qualify_join_types``: Which types of JOIN clauses should be fully qualified? Must be one of ``['inner', 'outer', 'both']``.
NOTE:
Anti-pattern
A mix of implicit and explicit column references are used in a GROUP BY clause.
SELECT
foo,
bar,
sum(baz) AS sum_value
FROM fake_table
GROUP BY
foo, 2;
-- The same also applies to column
-- references in ORDER BY clauses.
SELECT
foo,
bar
FROM fake_table
ORDER BY
1, bar;
Best practice
Reference all GROUP BY/ORDER BY columns either by name or by position.
-- GROUP BY: Explicit
SELECT
foo,
bar,
sum(baz) AS sum_value
FROM fake_table
GROUP BY
foo, bar;
-- ORDER BY: Explicit
SELECT
foo,
bar
FROM fake_table
ORDER BY
foo, bar;
-- GROUP BY: Implicit
SELECT
foo,
bar,
sum(baz) AS sum_value
FROM fake_table
GROUP BY
1, 2;
-- ORDER BY: Implicit
SELECT
foo,
bar
FROM fake_table
ORDER BY
1, 2;
**Name**: ``ambiguous.column_references``
**Aliases**: ``L054``
**Groups**: ``all``, ``core``, ``ambiguous``
**Configuration**
* ``group_by_and_order_by_style``: The expectation for using explicit column name references or implicit positional references. Must be one of ``['consistent', 'implicit', 'explicit']``.
Anti-pattern
When writing set expressions, all queries must return the same number of columns.
WITH cte AS (
SELECT
a,
b
FROM foo ) SELECT * FROM cte UNION SELECT
c,
d,
e
FROM t
Best practice
Always specify columns when writing set queries and ensure that they all seleect same number of columns
WITH cte AS (
SELECT a, b FROM foo ) SELECT
a,
b FROM cte UNION SELECT
c,
d FROM t **Name**: ``ambiguous.set_columns`` **Aliases**: ``L068`` **Groups**: ``all``, ``ambiguous``
Anti-pattern
Cross joins are valid, but rare in the wild - and more often created by mistake than on purpose. This rule catches situations where a cross join has been specified, but not explicitly and so the risk of a mistaken cross join is highly likely.
SELECT
foo
FROM bar
JOIN baz;
Best practice
Use CROSS JOIN.
SELECT
foo
FROM bar
CROSS JOIN baz;
This rule is ``sqlfluff fix`` compatible.
**Name**: ``ambiguous.join_condition``
**Groups**: ``all``, ``ambiguous``
Anti-pattern
In this example, select is in lower-case whereas FROM is in upper-case.
select
a FROM foo
Best practice
Make all keywords either in upper-case or in lower-case.
SELECT
a FROM foo -- Also good select
a from foo This rule is ``sqlfluff fix`` compatible. **Name**: ``capitalisation.keywords`` **Aliases**: ``L010`` **Groups**: ``all``, ``core``, ``capitalisation`` **Configuration** * ``capitalisation_policy``: The capitalisation policy to enforce. Must be one of ``['consistent', 'upper', 'lower', 'capitalise']``. * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``.
This rule applies to all unquoted identifiers, whether references or aliases, and whether they refer to columns or other objects (such as tables or schemas).
NOTE:
As this feature is only present in a few dialects, and not widely understood by users, we regard it as an antipattern. It is more widely understood that if the case of an identifier matters, then it should be quoted. If you, or your organisation, do wish to rely on this feature, we recommend that you disabled this rule (see Enabling and Disabling Rules).
Anti-pattern
In this example, unquoted identifier a is in lower-case but B is in upper-case.
select
a,
B from foo
In this more complicated example, there are a mix of capitalisations in both reference and aliases of columns and tables. That inconsistency is acceptable when those identifiers are quoted, but not when unquoted.
select
col_1 + Col_2 as COL_3,
"COL_4" as Col_5 from Foo as BAR
Best practice
Ensure all unquoted identifiers are either in upper-case or in lower-case.
select
a,
b from foo; -- ...also good... select
A,
B from foo; --- ...or for comparison with our more complex example, this too: select
col_1 + col_2 as col_3,
"COL_4" as col_5 from foo as bar This rule is ``sqlfluff fix`` compatible. **Name**: ``capitalisation.identifiers`` **Aliases**: ``L014`` **Groups**: ``all``, ``core``, ``capitalisation`` **Configuration** * ``extended_capitalisation_policy``: The capitalisation policy to enforce, extended with PascalCase, snake_case, and camelCase. This is separate from ``capitalisation_policy`` as it should not be applied to keywords.Camel, Pascal, and Snake will never be inferred when the policy is set to consistent. This is because snake can cause destructive changes to the identifier, and unlinted code is too easily mistaken for camel and pascal. If, when set to consistent, no consistent case is found, it will default to upper. Must be one of ``['consistent', 'upper', 'lower', 'pascal', 'capitalise', 'snake', 'camel']``. * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``. * ``unquoted_identifiers_policy``: Types of unquoted identifiers to flag violations for. Must be one of ``['all', 'aliases', 'column_aliases', 'table_aliases']``.
Anti-pattern
In this example, the two SUM functions don't have the same capitalisation.
SELECT
sum(a) AS aa,
SUM(b) AS bb FROM foo
Best practice
Make the case consistent.
SELECT
sum(a) AS aa,
sum(b) AS bb FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``capitalisation.functions`` **Aliases**: ``L030`` **Groups**: ``all``, ``core``, ``capitalisation`` **Configuration** * ``extended_capitalisation_policy``: The capitalisation policy to enforce, extended with PascalCase, snake_case, and camelCase. This is separate from ``capitalisation_policy`` as it should not be applied to keywords.Camel, Pascal, and Snake will never be inferred when the policy is set to consistent. This is because snake can cause destructive changes to the identifier, and unlinted code is too easily mistaken for camel and pascal. If, when set to consistent, no consistent case is found, it will default to upper. Must be one of ``['consistent', 'upper', 'lower', 'pascal', 'capitalise', 'snake', 'camel']``. * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``.
Anti-pattern
In this example, null and false are in lower-case whereas TRUE is in upper-case.
select
a,
null,
TRUE,
false from foo
Best practice
Ensure all literal null/true/false literals are consistently upper or lower case
select
a,
NULL,
TRUE,
FALSE from foo -- Also good select
a,
null,
true,
false from foo This rule is ``sqlfluff fix`` compatible. **Name**: ``capitalisation.literals`` **Aliases**: ``L040`` **Groups**: ``all``, ``core``, ``capitalisation`` **Configuration** * ``capitalisation_policy``: The capitalisation policy to enforce. Must be one of ``['consistent', 'upper', 'lower', 'capitalise']``. * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``.
Anti-pattern
In this example, int and unsigned are in lower-case whereas VARCHAR is in upper-case.
CREATE TABLE t (
a int unsigned,
b VARCHAR(15) );
Best practice
Ensure all datatypes are consistently upper or lower case
CREATE TABLE t (
a INT UNSIGNED,
b VARCHAR(15) ); This rule is ``sqlfluff fix`` compatible. **Name**: ``capitalisation.types`` **Aliases**: ``L063`` **Groups**: ``all``, ``core``, ``capitalisation`` **Configuration** * ``extended_capitalisation_policy``: The capitalisation policy to enforce, extended with PascalCase, snake_case, and camelCase. This is separate from ``capitalisation_policy`` as it should not be applied to keywords.Camel, Pascal, and Snake will never be inferred when the policy is set to consistent. This is because snake can cause destructive changes to the identifier, and unlinted code is too easily mistaken for camel and pascal. If, when set to consistent, no consistent case is found, it will default to upper. Must be one of ``['consistent', 'upper', 'lower', 'pascal', 'capitalise', 'snake', 'camel']``. * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``.
Anti-pattern
SELECT * FROM X WHERE 1 <> 2 AND 3 != 4;
Best practice
Ensure all "not equal to" comparisons are consistent, not mixing != and <>.
SELECT * FROM X WHERE 1 != 2 AND 3 != 4; This rule is ``sqlfluff fix`` compatible. **Name**: ``convention.not_equal`` **Aliases**: ``L061`` **Groups**: ``all``, ``convention`` **Configuration** * ``preferred_not_equal_style``: The style for using not equal to operator. Defaults to ``consistent``. Must be one of ``['consistent', 'c_style', 'ansi']``.
Anti-pattern
IFNULL or NVL are used to fill NULL values.
SELECT ifnull(foo, 0) AS bar, FROM baz; SELECT nvl(foo, 0) AS bar, FROM baz;
Best practice
Use COALESCE instead. COALESCE is universally supported, whereas Redshift doesn't support IFNULL and BigQuery doesn't support NVL. Additionally, COALESCE is more flexible and accepts an arbitrary number of arguments.
SELECT coalesce(foo, 0) AS bar, FROM baz; This rule is ``sqlfluff fix`` compatible. **Name**: ``convention.coalesce`` **Aliases**: ``L060`` **Groups**: ``all``, ``convention``
NOTE:
Anti-pattern
SELECT
a,
b, FROM foo
Best practice
SELECT
a,
b FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``convention.select_trailing_comma`` **Aliases**: ``L038`` **Groups**: ``all``, ``core``, ``convention`` **Configuration** * ``select_clause_trailing_comma``: Should trailing commas within select clauses be required or forbidden? Must be one of ``['forbid', 'require']``.
COUNT(*), COUNT(1), and even COUNT(0) are equivalent syntaxes in many SQL engines due to optimizers interpreting these instructions as "count number of rows in result".
The ANSI-92 spec mentions the COUNT(*) syntax specifically as having a special meaning:
So by default, SQLFluff enforces the consistent use of COUNT(*).
If the SQL engine you work with, or your team, prefers COUNT(1) or COUNT(0) over COUNT(*), you can configure this rule to consistently enforce your preference.
Anti-pattern
select
count(1) from table_a
Best practice
Use count(*) unless specified otherwise by config prefer_count_1, or prefer_count_0 as preferred.
select
count(*) from table_a This rule is ``sqlfluff fix`` compatible. **Name**: ``convention.count_rows`` **Aliases**: ``L047`` **Groups**: ``all``, ``core``, ``convention`` **Configuration** * ``prefer_count_0``: Should count(0) be preferred over count(*) and count(1)? Must be one of ``[True, False]``. * ``prefer_count_1``: Should count(1) be preferred over count(*) and count(0)? Must be one of ``[True, False]``.
Anti-pattern
In this example, the = operator is used to check for NULL values.
SELECT
a FROM foo WHERE a = NULL
Best practice
Use IS or IS NOT to check for NULL values.
SELECT
a FROM foo WHERE a IS NULL This rule is ``sqlfluff fix`` compatible. **Name**: ``convention.is_null`` **Aliases**: ``L049`` **Groups**: ``all``, ``core``, ``convention``
Anti-pattern
A statement is not immediately terminated with a semi-colon. The • represents space.
SELECT
a
FROM foo
;
SELECT
b
FROM bar••;
Best practice
Immediately terminate the statement with a semi-colon.
SELECT
a
FROM foo;
This rule is ``sqlfluff fix`` compatible.
**Name**: ``convention.terminator``
**Aliases**: ``L052``
**Groups**: ``all``, ``convention``
**Configuration**
* ``multiline_newline``: Should semi-colons be placed on a new line after multi-line statements? Must be one of ``[True, False]``.
* ``require_final_semicolon``: Should final semi-colons be required? (N.B. forcing trailing semi-colons is not recommended for dbt users as it can cause issues when wrapping the query within other SQL queries). Must be one of ``[True, False]``.
Anti-pattern
A top-level statement is wrapped in brackets.
(SELECT
foo
FROM bar)
-- This also applies to statements containing a sub-query.
(SELECT
foo
FROM (SELECT * FROM bar))
Best practice
Don't wrap top-level statements in brackets.
SELECT
foo
FROM bar
-- Likewise for statements containing a sub-query.
SELECT
foo
FROM (SELECT * FROM bar)
This rule is ``sqlfluff fix`` compatible.
**Name**: ``convention.statement_brackets``
**Aliases**: ``L053``
**Groups**: ``all``, ``convention``
Anti-pattern
RIGHT JOIN is used.
SELECT
foo.col1,
bar.col2
FROM foo
RIGHT JOIN bar
ON foo.bar_id = bar.id;
Best practice
Refactor and use LEFT JOIN instead.
SELECT
foo.col1,
bar.col2
FROM bar
LEFT JOIN foo
ON foo.bar_id = bar.id;
**Name**: ``convention.left_join``
**Aliases**: ``L055``
**Groups**: ``all``, ``convention``
This generic rule can be useful to prevent certain keywords, functions, or objects from being used. Only whole words can be blocked, not phrases, nor parts of words.
This block list is case insensitive.
Example use cases
Anti-pattern
If the blocked_words config is set to deprecated_table,bool then the following will flag:
SELECT * FROM deprecated_table WHERE 1 = 1; CREATE TABLE myschema.t1 (a BOOL);
Best practice
Do not used any blocked words:
SELECT * FROM another_table WHERE 1 = 1; CREATE TABLE myschema.t1 (a BOOLEAN); **Name**: ``convention.blocked_words`` **Aliases**: ``L062`` **Groups**: ``all``, ``convention`` **Configuration** * ``blocked_regex``: Optional, regex of blocked pattern which should not be used in statements. * ``blocked_words``: Optional, comma-separated list of blocked words which should not be used in statements. * ``match_source``: Optional, also match regex of blocked pattern before applying templating.
Some databases allow quoted literals to use either single or double quotes. Prefer one type of quotes as specified in rule setting, falling back to alternate quotes to reduce the need for escapes.
Dollar-quoted raw strings are excluded from this rule, as they are mostly used for literal UDF Body definitions.
NOTE:
This rule is only enabled for dialects that allow single and double quotes for quoted literals (currently bigquery, databricks, hive, mysql, sparksql). It can be enabled for other dialects with the force_enable = True flag.
Anti-pattern
select
"abc",
'abc',
"\"",
"abc" = 'abc'
from foo
Best practice
Ensure all quoted literals use preferred quotes, unless escaping can be reduced by using alternate quotes.
select
"abc",
"abc",
'"',
"abc" = "abc"
from foo
This rule is ``sqlfluff fix`` compatible.
**Name**: ``convention.quoted_literals``
**Aliases**: ``L064``
**Groups**: ``all``, ``convention``
**Configuration**
* ``force_enable``: Run this rule even for dialects where this rule is disabled by default. Must be one of ``[True, False]``.
* ``preferred_quoted_literal_style``: Preferred quoting style to use for the quoted literals. If set to ``consistent`` quoting style is derived from the first quoted literal in the file. Must be one of ``['consistent', 'single_quotes', 'double_quotes']``.
NOTE:
Anti-pattern
Using mixture of CONVERT, :: and CAST when preferred_type_casting_style config is set to consistent (default).
SELECT
CONVERT(int, 1) AS bar,
100::int::text,
CAST(10 AS text) AS coo FROM foo;
Best practice
Use consistent type casting style.
SELECT
CAST(1 AS int) AS bar,
CAST(CAST(100 AS int) AS text),
CAST(10 AS text) AS coo FROM foo; This rule is ``sqlfluff fix`` compatible. **Name**: ``convention.casting_style`` **Aliases**: ``L067`` **Groups**: ``all``, ``convention`` **Configuration** * ``preferred_type_casting_style``: The expectation for using sql type casting. Must be one of ``['consistent', 'shorthand', 'convert', 'cast']``.
Anti-pattern
Using WHERE clause for join conditions.
SELECT
foo.a
, bar.b FROM foo JOIN bar WHERE foo.x = bar.y;
Best practice
Use JOIN ON clause for join condition.
SELECT
foo.a
, bar.b FROM foo JOIN bar ON foo.x = bar.y; This rule is ``sqlfluff fix`` compatible. **Name**: ``convention.join_condition`` **Groups**: ``all``, ``convention``
This rule is only active if the jinja templater (or one of it's subclasses, like the dbt templater) are used for the current file.
Anti-pattern
Jinja tags with either no whitespace or very long whitespace are hard to read.
SELECT {{ a }} from {{ref('foo')}}
Best practice
A single whitespace surrounding Jinja tags, alternatively longer gaps containing newlines are acceptable.
SELECT {{ a }} from {{ ref('foo') }};
SELECT {{ a }} from {{
ref('foo')
}};
This rule is ``sqlfluff fix`` compatible.
**Name**: ``jinja.padding``
**Aliases**: ``L046``
**Groups**: ``all``, ``core``, ``jinja``
This rule checks for an enforces the spacing as configured in Configuring Layout. This includes excessive whitespace, trailing whitespace at the end of a line and also the wrong spacing between elements on the line. Because of this wide reach you may find that you wish to add specific configuration in your project to tweak how specific elements are treated. Rather than configuration on this specific rule, use the sqlfluff.layout section of your configuration file to customise how this rule operates.
The • character represents a space in the examples below.
Anti-pattern
SELECT
a, b(c) as d•• FROM foo•••• JOIN bar USING(a)
Best practice
SELECT
a, b(c) as d FROM foo JOIN bar USING (a) This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.spacing`` **Aliases**: ``L001``, ``L005``, ``L006``, ``L008``, ``L023``, ``L024``, ``L039``, ``L048``, ``L071`` **Groups**: ``all``, ``core``, ``layout``
Anti-pattern
The • character represents a space and the → character represents a tab. In this example, the third line contains five spaces instead of four and the second line contains two spaces and one tab.
SELECT
••→a,
•••••b
FROM foo
Best practice
Change the indentation to use a multiple of four spaces. This example also assumes that the indent_unit config value is set to space. If it had instead been set to tab, then the indents would be tabs instead.
SELECT
••••a,
••••b
FROM foo
This rule is ``sqlfluff fix`` compatible.
**Name**: ``layout.indent``
**Aliases**: ``L002``, ``L003``, ``L004``
**Groups**: ``all``, ``core``, ``layout``
The configuration for whether operators should be trailing or leading is part of Configuring Layout. The default configuration is:
[sqlfluff:layout:type:binary_operator] line_position = leading [sqlfluff:layout:type:comparison_operator] line_position = leading
Anti-pattern
In this example, if line_position = leading (or unspecified, as is the default), then the operator + should not be at the end of the second line.
SELECT
a +
b FROM foo
Best practice
If line_position = leading (or unspecified, as this is the default), place the operator after the newline.
SELECT
a
+ b FROM foo
If line_position = trailing, place the operator before the newline.
SELECT
a +
b FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.operators`` **Aliases**: ``L007`` **Groups**: ``all``, ``layout``
The configuration for whether operators should be trailing or leading is part of Configuring Layout. The default configuration is:
[sqlfluff:layout:type:comma] line_position = trailing
Anti-pattern
There is a mixture of leading and trailing commas.
SELECT
a
, b,
c FROM foo
Best practice
By default, SQLFluff prefers trailing commas. However it is configurable for leading commas. The chosen style must be used consistently throughout your SQL.
SELECT
a,
b,
c FROM foo -- Alternatively, set the configuration file to 'leading' -- and then the following would be acceptable: SELECT
a
, b
, c FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.commas`` **Aliases**: ``L019`` **Groups**: ``all``, ``layout``
This rule is sqlfluff fix compatible.
Name: layout.long_lines
Aliases: L016
Groups: all, core, layout
Configuration
Anti-pattern
In this example, there is a space between the function and the parenthesis.
SELECT
sum (a) FROM foo
Best practice
Remove the space between the function and the parenthesis.
SELECT
sum(a) FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.functions`` **Aliases**: ``L017`` **Groups**: ``all``, ``core``, ``layout``
Anti-pattern
In this example, the closing bracket is on the same line as CTE.
WITH zoo AS (
SELECT a FROM foo)
SELECT * FROM zoo
Best practice
Move the closing bracket on a new line.
WITH zoo AS (
SELECT a FROM foo ) SELECT * FROM zoo This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.cte_bracket`` **Aliases**: ``L018`` **Groups**: ``all``, ``core``, ``layout``
Anti-pattern
There is no blank line after the CTE closing bracket. In queries with many CTEs, this hinders readability.
WITH plop AS (
SELECT * FROM foo ) SELECT a FROM plop
Best practice
Add a blank line.
WITH plop AS (
SELECT * FROM foo ) SELECT a FROM plop This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.cte_newline`` **Aliases**: ``L022`` **Groups**: ``all``, ``core``, ``layout``
NOTE:
Anti-pattern
Multiple select targets on the same line.
select a, b from foo; -- Single select target on its own line. SELECT
a FROM foo;
Best practice
Multiple select targets each on their own line.
select
a,
b from foo; -- Single select target on the same line as the ``SELECT`` -- keyword. SELECT a FROM foo; -- When select targets span multiple lines, however they -- can still be on a new line. SELECT
SUM(
1 + SUM(
2 + 3
)
) AS col FROM test_table; This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.select_targets`` **Aliases**: ``L036`` **Groups**: ``all``, ``layout`` **Configuration** * ``wildcard_policy``: Treatment of wildcards. Defaults to ``single``. Must be one of ``['single', 'multiple']``.
Anti-pattern
select
distinct a,
b from x
Best practice
select distinct
a,
b from x This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.select_modifiers`` **Aliases**: ``L041`` **Groups**: ``all``, ``core``, ``layout``
Anti-pattern
In this example, UNION ALL is not on a line itself.
SELECT 'a' AS col UNION ALL SELECT 'b' AS col
Best practice
SELECT 'a' AS col UNION ALL SELECT 'b' AS col This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.set_operators`` **Aliases**: ``L065`` **Groups**: ``all``, ``core``, ``layout``
Anti-pattern
The content in file does not end with a single trailing newline. The $ represents end of file.
SELECT
a
FROM foo$
-- Ending on an indented line means there is no newline
-- at the end of the file, the • represents space.
SELECT
••••a
FROM
••••foo
••••$
-- Ending on a semi-colon means the last line is not a
-- newline.
SELECT
a
FROM foo
;$
-- Ending with multiple newlines.
SELECT
a
FROM foo
$
Best practice
Add trailing newline to the end. The $ character represents end of file.
SELECT
a
FROM foo
$
-- Ensuring the last line is not indented so is just a
-- newline.
SELECT
••••a
FROM
••••foo
$
-- Even when ending on a semi-colon, ensure there is a
-- newline after.
SELECT
a
FROM foo
;
$
This rule is ``sqlfluff fix`` compatible.
**Name**: ``layout.end_of_file``
**Aliases**: ``L009``, ``layout.end-of-file``
**Groups**: ``all``, ``core``, ``layout``
Anti-pattern
The file begins with newlines or whitespace. The ^ represents the beginning of the file.
^
SELECT
a
FROM foo
-- Beginning on an indented line is also forbidden,
-- (the • represents space).
••••SELECT
••••a
FROM
••••foo
Best practice
Start file on either code or comment. (The ^ represents the beginning of the file.)
^SELECT
a
FROM foo
-- Including an initial block comment.
^/*
This is a description of my SQL code.
*/
SELECT
a
FROM
foo
-- Including an initial inline comment.
^--This is a description of my SQL code.
SELECT
a
FROM
foo
This rule is ``sqlfluff fix`` compatible.
**Name**: ``layout.start_of_file``
**Aliases**: ``L050``
**Groups**: ``all``, ``layout``
Anti-pattern
In this example, the keyword are not at the beginning of or alone on the line.
SELECT 'a' AS col FROM tab WHERE x = 4 ORDER BY y LIMIT 5
Best practice
SELECT 'a' AS col FROM tab WHERE x = 4 ORDER BY y LIMIT 5
SELECT 'a' AS col FROM
tab WHERE
x = 4 ORDER BY
y LIMIT
5 This rule is ``sqlfluff fix`` compatible. **Name**: ``layout.keyword_newline`` **Groups**: ``all``, ``layout``
NOTE:
Anti-pattern
In this example, the reference vee has not been declared.
SELECT
vee.a FROM foo
Best practice
Remove the reference.
SELECT
a FROM foo **Name**: ``references.from`` **Aliases**: ``L026`` **Groups**: ``all``, ``core``, ``references`` **Configuration** * ``force_enable``: Run this rule even for dialects where this rule is disabled by default. Must be one of ``[True, False]``.
NOTE:
Anti-pattern
In this example, the reference vee has not been declared, and the variables a and b are potentially ambiguous.
SELECT a, b FROM foo LEFT JOIN vee ON vee.a = foo.a
Best practice
Add the references.
SELECT foo.a, vee.b FROM foo LEFT JOIN vee ON vee.a = foo.a **Name**: ``references.qualification`` **Aliases**: ``L027`` **Groups**: ``all``, ``references``
NOTE:
"consistent" will be fixed to "qualified" if inconsistency is found.
Anti-pattern
In this example, only the reference to b is qualified.
SELECT
a,
foo.b FROM foo
Best practice
Either all column references should be qualified, or all unqualified.
SELECT
a,
b FROM foo -- Also good SELECT
foo.a,
foo.b FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``references.consistent`` **Aliases**: ``L028`` **Groups**: ``all``, ``references`` **Configuration** * ``force_enable``: Run this rule even for dialects where this rule is disabled by default. Must be one of ``[True, False]``. * ``single_table_references``: The expectation for references in single-table select. Must be one of ``['consistent', 'qualified', 'unqualified']``.
Although unreserved keywords can be used as identifiers, and reserved words can be used as quoted identifiers, best practice is to avoid where possible, to avoid any misunderstandings as to what the alias represents.
NOTE:
Anti-pattern
In this example, SUM (built-in function) is used as an alias.
SELECT
sum.a FROM foo AS sum
Best practice
Avoid keywords as the name of an alias.
SELECT
vee.a FROM foo AS vee **Name**: ``references.keywords`` **Aliases**: ``L029`` **Groups**: ``all``, ``references`` **Configuration** * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``. * ``quoted_identifiers_policy``: Types of quoted identifiers to flag violations for. Must be one of ``['all', 'aliases', 'column_aliases', 'table_aliases', 'none']``. * ``unquoted_identifiers_policy``: Types of unquoted identifiers to flag violations for. Must be one of ``['all', 'aliases', 'column_aliases', 'table_aliases']``.
Anti-pattern
Using special characters within identifiers when creating or aliasing objects.
CREATE TABLE DBO.ColumnNames (
[Internal Space] INT,
[Greater>Than] INT,
[Less<Than] INT,
Number# INT )
Best practice
Identifiers should include only alphanumerics and underscores.
CREATE TABLE DBO.ColumnNames (
[Internal_Space] INT,
[GreaterThan] INT,
[LessThan] INT,
NumberVal INT ) **Name**: ``references.special_chars`` **Aliases**: ``L057`` **Groups**: ``all``, ``references`` **Configuration** * ``additional_allowed_characters``: Optional list of extra allowed characters, in addition to alphanumerics (A-Z, a-z, 0-9) and underscores. * ``allow_space_in_identifier``: Should spaces in identifiers be allowed? Must be one of ``[True, False]``. * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``. * ``quoted_identifiers_policy``: Types of quoted identifiers to flag violations for. Must be one of ``['all', 'aliases', 'column_aliases', 'table_aliases', 'none']``. * ``unquoted_identifiers_policy``: Types of unquoted identifiers to flag violations for. Must be one of ``['all', 'aliases', 'column_aliases', 'table_aliases']``.
This rule will fail if the quotes used to quote an identifier are (un)necessary depending on the force_quote_identifier configuration. This rule applies to both column references and their aliases. The default (safe) behaviour is designed not to unexpectedly corrupt SQL. That means the circumstances in which quotes can be safely removed depends on the current dialect would resolve the unquoted variant of the identifier (see below for examples).
Additionally this rule may be configured to a more aggressive setting by setting case_sensitive to False, in which case quotes will be removed regardless of the casing of the contained identifier. Any identifiers which contain special characters, spaces or keywords will still be left quoted. This setting is more appropriate for projects or teams where there is more control over the inputs and outputs of queries, and where it's more viable to institute rules such as enforcing that all identifiers are the default casing (and therefore meaning that using quotes to change the case of identifiers is unnecessary).
| Dialect group | ✅ Example where quotes are safe to remove. | ⚠️ Examples where quotes are not safe to remove. |
| Natively UPPERCASE dialects e.g. Snowflake, BigQuery, TSQL & Oracle. | Identifiers which, without quotes, would resolve to the default casing of FOO i.e. "FOO". | Identifiers where the quotes are necessary to preserve case (e.g. "Foo" or "foo"), or where the identifier contains something invalid without the quotes such as keywords or special characters e.g. "SELECT", "With Space" or "Special&Characters". |
| Natively lowercase dialects e.g. Athena, Hive & Postgres | Identifiers which, without quotes, would resolve to the default casing of foo i.e. "foo". | Identifiers where the quotes are necessary to preserve case (e.g. "Foo" or "foo"), or where the identifier contains something invalid without the quotes such as keywords or special characters e.g. "SELECT", "With Space" or "Special&Characters". |
| Case insensitive dialects e.g. DuckDB or Apache Spark SQL | Any identifiers which are valid without quotes: e.g. "FOO", "foo", "Foo", "fOo", FOO and foo would all resolve to the same object. | Identifiers which contain something invalid without the quotes such as keywords or special characters e.g. "SELECT", "With Space" or "Special&Characters". |
This rule is closely associated with (and constrained by the same above factors) as aliasing.self_alias.column (AL09).
When prefer_quoted_identifiers = False (default behaviour), the quotes are unnecessary, except for reserved keywords and special characters in identifiers.
Anti-pattern
In this example, valid unquoted identifiers, that are not also reserved keywords, are needlessly quoted.
SELECT "foo" as "bar"; -- For lowercase dialects like Postgres SELECT "FOO" as "BAR"; -- For uppercase dialects like Snowflake
Best practice
Use unquoted identifiers where possible.
SELECT foo as bar; -- For lowercase dialects like Postgres SELECT FOO as BAR; -- For uppercase dialects like Snowflake -- Note that where the case of the quoted identifier requires -- the quotes to remain, or where the identifier cannot be -- unquoted because it would be invalid to do so, the quotes -- may remain. For example: SELECT
"Case_Sensitive_Identifier" as is_allowed,
"Identifier with spaces or speci@l characters" as this_too,
"SELECT" as also_reserved_words FROM "My Table With Spaces"
When prefer_quoted_identifiers = True, the quotes are always necessary, no matter if the identifier is valid, a reserved keyword, or contains special characters.
NOTE:
Anti-pattern
In this example, a valid unquoted identifier, that is also not a reserved keyword, is required to be quoted.
SELECT 123 as foo
Best practice Use quoted identifiers.
SELECT 123 as "foo" -- For ANSI, ... -- or SELECT 123 as `foo` -- For BigQuery, MySql, ... This rule is ``sqlfluff fix`` compatible. **Name**: ``references.quoting`` **Aliases**: ``L059`` **Groups**: ``all``, ``references`` **Configuration** * ``case_sensitive``: If ``False``, comparison is done case in-sensitively. Defaults to ``True``. Must be one of ``[True, False]``. * ``ignore_words``: Comma separated list of words to ignore from rule. * ``ignore_words_regex``: Words to ignore from rule if they are a partial match for the regular expression. To ignore only full matches you can use ``^`` (beginning of text) and ``$`` (end of text). Due to regular expression operator precedence, it is good practice to use parentheses around everything between ``^`` and ``$``. * ``prefer_quoted_identifiers``: If ``True``, requires every identifier to be quoted. Defaults to ``False``. Must be one of ``[True, False]``. * ``prefer_quoted_keywords``: If ``True``, requires every keyword used as an identifier to be quoted. Defaults to ``False``. Must be one of ``[True, False]``.
Anti-pattern
select
case
when name like '%cat%' then 'meow'
when name like '%dog%' then 'woof'
else null
end from x
Best practice
Omit else null
select
case
when name like '%cat%' then 'meow'
when name like '%dog%' then 'woof'
end from x This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.else_null`` **Aliases**: ``L035`` **Groups**: ``all``, ``structure``
Anti-pattern
CASE statement returns booleans.
select
case
when fab > 0 then true
else false
end as is_fab from fancy_table -- This rule can also simplify CASE statements -- that aim to fill NULL values. select
case
when fab is null then 0
else fab
end as fab_clean from fancy_table -- This also covers where the case statement -- replaces NULL values with NULL values. select
case
when fab is null then null
else fab
end as fab_clean from fancy_table
Best practice
Reduce to WHEN condition within COALESCE function.
select
coalesce(fab > 0, false) as is_fab from fancy_table -- To fill NULL values. select
coalesce(fab, 0) as fab_clean from fancy_table -- NULL filling NULL. select fab as fab_clean from fancy_table This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.simple_case`` **Aliases**: ``L043`` **Groups**: ``all``, ``structure``
Anti-pattern
Defining a CTE that is not used by the query is harmless, but it means the code is unnecessary and could be removed.
WITH cte1 AS (
SELECT a
FROM t ), cte2 AS (
SELECT b
FROM u ) SELECT * FROM cte1
Best practice
Remove unused CTEs.
WITH cte1 AS (
SELECT a
FROM t ) SELECT * FROM cte1 **Name**: ``structure.unused_cte`` **Aliases**: ``L045`` **Groups**: ``all``, ``core``, ``structure``
Anti-pattern
In this example, the outer CASE's ELSE is an unnecessary, nested CASE.
SELECT
CASE
WHEN species = 'Cat' THEN 'Meow'
ELSE
CASE
WHEN species = 'Dog' THEN 'Woof'
END
END as sound FROM mytable
Best practice
Move the body of the inner CASE to the end of the outer one.
SELECT
CASE
WHEN species = 'Cat' THEN 'Meow'
WHEN species = 'Dog' THEN 'Woof'
END AS sound FROM mytable This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.nested_case`` **Aliases**: ``L058`` **Groups**: ``all``, ``structure``
By default this rule is configured to allow subqueries within FROM clauses but not within JOIN clauses. If you prefer a stricter lint then this is configurable.
NOTE:
Anti-pattern
select
a.x, a.y, b.z from a join (
select x, z from b ) using(x)
Best practice
with c as (
select x, z from b ) select
a.x, a.y, c.z from a join c using(x) This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.subquery`` **Aliases**: ``L042`` **Groups**: ``all``, ``structure`` **Configuration** * ``forbid_subquery_in``: Which clauses should be linted for subqueries? Must be one of ``['join', 'from', 'both']``.
Anti-pattern
select
a,
*,
row_number() over (partition by id order by date) as y,
b from x
Best practice
Order select targets in ascending complexity
select
*,
a,
b,
row_number() over (partition by id order by date) as y from x This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.column_order`` **Aliases**: ``L034`` **Groups**: ``all``, ``structure``
NOTE:
In fact dbt removed it from their style guide in February 2022. However, some like the rule, so for now we will keep it in SQLFluff, but encourage those that do not find value in the rule, to turn it off.
NOTE:
Anti-pattern
SELECT
table_a.field_1,
table_b.field_2 FROM
table_a INNER JOIN table_b USING (id)
Best practice
Specify the keys directly
SELECT
table_a.field_1,
table_b.field_2 FROM
table_a INNER JOIN table_b
ON table_a.id = table_b.id This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.using`` **Aliases**: ``L032`` **Groups**: ``all``, ``structure``
Anti-pattern
In this example, parentheses are not needed and confuse DISTINCT with a function. The parentheses can also be misleading about which columns are affected by the DISTINCT (all the columns!).
SELECT DISTINCT(a), b FROM foo
Best practice
Remove parentheses to be clear that the DISTINCT applies to both columns.
SELECT DISTINCT a, b FROM foo This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.distinct`` **Aliases**: ``L015`` **Groups**: ``all``, ``structure``, ``core``
This rule will break conditions from join clauses down into subconditions using the "and" and "or" binary operators.
Subconditions that are made up of a qualified column reference, a comparison operator and another qualified column reference are then evaluated to check whether they list the table that was referenced earlier - or later, depending on the preferred_first_table_in_join_clause configuration.
Subconditions that do not follow that pattern are ignored by this rule.
NOTE:
Anti-pattern
In this example, the tables that were referenced later are listed first and the preferred_first_table_in_join_clause configuration is set to earlier.
select
foo.a,
foo.b,
bar.c from foo left join bar
-- This subcondition does not list
-- the table referenced earlier first:
on bar.a = foo.a
-- Neither does this subcondition:
and bar.b = foo.b
Best practice
List the tables that were referenced earlier first.
select
foo.a,
foo.b,
bar.c from foo left join bar
on foo.a = bar.a
and foo.b = bar.b This rule is ``sqlfluff fix`` compatible. **Name**: ``structure.join_condition_order`` **Groups**: ``all``, ``structure`` **Configuration** * ``preferred_first_table_in_join_clause``: Which table to list first when joining two tables. Defaults to ``earlier``. Must be one of ``['earlier', 'later']``.
Including an expression that always evaluates to either TRUE or FALSE regardless of the input columns is unnecessary and makes statements harder to read and understand.
Constant conditions are sometimes mistakes (by mistyping the column name intended), and sometimes the result of incorrect information that they are necessary in some circumstances. In the former case, they can sometimes result in a cartesian join if it was supposed to be a join condition. Given the ambiguity of intent, this rule does not suggest an automatic fix, and instead invites the user to resolve the problem manually.
Anti-pattern
SELECT * FROM my_table -- This following WHERE clause is redundant. WHERE my_table.col = my_table.col
Best practice
SELECT * FROM my_table -- Replace with a condition that includes meaningful logic, -- or remove the condition entirely. WHERE my_table.col > 3 **Name**: ``structure.constant_expression`` **Groups**: ``all``, ``structure``
This rule will check if there are any tables that are referenced in the FROM or JOIN clause of a SELECT statement, but where no columns from that table are referenced in the any of the other clauses. Because some types of join are often used as filters, or to otherwise control granularity without being referenced (e.g. INNER and CROSS), this rule only applies to explicit OUTER joins (i.e. LEFT, RIGHT and FULL joins).
This rule relies on all of the column references in the SELECT statement being qualified with at least the table name, and so is designed to work alongside references.qualification (RF02). This is because without the knowledge of what columns exist in each upstream table, the rule is unable to resolve which table an unqualified column reference is pulled from.
This rule does not propose a fix, because it assumes that it an unused table is a mistake, but doesn't know whether the mistake was the join, or the mistake was not using it.
Anti-pattern
In this example, the table bar is included in the JOIN clause but not columns from it are referenced in
SELECT
foo.a,
foo.b FROM foo LEFT JOIN bar ON foo.a = bar.a
Best practice
Remove the join, or use the table.
SELECT foo.a, vee.b FROM foo; SELECT
foo.a,
foo.b,
bar.c FROM foo LEFT JOIN bar ON foo.a = bar.a
In the (very rare) situations that it is logically necessary to include a table in a join clause, but not otherwise refer to it (likely for granularity reasons, or as a stepping stone to another table), we recommend ignoring this rule for that specific line by using -- noqa: ST10 at the end of the line.
Anti-pattern
The SP_ prefix is used to identify system procedures and can adversely affect performance of the user-defined stored procedure. It can also break system procedures if there is a naming conflict.
CREATE PROCEDURE dbo.sp_pull_data
AS
SELECT
ID,
DataDate,
CaseOutput
FROM table1
Best practice
Use a different name for the stored procedure.
CREATE PROCEDURE dbo.pull_data
AS
SELECT
ID,
DataDate,
CaseOutput
FROM table1
-- Alternatively prefix with USP_ to
-- indicate a user-defined stored procedure.
CREATE PROCEDURE dbo.usp_pull_data
AS
SELECT
ID,
DataDate,
CaseOutput
FROM table1
**Name**: ``tsql.sp_prefix``
**Aliases**: ``L056``
**Groups**: ``all``, ``tsql``
SQLFluff is a modular SQL linter for humans.
sqlfluff [OPTIONS] COMMAND [ARGS]...
Options
Examples:
sqlfluff lint --dialect postgres . sqlfluff lint --dialect mysql --rules ST05 my_query.sql sqlfluff fix --dialect sqlite --rules LT10,ST05 src/queries sqlfluff parse --dialect duckdb --templater jinja path/my_query.sql
Show the current dialects available.
sqlfluff dialects [OPTIONS]
Options
Fix SQL files.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
sqlfluff fix [OPTIONS] [PATHS]...
Options
Arguments
Autoformat SQL files.
This effectively force applies sqlfluff fix with a known subset of fairly stable rules. Enabled rules are ignored, but rule exclusions (via CLI) or config are still respected.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
sqlfluff format [OPTIONS] [PATHS]...
Options
Arguments
Lint SQL files via passing a list of files or using stdin.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
Linting SQL files:
Linting a file via stdin (note the lone '-' character):
sqlfluff lint [OPTIONS] [PATHS]...
Options
Arguments
Parse SQL files and just spit out the result.
PATH is the path to a sql file or directory to lint. This can be either a file ('path/to/file.sql'), a path ('directory/of/sql/files'), a single ('-') character to indicate reading from stdin or a dot/blank ('.'/' ') which will be interpreted like passing the current working directory as a path argument.
sqlfluff parse [OPTIONS] PATH
Options
Arguments
Render SQL files and just spit out the result.
PATH is the path to a sql file. This should be either a single file file ('path/to/file.sql') or a single ('-') character to indicate reading from stdin.
sqlfluff render [OPTIONS] PATH
Options
Arguments
Show the current rules in use.
sqlfluff rules [OPTIONS]
Options
Show the version of sqlfluff.
sqlfluff version [OPTIONS]
Options
SQLFluff exposes a public api for other python applications to use. A basic example of this usage is given here, with the documentation for each of the methods below.
"""This is an example of how to use the simple sqlfluff api."""
from typing import Any, Dict, Iterator, List, Union
import sqlfluff
# -------- LINTING ----------
my_bad_query = "SeLEct *, 1, blah as fOO from mySchema.myTable"
# Lint the given string and return an array of violations in JSON representation.
lint_result = sqlfluff.lint(my_bad_query, dialect="bigquery")
# lint_result =
# [
# {
# "code": "CP01",
# "line_no": 1,
# "line_pos": 1,
# "description": "Keywords must be consistently upper case.",
# }
# ...
# ]
# -------- FIXING ----------
# Fix the given string and get a string back which has been fixed.
fix_result_1 = sqlfluff.fix(my_bad_query, dialect="bigquery")
# fix_result_1 = 'SELECT *, 1, blah AS foo FROM myschema.mytable\n'
# We can also fix just specific rules.
fix_result_2 = sqlfluff.fix(my_bad_query, rules=["CP01"])
# fix_result_2 = 'SELECT *, 1, blah AS fOO FROM mySchema.myTable'
# Or a subset of rules...
fix_result_3 = sqlfluff.fix(my_bad_query, rules=["CP01", "CP02"])
# fix_result_3 = 'SELECT *, 1, blah AS fOO FROM myschema.mytable'
# -------- PARSING ----------
# Parse the given string and return a JSON representation of the parsed tree.
parse_result = sqlfluff.parse(my_bad_query)
# parse_result = {'file': {'statement': {...}, 'newline': '\n'}}
# This JSON structure can then be parsed as required.
# An example usage is shown below:
def get_json_segment(
parse_result: Dict[str, Any], segment_type: str
) -> Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
"""Recursively search JSON parse result for specified segment type.
Args:
parse_result (Dict[str, Any]): JSON parse result from `sqlfluff.fix`.
segment_type (str): The segment type to search for.
Yields:
Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
Retrieves children of specified segment type as either a string for a raw
segment or as JSON or an array of JSON for non-raw segments.
"""
for k, v in parse_result.items():
if k == segment_type:
yield v
elif isinstance(v, dict):
yield from get_json_segment(v, segment_type)
elif isinstance(v, list):
for s in v:
yield from get_json_segment(s, segment_type)
# e.g. Retrieve array of JSON for table references.
table_references = list(get_json_segment(parse_result, "table_reference"))
print(table_references)
# [[{'identifier': 'mySchema'}, {'dot': '.'}, {'identifier': 'myTable'}]]
# Retrieve raw table name from last identifier in the table reference.
for table_reference in table_references:
table_name = list(get_json_segment(parse_result, "naked_identifier"))[-1]
print(f"table_name: {table_name}")
# table_name: myTable
Sqlfluff is a SQL linter for humans.
NOTE:
The simple API presents only a fraction of the functionality present within the core SQLFluff library. For more advanced use cases, users can import the Linter() and FluffConfig() classes from sqlfluff.core. As of version 0.4.0 this is considered as experimental only as the internals may change without warning in any future release. If you come to rely on the internals of SQLFluff, please post an issue on GitHub to share what you're up to. This will help shape a more reliable, tidy and well documented public API for use.
You can use FluffConfig() class to configure SQLFluff behaviour.
"""This is an example of providing config overrides.""" from sqlfluff.core import FluffConfig, Linter sql = "SELECT 1\n" config = FluffConfig(
overrides={
"dialect": "snowflake",
# NOTE: We explicitly set the string "none" here rather
# than a None literal so that it overrides any config
# set by any config files in the path.
"library_path": "none",
} ) linted_file = Linter(config=config).lint_string(sql) assert linted_file.get_violations() == []
Instances of FluffConfig() can be created manually, or parsed.
"""An example to show a few ways of configuring the API."""
import sqlfluff
from sqlfluff.core import FluffConfig, Linter
# #######################################
# The simple API can be configured in three ways.
# 1. Limited keyword arguments
sqlfluff.fix("SELECT 1", dialect="bigquery")
# 2. Providing the path to a config file
sqlfluff.fix("SELECT 1", config_path="test/fixtures/.sqlfluff")
# 3. Providing a preconfigured FluffConfig object.
# NOTE: This is the way of configuring SQLFluff which will give the most control.
# 3a. FluffConfig objects can be created directly from a dictionary of values.
config = FluffConfig(configs={"core": {"dialect": "bigquery"}})
# 3b. FluffConfig objects can be created from a config file in a string.
config = FluffConfig.from_string("[sqlfluff]\ndialect=bigquery\n")
# 3c. FluffConfig objects can be created from a config file in multiple strings
# to simulate the effect of multiple nested config strings.
config = FluffConfig.from_strings(
# NOTE: Given these two strings, the resulting dialect would be "mysql"
# as the later files take precedence.
"[sqlfluff]\ndialect=bigquery\n",
"[sqlfluff]\ndialect=mysql\n",
)
# 3d. FluffConfig objects can be created from a path containing a config file.
config = FluffConfig.from_path("test/fixtures/")
# 3e. FluffConfig objects can be from keyword arguments
config = FluffConfig.from_kwargs(dialect="bigquery", rules=["LT01"])
# The FluffConfig is then provided via a config argument.
sqlfluff.fix("SELECT 1", config=config)
# #######################################
# The core API is always configured using a FluffConfig object.
# When instantiating a Linter (or Parser), a FluffConfig must be provided
# on instantiation. See above for details on how to create a FluffConfig.
linter = Linter(config=config)
# The provided config will then be used in any operations.
lint_result = linter.lint_string("SELECT 1", fix=True)
fixed_string = lint_result.fix_string()
# NOTE: The "True" element shows that fixing was a success.
assert fixed_string == ("SELECT 1", True)
Supported dialects and rules are available through list_dialects() and list_rules().
"""This is an example of how get basic options from sqlfluff.""" import sqlfluff # -------- DIALECTS ---------- dialects = sqlfluff.list_dialects() # dialects = [DialectTuple(label='ansi', name='ansi', inherits_from='nothing'), ...] dialect_names = [dialect.label for dialect in dialects] # dialect_names = ["ansi", "snowflake", ...] # -------- RULES ---------- rules = sqlfluff.list_rules() # rules = [ # RuleTuple( # code='Example_LT01', # description='ORDER BY on these columns is forbidden!' # ), # ... # ] rule_codes = [rule.code for rule in rules] # rule_codes = ["LT01", "LT02", ...]
The core elements of sqlfluff.
If we fail to match the whole string, then we must have found something that we cannot lex. If that happens we should package it up as unlexable and keep track of the exceptions.
This adds slices in the templated file to the original lexed elements. We'll need this to work out the position in the source file.
NB: This a generator which will yield the result of each file within the path iteratively.
Anything within this section should only be necessary for people who are developing plugins or rules to interact with SQLFluff on a deeper level or people who've decided to help the project by contributing to SQLFluff.
As these docs are some of the least commonly used, you may find that not all modules are documented directly here, and so you may find it instructive to read this together with docstrings and comments directly within the SQLFluff codebase on GitHub.
When using the Python API, there are additional options for configuration beyond those specified in the Setting Configuration section of the main docs. Internally, SQLFluff uses a consistent FluffConfig class which is then made accessible to different parts of the tool during linting and fixing.
As described in the Nesting section of the configuration docs, multiple nested documentation files can be used in a project and the result is a combined config object which contains the resulting union of those files. Under the hood, this is stored in a dict object, and it's possible get and set individual values, using get() & set_value(), but also get entire portions of that config dict using get_section().
When instantiating a FluffConfig object, there are a few options to set specific config values (such as dialect or rules), but to access the full available set of features it's best to pass in a dict of the values you want to set.
This config dict is a nested object, where the colon (:) characters from the .sqlfluff config files, delimit the keys. For example, take the following config file:
[sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower
This would be represented in the config dict as below. See that the nested structure has been created by splitting the keys on the colon (:) characters:
configs = {
"rules":{
"capitalisation.keywords": {
"capitalisation_policy": "lower"
}
}
}
The following methods are provided to allow conversion of a selection of file formats into a consistent mapping object for instantiating a FluffConfig object.
This is primarily used when loading configuration bundled with a SQLFluff plugin, or to load the default config for SQLFluff itself. By loading config from the package directly we avoid some of the path resolution which otherwise occurs. This is also more compatible with mypyc because it avoids the use of the __file__ attribute to find the default config.
Any paths found in the loaded config are resolved relative to os.getcwd().
For more information about resource loading, see the docs for importlib: https://docs.python.org/3/library/importlib.resources.html
This function will search for all valid config files at the given path, load any found and combine them into a config mapping. If multiple valid files are found, they are resolved in priority order, where pyproject.toml is given the highest precedence, followed by .sqlfluff, pep8.ini, tox.ini and finally setup.cfg.
By accepting only a path string, we enable efficient caching of results, such that configuration can be reused between files without reloading the information from disk.
We layer each of the configs on top of each other, starting with any home or user configs (e.g. in appdir or home (~)), then any local project configuration and then any explicitly specified config paths.
This class is designed to be instantiated once for each file and then be reused by each part of the process. For multiple files in the same path, a parent object will be created for the each path and then variants of it are created for each file. The object itself contains the references to any long lived objects which might be used by multiple parts of the codebase such as the dialect and the templater (both of which can be resource intensive to load & instantiate), which allows (for example), multiple files to reuse the same instance of the relevant dialect.
It is also designed to pickle well for use in parallel operations.
NOTE:
Copies created using this method can safely be modified without those changes propagating back up to the object which was originally copied.
This is primarily used in the CLI logs to indicate to the user what values have been changed for each file compared to the root config for the project.
This is a convenience method for the ways that the public classes like Linter(), Parser() and Lexer() allow a subset of attributes to be set directly rather than requiring a pre-made FluffConfig.
Config strings are incorporated from first to last, treating the first element as the "root" config, and then later config strings will take precedence over any earlier values.
The following examples show how to fetch various default values:
>>> FluffConfig(overrides={"dialect": "ansi"}).get("dialect")
'ansi'
>>> config = FluffConfig(overrides={"dialect": "ansi"})
>>> config.get("tab_space_size", section="indentation")
4
>>> FluffConfig(overrides={"dialect": "ansi"}).get(
... "capitalisation_policy",
... section=["rules", "capitalisation.keywords"]
... )
'consistent'
If the element found at the address is a value and not a section, it is still returned and so this can be used as a more advanced from of the basic get method.
NOTE:
This is primarily to enable formatting of config objects in the CLI.
We show values before dicts, the tuple contains an indent value to know what level of the dict we're in. Dict labels will be returned as a blank value before their content.
>>> cfg = FluffConfig(overrides={"dialect": "ansi"})
>>> cfg.process_inline_config(
... "-- sqlfluff:dialect:postgres",
... "test.sql"
... )
>>> cfg.get("dialect")
'postgres'
>>> cfg = FluffConfig(overrides={"dialect": "ansi"})
>>> cfg.process_raw_file_for_config(
... "-- sqlfluff:dialect:postgres",
... "test.sql"
... )
>>> cfg.get("dialect")
'postgres'
>>> cfg = FluffConfig(overrides={"dialect": "ansi"})
>>> cfg.set_value(["dialect"], "postgres")
>>> cfg.get("dialect")
'postgres'
>>> cfg = FluffConfig(overrides={"dialect": "ansi"})
>>> cfg.set_value(["indentation", "tab_space_size"], 2)
>>> cfg.get("tab_space_size", section="indentation")
2
Implements the base rule class.
Rules crawl through the trees returned by the parser and evaluate particular rules.
The intent is that it should be possible for the rules to be expressed as simply as possible, with as much of the complexity abstracted away.
The evaluation function should take enough arguments that it can evaluate the position of the given segment in relation to its neighbors, and that the segment which finally "triggers" the error, should be the one that would be corrected OR if the rule relates to something that is missing, then it should flag on the segment FOLLOWING, the place that the desired element is missing.
By removing its fixes, a LintResult will still be reported, but it will be treated as _unfixable_.
Or optionally the opposite if keep_meta is True.
If a name is defined, it's the name - otherwise the code.
The name is a much more understandable reference and so makes config files more readable. For backward compatibility however we also support the rule code for those without names.
NB: This is recursive.
This metaclass provides provides auto-enrichment of the rule docstring so that examples, groups, aliases and names are added.
The reason we enrich the docstring is so that it can be picked up by autodoc and all be displayed in the sqlfluff docs.
This contains a set of rules, post filtering but also contains the mapping required to interpret any noqa messages found in files.
The reason for this object is that rules are filtered and instantiated into this pack in the main process when running in multi-processing mode so that user defined rules can be used without reference issues.
A rule set is instantiated on module load, but the references to each of its classes are instantiated at runtime. This means that configuration values can be passed to those rules live and be responsive to any changes in configuration from the path that the file is in.
Rules should be fetched using the get_rulelist() command which also handles any filtering (i.e. allowlisting and denylisting).
New rules should be added to the instance of this class using the register() decorator. That decorator registers the class, but also performs basic type and name-convention checks.
The code for the rule will be parsed from the name, the description from the docstring. The eval function is assumed that it will be overridden by the subclass, and the parent class raises an error on this function if not overridden.
We use the config both for allowlisting and denylisting, but also for configuring the rules given the given config.
@myruleset.register class Rule_LT01(BaseRule):
"Description of rule."
def eval(self, **kwargs):
return LintResult()
We expect that rules are defined as classes with the name Rule_XXXX where XXXX is of the form LNNN, where L is a letter (literally L for linting by default) and N is a three digit number.
If this receives classes by any other name, then it will raise a ValueError.
Generate the master reference map. The priority order is: codes > names > groups > aliases (i.e. if there's a collision between a name and an alias - we assume the alias is wrong)
These newer modules provide a higher-level API for rules working with segments and slices. Rules that need to navigate or search the parse tree may benefit from using these. Eventually, the plan is for all rules to use these modules. As of December 30, 2021, 17+ rules use these modules.
The modules listed below are submodules of sqlfluff.utils.functional.
Surrogate class for working with Segment collections.
The segments may or may not be contiguous in a parse tree. Provides useful operations on a sequence of segments to simplify rule creation.
NOTE: Iterates the segments BETWEEN start_seg and stop_seg, i.e. those segments are not included in the loop.
Defines commonly used segment predicates for rule writers.
For consistency, all the predicates in this module are implemented as functions returning functions. This avoids rule writers having to remember the distinction between normal functions and functions returning functions.
This is not necessarily a complete set of predicates covering all possible requirements. Rule authors can define their own predicates as needed, either as regular functions, lambda, etc.
Surrogate class for working with RawFileSlice collections.
The slices may or may not be contiguous in a file. Provides useful operations on a sequence of slices to simplify rule creation.
NOTE: Iterates the slices BETWEEN start_slice and stop_slice, i.e. those slices are not included in the loop.
Defines commonly used raw file slice predicates for rule writers.
For consistency, all the predicates in this module are implemented as functions returning functions. This avoids rule writers having to remember the distinction between normal functions and functions returning functions.
This is not necessarily a complete set of predicates covering all possible requirements. Rule authors can define their own predicates as needed, either as regular functions, lambda, etc.
Many rules supported by SQLFluff involve the spacing and layout of different elements, either to enforce a particular layout or just to add or remove code elements in a way sensitive to the existing layout configuration. The way this is achieved is through some centralised utilities in the sqlfluff.utils.reflow module.
This module aims to achieve several things:
To support this, the module provides a ReflowSequence class which allows access to all of the relevant operations which can be used to reformat sections of code, or even a whole file. Unless there is a very good reason, all rules should use this same approach to ensure consistent treatment of layout.
This acts as the primary route into using the reflow routines. It acts in a way that plays nicely within a rule context in that it accepts segments and configuration, while allowing access to modified segments and a series of LintFix objects, which can be returned by the calling rule.
Sequences are made up of alternating ReflowBlock and ReflowPoint objects (even if some points have no segments). This is validated on construction.
Most operations also return ReflowSequence objects such that operations can be chained, and then the resultant fixes accessed at the last stage, for example:
fixes = (
ReflowSequence.from_around_target(
context.segment,
root_segment=context.parent_stack[0],
config=context.config,
)
.rebreak()
.get_fixes() )
This assumes that reindent() has already been applied.
NOTE: We don't just expand to the first block around the target but to the first code element, which means we may swallow several comment blocks in the process.
To evaluate reflow around a specific target, we need need to generate a sequence which goes for the preceding raw to the following raw. i.e. at least: block - point - block - point - block (where the central block is the target).
This is intended as a base constructor, which others can use. In particular, if no depth_map argument is provided, this method will generate one in a potentially inefficient way. If the calling method has access to a better way of inferring a depth map (for example because it has access to a common root segment for all the content), it should do that instead and pass it in.
We're hydrating them here directly from the LintResult objects, so for more accurate results, consider using .get_results(). This method is particularly useful when consolidating multiple results into one.
Insertion is always relative to an existing element. Either before or after it as specified by pos. This generates appropriate creation LintFix objects to direct the linter to insert those elements.
This intentionally does not handle indentation, as the existing indents are assumed to be correct.
NOTE:
This generates appropriate replacement LintFix objects to direct the linter to modify those elements.
NOTE this method relies on the embodied results being correct so that we can build on them.
This generates appropriate deletion LintFix objects to direct the linter to remove those elements.
This class, and its sibling ReflowBlock, should not normally be manipulated directly by rules, but instead should be manipulated using ReflowSequence.
It holds segments which can be changed during a reflow operation such as whitespace and newlines.It may also contain Indent and Dedent elements.
It holds no configuration and is influenced by the blocks on either side, so that any operations on it usually have that configuration passed in as required.
Parallel to BaseSegment.class_types
If the point currently contains no newlines, one will be introduced and any trailing whitespace will be effectively removed.
More specifically, the newline is inserted before the existing whitespace, with the new indent being a replacement for that same whitespace.
For placeholder newlines or indents we generate appropriate source fixes.
Returns True if contains only whitespace, indents, template loops or placeholders.
Note: * ReflowBlocks will contain the placeholders and loops * ReflowPoints will contain whitespace, indents and newlines.
These newlines are either newline segments or contained within consumed sections of whitespace. This counts both.
NB: This effectively includes trailing whitespace fixes.
Deletion and edit fixes are generated immediately, but creations are paused to the end and done in bulk so as not to generate conflicts.
Note that the strip_newlines functionality exists here as a slight exception to pure respacing, but as a very simple case of positioning line breaks. The default operation of respace does not enable it, however it exists as a convenience for rules which wish to use it.
This class, and its sibling ReflowPoint, should not normally be manipulated directly by rules, but instead should be manipulated using ReflowSequence.
It holds segments to reflow and also exposes configuration regarding how they are expected to reflow around others. Typically it holds only a single element, which is usually code or a templated element. Because reflow operations control spacing, it would be very unusual for this object to be modified; as such it exposes relatively few methods.
The attributes exposed are designed to be "post configuration" i.e. they should reflect configuration appropriately.
Parallel to BaseSegment.class_types
This is the primary route to construct a ReflowBlock, as is allows all of the inference of the spacing and position configuration from the segments it contains and the appropriate config objects.
Returns True if contains only whitespace, indents, template loops or placeholders.
Note: * ReflowBlocks will contain the placeholders and loops * ReflowPoints will contain whitespace, indents and newlines.
These newlines are either newline segments or contained within consumed sections of whitespace. This counts both.
This page aims to act as a guide for migrating between major SQLFluff releases. Necessarily this means that bugfix releases, or releases requiring no change for the user are not mentioned. For full details of each individual release, see the detailed changelog.
This release makes a couple of potentially breaking changes:
This release include two minor breaking changes which will only affect users engaged in performance optimisation of SQLFluff itself.
This release changes some of the interfaces between SQLFluff core and our plugin ecosystem. The only breaking change is in the interface between SQLFluff and templater plugins (which are not common in the ecosystem, hence why this is only a minor and not a major release).
For all plugins, we also recommend a different structure for their imports (especially for rule plugins which are more common in the ecosystem) - for performance and stability reasons. Some users had been experiencing very long import times with previous releases as a result of the layout of plugin imports. Users with affected plugins will begin to see a warning from this release onward, which can be resolved for their plugin by updating to a new version of that plugin which follows the guidelines.
Templaters before this version would pass a make_template() callable to the slicing methods as part of being able to map the source file. This method would accept a str and return a jinja2.environment.Template object to allow the templater to render multiple variants of the template to do the slicing operation (which allows linting issues found in templated files to be mapped accurately back to their position in the unrendered source file). This approach is not very generalisable, and did not support templating operations with libraries other than jinja2.
As a result, we have amended the interface to instead pass a render_func() callable, which accepts a str and returns a str. This works fine for the jinja templater (and by extension the dbt templater) as they can simply wrap the original callable with a method that calls render() on the original Template object. It also however opens up the door to other templating engines, and in particular to remote templaters which might pass unrendered code over a HTTP connection for rendering.
Specifically:
We recommend that the module in a plugin which defines all of the hook implementations (anything using the @hookimpl decorator) must be able to fully import before any rule implementations are imported. More specifically, SQLFluff must be able to both import and run any implementations of get_configs_info() before any plugin rules (i.e. any derivatives of BaseRule) are imported. Because of this, we recommend that rules are defined in a separate module to the root of the plugin and then only imported within the get_rules() method.
Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented from this version onward, recommending that you update your plugin.
See the Developing Plugins section of the docs for an example.
Upgrading to 2.0 brings several important breaking changes:
To upgrade smoothly between versions, we recommend the following sequence:
To illustrate the points above, this is an illustrative example config for a 2.0 compatible project. Note that the config is fairly brief and sets only the values which differ from the default config.
[sqlfluff] dialect = snowflake templater = dbt max_line_length = 120 # Exclude some specific rules based on a mixture of codes and names exclude_rules = RF02, RF03, RF04, ST06, ST07, AM05, AM06, convention.left_join, layout.select_targets [sqlfluff:indentation] # Enabling implicit indents for this project. # See https://docs.sqlfluff.com/en/stable/perma/indent_locations.html allow_implicit_indents = True # Add a few specific rule configurations, referenced by the rule names # and not by the rule codes. [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower [sqlfluff:rules:capitalisation.identifiers] capitalisation_policy = lower [sqlfluff:rules:capitalisation.functions] extended_capitalisation_policy = lower # An example of setting a custom layout specification which # is more lenient than default config. [sqlfluff:layout:type:set_operator] line_position = alone
This release brings several internal changes, and acts as a prelude to 2.0.0. In particular, the following config values have changed:
If any of these values have been set in your config, they will be automatically translated to the new values at runtime, and a warning will be shown. To silence the warning, update your config file to the new values. For more details on configuring layout see Configuring Layout.
This release brings several potentially breaking changes to the underlying parse tree. For users of the cli tool in a linting context you should notice no change. If however your application relies on the structure of the SQLFluff parse tree or the naming of certain elements within the yaml format, then this may not be a drop-in replacement. Specifically:
If using the python api, the parent type (such as identifier) will still register if you call .is_type("identifier"), as this function checks all inherited types. However the eventual type returned by .get_type()` will now be (in most cases) what used to be accessible at .name. The name attribute will be deprecated in a future release.
This release introduces the capability to automatically skip large files, and sets default limits on the maximum file size before a file is skipped. Users should see a performance gain, but may experience warnings associated with these skipped files.
Want to find other people who are using SQLFluff in production use cases? Want to brag about how you're using it? Just want to show solidarity with the project and provide a testimonial for it?
Just add a section below by raising a PR on GitHub by editing this file ✏️.
We have a fast-growing community on Slack, come and join us!
Follow us On Twitter @SQLFluff for announcements and other related posts.
Alan Cruickshank
2024, Alan Cruickshank
| February 9, 2025 |