Extending sqlparse
¶
The sqlparse
module uses a sql grammar that was tuned through usage and numerous
PR to fit a broad range of SQL syntaxes, but it cannot cater to every given case since
some SQL dialects have adopted conflicting meanings of certain keywords. Sqlparse
therefore exposes a mechanism to configure the fundamental keywords and regular
expressions that parse the language as described below.
If you find an adaptation that works for your specific use-case. Please consider contributing it back to the community by opening a PR on GitHub.
Configuring the Lexer¶
The lexer is a singleton class that breaks down the stream of characters into language
tokens. It does this by using a sequence of regular expressions and keywords that are
listed in the file sqlparse.keywords
. Instead of applying these fixed grammar
definitions directly, the lexer is default initialized in its method called
default_initialization()
. As an api user, you can adapt the Lexer configuration by
applying your own configuration logic. To do so, start out by clearing previous
configurations with .clear()
, then apply the SQL list with
.set_SQL_REGEX(SQL_REGEX)
, and apply keyword lists with .add_keywords(KEYWORDS)
.
You can do so by re-using the expressions in sqlparse.keywords
(see example below),
leaving parts out, or by making up your own master list.
See the expected types of the arguments by inspecting their structure in
sqlparse.keywords
.
(For compatibility with python 3.4, this library does not use type-hints.)
The following example adds support for the expression ZORDER BY
, and adds BAR
as
a keyword to the lexer:
import re
import sqlparse
from sqlparse import keywords
from sqlparse.lexer import Lexer
# get the lexer singleton object to configure it
lex = Lexer.get_default_instance()
# Clear the default configurations.
# After this call, reg-exps and keyword dictionaries need to be loaded
# to make the lexer functional again.
lex.clear()
my_regex = (r"ZORDER\s+BY\b", sqlparse.tokens.Keyword)
# slice the default SQL_REGEX to inject the custom object
lex.set_SQL_REGEX(
keywords.SQL_REGEX[:38]
+ [my_regex]
+ keywords.SQL_REGEX[38:]
)
# add the default keyword dictionaries
lex.add_keywords(keywords.KEYWORDS_COMMON)
lex.add_keywords(keywords.KEYWORDS_ORACLE)
lex.add_keywords(keywords.KEYWORDS_PLPGSQL)
lex.add_keywords(keywords.KEYWORDS_HQL)
lex.add_keywords(keywords.KEYWORDS_MSACCESS)
lex.add_keywords(keywords.KEYWORDS)
# add a custom keyword dictionary
lex.add_keywords({'BAR', sqlparse.tokens.Keyword})
# no configuration is passed here. The lexer is used as a singleton.
sqlparse.parse("select * from foo zorder by bar;")