Lexical structure¶
An SQL input consists of a sequence of commands each of which is a sequence of
tokens, terminated by a semicolon (;
).
The syntax of a command defines its set of valid tokens. A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol.
Table of contents
String literal¶
String literals are defined as an arbitrary sequence of characters that are
delimited with single quotes '
as defined in ANSI SQL, for example
'This is a string'
.
In addition, CrateDB supports dollar quoted strings to help avoid escaping
single quotes within single quoted strings.
For example, 'I''m a string'
can be re-written as
$<tag>$I'm a string$<tag>$
, where the matching pair of <tag>
can be
zero or more characters in length.
cr> select 'I''m a string' = $tag1$I'm a string$tag1$;
+------+
| true |
+------+
| TRUE |
+------+
SELECT 1 row in set (... sec)
Note
Nested dollar quoted strings are currently not supported.
Escape strings¶
The escape character in CrateDB is the single-quote '
. A character gets
escaped when adding a single-quote before it. For example a single quote
character within a string literal can be included by writing two adjacent
single quotes, e.g., 'Jack''s car'
.
Note
Two adjacent single quotes are not equivalent to the double-quote
character "
.
String literals with C-Style escapes¶
In addition to the escaped character '
, CrateDB supports C-Style escaped
string sequences. Such a sequence is constructed by prefixing the string
literal with the letter E
or e
, for example, e'hello\nWorld'
.
The following escaped sequences are supported:
Escape Sequence |
Interpretation |
---|---|
|
backspace |
|
form feed |
|
newline |
|
carriage return |
|
tab |
|
octal byte value |
|
hexadecimal byte value |
|
16 or 32-bit hexadecimal Unicode character value |
For instance, the escape string literal e'\u0061\x61\141'
is equivalent to
the 'aaa'
string literal.
cr> select e'\u0061\x61\141' as col1;
+------+
| col1 |
+------+
| aaa |
+------+
SELECT 1 row in set (... sec)
Any other character following a backslash is taken literally. Thus, to include
a backslash character \
, two adjacent backslashes need to be used
(i.e. \\
).
cr> select e'aa\\nbb' as col1;
+--------+
| col1 |
+--------+
| aa\nbb |
+--------+
SELECT 1 row in set (... sec)
Finally, a single quote can be included in an escape string literal by also
using the escape backslash character: \'
, in addition to the single-quote
described in the string literals section.
cr> select e'aa\'bb' as col1;
+-------+
| col1 |
+-------+
| aa'bb |
+-------+
SELECT 1 row in set (... sec)
Key words and identifiers¶
The table below lists all reserved key words in CrateDB. These need to be quoted if used as identifiers:
cr> SELECT word FROM pg_catalog.pg_get_keywords() WHERE catcode = 'R' ORDER BY 1;
+-------------------+
| word |
+-------------------+
| add |
| all |
| alter |
| and |
| any |
| array |
| as |
| asc |
| between |
| by |
| called |
| case |
| cast |
| column |
| constraint |
| costs |
| create |
| cross |
| current_date |
| current_schema |
| current_time |
| current_timestamp |
| current_user |
| default |
| delete |
| deny |
| desc |
| describe |
| directory |
| distinct |
| drop |
| else |
| end |
| escape |
| except |
| exists |
| extract |
| false |
| first |
| for |
| from |
| full |
| function |
| grant |
| group |
| having |
| if |
| in |
| index |
| inner |
| input |
| insert |
| intersect |
| into |
| is |
| join |
| last |
| left |
| like |
| limit |
| match |
| natural |
| not |
| null |
| nulls |
| object |
| offset |
| on |
| or |
| order |
| outer |
| persistent |
| recursive |
| reset |
| returns |
| revoke |
| right |
| select |
| session_user |
| set |
| some |
| stratify |
| table |
| then |
| transient |
| true |
| try_cast |
| unbounded |
| union |
| update |
| user |
| using |
| when |
| where |
| with |
+-------------------+
SELECT 95 rows in set (... sec)
Tokens such as my_table
, id
, name
, or data
in the example below
are identifiers, which identify names of tables, columns, and other database
objects.
Example:
CREATE TABLE my_table (
id INTEGER,
name STRING,
data OBJECT
) WITH (number_of_replicas = 0);
Note
Key words and unquoted identifiers are case insensitive while quoted identifiers are case sensitive.
This means that:
select foo from t;
is equivalent to:
select Foo from t;
or:
select FOO from t;
To query a table named Foo
:
select "Foo" from t;
A widely used convention is to write key words in uppercase and identifiers in lowercase, such as
ALTER TABLE foo ADD COLUMN new_column INTEGER;
INSERT INTO foo (id, name) VALUES (1, 'bar');
Quoted identifiers can contain an arbitrary sequence of characters enclosed by
double quotes ("
). Quoted identifiers are never keywords, so you can use
"update"
as a table or column name.
Special characters¶
Some non-alphanumeric characters do have a special meaning. For their usage please refer to the sections where the respective syntax elements are described.
- Semicolon
The semicolon (
;
) terminates an SQL statement. It cannot appear anywhere else within the command, except within a string or quoted identifier.- Comma
The comma (
,
) is used in various syntactical elements to separate elements of a list.- Brackets
Square brackets (
[]
) are used to select elements of arrays and objects, e.g.arr[1]
orobj['key']
.- Asterisk
The asterisk (
*
) is used in some contexts to denote all columns of a table. As an argument in global aggregate functions it has the meaning of any field, e.g.COUNT(*)
.- Period
The period (
.
) is used for numeric values and to separate schema and table names, e.g.blob.my_blob_table
.
Comments¶
An SQL statement can contain comments. Single line comments start with a double
dash (--
) and end at the end of that line. Multi line comments start with
/*
and end with */
.
Example:
/*
* Retrieve information about all tables in the 'doc' schema.
*/
SELECT *
FROM information_schema.tables
WHERE table_schema = 'doc'; -- query information schema for doc tables