Creating tables¶
Tables are the basic building blocks of a relational database. A table can hold multiple rows (i.e., records), with each row having multiple columns and each column holding a single data element (i.e., value). You can query tables to insert data, select (i.e., retrieve) data, and delete data.
Table of contents
Table definition¶
To create a table, use the CREATE TABLE statement.
At a minimum, you must specify a table name and one or more column definitions. A column definition must specify a column name and a corresponding data type.
Here’s an example statement:
cr> CREATE TABLE my_table (
... first_column integer,
... second_column text
... );
CREATE OK, 1 row affected (... sec)
This statement creates a table named my_table
with two columns named
first_column
and second_column
with types integer and text.
A table can be dropped (i.e., deleted) by using the DROP TABLE statement:
cr> DROP TABLE my_table;
DROP OK, 1 row affected (... sec)
If the my_table
table did not exist, the DROP TABLE
statement above
would return an error message. If you specify the IF EXISTS
clause, the
instruction is conditional on the table’s existence and would not return an
error message:
cr> DROP TABLE IF EXISTS my_table;
DROP OK, 0 rows affected (... sec)
Tip
By default, CrateDB will enforce the column definitions you specified with
the CREATE TABLE
statement (what’s known as a strict column
policy).
However, you can configure the column_policy table parameter so that the INSERT, UPDATE, and COPY FROM statements can arbitrarily create new columns as needed (what’s known as a dynamic column policy).
Schemas¶
Tables can be created in different schemas. These are created implicitly on table creation and cannot be created explicitly. If a schema did not exist yet, it will be created.
You can create a table called my_table
in a schema called my_schema
schema like so:
cr> create table my_schema.my_table (
... pk int primary key,
... label text,
... position geo_point
... );
CREATE OK, 1 row affected (... sec)
We can confirm this by looking up this table in the information_schema.tables table:
cr> select table_schema, table_name from information_schema.tables
... where table_name='my_table';
+--------------+------------+
| table_schema | table_name |
+--------------+------------+
| my_schema | my_table |
+--------------+------------+
SELECT 1 row in set (... sec)
The following schema names are reserved and may not be used:
blob
information_schema
sys
Tip
Schemas are primarily namespaces for tables. You can use privileges to control access to schemas.
A user-created schema exists as long as there are tables with the same schema
name. If the last table with that schema is dropped, the schema is gone (except
for the blob
and doc
schema):
cr> drop table my_schema.my_table ;
DROP OK, 1 row affected (... sec)
Every table that is created without an explicit schema name, will be created in
the doc
schema:
cr> create table my_doc_table (
... a_column int,
... another_one geo_point
... );
CREATE OK, 1 row affected (... sec)
cr> select table_schema, table_name from information_schema.tables
... where table_name='my_doc_table';
+--------------+--------------+
| table_schema | table_name |
+--------------+--------------+
| doc | my_doc_table |
+--------------+--------------+
SELECT 1 row in set (... sec)
Naming restrictions¶
Table, schema and column identifiers cannot have the same names as reserved key words. Please refer to the Lexical structure section for more information about naming.
Additionally, table and schema names are restricted in terms of characters and length. They:
may not contain one of the following characters:
\ / * ? " < > | <whitespace> , # .
should not exceed 255 bytes when encoded with
utf-8
(this limit applies on the optionally schema-qualified table name)
Column names are restricted in terms of patterns:
Columns are not allowed to contain a dot (
.
), since this conflicts with internal path definitions.Columns that conflict with the naming scheme of virtual system columns are restricted.
Character sequences that conform to the subscript notation (e.g.
col['id']
) are not allowed.
Table configuration¶
You can configure tables in many different ways to take advantage of the range of functionality that CrateDB supports. For example:
CrateDB transparently segments the underlying storage of table data into shards (four by default). You can configure the number of shards with the CLUSTERED BY clause. You control how CrateDB routes table rows to shards by specifying a routing column.
You can use cluster settings to configure how shards are balanced across a cluster and allocated to nodes (with attribute-based allocation, disk-based allocation, or both).
You can replicate shards WITH the number_of_replicas table setting. CrateDB will split replicated partitions into primary shards, with each primary shard having one or more replica shards.
When you lose a primary shard (e.g., due to node failure), CrateDB will promote a replica shard to primary. More table replicas mean a smaller chance of permanent data loss (through increased data redundancy) in exchange for more disk space utilization and intra-cluster network traffic.
Replication can also improve read performance because any increase in the number of shards distributed across a cluster also increases the opportunities for CrateDB to parallelize query execution across multiple nodes.
You can partition a table into one or more partitions with the PARTITIONED BY clause. You control how tables are partitioned by specifying one or more partition columns. Each unique combination of partition column values results in a new partition.
By partitioning a table, you can segment some SQL statements (e.g., those used for table optimization, import and export, and backup and restore) by constraining them to one or more partitions.
You can fine-tune table operation by setting table parameters using the WITH clause. Available parameters include those used to configure replication, sharding, refresh interval, read and write operations, soft deletes, durability, column policy, and more.