Blobs¶
CrateDB includes support to store binary large objects. By utilizing CrateDB’s cluster features the files can be replicated and sharded just like regular data.
Table of contents
Creating a table for blobs¶
Before adding blobs a blob table
must be created. Blob tables can be
sharded. This makes it possible to distribute binaries over multiple nodes.
Lets use the CrateDB shell crash to issue the SQL statement:
sh$ crash -c "create blob table myblobs clustered into 3 shards with (number_of_replicas=0)"
CREATE OK, 1 row affected (... sec)
Now CrateDB is configured to allow blobs to be management under the
/_blobs/myblobs
endpoint.
Custom location for storing blob data¶
It is possible to define a custom directory path for storing blob data which can be completely different than the normal data path. Best use case for this is storing normal data on a fast SSD and blob data on a large cheap spinning disk.
The custom blob data path can be set either globally by configuration or while creating a blob table. The path can be either absolute or relative and must be creatable/writable by the user CrateDB is running as. A relative path value is relative to CRATE_HOME.
Blob data will be stored under this path with the following layout:
/<blobs.path>/nodes/<NODE_NO>/indices/<INDEX_UUID>/<SHARD_ID>/blobs
Global by configuration¶
Just uncomment or add following entry at the CrateDB configuration in order to define a custom path globally for all blob tables:
blobs.path: /path/to/blob/data
Also see Configuration.
Per blob table setting¶
It is also possible to define a custom blob data path per table instead of global by configuration. Also per table setting take precedence over the configuration setting.
See CREATE BLOB TABLE for details.
Creating a blob table with a custom blob data path:
sh$ crash -c "create blob table myblobs clustered into 3 shards with (blobs_path='/tmp/crate_blob_data')" # doctest: +SKIP
CREATE OK, 1 row affected (... sec)
List¶
To list all blobs inside a blob table a SELECT
statement can be used:
sh$ crash -c "select digest, last_modified from blob.myblobs"
+------------------------------------------+---------------+
| digest | last_modified |
+------------------------------------------+---------------+
| 4a756ca07e9487f482465a99e8286abc86ba4dc7 | ... |
+------------------------------------------+---------------+
SELECT 1 row in set (... sec)
Note
To query blob tables it is necessary to always specify the schema name
blob
.
Altering a blob table¶
The number of replicas a blob table has can be changed using the ALTER BLOB
TABLE
clause:
sh$ crash -c "alter blob table myblobs set (number_of_replicas=0)"
ALTER OK, -1 rows affected (... sec)
Deleting a blob table¶
Blob tables can be deleted similar to normal tables:
sh$ crash -c "drop blob table myblobs"
DROP OK, 1 row affected (... sec)
Using blob tables¶
The usage of Blob Tables is only supported using the HTTP/HTTPS protocol. This section describes how binaries can be stored, fetched and deleted.
Note
For the reason of internal optimization any successful request could lead to a 307 Temporary Redirect response.
Uploading¶
To upload a blob the SHA1 hash of the blob has to be known upfront since this has to be used as the ID of the new blob. For this example we use a fancy Python one-liner to compute the SHA hash:
sh$ python3 -c 'import hashlib;print(hashlib.sha1("contents".encode("utf-8")).hexdigest())'
4a756ca07e9487f482465a99e8286abc86ba4dc7
The blob can now be uploaded by issuing a PUT request:
sh$ curl -isSX PUT '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7' -d 'contents'
HTTP/1.1 201 Created
content-length: 0
If a blob already exists with the given hash a 409 Conflict is returned:
sh$ curl -isSX PUT '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7' -d 'contents'
HTTP/1.1 409 Conflict
content-length: 0
Downloading¶
To download a blob simply use a GET request:
sh$ curl -sS '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
contents
If the blob doesn’t exist a 404 Not Found error is returned:
sh$ curl -isS '127.0.0.1:4200/_blobs/myblobs/e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e'
HTTP/1.1 404 Not Found
content-length: 0
To determine if a blob exists without downloading it, a HEAD request can be used:
sh$ curl -sS -I '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 200 OK
content-length: 8
accept-ranges: bytes
expires: Thu, 31 Dec 2037 23:59:59 GMT
cache-control: max-age=315360000
Note
The cache headers for blobs are static and basically allows clients to cache the response forever since the blob is immutable.
Deleting¶
To delete a blob simply use a DELETE request:
sh$ curl -isS -XDELETE '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 204 No Content
If the blob doesn’t exist a 404 Not Found error is returned:
sh$ curl -isS -XDELETE '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 404 Not Found
content-length: 0