Validation Schemas

A validation schema is a mapping, usually a dict. Schema keys are the keys allowed in the target dictionary. Schema values express the rules that must be matched by the corresponding target values.

schema = {'name': {'type': 'string', 'maxlength': 10}}

In the example above we define a target dictionary with only one key, name, which is expected to be a string not longer than 10 characters. Something like {'name': 'john doe'} would validate, while something like {'name': 'a very long string'} or {'name': 99} would not.

By default all keys in a document are optional unless the required-rule is set True for individual fields or the validator’s :attr:~cerberus.Validator.require_all is set to True in order to expect all schema-defined fields to be present in the document.

Registries

There are two default registries in the cerberus module namespace where you can store definitions for schemas and rules sets which then can be referenced in a validation schema. You can furthermore instantiate more Registry objects and bind them to the rules_set_registry or schema_registry of a validator. You may also set these as keyword-arguments upon intitialization.

Using registries is particularly interesting if

  • schemas shall include references to themselves, vulgo: schema recursion
  • schemas contain a lot of reused parts and are supposed to be serialized
>>> from cerberus import schema_registry
>>> schema_registry.add('non-system user',
...                     {'uid': {'min': 1000, 'max': 0xffff}})
>>> schema = {'sender': {'schema': 'non-system user',
...                      'allow_unknown': True},
...           'receiver': {'schema': 'non-system user',
...                        'allow_unknown': True}}
>>> from cerberus import rules_set_registry
>>> rules_set_registry.extend((('boolean', {'type': 'boolean'}),
...                            ('booleans', {'valuesrules': 'boolean'})))
>>> schema = {'foo': 'booleans'}

Validation

Validation schemas themselves are validated when passed to the validator or a new set of rules is set for a document’s field. A SchemaError is raised when an invalid validation schema is encountered. See Schema Validation Schema for a reference.

However, be aware that no validation can be triggered for all changes below that level or when a used definition in a registry changes. You could therefore trigger a validation and catch the exception:

>>> v = Validator({'foo': {'allowed': []}})
>>> v.schema['foo'] = {'allowed': 1}
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "cerberus/schema.py", line 99, in __setitem__
    self.validate({key: value})
  File "cerberus/schema.py", line 126, in validate
    self._validate(schema)
  File "cerberus/schema.py", line 141, in _validate
    raise SchemaError(self.schema_validator.errors)
SchemaError: {'foo': {'allowed': 'must be of container type'}}
>>> v.schema['foo']['allowed'] = 'strings are no valid constraint for allowed'
>>> v.schema.validate()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "cerberus/schema.py", line 126, in validate
    self._validate(schema)
  File "cerberus/schema.py", line 141, in _validate
    raise SchemaError(self.schema_validator.errors)
SchemaError: {'foo': {'allowed': 'must be of container type'}}

Serialization

Cerberus schemas are built with vanilla Python types: dict, list, string, etc. Even user-defined validation rules are invoked in the schema by name as a string. A useful side effect of this design is that schemas can be defined in a number of ways, for example with PyYAML.

>>> import yaml
>>> schema_text = '''
... name:
...   type: string
... age:
...   type: integer
...   min: 10
... '''
>>> schema = yaml.load(schema_text)
>>> document = {'name': 'Little Joe', 'age': 5}
>>> v.validate(document, schema)
False
>>> v.errors
{'age': ['min value is 10']}

You don’t have to use YAML of course, you can use your favorite serializer. json for example. As long as there is a decoder that can produce a nested dict, you can use it to define a schema.

For populating and dumping one of the registries, use extend() and all().