Core API: Tunneling¶
- construct.RawCopy(subcon)¶
Used to obtain byte representation of a field (aside of object value).
Returns a dict containing both parsed subcon value, the raw bytes that were consumed by subcon, starting and ending offset in the stream, and amount in bytes. Builds either from raw bytes representation or a value used by subcon. Size is same as subcon.
Object is a dictionary with either “data” or “value” keys, or both.
When building, if both the “value” and “data” keys are present, then the “data” key is used and the “value” key is ignored. This is undesirable in the case that you parse some data for the purpose of modifying it and writing it back; in this case, delete the “data” key when modifying the “value” key to correctly rebuild the former.
- Parameters:
subcon – Construct instance
- Raises:
StreamError – stream is not seekable and tellable
RawCopyError – building and neither data or value was given
StringError – building from non-bytes value, perhaps unicode
Example:
>>> d = RawCopy(Byte) >>> d.parse(b"\xff") Container(data=b'\xff', value=255, offset1=0, offset2=1, length=1) >>> d.build(dict(data=b"\xff")) '\xff' >>> d.build(dict(value=255)) '\xff'
- construct.ByteSwapped(subcon)¶
Swaps the byte order within boundaries of given subcon. Requires a fixed sized subcon.
- Parameters:
subcon – Construct instance, subcon on top of byte swapped bytes
- Raises:
SizeofError – ctor or compiler could not compute subcon size
See
Transformed
andRestreamed
for raisable exceptions.Example:
Int24ul <--> ByteSwapped(Int24ub) <--> BytesInteger(3, swapped=True) <--> ByteSwapped(BytesInteger(3))
- construct.BitsSwapped(subcon)¶
Swaps the bit order within each byte within boundaries of given subcon. Does NOT require a fixed sized subcon.
- Parameters:
subcon – Construct instance, subcon on top of bit swapped bytes
- Raises:
SizeofError – compiler could not compute subcon size
See
Transformed
andRestreamed
for raisable exceptions.Example:
>>> d = Bitwise(Bytes(8)) >>> d.parse(b"\x01") '\x00\x00\x00\x00\x00\x00\x00\x01' >>>> BitsSwapped(d).parse(b"\x01") '\x01\x00\x00\x00\x00\x00\x00\x00'
- construct.Prefixed(lengthfield, subcon, includelength=False)¶
Prefixes a field with byte count.
Parses the length field. Then reads that amount of bytes, and parses subcon using only those bytes. Constructs that consume entire remaining stream are constrained to consuming only the specified amount of bytes (a substream). When building, data gets prefixed by its length. Optionally, length field can include its own size. Size is the sum of both fields sizes, unless either raises SizeofError.
Analog to
PrefixedArray
which prefixes with an element count, instead of byte count. Semantics is similar but implementation is different.VarInt
is recommended for new protocols, as it is more compact and never overflows.- Parameters:
lengthfield – Construct instance, field used for storing the length
subcon – Construct instance, subcon used for storing the value
includelength – optional, bool, whether length field should include its own size, default is False
- Raises:
StreamError – requested reading negative amount, could not read enough bytes, requested writing different amount than actual data, or could not write all bytes
Example:
>>> d = Prefixed(VarInt, GreedyRange(Int32ul)) >>> d.parse(b"\x08abcdefgh") [1684234849, 1751606885] >>> d = PrefixedArray(VarInt, Int32ul) >>> d.parse(b"\x02abcdefgh") [1684234849, 1751606885]
- construct.PrefixedArray(countfield, subcon)¶
Prefixes an array with item count (as opposed to prefixed by byte count, see
Prefixed
).VarInt
is recommended for new protocols, as it is more compact and never overflows.- Parameters:
countfield – Construct instance, field used for storing the element count
subcon – Construct instance, subcon used for storing each element
- Raises:
StreamError – requested reading negative amount, could not read enough bytes, requested writing different amount than actual data, or could not write all bytes
RangeError – consumed or produced too little elements
Example:
>>> d = Prefixed(VarInt, GreedyRange(Int32ul)) >>> d.parse(b"\x08abcdefgh") [1684234849, 1751606885] >>> d = PrefixedArray(VarInt, Int32ul) >>> d.parse(b"\x02abcdefgh") [1684234849, 1751606885]
- construct.FixedSized(length, subcon)¶
Restricts parsing to specified amount of bytes.
Parsing reads length bytes, then defers to subcon using new BytesIO with said bytes. Building builds the subcon using new BytesIO, then writes said data and additional null bytes accordingly. Size is same as length, although negative amount raises an error.
- Parameters:
length – integer or context lambda, total amount of bytes (both data and padding)
subcon – Construct instance
- Raises:
StreamError – requested reading negative amount, could not read enough bytes, requested writing different amount than actual data, or could not write all bytes
PaddingError – length is negative
PaddingError – subcon written more bytes than entire length (negative padding)
Can propagate any exception from the lambda, possibly non-ConstructError.
Example:
>>> d = FixedSized(10, Byte) >>> d.parse(b'\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00') 255 >>> d.build(255) b'\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> d.sizeof() 10
- construct.NullTerminated(subcon, term=b'\x00', include=False, consume=True, require=True)¶
Restricts parsing to bytes preceding a null byte.
Parsing reads one byte at a time and accumulates it with previous bytes. When term was found, (by default) consumes but discards the term. When EOF was found, (by default) raises same StreamError exception. Then subcon is parsed using new BytesIO made with said data. Building builds the subcon and then writes the term. Size is undefined.
The term can be multiple bytes, to support string classes with UTF16/32 encodings for example. Be warned however: as reported in Issue 1046, the data read must be a multiple of the term length and the term must start at a unit boundary, otherwise strange things happen when parsing.
- Parameters:
subcon – Construct instance
term – optional, bytes, terminator byte-string, default is x00 single null byte
include – optional, bool, if to include terminator in resulting data, default is False
consume – optional, bool, if to consume terminator or leave it in the stream, default is True
require – optional, bool, if EOF results in failure or not, default is True
- Raises:
StreamError – requested reading negative amount, could not read enough bytes, requested writing different amount than actual data, or could not write all bytes
StreamError – encountered EOF but require is not disabled
PaddingError – terminator is less than 1 bytes in length
Example:
>>> d = NullTerminated(Byte) >>> d.parse(b'\xff\x00') 255 >>> d.build(255) b'\xff\x00'
- construct.NullStripped(subcon, pad=b'\x00')¶
Restricts parsing to bytes except padding left of EOF.
Parsing reads entire stream, then strips the data from right to left of null bytes, then parses subcon using new BytesIO made of said data. Building defers to subcon as-is. Size is undefined, because it reads till EOF.
The pad can be multiple bytes, to support string classes with UTF16/32 encodings.
- Parameters:
subcon – Construct instance
pad – optional, bytes, padding byte-string, default is x00 single null byte
- Raises:
PaddingError – pad is less than 1 bytes in length
Example:
>>> d = NullStripped(Byte) >>> d.parse(b'\xff\x00\x00') 255 >>> d.build(255) b'\xff'
- construct.RestreamData(datafunc, subcon)¶
Parses a field on external data (but does not build).
Parsing defers to subcon, but provides it a separate BytesIO stream based on data provided by datafunc (a bytes literal or another BytesIO stream or Construct instances that returns bytes or context lambda). Building does nothing. Size is 0 because as far as other fields see it, this field does not produce or consume any bytes from the stream.
- Parameters:
datafunc – bytes or BytesIO or Construct instance (that parses into bytes) or context lambda, provides data for subcon to parse from
subcon – Construct instance
Can propagate any exception from the lambdas, possibly non-ConstructError.
Example:
>>> d = RestreamData(b"\x01", Int8ub) >>> d.parse(b"") 1 >>> d.build(0) b'' >>> d = RestreamData(NullTerminated(GreedyBytes), Int16ub) >>> d.parse(b"\x01\x02\x00") 0x0102 >>> d = RestreamData(FixedSized(2, GreedyBytes), Int16ub) >>> d.parse(b"\x01\x02\x00") 0x0102
- construct.Transformed(subcon, decodefunc, decodeamount, encodefunc, encodeamount)¶
Transforms bytes between the underlying stream and the (fixed-sized) subcon.
Parsing reads a specified amount (or till EOF), processes data using a bytes-to-bytes decoding function, then parses subcon using those data. Building does build subcon into separate bytes, then processes it using encoding bytes-to-bytes function, then writes those data into main stream. Size is reported as decodeamount or encodeamount if those are equal, otherwise its SizeofError.
Used internally to implement
Bitwise
Bytewise
ByteSwapped
BitsSwapped
.Possible use-cases include encryption, obfuscation, byte-level encoding.
Warning
Remember that subcon must consume (or produce) an amount of bytes that is same as decodeamount (or encodeamount).
Warning
Do NOT use seeking/telling classes inside Transformed context.
- Parameters:
subcon – Construct instance
decodefunc – bytes-to-bytes function, applied before parsing subcon
decodeamount – integer, amount of bytes to read
encodefunc – bytes-to-bytes function, applied after building subcon
encodeamount – integer, amount of bytes to write
- Raises:
StreamError – requested reading negative amount, could not read enough bytes, requested writing different amount than actual data, or could not write all bytes
StreamError – subcon build and encoder transformed more or less than encodeamount bytes, if amount is specified
StringError – building from non-bytes value, perhaps unicode
Can propagate any exception from the lambdas, possibly non-ConstructError.
Example:
>>> d = Transformed(Bytes(16), bytes2bits, 2, bits2bytes, 2) >>> d.parse(b"\x00\x00") b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> d = Transformed(GreedyBytes, bytes2bits, None, bits2bytes, None) >>> d.parse(b"\x00\x00") b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
- construct.Restreamed(subcon, decoder, decoderunit, encoder, encoderunit, sizecomputer)¶
Transforms bytes between the underlying stream and the (variable-sized) subcon.
Used internally to implement
Bitwise
Bytewise
ByteSwapped
BitsSwapped
.Warning
Remember that subcon must consume or produce an amount of bytes that is a multiple of encoding or decoding units. For example, in a Bitwise context you should process a multiple of 8 bits or the stream will fail during parsing/building.
Warning
Do NOT use seeking/telling classes inside Restreamed context.
- Parameters:
subcon – Construct instance
decoder – bytes-to-bytes function, used on data chunks when parsing
decoderunit – integer, decoder takes chunks of this size
encoder – bytes-to-bytes function, used on data chunks when building
encoderunit – integer, encoder takes chunks of this size
sizecomputer – function that computes amount of bytes outputed
Can propagate any exception from the lambda, possibly non-ConstructError. Can also raise arbitrary exceptions in RestreamedBytesIO implementation.
Example:
Bitwise <--> Restreamed(subcon, bits2bytes, 8, bytes2bits, 1, lambda n: n//8) Bytewise <--> Restreamed(subcon, bytes2bits, 1, bits2bytes, 8, lambda n: n*8)
- construct.ProcessXor(padfunc, subcon)¶
Transforms bytes between the underlying stream and the subcon.
Used internally by KaitaiStruct compiler, when translating process: xor tags.
Parsing reads till EOF, xors data with the pad, then feeds that data into subcon. Building first builds the subcon into separate BytesIO stream, xors data with the pad, then writes that data into the main stream. Size is the same as subcon, unless it raises SizeofError.
- Parameters:
padfunc – integer or bytes or context lambda, single or multiple bytes to xor data with
subcon – Construct instance
- Raises:
StringError – pad is not integer or bytes
Can propagate any exception from the lambda, possibly non-ConstructError.
Example:
>>> d = ProcessXor(0xf0 or b'\xf0', Int16ub) >>> d.parse(b"\x00\xff") 0xf00f >>> d.sizeof() 2
- construct.ProcessRotateLeft(amount, group, subcon)¶
Transforms bytes between the underlying stream and the subcon.
Used internally by KaitaiStruct compiler, when translating process: rol/ror tags.
Parsing reads till EOF, rotates (shifts) the data left by amount in bits, then feeds that data into subcon. Building first builds the subcon into separate BytesIO stream, rotates right by negating amount, then writes that data into the main stream. Size is the same as subcon, unless it raises SizeofError.
- Parameters:
amount – integer or context lambda, shift by this amount in bits, treated modulo (group x 8)
group – integer or context lambda, shifting is applied to chunks of this size in bytes
subcon – Construct instance
- Raises:
RotationError – group is less than 1
RotationError – data length is not a multiple of group size
Can propagate any exception from the lambda, possibly non-ConstructError.
Example:
>>> d = ProcessRotateLeft(4, 1, Int16ub) >>> d.parse(b'\x0f\xf0') 0xf00f >>> d = ProcessRotateLeft(4, 2, Int16ub) >>> d.parse(b'\x0f\xf0') 0xff00 >>> d.sizeof() 2
- construct.Checksum(checksumfield, hashfunc, bytesfunc)¶
Field that is build or validated by a hash of a given byte range. Usually used with
RawCopy
.Parsing compares parsed subcon checksumfield with a context entry provided by bytesfunc and transformed by hashfunc. Building fetches the contect entry, transforms it, then writes is using subcon. Size is same as subcon.
- Parameters:
checksumfield – a subcon field that reads the checksum, usually Bytes(int)
hashfunc – function that takes bytes and returns whatever checksumfield takes when building, usually from hashlib module
bytesfunc – context lambda that returns bytes (or object) to be hashed, usually like this.rawcopy1.data
- Raises:
ChecksumError – parsing and actual checksum does not match actual data
Can propagate any exception from the lambdas, possibly non-ConstructError.
Example:
import hashlib d = Struct( "fields" / RawCopy(Struct( Padding(1000), )), "checksum" / Checksum(Bytes(64), lambda data: hashlib.sha512(data).digest(), this.fields.data), ) d.build(dict(fields=dict(value={})))
import hashlib d = Struct( "offset" / Tell, "checksum" / Padding(64), "fields" / RawCopy(Struct( Padding(1000), )), "checksum" / Pointer(this.offset, Checksum(Bytes(64), lambda data: hashlib.sha512(data).digest(), this.fields.data)), ) d.build(dict(fields=dict(value={})))
- construct.Compressed(subcon, encoding, level=None)¶
Compresses and decompresses underlying stream when processing subcon. When parsing, entire stream is consumed. When building, it puts compressed bytes without marking the end. This construct should be used with
Prefixed
.Parsing and building transforms all bytes using a specified codec. Since data is processed until EOF, it behaves similar to GreedyBytes. Size is undefined.
- Parameters:
subcon – Construct instance, subcon used for storing the value
encoding – string, any of module names like zlib/gzip/bzip2/lzma, otherwise any of codecs module bytes<->bytes encodings, each codec usually requires some Python version
level – optional, integer between 0..9, although lzma discards it, some encoders allow different compression levels
- Raises:
ImportError – needed module could not be imported by ctor
StreamError – stream failed when reading until EOF
Example:
>>> d = Prefixed(VarInt, Compressed(GreedyBytes, "zlib")) >>> d.build(bytes(100)) b'\x0cx\x9cc`\xa0=\x00\x00\x00d\x00\x01' >>> len(_) 13
- construct.CompressedLZ4(subcon)¶
Compresses and decompresses underlying stream before processing subcon. When parsing, entire stream is consumed. When building, it puts compressed bytes without marking the end. This construct should be used with
Prefixed
.Parsing and building transforms all bytes using LZ4 library. Since data is processed until EOF, it behaves similar to GreedyBytes. Size is undefined.
- Parameters:
subcon – Construct instance, subcon used for storing the value
- Raises:
ImportError – needed module could not be imported by ctor
StreamError – stream failed when reading until EOF
Can propagate lz4.frame exceptions.
Example:
>>> d = Prefixed(VarInt, CompressedLZ4(GreedyBytes)) >>> d.build(bytes(100)) b'"\x04"M\x18h@d\x00\x00\x00\x00\x00\x00\x00#\x0b\x00\x00\x00\x1f\x00\x01\x00KP\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> len(_) 35
- construct.EncryptedSym(subcon, cipher)¶
Perform symmetrical encryption and decryption of the underlying stream before processing subcon. When parsing, entire stream is consumed. When building, it puts encrypted bytes without marking the end.
Parsing and building transforms all bytes using the selected cipher. Since data is processed until EOF, it behaves similar to GreedyBytes. Size is undefined.
The key for encryption and decryption should be passed via contextkw to build and parse methods.
This construct is heavily based on the cryptography library, which supports the following algorithms and modes. For more details please see the documentation of that library.
Algorithms: - AES - Camellia - ChaCha20 - TripleDES - CAST5 - SEED - SM4 - Blowfish (weak cipher) - ARC4 (weak cipher) - IDEA (weak cipher)
Modes: - CBC - CTR - OFB - CFB - CFB8 - XTS - ECB (insecure)
Note
Keep in mind that some of the algorithms require padding of the data. This can be done e.g. with
Aligned
.Note
For GCM mode use
EncryptedSymAead
.- Parameters:
subcon – Construct instance, subcon used for storing the value
cipher – Cipher object or context lambda from cryptography.hazmat.primitives.ciphers
- Raises:
ImportError – needed module could not be imported
StreamError – stream failed when reading until EOF
CipherError – no cipher object is provided
CipherError – an AEAD cipher is used
Can propagate cryptography.exceptions exceptions.
Example:
>>> from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes >>> d = Struct( ... "iv" / Default(Bytes(16), os.urandom(16)), ... "enc_data" / EncryptedSym( ... Aligned(16, ... Struct( ... "width" / Int16ul, ... "height" / Int16ul, ... ) ... ), ... lambda ctx: Cipher(algorithms.AES(ctx._.key), modes.CBC(ctx.iv)) ... ) ... ) >>> key128 = b"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" >>> d.build({"enc_data": {"width": 5, "height": 4}}, key=key128) b"o\x11i\x98~H\xc9\x1c\x17\x83\xf6|U:\x1a\x86+\x00\x89\xf7\x8e\xc3L\x04\t\xca\x8a\xc8\xc2\xfb'\xc8" >>> d.parse(b"o\x11i\x98~H\xc9\x1c\x17\x83\xf6|U:\x1a\x86+\x00\x89\xf7\x8e\xc3L\x04\t\xca\x8a\xc8\xc2\xfb'\xc8", key=key128) Container: iv = b'o\x11i\x98~H\xc9\x1c\x17\x83\xf6|U:\x1a\x86' (total 16) enc_data = Container: width = 5 height = 4
- construct.EncryptedSymAead(subcon, cipher, nonce, associated_data=b'')¶
Perform symmetrical AEAD encryption and decryption of the underlying stream before processing subcon. When parsing, entire stream is consumed. When building, it puts encrypted bytes and tag without marking the end.
Parsing and building transforms all bytes using the selected cipher and also authenticates the associated_data. Since data is processed until EOF, it behaves similar to GreedyBytes. Size is undefined.
The key for encryption and decryption should be passed via contextkw to build and parse methods.
This construct is heavily based on the cryptography library, which supports the following AEAD ciphers. For more details please see the documentation of that library.
AEAD ciphers: - AESGCM - AESCCM - ChaCha20Poly1305
- Parameters:
subcon – Construct instance, subcon used for storing the value
cipher – Cipher object or context lambda from cryptography.hazmat.primitives.ciphers
- Raises:
ImportError – needed module could not be imported
StreamError – stream failed when reading until EOF
CipherError – unsupported cipher object is provided
Can propagate cryptography.exceptions exceptions.
Example:
>>> from cryptography.hazmat.primitives.ciphers import aead >>> d = Struct( ... "nonce" / Default(Bytes(16), os.urandom(16)), ... "associated_data" / Bytes(21), ... "enc_data" / EncryptedSymAead( ... GreedyBytes, ... lambda ctx: aead.AESGCM(ctx._.key), ... this.nonce, ... this.associated_data ... ) ... ) >>> key128 = b"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" >>> d.build({"associated_data": b"This is authenticated", "enc_data": b"The secret message"}, key=key128) b'\xe3\xb0"\xbaQ\x18\xd3|\x14\xb0q\x11\xb5XZ\xeeThis is authenticated\x88~\xe5Vh\x00\x01m\xacn\xad k\x02\x13\xf4\xb4[\xbe\x12$\xa0\x7f\xfb\xbf\x82Ar\xb0\x97C\x0b\xe3\x85' >>> d.parse(b'\xe3\xb0"\xbaQ\x18\xd3|\x14\xb0q\x11\xb5XZ\xeeThis is authenticated\x88~\xe5Vh\x00\x01m\xacn\xad k\x02\x13\xf4\xb4[\xbe\x12$\xa0\x7f\xfb\xbf\x82Ar\xb0\x97C\x0b\xe3\x85', key=key128) Container: nonce = b'\xe3\xb0"\xbaQ\x18\xd3|\x14\xb0q\x11\xb5XZ\xee' (total 16) associated_data = b'This is authenti'... (truncated, total 21) enc_data = b'The secret messa'... (truncated, total 18)
- construct.Rebuffered(subcon, tailcutoff=None)¶
Caches bytes from underlying stream, so it becomes seekable and tellable, and also becomes blocking on reading. Useful for processing non-file streams like pipes, sockets, etc.
Warning
Experimental implementation. May not be mature enough.
- Parameters:
subcon – Construct instance, subcon which will operate on the buffered stream
tailcutoff – optional, integer, amount of bytes kept in buffer, by default buffers everything
Can also raise arbitrary exceptions in its implementation.
Example:
Rebuffered(..., tailcutoff=1024).parse_stream(nonseekable_stream)