Serialization Format (File Format)

This format is called a serialization format instead of file format because it is used for network communication as well.

This format is currently a binary format for run time efficiency reasons (speed and size), although an ascii or XML version is not impossible.

Note that this format is not really fe 'data' system dependent or specific. It is simply an extensible aggregate type serialization scheme that happens to fit very well with the 'data' system.

block types

Fundamentally, this format is a block oriented stream structure. It is a ordered series of blocks. Each block starts with a single byte type code.

The following block types are supported:

end (code 0)

End of state block and record group id sequencing. Backwiring should be completable when this block is read. Layout id sequencing is not reset. Therefore, subsequent blocks may be dependent upon layout blocks sent before an end block. This block has no size beyond the code byte.

U8: block code

reset (code 1)

End of layout id sequencing. All sent layout information is considered invalid after this block. This block has no size beyond the code byte.

U8: block code

info (code 2)

Not called a header because itmay be repeated and may be anywhere (not just the beginning) of a stream.

U8: block code
U32: version

attribute (code 3)

U8: block code
String: attribute name
U32: typename count
-- foreach typename count
    String: typename
U32: type size

layout (code 4)

U8: block code
String: layout name
U32: attribute count
-- foreach attribute count:
    U32: attribute id

group (code 5)

Heterogeneous grouping of references to state data blocks.

U8: block code
U32: group id
U32: state block count
-- foreach state block count
    U32: state block id

state (code 6)

Homogeneous sequence of state data blocks.

U8: block code
U32: layout id
U32: state block count
-- foreach state block count
    -- foreach attribute count (from layout):
        <serialized attribute>

serialized attributes

attribute sizes

Attribute sizes may be provided in order to facilitate skipping the input of data, such as in the case where a reader does not have the necessary plugins to read all types. For fixed size types this is simply provided in the layout block. For variable sized types there are two possible sizes set in the layout: implicit (-1) and explicit (-2). Implicit means that the size is not provided in any 'skippable' way, so if the reader does not know of the type the data cannot be skipped. Explicit means the actual size of a chunk of data is itself encoded as a U32 at the start of the chunk.

There are two special implicit variable sized types: a state block reference (Record) and a heterogeneous state block reference collection (RecordGroup). Since these are references to first class serialization format data types, they are handled directly via ids.


Due to the possible circular referencing that is allowed by this format, some references must be connected, or 'wired', after the stream is read up to an 'end' block.

state block ids

Due to the possible large quantity of state blocks, ids are not explicitly written in the format but are implicit. State block ids are assigned to state block in the order in which state blocks appear in the stream. The third state block is id 3.

layout dependencies

Any state block must appear after the matching layout has already appeared.

group dependencies

Groups may only refer to state blocks that have already appeared.