Design Overview

Application Considerations

The data system is designed to provide a safe robust run-time aggregate type system with high performance for common use cases.

In a runtime aggregate typed data system, the "type" of an instance is defined by an aggregation of attributes. This manifests as both a form of implicit typing (akin to python "duck" typing, where a thing is a thing because it has the attributes of that thing–if it walks like a duck and quacks like a duck...), and an explicit form (where a thing is constructed as a thing with a strict set of attributes).

This leads directly to analogy with other cases, such as Maya, Houdini, USD, etc. However, one of the closest patterns to compare is the Entity Component System (ECS) pattern.

Distilled from https://en.wikipedia.org/wiki/Entity_component_system:

Entity: The entity is a general purpose object. Usually, it only consists of a unique id.
Component: The raw data for one aspect of the object, and how it interacts with the world.
System: Each System runs continuouslyand performs global actions on every Entity that possesses a Component of the same aspect as that System.

src/data overlaps with the above definition. src/data does not directly facilitate the "System" part, but rather defers the functionality aspect to src/plugin. However, src/data extends the concept of Entity to a Typed Entity, which improves on the ECS pattern's power and can be used for more optimized implementation without losing usability. A good reason to relate src/data to ECS is that ECS has a lot of stuff written about it which applies to src/data, including the value of the run time part of it, and the tradeoffs in such.

The Basic Summary

Scopes are namespaces, and everything else is within a Scope
Attributes are named types. Attribute instances are data.
Layouts define a set of attributes, so they are "aggregate types"
Records contain instances of attributes according to a Layout
RecordGroups contain Records

The terms used in traditional ECS map roughly to these src/data terms.

Entity Type: Layout
Entity: Record
Component: Attribute
System: src/plugin(1)

(1) It's perhaps more accurate to say that System maps to modules created from capabilities provided by src/plugin. That is, src/plugin isn't itself a System, but rather the means with which to make them.

These definitions will be spelled out in more detail below.

Application Considerations

Before diving into detail, it may be worth confirming that this data system is well suited for your application. The data system is designed to provide a safe robust run-time aggregate type system with high performance for common use cases.

However, lifecycle management of instances is relatively expensive due to the requirement to be safe and robust. Creation and deletion of Records, even arrays at a time, can be time consuming. Therefore, it is recommended to avoid using the data system for data that is of high quantity but short lifetime.

For example a collisional particle system with maybe 10,000 particles could easily have 50,000 different collisions per frame. The collisions themselves probably should not be dealt with as Records, at least not without culling first or pooling. (ProxMultiGrid is an example of pooling).

Provided that this does not affect your application, let's carry on to the basics.

The Basics

Accessors, Attributes, and Types

Scope is the namespace and starting point for src/data structures. In order to use src/data, one usually has to start with a Scope.

sp<Scope> spScope(new Scope());

Attributes are named and typed. The names are strings, and the namespace for the names is a Scope. The types are FE type system types.

New types can be added first order into the system, even from plugins. Further, the types themselves are also string named. This means that it is possible for the same type to have mulitple names. For example, it is common for I32 to have the names "I32", "integer", and "int".

// Looking up a type by name, although actually doing this is uncommon
sp<BaseType> spIntegerTypeThisPrimerDoesNotNeed =
    spMaster->typeMaster()->lookupName("integer");

Accessors are the preferred way to both the define Attributes, and also to interact with instantiated Records (more on that next).

Accessor<String> hello_accessor(spScope, "hello");

This creates an accessor called hello_accessor, which can access the Attribute "hello," of (C++) type String, within the scope spScope. This accessor allows read/write access to the Attribute "hello," on all Records within the given Scope. The syntax to perform such access is as follows.

hello_accessor(myRecord) = "world";

Given a Record pointed to by myRecord, we can use hello_accessor to write "world" as the value of the Attribute "hello" within myRecord (for simplicity, we have assumed that myRecord contains this attribute).

Layouts: The Types of src/data

A Layout is the (type) of this aggregate type system. It is made up of an aggregation of Attributes.

By contrast, a C++ type, such as unsigned int or class MyClass, is a compiler supported and compiled type-checked "type."

A Layout, on the other hand, is an aggregate of named "C++ types" that is assembled at runtime. An actual Layout may have Attributes added to it at runtime by any number of chosen plugins, allowing for runtime flexibility of aggregate structures, according to the needs of each plugin.

An important aspect is that a Layout doesn't necessarily ever only contain the attributes that a given plugin (or System, presumably) expects. This is consistent with the ducktype nature of it all. Thus, whatever puts attributes into a layout can be assured that any record of that layout will have those attributes, but it cannot make other assumptions about what the layout may or may not have.

sp<Layout> spLayout = spScope->declare("layout_name");
// Layouts are named, and may be found by name from their Scope
// The following line is redundant, merely to show the syntax used for lookup
spLayout = spScope->lookupLayout("layout_name");
// Add an attribute to a Layout
spLayout->populate(hello_accessor);

A Layout is mutable until a record of that Layout is instantiated, at which point the Layout is locked and cannot be further adjusted (in terms of attributes).

Create a Record

A Record is an instance of an aggregate type as defined by a Layout.

Technically, the C++ class Record is a reference counted reference to a Record instance.

Records are not named in the runtime, however, for referencing purposes in the file format, you may see them named in ascii files.

Record newRecord = spScope->createRecord(spLayout);

If no record instances of spLayout existed prior to the above line's execution, spLayout would now become locked, preventing alteration of the aggregate type.

Create a RecordGroup and add a Record

A RecordGroup is a collection of Records. Because Records are really references to instances, there is no problem with having Records within more than one RecordGroup. We can think of a RecordGroup as a dataset or collection of instances. Records within a RecordGroup need not have the same Layout as one another.

sp<RecordGroup> spRG(new RecordGroup());
// Add a record to a group
spRG->add(myRecord);

This example is not about reading or writing of src/data files. However, we will cover it briefly. One can print the contents of a RecordGroup, including the Attributes and Layouts present in the Records within, using this function.

print_group(spRG);

At this point, the output for look something like what folows:

INFO 5
ATTRIBUTE hello string
LAYOUT layout_name
        hello
DEFAULTGROUP 1
RECORD layout_name1 layout_name
        hello "world"
RECORDGROUP 1
END

Additionally, one could load the above into a RecordGroup from an Ascii file, producing the same RecordGroup that we made programmatically. The following command would achieve this.

sp<data::StreamI> spStreamDataset;
std::ifstream strm("hello.rg");
spStreamDataset = new data::AsciiStream(spScope);
sp<RecordGroup> rg_file = spStreamDataset->input(strm);
strm.close();

Accessing src/data data

Safe use of Accessors

To reiterate, an Accessor is used to read and write a particular Attribute. A Layout is an aggregate type, consisting of Attributes. Records are instances of Layouts. Let's set up our example.

Accessor<String> hello_accessor(spScope, "hello");
sp<Layout> spLayout = spScope->declare("layout_for_accessor_example");
spLayout->populate(hello_accessor);
Record record = spScope->createRecord(spLayout);
hello_accessor(record) = "WORLD";
String string_value = hello_accessor(record);

In a heterogeneous data environment, it is useful to check whether a Record or Layout has a particular Attribute before we attempt to use and Accessor.

// Does the record have it?
bool record_has_hello = hello_accessor.check(record);
// Does the layout have it?
bool layout_has_hello = hello_accessor.check(spLayout);
// queryAttribute() can be used to combine checking and accessing
// NULL is returned if the Attribute wasn't there
String *pointer_to_data= hello_accessor.queryAttribute(record);

Using AccessorSets

As the name implies, an AccessorSet is a set of Accessors. AccessorSets can be used as a way to define an implicit or "duck" type. The most common way to define them is by inheriting from the base AccessorSet class, and defining the Attributes in the new set. It is customary to prepend "As" to the class name, indicating that it is an AccessorSet. For a simple example, we will consider a closed 2D shape.

class FE_DL_EXPORT AsShape:
public AccessorSet,
public Initialize<AsShape>
{
public:
    void initialize(void)
    {
        add(perimeter,  FE_USE("perimeter"));
        add(area,       FE_USE("area"));
    }
    Accessor<Real>      perimeter;
    Accessor<Real>      area;
};

Let's create a heterogeneous dataset with Records of two three different Layouts: circle, square, and line segment.

// First, we define our Attributes and create the Accessors.
Accessor<Real> perimeterAccessor(spScope, "perimeter");
Accessor<Real> areaAccessor(spScope, "area");
Accessor<Real> radiusAccessor(spScope, "radius");
Accessor<Real> sideLengthAccessor(spScope, "sideLength");
Accessor<Real> lengthAccessor(spScope, "length");
// Next, we define our Layouts
sp<Layout> spCircleLayout = spScope->declare("circle");
spCircleLayout->populate(perimeterAccessor);
spCircleLayout->populate(areaAccessor);
spCircleLayout->populate(radiusAccessor);
sp<Layout> spSquareLayout = spScope->declare("square");
spSquareLayout->populate(perimeterAccessor);
spSquareLayout->populate(areaAccessor);
spSquareLayout->populate(sideLengthAccessor);
sp<Layout> spCircleLayout = spScope->declare("circle");
spLineLayout->populate(lengthAccessor);
// Then, we create a RecordGroup, and one or more Records
// corresponding to each Layout. we will stick with one each
Record r_circle = spScope->createRecord(spCircleLayout);
radiusAccessor(r_circle) = 3;
perimeterAccessor(r_circle) = 2 * 3.14 * 3;
areaAccessor(r_circle) = 3.14 * 3 * 3;
Record r_square = spScope->createRecord(spSquareLayout);
sideLengthAccessor(r_square) = 3;
perimeterAccessor(r_square) = 3 * 4;
areaAccessor(r_square) = 3 * 3;
Record r_line = spScope->createRecord(spLineLayout);
lengthAccessor(r_line) = 3;
sp<RecordGroup> spRG(new RecordGroup());
spRecordGroup->add(r_circle);
spRecordGroup->add(r_square);
spRecordGroup->add(r_line);
// Finally, we use AsShape to filter the dataset
// This call will return all Records containing the Attributes in
// the AccessorSet AsShape, ie perimeter and area
AsShape asShape;
RecordGroup shapeRecords;
asShape.filter(shapeRecords, spRG);

At this point, shapeRecords will contain two Records: those referred to by r_circle and r_square. The Record referred to by r_line will be filtered out, because it does not match the implied type of AsShape.

It is worth pointing out that, in addition to being able to padd Attributes to a Layout one-at-a-time, we can also populate a Layout with all the Attributes found in an AccessorSet (or many AccessorSets).

sp<Layout> spLayout = spScope->declare("layout");

asShape.populate(spLayout);

Access Patterns

In the code sample xDataPrimer, you can find a simple use case to compare 4 different patterns for accessing src/data data. The example is a 1D simple particle simulation. The Attributes for Records created above were chosen with this case in mind, with each particle having mass, velocity, force, and location.

Four Patterns are covered, with a very simple summary here, with the speed tests occurring on a Linux laptop on 20200529, optimized build.

Pattern	Small dataset	Large Dataset	Comment
Basic	1.13	14.7	Easy to use
RecordArray	1.06	4.3	General purpose, most common method
Layout	0.53	2.1	Requires FE_AV_FASTITER_ENABLE
Compile	0.10	1.2	Requires static topology

Since the examples are spelled out in detail in the code sample, we will reproduce only the high-level summary here. Refer to xDataPrimer for more information.

Basic

This is the easiest method, and is essentially what was done above.

Pattern:

AccessorSet::filter() to get matching Records in RecordGroup
iterate through std::vector of Records

RecordArray

This method is usually faster than the Basic pattern, and allows for more flexibility and control while filtering, but is more involved, and can get downright ugly when nesting.

While RecordArrays are not part of what you see in src/data files, the src/data runtime automatically organizes the the Records with RecordGroups into RecordArrays. There is a RecordArray for each Layout with Records in a RecordGroup.

Pattern:

iterate through RecordArrays in a RecordGroup
iterate through Records in a RecordArray for matching Layout types

Layout

This pattern only works with FE_AV_FASTITER_ENABLE compiled on.

The idea of this method is to iterate through data as direct as possible, as cache friendly as possible. To do this, not only do we need access to the underlying arrays, but the managing of those arrays needs to be such that one can iterate without overhead such as checking for holes. However, FE_AV_FASTITER_ENABLE, which does this, also makes (for now), some operations no longer available. In particular, adding new Attributes to already instantiated Records.

This patten also does not start with a RecordGroup, but is rather full scope oriented, iterating through Layouts.

Pattern:

iterate through Layouts, finding matching ones
access data arrays directly

Compile

The idea of this pattern is to do as much processing overhead just once, and run through repeated access faster. The tradeoff here is that the dataset topology itself cannot change. So this pattern only applies to static topologies.

Pattern:

iterate through data "compiling" a faster access structure
use fast structure