Free Electron
|
The data system is designed to provide a safe robust run-time aggregate type system with high performance for common use cases.
In a runtime aggregate typed data system, the "type" of an instance is defined by an aggregation of attributes. This manifests as both a form of implicit typing (akin to python "duck" typing, where a thing is a thing because it has the attributes of that thing–if it walks like a duck and quacks like a duck...), and an explicit form (where a thing is constructed as a thing with a strict set of attributes).
This leads directly to analogy with other cases, such as Maya, Houdini, USD, etc. However, one of the closest patterns to compare is the Entity Component System (ECS) pattern.
Distilled from https://en.wikipedia.org/wiki/Entity_component_system:
src/data overlaps with the above definition. src/data does not directly facilitate the "System" part, but rather defers the functionality aspect to src/plugin. However, src/data extends the concept of Entity to a Typed Entity, which improves on the ECS pattern's power and can be used for more optimized implementation without losing usability. A good reason to relate src/data to ECS is that ECS has a lot of stuff written about it which applies to src/data, including the value of the run time part of it, and the tradeoffs in such.
The Basic Summary
The terms used in traditional ECS map roughly to these src/data terms.
(1) It's perhaps more accurate to say that System maps to modules created from capabilities provided by src/plugin. That is, src/plugin isn't itself a System, but rather the means with which to make them.
These definitions will be spelled out in more detail below.
Before diving into detail, it may be worth confirming that this data system is well suited for your application. The data system is designed to provide a safe robust run-time aggregate type system with high performance for common use cases.
However, lifecycle management of instances is relatively expensive due to the requirement to be safe and robust. Creation and deletion of Records, even arrays at a time, can be time consuming. Therefore, it is recommended to avoid using the data system for data that is of high quantity but short lifetime.
For example a collisional particle system with maybe 10,000 particles could easily have 50,000 different collisions per frame. The collisions themselves probably should not be dealt with as Records, at least not without culling first or pooling. (ProxMultiGrid is an example of pooling).
Provided that this does not affect your application, let's carry on to the basics.
Scope is the namespace and starting point for src/data structures. In order to use src/data, one usually has to start with a Scope.
Attributes are named and typed. The names are strings, and the namespace for the names is a Scope. The types are FE type system types.
New types can be added first order into the system, even from plugins. Further, the types themselves are also string named. This means that it is possible for the same type to have mulitple names. For example, it is common for I32 to have the names "I32", "integer", and "int".
Accessors are the preferred way to both the define Attributes, and also to interact with instantiated Records (more on that next).
This creates an accessor called hello_accessor, which can access the Attribute "hello," of (C++) type String, within the scope spScope. This accessor allows read/write access to the Attribute "hello," on all Records within the given Scope. The syntax to perform such access is as follows.
Given a Record pointed to by myRecord, we can use hello_accessor to write "world" as the value of the Attribute "hello" within myRecord (for simplicity, we have assumed that myRecord contains this attribute).
A Layout is the (type) of this aggregate type system. It is made up of an aggregation of Attributes.
By contrast, a C++ type, such as unsigned int or class MyClass, is a compiler supported and compiled type-checked "type."
A Layout, on the other hand, is an aggregate of named "C++ types" that is assembled at runtime. An actual Layout may have Attributes added to it at runtime by any number of chosen plugins, allowing for runtime flexibility of aggregate structures, according to the needs of each plugin.
An important aspect is that a Layout doesn't necessarily ever only contain the attributes that a given plugin (or System, presumably) expects. This is consistent with the ducktype nature of it all. Thus, whatever puts attributes into a layout can be assured that any record of that layout will have those attributes, but it cannot make other assumptions about what the layout may or may not have.
A Layout is mutable until a record of that Layout is instantiated, at which point the Layout is locked and cannot be further adjusted (in terms of attributes).
A Record is an instance of an aggregate type as defined by a Layout.
Technically, the C++ class Record is a reference counted reference to a Record instance.
Records are not named in the runtime, however, for referencing purposes in the file format, you may see them named in ascii files.
If no record instances of spLayout existed prior to the above line's execution, spLayout would now become locked, preventing alteration of the aggregate type.
A RecordGroup is a collection of Records. Because Records are really references to instances, there is no problem with having Records within more than one RecordGroup. We can think of a RecordGroup as a dataset or collection of instances. Records within a RecordGroup need not have the same Layout as one another.
This example is not about reading or writing of src/data files. However, we will cover it briefly. One can print the contents of a RecordGroup, including the Attributes and Layouts present in the Records within, using this function.
At this point, the output for look something like what folows:
Additionally, one could load the above into a RecordGroup from an Ascii file, producing the same RecordGroup that we made programmatically. The following command would achieve this.
To reiterate, an Accessor is used to read and write a particular Attribute. A Layout is an aggregate type, consisting of Attributes. Records are instances of Layouts. Let's set up our example.
In a heterogeneous data environment, it is useful to check whether a Record or Layout has a particular Attribute before we attempt to use and Accessor.
As the name implies, an AccessorSet is a set of Accessors. AccessorSets can be used as a way to define an implicit or "duck" type. The most common way to define them is by inheriting from the base AccessorSet class, and defining the Attributes in the new set. It is customary to prepend "As" to the class name, indicating that it is an AccessorSet. For a simple example, we will consider a closed 2D shape.
Let's create a heterogeneous dataset with Records of two three different Layouts: circle, square, and line segment.
At this point, shapeRecords will contain two Records: those referred to by r_circle and r_square. The Record referred to by r_line will be filtered out, because it does not match the implied type of AsShape.
It is worth pointing out that, in addition to being able to padd Attributes to a Layout one-at-a-time, we can also populate a Layout with all the Attributes found in an AccessorSet (or many AccessorSets).
In the code sample xDataPrimer, you can find a simple use case to compare 4 different patterns for accessing src/data data. The example is a 1D simple particle simulation. The Attributes for Records created above were chosen with this case in mind, with each particle having mass, velocity, force, and location.
Four Patterns are covered, with a very simple summary here, with the speed tests occurring on a Linux laptop on 20200529, optimized build.
Pattern | Small dataset | Large Dataset | Comment |
---|---|---|---|
Basic | 1.13 | 14.7 | Easy to use |
RecordArray | 1.06 | 4.3 | General purpose, most common method |
Layout | 0.53 | 2.1 | Requires FE_AV_FASTITER_ENABLE |
Compile | 0.10 | 1.2 | Requires static topology |
Since the examples are spelled out in detail in the code sample, we will reproduce only the high-level summary here. Refer to xDataPrimer for more information.
This is the easiest method, and is essentially what was done above.
Pattern:
This method is usually faster than the Basic pattern, and allows for more flexibility and control while filtering, but is more involved, and can get downright ugly when nesting.
While RecordArrays are not part of what you see in src/data files, the src/data runtime automatically organizes the the Records with RecordGroups into RecordArrays. There is a RecordArray for each Layout with Records in a RecordGroup.
Pattern:
This pattern only works with FE_AV_FASTITER_ENABLE compiled on.
The idea of this method is to iterate through data as direct as possible, as cache friendly as possible. To do this, not only do we need access to the underlying arrays, but the managing of those arrays needs to be such that one can iterate without overhead such as checking for holes. However, FE_AV_FASTITER_ENABLE, which does this, also makes (for now), some operations no longer available. In particular, adding new Attributes to already instantiated Records.
This patten also does not start with a RecordGroup, but is rather full scope oriented, iterating through Layouts.
Pattern:
The idea of this pattern is to do as much processing overhead just once, and run through repeated access faster. The tradeoff here is that the dataset topology itself cannot change. So this pattern only applies to static topologies.
Pattern: