This post is about generating a file reading/writting library from some sort of specifications, in this case in the form of an xml file that details the internal binary structure of the file.
Using this approach, we can define the format in an xml file and create some code generating tools to write code for libraries which read and write in that format for a given language such as C++ or python. These tools will also create some models in the respective programming language that are read and populated by the generated library.
This has the advantage in case the format changes we just run the tool again with the updated format from the xml file. The code for reading the format will be updated and all that we need to do is to recompile it. It also allows people who are not familiar with programming to edit the format without writing code.
I am going to use an open source project called niflib which reads and write 3d models in files for games like Elder Scrolls Skyrim. You can find it on github: https://github.com/Alecu100/niflib.
The format stores all the information in separate blocks inside a file with links between them. Each of these blocks has a type and an index inside the file. The index is used to link a block to a another block. So a block can have a defined field which contains a number representing an index to another block which is used by the current block. Also a block can contain normal fields with represent various properties stored in the file. A field can even be an array. The binary field layout is defined in the xml file.
Some fields of a block can be grouped into a compound type. In this case the compound type will be another field in the block. A compound type cannot be referenced by other blocks and it is contained in a block.
A definition of a block with it's fields looks like this in the xml file:
What's so special is the fact that there are no circular references between blocks.There are actually two types of links between blocks. References which are counted for garbage collection and direct links which are not used for garbage collection. The link types are defined in the format so that there cannot be any sort of circular references. Only direct links can be circular since they are not used by garbage collection.
Inside the code we handle these blocks by a reference. When a reference runs out of scope, it is deleted and the reference counter for the corresponding block is decremented. When it reaches 0 that block can be safely deleted.
Initially when we read from a file we get a single reference to the root block. This root block contain downward references to other child blocks. The references have a tree like structure. So we have only one references to the root node and through it we have references to the rest of the blocks. In this case if for example we stop referencing the root node then it will be deleted since it's reference count will be 0. When the root node is deleted it's references to child nodes are also deleted. Since the child nodes are only referenced through the root node, their reference count will go from 1 to 0 so they will be deleted too.
The tool which generates the code is actually a python script. It has 2 main scripts. A scripts for reading the xml specifications for the format and a helper script that represents a code file with functions to write specific elements into that file such as methods or fields for classes.
This is just a small part of a python method that generates a method for reading, writing etc of a block
Bellow you can see the generated code for the base type used for all blocks that provides reference counting:
And bellow is the partial definition of a block. You can see some special comments "//--BEGIN MISC CUSTOM CODE--//" and "//--END MISC CUSTOM CODE--//" that delimit custom written user code that won't be modified when the format changes:
In this case I presented I used an example which reads and writes from a file. But you can also extend this to read from a network source instead of file. This would work really well in case you have really big distributed and complicated systems with many components that communicate between them but are written in different programming languages.
I think Google uses a technology called "Protocol Buffers" to generate reading and writing libraries from a network.
That's about it. Sorry if the code examples are really long but I could not find anything shorter and I wanted to show real world examples.
Using this approach, we can define the format in an xml file and create some code generating tools to write code for libraries which read and write in that format for a given language such as C++ or python. These tools will also create some models in the respective programming language that are read and populated by the generated library.
This has the advantage in case the format changes we just run the tool again with the updated format from the xml file. The code for reading the format will be updated and all that we need to do is to recompile it. It also allows people who are not familiar with programming to edit the format without writing code.
I am going to use an open source project called niflib which reads and write 3d models in files for games like Elder Scrolls Skyrim. You can find it on github: https://github.com/Alecu100/niflib.
The format stores all the information in separate blocks inside a file with links between them. Each of these blocks has a type and an index inside the file. The index is used to link a block to a another block. So a block can have a defined field which contains a number representing an index to another block which is used by the current block. Also a block can contain normal fields with represent various properties stored in the file. A field can even be an array. The binary field layout is defined in the xml file.
Some fields of a block can be grouped into a compound type. In this case the compound type will be another field in the block. A compound type cannot be referenced by other blocks and it is contained in a block.
A definition of a block with it's fields looks like this in the xml file:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<niobject name="NiGeometryData" abstract="1" inherit="NiObject"> | |
Mesh data: vertices, vertex normals, etc. | |
<add name="Unknown Int" type="int" ver1="10.2.0.0">Unknown identifier. Always 0.</add> | |
<!-- special case for Bethesda PSysData (or NiParticlesData?) in Fallout 3 and higher --> | |
<add name="Num Vertices" type="ushort" cond="!NiPSysData">Number of vertices.</add> | |
<add name="Num Vertices" type="ushort" cond="NiPSysData" vercond="(Version < 20.2.0.7) || (User Version < 11)">Number of vertices.</add> | |
<add name="BS Max Vertices" type="ushort" cond="NiPSysData" vercond="(Version >= 20.2.0.7) && (User Version >= 11)">Bethesda uses this for max number of particles in NiPSysData.</add> | |
<add name="Keep Flags" type="byte" ver1="10.1.0.0">Used with NiCollision objects when OBB or TRI is set.</add> | |
<add name="Compress Flags" type="byte" ver1="10.1.0.0">Unknown.</add> | |
<add name="Has Vertices" type="bool" default="1">Is the vertex array present? (Always non-zero.)</add> | |
<add name="Vertices" type="Vector3" arr1="Num Vertices" cond="Has Vertices">The mesh vertices.</add> | |
<add name="Num UV Sets" type="ushort" vercond="((Version >= 10.0.1.0) && (!((Version >= 20.2.0.7) && (User Version >= 11))))" calculated="1">Flag for tangents and bitangents in upper byte. Texture flags in lower byte.</add> | |
<add name="BS Num UV Sets" type="ushort" vercond="((Version >= 20.2.0.7) && (User Version >= 11))" calculated="1">Bethesda's version of this field for nif versions 20.2.0.7 and up. Only a single bit denotes whether uv's are present. For example, see meshes/architecture/megaton/megatonrampturn45sml.nif in Fallout 3.</add> | |
<add name="Skyrim Material" type="SkyrimHavokMaterial" ver1="20.2.0.7" userver="12" cond="!NiPSysData">Material</add> | |
<add name="Has Normals" type="bool">Do we have lighting normals? These are essential for proper lighting: if not present, the model will only be influenced by ambient light.</add> | |
<add name="Normals" type="Vector3" arr1="Num Vertices" cond="Has Normals">The lighting normals.</add> | |
<add name="Tangents" type="Vector3" arr1="Num Vertices" cond="(Has Normals) && ((Num UV Sets & 61440) || (BS Num UV Sets & 61440))" ver1="10.1.0.0">Tangent vectors.</add> | |
<add name="Bitangents" type="Vector3" arr1="Num Vertices" cond="(Has Normals) && ((Num UV Sets & 61440) || (BS Num UV Sets & 61440))" ver1="10.1.0.0">Bitangent vectors.</add> | |
<add name="Center" type="Vector3">Center of the bounding box (smallest box that contains all vertices) of the mesh.</add> | |
<add name="Radius" type="float">Radius of the mesh: maximal Euclidean distance between the center and all vertices.</add> | |
<add name="Unknown 13 shorts" type="short" arr1="13" ver1="20.3.0.9" ver2="20.3.0.9" userver="131072">Unknown, always 0?</add> | |
<add name="Has Vertex Colors" type="bool"> | |
Do we have vertex colors? These are usually used to fine-tune the lighting of the model. | |
Note: how vertex colors influence the model can be controlled by having a NiVertexColorProperty object as a property child of the root node. If this property object is not present, the vertex colors fine-tune lighting. | |
Note 2: set to either 0 or 0xFFFFFFFF for NifTexture compatibility. | |
</add> | |
<add name="Vertex Colors" type="Color4" arr1="Num Vertices" cond="Has Vertex Colors">The vertex colors.</add> | |
<add name="Num UV Sets" type="ushort" ver2="4.2.2.0">The lower 6 (or less?) bits of this field represent the number of UV texture sets. The other bits are probably flag bits. For versions 10.1.0.0 and up, if bit 12 is set then extra vectors are present after the normals.</add> | |
<add name="Has UV" type="bool" ver2="4.0.0.2"> | |
Do we have UV coordinates? | |
Note: for compatibility with NifTexture, set this value to either 0x00000000 or 0xFFFFFFFF. | |
</add> | |
<add name="UV Sets" type="TexCoord" arr1="(Num UV Sets & 63) | (BS Num UV Sets & 1)" arr2="Num Vertices">The UV texture coordinates. They follow the OpenGL standard: some programs may require you to flip the second coordinate.</add> | |
<add name="Consistency Flags" type="ConsistencyType" ver1="10.0.1.0" default="CT_MUTABLE" vercond="User Version < 12">Consistency Flags</add> | |
<add name="Consistency Flags" type="ConsistencyType" ver1="10.0.1.0" default="CT_MUTABLE" vercond="User Version >= 12" cond="!NiPSysData">Consistency Flags</add> | |
<add name="Additional Data" type="Ref" template="AbstractAdditionalGeometryData" ver1="20.0.0.4" vercond="User Version < 12">Unknown.</add> | |
<add name="Additional Data" type="Ref" template="AbstractAdditionalGeometryData" ver1="20.0.0.4" vercond="User Version >= 12" cond="!NiPSysData">Unknown.</add> | |
</niobject> |
What's so special is the fact that there are no circular references between blocks.There are actually two types of links between blocks. References which are counted for garbage collection and direct links which are not used for garbage collection. The link types are defined in the format so that there cannot be any sort of circular references. Only direct links can be circular since they are not used by garbage collection.
Inside the code we handle these blocks by a reference. When a reference runs out of scope, it is deleted and the reference counter for the corresponding block is decremented. When it reaches 0 that block can be safely deleted.
Initially when we read from a file we get a single reference to the root block. This root block contain downward references to other child blocks. The references have a tree like structure. So we have only one references to the root node and through it we have references to the rest of the blocks. In this case if for example we stop referencing the root node then it will be deleted since it's reference count will be 0. When the root node is deleted it's references to child nodes are also deleted. Since the child nodes are only referenced through the root node, their reference count will go from 1 to 0 so they will be deleted too.
The tool which generates the code is actually a python script. It has 2 main scripts. A scripts for reading the xml specifications for the format and a helper script that represents a code file with functions to write specific elements into that file such as methods or fields for classes.
This is just a small part of a python method that generates a method for reading, writing etc of a block
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def stream(self, block, action, localprefix = "", prefix = "", arg_prefix = "", arg_member = None): | |
lastver1 = None | |
lastver2 = None | |
lastuserver = None | |
lastcond = None | |
lastvercond = None | |
# stream name | |
if action == ACTION_READ: | |
stream = "in" | |
else: | |
stream = "out" | |
# preperation | |
if isinstance(block, Block) or block.name in ["Footer", "Header"]: | |
if action == ACTION_READ: | |
if block.has_links or block.has_crossrefs: | |
self.code("unsigned int block_num;") | |
if action == ACTION_OUT: | |
self.code("stringstream out;") | |
# declare array_output_count, only if it will actually be used | |
for y in block.members: | |
if y.arr1.lhs or (y.ctype in ["BoundingVolume", "ByteArray", "KeyGroup"]): | |
self.code("unsigned int array_output_count = 0;") | |
break | |
if action == ACTION_GETREFS: | |
self.code("list<Ref<NiObject> > refs;") | |
if action == ACTION_GETPTRS: | |
self.code("list<NiObject *> ptrs;") | |
# stream the ancestor | |
if isinstance(block, Block): | |
if block.inherit: | |
if action == ACTION_READ: | |
self.code("%s::Read( %s, link_stack, info );"%(block.inherit.cname, stream)) | |
elif action == ACTION_WRITE: | |
self.code("%s::Write( %s, link_map, missing_link_stack, info );"%(block.inherit.cname, stream)) | |
elif action == ACTION_OUT: | |
self.code("%s << %s::asString();"%(stream, block.inherit.cname)) | |
elif action == ACTION_FIXLINKS: | |
self.code("%s::FixLinks( objects, link_stack, missing_link_stack, info );"%block.inherit.cname) | |
elif action == ACTION_GETREFS: | |
self.code("refs = %s::GetRefs();"%block.inherit.cname) | |
elif action == ACTION_GETPTRS: | |
self.code("ptrs = %s::GetPtrs();"%block.inherit.cname) | |
# declare and calculate local variables (TODO: GET RID OF THIS; PREFERABLY NO LOCAL VARIABLES AT ALL) | |
if action in [ACTION_READ, ACTION_WRITE, ACTION_OUT]: | |
block.members.reverse() # calculated data depends on data further down the structure | |
for y in block.members: | |
# is manual update, Bit 1=Read, Bit 2=Write, Bit 3=Out | |
if y.is_manual_update: | |
if action == ACTION_OUT: continue | |
if action == ACTION_WRITE and (int(y.is_manual_update) & 1 != 0): continue | |
if action == ACTION_READ and (int(y.is_manual_update) & 2 != 0): continue | |
if not y.is_duplicate and action in [ACTION_WRITE, ACTION_OUT]: | |
if y.func: | |
self.code('%s%s = %s%s();'%(prefix, y.cname, prefix, y.func)) | |
elif y.is_calculated: | |
if action in [ACTION_READ, ACTION_WRITE]: | |
self.code('%s%s = %s%sCalc(info);'%(prefix, y.cname, prefix, y.cname)) | |
# ACTION_OUT is in asString(), which doesn't take version info | |
# so let's simply not print the field in this case | |
elif y.arr1_ref: | |
if not y.arr1 or not y.arr1.lhs: # Simple Scalar | |
cref = block.find_member(y.arr1_ref[0], True) | |
# if not cref.is_duplicate and not cref.next_dup and (not cref.cond.lhs or cref.cond.lhs == y.name): | |
# self.code('assert(%s%s == (%s)(%s%s.size()));'%(prefix, y.cname, y.ctype, prefix, cref.cname)) | |
self.code('%s%s = (%s)(%s%s.size());'%(prefix, y.cname, y.ctype, prefix, cref.cname)) | |
elif y.arr2_ref: # 1-dimensional dynamic array | |
cref = block.find_member(y.arr2_ref[0], True) | |
if not y.arr1 or not y.arr1.lhs: # Second dimension | |
# if not cref.is_duplicate and not cref.next_dup (not cref.cond.lhs or cref.cond.lhs == y.name): | |
# self.code('assert(%s%s == (%s)((%s%s.size() > 0) ? %s%s[0].size() : 0));'%(prefix, y.cname, y.ctype, prefix, cref.cname, prefix, cref.cname)) | |
self.code('%s%s = (%s)((%s%s.size() > 0) ? %s%s[0].size() : 0);'%(prefix, y.cname, y.ctype, prefix, cref.cname, prefix, cref.cname)) | |
else: | |
# index of dynamically sized array | |
self.code('for (unsigned int i%i = 0; i%i < %s%s.size(); i%i++)'%(self.indent, self.indent, prefix, cref.cname, self.indent)) | |
self.code('\t%s%s[i%i] = (%s)(%s%s[i%i].size());'%(prefix, y.cname, self.indent, y.ctype, prefix, cref.cname, self.indent)) | |
# else: #has duplicates needs to be selective based on version | |
# self.code('assert(!"%s");'%(y.name)) | |
block.members.reverse() # undo reverse |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class RefObject { | |
public: | |
/*! Constructor */ | |
NIFLIB_API RefObject(); | |
/*! Copy Constructor */ | |
NIFLIB_API RefObject(const RefObject& src); | |
/*! Destructor */ | |
NIFLIB_API virtual ~RefObject(); | |
/*! | |
* A constant value which uniquly identifies objects of this type. | |
*/ | |
NIFLIB_API static const Type TYPE; | |
/*! | |
* Summarizes the information contained in this object in English. | |
* \param[in] verbose Determines whether or not detailed information about large areas of data will be printed out. | |
* \return A string containing a summary of the information within the object in English. This is the function that Niflyze calls to generate its analysis, so the output is the same. | |
*/ | |
NIFLIB_API virtual string asString( bool verbose = false ) const = 0; | |
/*! | |
* Used to determine the type of a particular instance of this object. | |
* \return The type constant for the actual type of the object. | |
*/ | |
NIFLIB_API virtual const Type & GetType() const; | |
/*! | |
* Used to determine whether this object is exactly the same type as the given type constant. | |
* \return True if this object is exactly the same type as that represented by the given type constant. False otherwise. | |
*/ | |
NIFLIB_API bool IsSameType( const Type & compare_to ) const; | |
/*! | |
* Used to determine whether this object is exactly the same type as another object. | |
* \return True if this object is exactly the same type as the given object. False otherwise. | |
*/ | |
NIFLIB_API bool IsSameType( const RefObject * object ) const; | |
/*! | |
* Used to determine whether this object is a derived type of the given type constant. For example, all NIF objects are derived types of NiObject, and a NiNode is also a derived type of NiObjectNET and NiAVObject. | |
* \return True if this object is derived from the type represented by the given type constant. False otherwise. | |
*/ | |
NIFLIB_API bool IsDerivedType( const Type & compare_to ) const; | |
/*! | |
* Used to determine whether this object is a derived type of another object. For example, all NIF objects are derived types of NiObject, and a NiNode is also a derived type of NiObjectNET and NiAVObject. | |
* \return True if this object is derived from the type of of the given object. False otherwise. | |
*/ | |
NIFLIB_API bool IsDerivedType( const RefObject * objct ) const; | |
/*! | |
* Formats a human readable string that includes the type of the object, and its name, if it has one. | |
* \return A string in the form: address(type), or adress(type) {name} | |
*/ | |
NIFLIB_API virtual string GetIDString() const; | |
/*! | |
* Returns the total number of reference-counted objects of any kind that have been allocated by Niflib for any reason. This is for debugging or informational purpouses. Mostly usful for tracking down memory leaks. | |
* \return The total number of reference-counted objects that have been allocated. | |
*/ | |
NIFLIB_API static unsigned int NumObjectsInMemory(); | |
/*! | |
* Increments the reference count on this object. This should be taken care of automatically as long as you use Ref<T> smart pointers. However, if you use bare pointers you may call this function yourself, though it is not recomended. | |
*/ | |
NIFLIB_API void AddRef() const; | |
/*! | |
* Decriments the reference count on this object. This should be taken care of automatically as long as you use Ref<T> smart pointers. However, if you use bare pointers you may call this function yourself, though it is not recomended. | |
*/ | |
NIFLIB_API void SubtractRef() const; | |
/*! | |
* Returns the number of references that currently exist for this object. | |
* \return The number of references to this object that are in use. | |
*/ | |
NIFLIB_API unsigned int GetNumRefs(); | |
} |
And bellow is the partial definition of a block. You can see some special comments "//--BEGIN MISC CUSTOM CODE--//" and "//--END MISC CUSTOM CODE--//" that delimit custom written user code that won't be modified when the format changes:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class NiGeometryData : public NiObject { | |
public: | |
/*! Constructor */ | |
NIFLIB_API NiGeometryData(); | |
/*! Destructor */ | |
NIFLIB_API virtual ~NiGeometryData(); | |
/*! | |
* A constant value which uniquly identifies objects of this type. | |
*/ | |
NIFLIB_API static const Type TYPE; | |
/*! | |
* A factory function used during file reading to create an instance of this type of object. | |
* \return A pointer to a newly allocated instance of this type of object. | |
*/ | |
NIFLIB_API static NiObject * Create(); | |
/*! | |
* Summarizes the information contained in this object in English. | |
* \param[in] verbose Determines whether or not detailed information about large areas of data will be printed out. | |
* \return A string containing a summary of the information within the object in English. This is the function that Niflyze calls to generate its analysis, so the output is the same. | |
*/ | |
NIFLIB_API virtual string asString( bool verbose = false ) const; | |
/*! | |
* Used to determine the type of a particular instance of this object. | |
* \return The type constant for the actual type of the object. | |
*/ | |
NIFLIB_API virtual const Type & GetType() const; | |
//--BEGIN MISC CUSTOM CODE--// | |
protected: | |
/*! The mesh vertex indices. */ | |
vector<int > vertexIndices; | |
/*! The mapping between Nif & Max UV sets. */ | |
std::map<int, int> uvSetMap; // first = Max index, second = Nif index | |
public: | |
//--Counts--// | |
/*! | |
* Returns the number of verticies that make up this mesh. This is also the number of normals, colors, and UV coordinates if these are used. | |
* \return The number of vertices that make up this mesh. | |
* \sa IShapeData::SetVertexCount | |
*/ | |
NIFLIB_API int GetVertexCount() const; | |
/*! | |
* Returns the number of texture coordinate sets used by this mesh. For each UV set, there is a pair of texture coordinates for every vertex in the mesh. Each set corresponds to a texture entry in the NiTexturingPropery object. | |
* \return The number of texture cooridnate sets used by this mesh. Can be zero. | |
* \sa IShapeData::SetUVSetCount, ITexturingProperty | |
*/ | |
NIFLIB_API short GetUVSetCount() const; | |
/*! | |
* Changes the number of UV sets used by this mesh. If the new size is smaller, data at the end of the array will be lost. Otherwise it will be retained. The number of UV sets must correspond with the number of textures defined in the corresponding NiTexturingProperty object. | |
* \param n The new size of the uv set array. | |
* \sa IShapeData::GetUVSetCount, ITexturingProperty | |
*/ | |
NIFLIB_API void SetUVSetCount(int n); | |
/*! | |
* Returns the number of vertec indices that make up this mesh. | |
* \return The number of vertex indices that make up this mesh. | |
* \sa IShapeData::SetVertexIndexCount | |
*/ | |
NIFLIB_API int GetVertexIndexCount() const; | |
//--END CUSTOM CODE--// | |
protected: | |
/*! Unknown identifier. Always 0. */ | |
int unknownInt; | |
/*! Number of vertices. */ | |
mutable unsigned short numVertices; | |
/*! Bethesda uses this for max number of particles in NiPSysData. */ | |
unsigned short bsMaxVertices; | |
/*! Used with NiCollision objects when OBB or TRI is set. */ | |
byte keepFlags; | |
/*! Unknown. */ | |
byte compressFlags; | |
/*! Is the vertex array present? (Always non-zero.) */ | |
bool hasVertices; | |
/*! The mesh vertices. */ | |
vector<Vector3 > vertices; | |
/*! Flag for tangents and bitangents in upper byte. Texture flags in lower byte. */ | |
mutable unsigned short numUvSets; | |
/*! | |
* Bethesda's version of this field for nif versions 20.2.0.7 and up. Only a single | |
* bit denotes whether uv's are present. For example, see | |
* meshes/architecture/megaton/megatonrampturn45sml.nif in Fallout 3. | |
*/ | |
mutable unsigned short bsNumUvSets; | |
} |
In this case I presented I used an example which reads and writes from a file. But you can also extend this to read from a network source instead of file. This would work really well in case you have really big distributed and complicated systems with many components that communicate between them but are written in different programming languages.
I think Google uses a technology called "Protocol Buffers" to generate reading and writing libraries from a network.
That's about it. Sorry if the code examples are really long but I could not find anything shorter and I wanted to show real world examples.
Comments
Post a Comment