Libraries · Optimization · Tips & Tricks · Uncategorized

An XML DOM with just 8 bytes per node

I present a small XML library that parses or creates XML documents with an extremely light-weight DOM that uses just 8 bytes per node. This post shows some of the interesting algorithms that make this possible.

Another XML Library?

I know what you are thinking: not another XML library! And you would be right, especially considering that XML is becoming more obsolete each year in favor of arguably better formats like JSON and YAML. But there is still a lot of XML out there, both old and new, and it will remain relevant for years to come. Besides, this post is not about presenting another XML library, but about some of the algorithms it uses to create a very lightweight DOM. You may find (parts of) those algorithms useful in other situations as well. But if you don’t care about that and just want to see how this library holds up against other libraries, then feel free to skip to the end for a comparison.

DOM Representations

As you probably know, an XML DOM is a tree of nodes, where nodes are usually either elements or text. Each node needs to hold the following information:

  • The node type. We support 4 node types: Element, Text, Comment and CData. XML has some other node types such as processing instructions, declarations and document type declarations, but we don’t support those.
  • The parent node.
  • For Element nodes:
    • A list of child nodes.
    • A list of attributes.
    • The element name.
  • For Text, Comment and CData nodes:
    • The string value.

So how are you going to fit all this information into just 8 bytes? Well, obviously you can’t: a single pointer to a parent node for example is already 8 bytes on a 64-bit system. But we can try to fit most of this information into 8 bytes.

Key here is to store this information in a slightly different way. For example, storing a list of child nodes and a list of attributes is inefficient. The reference to the list object itself is already 8 bytes on 64-bit systems. Furthermore, these lists allocate additional dynamic memory to store its contents. Another way to store this information is to use a linked list instead. This is nothing new and many other XML libraries use this approach. In this case, each XML node holds the following information:

  • Node type.
  • Parent node.
  • For Element nodes:
    • Pointer to the first child node.
    • Pointer to the next sibling node.
    • Optional pointer to the previous sibling node (to create a doubly linked list).
    • Pointer to the first attribute.
    • Element name.
  • For Text, Comment and CData nodes:
    • The string value.
  • And for each attribute:
    • Pointer to the next attribute.
    • Attribute name.
    • Attribute value.

At first sight, this only seems to make things worse, since you are storing a lot of pointers now for each node.

Storing a pointer to the previous sibling node is not required, but it improves performance for certain node operations (like deleting nodes). By using a previous sibling, you are creating a doubly linked list, which can avoid linear searches when looking for nodes.

But this organization is key to squishing everything down into 8 bytes. We’ll see how a bit later.

Why 8 Bytes?

But why this seemingly arbitrary choice to try to store all this information into 8 bytes? Well, I got interested in this issue when looking at pugixml library. This is a popular XML library for C++ that supports a “compact mode”. In compact mode, each node is stored in just 12 bytes. Basically, I wanted to see if I could do better. And also, 12 bytes didn’t seem like an ideal number to me, not being a power of 2 and all.

Another motivation is that I wanted to reduce memory fragmentation by allocating complete memory pages at a time to store the nodes. Then, instead of freeing individual nodes, I could just free a complete memory page, freeing a whole bunch of nodes at once. This model avoids small memory allocations and reduces memory fragmentation.

On all operating systems that Delphi supports, memory pages are at least 4096 bytes (4 KB) in size. On some operating systems, the size of a memory page may be larger, but always a multiple of 4096. In my testing, memory pages are 4 KB on macOS and Android and 16 KB on 64-bit iOS. On Windows, you actually reserve larger blocks that are 64 KB in size (called the ‘allocation granularity’), and then allocate (commit) smaller 4 KB pages in these larger blocks.

The XML library doesn’t assume these pages sizes. It uses GetSystemInfo on Windows to check the allocation granularity and page size, and uses sysconf(_SC_PAGESIZE) on all other platforms to get the page size. To allocate pages, we use VirtualAlloc on Windows and mmap on all other platforms.

So the library allocates 4096 bytes at a time. Using a node size of 8 bytes, this fits 512 nodes.

From Pointers to Indices

So let’s assume for now that our XML documents contain at most 512 nodes. This makes the following discussion a bit easier to follow. I will show later how this limitation is removed.

In this case, we can store all these nodes into a single memory page. This has the following important implication: instead of each node storing pointers to other nodes, we could store indices to other nodes in the same memory page instead. Since a page can hold up to 512 nodes, an index to another node is in the range 0..511. And it takes just 9 bits to store a value between 0 and 511. So instead of using 32 or 64 bits (depending on platform) to store a pointer, we use just 9 bits instead (regardless of whether you are running on a 32-bit or 64-bit OS).

A Compact Element

So let’s see how we can partition the 64 bits of a node to store as much information as possible. For Element nodes, we store the following:

Field#Bits
Node type2
Index of parent node9
Index of first child node9
Index of next sibling node9
Index of previous sibling node9
Index of first attribute9
Index of element name17

Since we support 4 node types, 2 bits suffice to store the type. We store attributes in the same way as nodes; that is, each attribute is also stored in an 8-byte segment of a memory page. So we also just need 9 bits to store the index of the first attribute.

Index 0 is reserved to represent a nil-node or nil-attribute. So if an element has no attributes, the “Index of first attribute” field will be 0.

This amounts to 2+5×9=47 bits to store the node type and indices. This leaves 17 bits to store the element name. Of course, we cannot store a string in these 17 bits, but we can use these 17 bits as an index into a list of strings. This has the additional advantage that elements with the same name all use the same index the string list, and the actual string value is only stored once in total, instead of once for each element with that name. Since many database-like XML files use the same element names over and over again, this can lead to considerable savings.

This way of storing strings only once is similar to “string interning”, and is also used in some other libraries.

With 17 bits, we can store up to 131,072 strings in the string list (where index 0 is reserved for an empty string). This doesn’t mean that your XML document can’t have more than 131,072 nodes though: it means that your XML document can’t have more than 131,072 distinct element names, which would be very unlikely.

This is not an entirely accurate statement, since the library uses the same string list to store attribute names (for the same reason). But let’s not overcomplicate things for this discussion.

A Compact Text/Comment/CData Node

For these types of nodes, we don’t need to store a name and list of child nodes and attributes. So the layout could be as follows:

Field#Bits
Node type2
Index of parent node9
Index of next sibling node9
Index of previous sibling node9
String value35

After the required bits for node type and indices, 35 bits remain to store the string value somehow. We could use an index into a string list for this, like we did for element names, but in many cases this is not needed. And since the contents of text nodes is much more varied than element names, there isn’t much to be gained from string interning.

However, a string itself is basically just a pointer to some reference counted piece of memory. So on 32-bit systems, these 35 bits are more than enough to store the string pointer, and that is what we do.

We store the string pointer in such a way that it increases the reference count of the string, so the string will remain alive until we no longer need the node. At that point, we decrease the reference count.

On 64-bit systems though, the value of a string pointer can theoretically take any 64-bit value, and thus wouldn’t fit into 35 bits. You also cannot assume that these values fit into 32 bits even if your app uses far less than 4 GB of memory. For example, I noticed that on some Android devices, the pointer values of strings often exceed the 4 GB virtual address space.

But it is safe to assume that the addresses of almost all strings lie within a 4 GB range of some other arbitrary string. So the library calculates the difference between the actual string pointer and some other arbitrary dynamically allocated memory address. If the result fits in 32 bits, then it is stored in 32 of the 35 available bits. In the (very unlikely) case that the result doesn’t fit, we set one of the 35 bits to 1 to indicate that the string doesn’t fit, and use 32 of the other 34 bits to store an index into a special string list that is used just for this scenario. This all complicates things a bit, but is still safe and efficient.

A Compact Attribute

Finally, we need storage space for attributes. An attribute contains a name/value pair and an index to the next attribute in the list:

Field#Bits
Index of next sibling (attribute)9
Index of attribute name17
Attribute value38

Attributes are stored along nodes in a memory page. So each attribute uses 8 bytes as well. Again, we use 9 bits as the index of the next attribute. And as we did for element names, we use 17 bits for an index into the (shared) list of attribute and element names.

That leaves 38 bits for the attribute value. This value is stored in the same way as strings are stored for text, comment and CData nodes as described above.

A Simple Example

So, what would this look like in practice? Consider the following small XML document:

<foo>
  <bar>baz</bar>
  <bar a1="val1" a2="val2" />
</foo>

This results in the following memory page layout:

This may look a bit complicated but it isn’t too bad if you look at it one piece at a time. The 4096 byte memory page is divided into 512 segments of 8 bytes each (numbered from 0 to 511 in the image). The first segment (segment 0) is not used because index 0 represents a nil-pointer. The image shows the 3 XML elements in light blue, the text node in light green, and the 2 attributes in light red. The arrows indicate how these nodes and attributes are connected to each other.

Elements and attributes have a name index (N) that refers to element and attribute names in a string list (shown in light orange). Text nodes and attributes also have a string value (S) which points directly to a Delphi string as discussed earlier.

TXmlNode API

Let’s take a quick look at how all this can be used to create an XML node API. Since all information is stored in a 64-bit segment of a memory page, an XML node is nothing more than a pointer to a 64-bit integer (PUInt64). The library wraps this inside a record called TXmlNode:

type
  TXmlNodeType = (Element, Text, Comment, CData);

type
  TXmlNode = record
  private
    FNode: PUInt64;
    function GetNodeType: TXmlNodeType; inline;
    function GetParent: TXmlNode;
    ...
  public
    property NodeType: TXmlNodeType read GetNodeType;
    property Parent: TXmlNode read GetParent;
    ...
  end;

As mentioned, the NodeType can be stored using just 2 bits, and “pointers” to other nodes (like Parent) are stored as 9-bit integer indices. By convention, we store the NodeType in the lowest 2 bits of the 64-bit integer, so the property getter looks like this:

function TXmlNode.GetNodeType: TXmlNodeType;
begin
  if (FNode = nil) then
    Result := TXmlNodeType.Element
  else
    Result := TXmlNodeType(FNode^ and 3);
end;

We perform a sanity check first to check if the node is nil. In this library, it is perfectly legal for the underlying pointer to be nil. This means that the actual node is a nil-node. It is also perfectly legal to call methods and use properties of a nil-node. This way, you don’t have to check if a node is nil each time before you use it, which can simplify your code considerably.

We take a look at the implementation of GetParent later.

Large Documents

So far we have assumed that all nodes and attributes fit into a single 4 KB memory page. Of course, that is not always the case. In fact, this compact memory layout is especially useful for (very) large documents, otherwise there would be little point in coming up with this design.

Before looking at a solution for this problem, we need to take a look at how a node is able to locate another (child or parent) node in the same memory page. Normally, the only information we have is just the index of another node in the same memory page. How can we use this index to locate the node? If there is just one memory page, then we can just use the address of this memory page and increment it with the value 8 x Index. But if there are multiple memory pages, then this doesn’t work.

From Node to Memory Page

The node needs to know to which memory page it belongs, so it can use the address of that memory page to locate other nodes. We cannot store the memory page with each node, since we already used up all 64 bits to store other information. However, since we allocate complete memory pages, we don’t actually need to store this information with each node. The reason for allocating complete pages is that these are always aligned to an address that is a multiple of the page size. In this case, this means that a memory page is always aligned to a 4 KB boundary. We can then use this fact to find the memory page that a node belongs to by just rounding down the address of the node to a 4 KB value. For example, when a node points to an 8-byte segment at address $1FB53248, then it belongs to a memory page that starts at address $1FB53000. Since 4 KB is $1000 in hexadecimal, this means we can just set the lowest 3 hexadecimal digits to 0 to find the memory page. In code, this can quickly be done with a bitwise AND operator:

const
  BLOCK_SIZE = 4096;

function TXmlNode.GetBlock: PByte;
begin
  Result := PByte(UIntPtr(FNode) and not (BLOCK_SIZE - 1));
end;

The library uses the term “block” instead of “page” since it is possible that a physical memory page contains multiple 4 KB blocks (for example, memory pages on iOS are 16 KB in size, and thus fit 4 blocks).

Linking to Nodes in another Memory Page

Now that we know how to find the memory page for a node, we can tackle the problem of needing multiple pages. The solution I used is to reserve another index value for the case that a referenced node is in another page. Remember, indices take on values between 0 and 511. We already reserved 0 to represent a nil-node. The library uses value 511 to indicate that the referenced node is in another page. This is kind of an arbitrary value, but it makes some logical sense.

Whenever index 511 is encountered, it uses a hash map (a kind of dictionary) to lookup the referenced node instead. The key of the hash map is a combination of the address of the source node, and a number that indicates what kind of node we are looking up (for example, the parent, first child or next sibling). The value of the hash map is the referenced node (that is, a pointer to a 64-bit integer in another memory page). Using a hash map for this purpose adds some memory and CPU overhead. But the idea is that the majority of node references will be in the same memory page, so the hash map is only used in the relatively few situations that this is not the case.

But how does the node know where to find the hash map? Again, all 64 bits for the node storage are already used. Here we can use another trick: Since indices 0 and 511 are not used, this means that the first 8 bytes and the last 8 bytes of each memory page aren’t used either. So we use the first 8 bytes to store a pointer to the XML document object that owns the memory page. The last 8 bytes are currently not used.

We haven’t talked about an XML document class yet. This is the only public class in the library. It wraps the IXmlDocument interface and owns all memory pages (and thus nodes). It also owns the hash map that is used to locate out-of-page nodes.

TXmlNode.GetParent Example

With this information, lets take a look at how the getter for the Parent property is implemented. As a recap, the lowest 2 bits of a node represent the node type (see the TXmlNode.GetNodeType example earlier). By convention, the next 9 bits are the index of the parent node:

const
  NIL_BITS  = 0;
  HASH_BITS = $1FF;

function TXmlNode.GetParent: TXmlNode;
begin
  if (FNode = nil) then
    Result.FNode := nil
  else
  begin
    var Bits: UIntPtr := (FNode^ shr 2) and $1FF;
    if (Bits = NIL_BITS) then
      Result.FNode := nil
    else if (Bits = HASH_BITS) then
    begin
      var Doc := GetDocument;
      Result.FNode := Doc.FPointerMap.Get(FNode, ID_PARENT);
    end
    else
    begin
      var Block := GetBlock;
      Result.FNode := Pointer(Block + (Bits shl 3));
    end;
  end;
end;

Lets look at this step-by-step:

  • Again, first we check if the node itself is nil, and return the default value if that is the case. Remember that this enables to user to safely access properties and methods of a nil-node.
  • Otherwise, we retrieve the 9 index bits. This can easily and quickly be done with some bit shifting and a bitwise AND operation. Since the lowest 2 bits are used for the node type, we need to shift the value 2 bits to the right, and then AND the result with $1FF (511) to extract the lowest 9 bits.
  • If the index has the predefined value 0 (NIL_BITS), then it references a nil-node.
  • Otherwise, if the index has the other predefined value 511 (HASH_BITS) then the hash map is used to locate the parent (ID_PARENT). To get to the hash map, we first need to get to the document that owns this node. As mentioned earlier, a reference to the document is stored in the first 8 bytes of the memory page.
  • For all other indices, the parent node is in the same memory page (block). So we retrieve the block (as shown earlier) and add 8 times the index (using a left shift by 3) to locate the parent node.

These are the main algorithms used to create a very light-weight DOM. For more details you can check the source code.

Performance Comparison

So how much memory does this library save compared to other XML libraries? Before I get into that, first some notes about this library.

The library is a personal project of mine called Neslib.Xml, available on GitHub. It is not only light-weight in memory (and CPU) usage, but also in features. It has less features than many other XML libraries and only focuses on the core features that should satisfy most use cases:

  • It can read and write XML to from and to files, streams or strings.
  • It supports source files in UTF-8, UTF-16 (BE and LE) and UTF-32 (BE and LE) formats, but only through a BOM marker. It ignores any “encoding” attributes in the XML declaration. If there is no BOM marker, it assumes UTF-8. File output is always in UTF-8 format.
  • It supports comments and CData sections.
  • It has the usual APIs to traverse a DOM, create a DOM and search for immediate children by element or attribute name.
  • It can be compiled in either Unicode (default) or UTF-8 mode for even lower memory usage.

But it doesn’t support the following features:

  • There is no explicit support for namespaces. You can still use namespaces, but they are just considered part of a name.
  • There is no support for DTDs (<!DOCTYPE ...>) or validating against a DTD.
  • There is no support for custom entity references. Only the standard character references (&amp;, &lt; etc.) are supported.
  • There is no support for processing instructions (<? ... ?>) and declarations (<?xml ...?>). They may still be present in an XML file, but are ignored. When writing an XML file, a standard XML declaration is used.
  • There is no support for XPath or XSLT.

If any of those features are important to you, then check out the other libraries used for the comparisons.

Candidates

The following other XML libraries are used for the comparisons below:

You can reproduce these performance tests yourself using the XmlPerfTests app in the Neslib.Xml repository. Please let me know if you think I implemented one or more tests incorrectly. I want to be fair in my comparisons.

Memory Usage Tests

These tests load the 24 MB nasa.xml file from the XML Data Repository into a DOM. For the Neslib.Xml library, this is done with the following code:

var Document := TXmlDocument.Create;
Document.Load('nasa.xml');

Note that TXmlDocument.Create is a class function that returns an IXmlDocument interface, so memory management is automatic.

As I hoped, Neslib.Xml has the lowest memory usage. In UTF-8 mode, it uses only slightly more memory than the size of the XML document itself. Also, the differences between the 32-bit and 64-bit versions is pretty small. This is mostly because Neslib.Xml uses 9-bit indices to reference other nodes, regardless of platform. Whereas other libraries mostly use pointers, which are twice as large on 64-bit platforms.

I only used the 32-bit version of OXml for testing because the demo version is not available for 64-bit Windows.

Load Time Tests

I expected that loading a large XML file would take longer in my library, because of the additional code needed to manage 9-bit indices:

But the opposite is true. It looks like the CPU time needed for node management is more than compensated for by decreased memory usage and fragmentation. I assume this is mostly because my approach is more cache-friendly and results in less (expensive) cache misses. Only DIXml (which is a wrapper around libxml2) performs comparably or slightly better.

This chart is truncated at 2,500 ms because SimpleXML takes over 90,000 ms (or 90 seconds) to load, which would make the chart unreadable otherwise.

Traversal Tests

The traversal tests just iterate over all nodes and attributes in the document. For the Neslib.Xml library, this is done using the following recursive code:

procedure Traverse(const ANode: TXmlNode);
begin
  var Attr := ANode.FirstAttribute;
  while (Attr <> nil) do
  begin
    MarkAttribute(Attr.Name, Attr.Value);
    Attr := Attr.Next;
  end;

  var Child := ANode.FirstChild;
  while (Child <> nil) do
  begin
    case Child.NodeType of
      TXmlNodeType.Element:
        begin
          MarkElement(Child.Value);
          Traverse(Child);
        end;

      TXmlNodeType.Text:
        MarkText(Child.Value);
    end;
    Child := Child.NextSibling;
  end;
end;

The Mark* methods are used to count the number of elements, attributes and text nodes in the document. These counts are used to check if the document loaded correctly.

The library also supports enumerators, which you may find easier to use to enumerate nodes and attributes.

Traversal is also very fast, on par with DIXml and OXml.

Again, this chart is truncated at 250 ms since the Delphi and MSXML implementations take over 2400 ms and 1200 ms respectively.

Query Tests

The query tests walk through all the datasets in the document, and for each dataset is searches for the first element with name ‘tableHead’, and searches that element for the first element with name ‘fields’ etc., for a few levels deep. It finally checks if a ‘units’ element has a text node with value ‘arcsec’. If so, it increments a counter.

I have written these tests without using XPath, since not all libraries (including mine) support XPath. For Neslib.Xml, the query is done using this code:

function QueryArcsecFields: Integer;
begin
  Result := 0;
  var Datasets := FDocument.DocumentElement;
  var Dataset := Datasets.FirstChild;
  while (Dataset <> nil) do
  begin
    var Units := Dataset.ElementByName('tableHead')
                        .ElementByName('fields')
                        .ElementByName('field')
                        .ElementByName('units');

    var Text := Units.FirstChild;
    if (Text.NodeType = TXmlNodeType.Text) 
      and (Text.Value = 'arcsec')
    then
      Inc(Result);

    Dataset := Dataset.NextSibling;
  end;
end;

Note that any of the ElementByName methods can return a nil-node. It is perfectly legal to use methods or properties on a nil-node, so you don’t have to check for this every step. This can simplify your code considerably.

Again, Neslib.Xml is very fast for this use case.

And again, the chart is truncated because MSXML takes over 75 ms for this test.

Cleanup Tests

Finally, I measured how long it takes to clean up (or destroy) a document. Since an XML document is an interface in Neslib.Xml (as it is in many other libraries), this is just a matter of setting this interface reference to nil.

Here, the record-based OXml implementation beats Neslib.Xml and is about twice as fast. But still, Neslib.Xml is very fast compared to most other libraries.

I had to truncate the chart once more because the default Delphi implementation takes over 900 ms here.

In Closing

If you regularly work with large XML files and need to load them into a DOM, then try out Neslib.Xml. It uses some interesting algorithms to consume as little memory as possible. The library is not as feature-rich as many other ones, but it may be sufficient for your needs.

31 thoughts on “An XML DOM with just 8 bytes per node

  1. Good job on that XML library and very nice idea with the reduced “pointer” system. And looks like I have to take a look why VerySimpleXml doesn’t load the file correctly.

    Like

    1. Seems the nasa.xml is “broken” in some way. In line 656394 you’ll find a text containing this:
      “for the non-emission population and T_X_>4keV at 90% confidence).”
      An unescaped >. SO it’s possible your lib has the same issue as mine (as i just noticed) and gets confused on this. It seems they escape
      Is this even ok?!

      Like

    2. Seems the XML is broken in some way. In line 656395 you’ll find this:
      “for the non-emission population and T_X_>4keV at 90% confidence).”
      An unescaped >
      My own lib seems to have an issue with this.

      Like

  2. Good job on that XML library and very nice idea with the reduced “pointer” system. And looks like I have to take a look why VerySimpleXml doesn’t load the test file correctly.

    Like

    1. Glad you like the library. Sorry about putting you on the spot like this. Your library does load the XML file, but I come up with different counts for the number of text nodes compared to my and the other libraries. Maybe because your library handles whitespace differently. I haven’t look into this.

      You can check for yourself by running the XmlPerfTests app in the Neslib.Xml repo. Please let me know if I am using your library incorrectly, or if I should configure it differently to match the expected output. I would be happy to update the results accordingly.

      Like

      1. I’ll take a look, aside from that you don’t need to use a that complicated QueryArcsecFields in the VerySimpleXml-test (just a while-loop like yours should work, too). But I’ll first see why my lib has a lot more nodes than expected 😉

        Like

  3. I also made a XML parser: https://github.com/benibela/internettools

    But for FreePascal and primarily to load HTML.

    But my nodes are 96 bytes. They used to be smaller when I did not had support for namespaces and processing instructions (which are useless for HTML), but then I had to add them to have fully standard conformant XPath support.

    I need to make them smaller like you.

    I wonder if mmap is really portable to all platforms and would work in FreePascal

    Like

    1. As far as Delphi is concerned, map works on all supported operating systems except Windows. Don’t know about the additional platforms that FreePascal supports, but as long as they are based on a Linux/Posix kernel, map should be available. Otherwise, you would need to check the documentation for that OS.

      Like

      1. Besides the OS there might be conflicts with FreePascal’s memory manager. Perhaps the manager assumes it can allocate some pages and crashes if it can’t, when they are already allocated outside the memory manager? But it does not look like it, FreePascal also just calls mmap with a nil address (there the mmap function is called fpmmap as wrapper around the syscall) . And on Windows they call HeapAlloc.

        I have been thinking about using this for my XPath interpreter. But there are three things missing for conformant XPath. Namespaces, processing instructions and perhaps most importantly a document order comparison.

        XPath 2+ uses lists of nodes, not sets of nodes, and the lists needs to be sorted in document order. How could you efficiently test whether one node comes before another node in the document? Enumerate all their parents and then check for duplicates? That is slow.
        Keeping the nodes in each block sorted would work well. Then nodes within the same one block just need a pointer comparison. And each block could have an index (in the unused space on the last 8-bytes of it?). The nodes are probably sorted after loading the xml file? But inserting nodes would mess the order up. It would need to move all following nodes on an insert. Or just create a new block after the block with the insertion and move the nodes there.

        Sorted nodes can be used for other things. You would not need the next attribute index anymore. The next attribute would always be the next node in the block or the first node of the next block. That could be used to allow more names or something.

        Then processing instructions. They could be added rather straightforwardly by removing CDATA nodes. There is no need for CDATA nodes, they are just text nodes. Nevertheless text nodes also have two bits to spare, they could be used to mark the text node as CDATA text node.

        How would it affect the performance when the offset between the actual string and the base address is calculated with XOR? XOR is simpler than additional, but if x86 uses lea for the addition, the addition might be faster? Unless Delphi adds an overflow check for the addition. And on ARM simpler operations should be faster

        You also could write an academic paper about it

        Like

      2. I don’t know about the internals of the FreePascal memory manager. The library requires that memory pages are aligned to a 4KB boundary. That’s why I use VirtualAlloc on Windows and mmap on other platforms, since these APIs guarantee that memory is aligned. I don’t know why using mmap directly would interfere with FreePascal’s memory manager.

        I intentionally want to keep this library small, and had to drop some features (like explicit namespaces and processing instructions) to fit things in 64 bits.

        When you parse an XML file, the nodes are in the order as they appear in the XML file (both logically and in memory). When you manually add nodes to an existing XML DOM, you can decide where you want to place it (to keep it sorted using your requirements). That way it will still be in “logical” order, but the “physical” order in memory may be different after a mutation (especially if you also delete nodes).

        The attribute index is needed for the same reason: if you delete or add attributes, they may not be next to each other in memory anymore (for example, if you add a node, and then add an attribute to another previous node, then the memory for the attributes will not be continuous anymore).

        I haven’t tried using XOR for calculating the string offset. This is an interesting idea. It may improve performance a bit, although often XOR and ADD use the same amount of CPU cycles. Also, this calculation is only needed on 64-bit systems. Would be an interesting experiment though…

        Like

      1. Yes, I definitely will keep using inline variables. Was a bit reluctant with Delphi 10.3, when IDE support was not that great. But in Delphi 10.4 with the new Language Server Protocol, inline variables work very nicely. Especially when used with type inference, which can save code and time.

        We do not try to make our code work with FreePascal or old Delphi versions. We are not component vendors that keep a variety of Delphi versions installed. We just share some of the code that we use for our day jobs. And in our day jobs, we use the latest Delphi version.

        But you can always fork this repository and make it compatible with FreePascal or older Delphi versions. The license is very liberal…

        Like

  4. >I don’t know why using mmap directly would interfere with FreePascal’s memory manager.

    Then it probably does not interfere.

    Do you call munmap anywhere?

    >I intentionally want to keep this library small, and had to drop some features (like explicit namespaces and processing instructions) to fit things in 64 bits.

    But there are still unused bits, so more things could be added

    >When you parse an XML file, the nodes are in the order as they appear in the XML file (both logically and in memory). When you manually add nodes to an existing XML DOM, you can decide where you want to place it (to keep it sorted using your requirements). That way it will still be in “logical” order, but the “physical” order in memory may be different after a mutation (especially if you also delete nodes).

    I do not want to sort the nodes in the DOM.

    When I have an unsorted array of nodes (that are already in the DOM), I need to sort the array, so that the nodes in the array have the same order as the nodes in the DOM.

    So that needs a quick way to compare nodes according to that order

    >But you can always fork this repository and make it compatible with FreePascal or older Delphi versions. The license is very liberal…

    Perhaps I will reimplement it from this blog post. Eventually, in a few years

    I am only supposed to implement things where I can publish an academic paper about

    Like

  5. Interesting read – thanks for documenting it. I’ll apply some of these techniques to my own C++ text tree class to reduce memory footprint, as I’ve been dismayed at the overhead (I just need to steal a few more bits for the node type info since mine represents JSON/XML/INI, but all the ideas are applicable).

    Have you considered using signed relative indices (int8) for next and previous? Since siblings are typically only a few nodes apart, this should reduce the average number of hash lookups (index 511) near the block boundaries. It would be worse for single block documents (if everything fits into 512 nodes, and siblings are >256 nodes apart), but it should be fewer lookups for large documents, because if you think about the largest accessible distances from any node in a block to its siblings, relative indices have less variance than bit-truncated absolute indices.

    Like

    1. Thanks. It may actually be a bit easier to implement in C++ because of the bitfield feature in the C language.

      Using a signed integer can be a good idea. I haven’t considered that. It is probably only beneficial for the Previous and Parent fields, since the other indices should “point” to nodes further down the block. You could try it out to see the effects on different types of documents. I would be interested in hearing your results.

      Liked by 1 person

  6. Thank you for this great article.

    Neslib.Hash has a MurmurHash2 function. I am somewhat perplexed that AData is passed as a const integer. Could you give me some heads up why so (and it works for both 32bit and 64bit program), rather than directly pass a PByte? What is the trick here?

    function MurmurHash2(const AData; ALen: Integer): Integer;

    Like

    1. AData is passed as an untyped const parameter. This is an old Delphi/Pascal language feature which basically means that AData is passed as a pointer behind the scenes. You will see the same pattern in TStream.Read/Write for example. The Object Pascal language guide may be better at explaining this…

      Like

      1. Ah! Thank you. Sorry I missed it is untyped const. My eyes cheaped me – I thought it was const AData: Integer, and got confused. Now it makes sense. Thanks again.

        Like

  7. Thank you for this great article.
    I’ve try to use neslib.xml to write a xml file which contain some same node text, but it failed,Is there a way to write a xml file like below:

    131232

    12234

    Like

  8. Thank you, I really appreciate your work.
    I program Delphi as a hobby.
    I used your library with a little change and addition, and look what great results I got: I was able to completely eliminate the use of record and structure declarations class object type, because of its complexity, rigidity, and memory consumption. I have used TxmlNode for structured variable storage. I can even store a 32-bit number in the 64-bit self of the TXmlAttribute without spending extra space on it.
    I also don’t need to care about how the data will be saved because just calling ToXML.
    I have used your xml library even for tasks that had nothing to do with xml.

    Like

Leave a comment