Leaves as unique type, vs single node type.
The leaf elements of the tree data structure don't need as much information as their parent nodes. They don't have dividers. So half of the nodes of the data structure have a field that stores no information. It can be tempting to give the leaf nodes a separate type to improve memory usage, but the divider size is small, and the number of nodes in general is relatively small compared to the number of aabbs. The leaves would also have to live in their own container (or you'd have to deal with alignment problems), which would lead to more complexity while iterating down the tree.
Tree structure data separate from elements in memory, vs intertwined
There is a certain appeal to storing the tree elements and tree data intertwined in the same piece of contiguous memory. Cache coherency might be improved since there is very little chance that the tree data, and the elements that belong to that node are not in the same cache block. But there is added complexity. Because the elements that are inserted into the tree are generic, the alignment of the objects may not match that of the tree data object. This would lead to wasted padded space that must be inserted into the tree to accommodate this. Additionally, while this data structure could be faster at querying, it would be slower if the user just wants to iterate through all the elements in the tree. The cache misses don't happen that often since the chance of a cache miss only happens on a per node basis, instead of per element. Moreover, I think the tree data structure itself is very small and has a good chance of being completely in a cache block.
Dfs in-order vs Dfs pre-order vs Bfs order.
The main primitive that they use is accessing the two children nodes from a parent node. Fundamentally, if I have a node, I would hope that the two children nodes are somewhat nearby to where the node I am currently at is. More importantly, the further down the tree I go, I would hope this is more and more so the case (as I have to do it for more and more nodes). For example, the closeness of the children nodes to the root isn't that important since there is only one of those. On the other hand, the closeness of the children nodes for the nodes at the 5th level of the tree are more important since that are 32 of them.
So how can we layout the nodes in memory to achieve this? Well, putting them in memory in breadth first order doesn't cut it. This achieves the exact opposite. For example, the children of the root are literally right next to it. On the other hand the children of the most left node on the 5th level only show up after you look over all the other nodes at the 5th level. Depth first search gives us the properties that we want. With this ordering, all the parents of leaf nodes are literally right next to them in memory. The children of the root node are potentially extremely far apart, but that is okay since there is only one of them.
Dfs in order uses more memory during construction from during recursion, but it gives you the nice property of the following: If a node is to the left of another node in space, then that node is to the left of that node in memory. Not sure when this property would every be useful, however.
Dfs pre-order and post-order might be more cache friendly. With these layouts, there is just one memory segment that shrinks in size as you recurse. This gives you a nice property you don't have with in-order. Given a node, you can represent all the aabbs underneath it as one contiguous slice. This property is relied on in the nbody impl. broccoli uses dfs pre-order.