The broccoli crate's goal is to provide broad-phase queries such as "find all elements that intersect". Its data structure is basically a kdtree with the added feature that elements that belong to a node are sorted along the divider axis so that we can use sweep and prune during the query phase. Lets compare the complexity of finding all intersecting pairs using four different methods: kdtree, sweep and prune, naive, and broccoli.

broccolinaivenosortsweepComplexity of space partitioning algs with abspiral(x,1.5)Number of ElementsNumber of Comparisons01e62.0e63.0e64.0e602000400060008000

As you can see, broccoli is a clear winner in terms of minimizing the number of comparisons. The jumps that you see in the kdtree line are the points at which the trees height grows. It is a complete binary tree so a slight increase in the height causes a doubling of nodes so it is a drastic change. As the number of aabbs increases it is inevitable that sometimes the tree will be too tall or too short. Like kdtree, broccoli is also a complete binary tree so it also have jumps, but they are less pronounced. I think this is because the second strategy of sweep and prune can "absorb" the jumps.

Lets make sure that broccoli is still the winner if the aabbs are more clumped up.

broccolinaivenosortsweepComplexity of space partitioning algs with abspiral(x,0.6)Number of ElementsNumber of Comparisons02e64e66e68e61e702000400060008000

Okay good broccoli is still the winner against its building blocks in terms of comparisons.

So thats great that we've found a strategy that minimizes the comparisons, but that doesn't really mean anything unless the real world performance is also just as fast. Lets look at how it benches against the same set of strategies.

broccbrocc_parnaivenosortnosort_parsweepsweep_parBench of space partitioning algs with abspiral(x,1.5)Number of ElementsTime in Seconds0e00.0050.01002000400060008000

Here we have also plotted the parallel versions that uses the rayon work-stealing crate under the hood. This is benched on my quad-core laptop. It is interesting to note that the real world bench times follow the same trend as the theoretical number of comparisons. Again, we can see that broccoli wins. Parallel broccoli beats parallel kdtree, and ditto for sequential versions. Lets make sure things don't change when elements are more clumped up.

broccbrocc_parnaivenosortnosort_parsweepsweep_parBench of space partitioning algs with abspiral(x,0.6)Number of ElementsTime in Seconds0e00.0020.0040.0060.0080.01002000400060008000

Okay good its still the same results. In fact you can see that the parallel kdtree algorithm is suffering a bit with the extra clumpiness. Earlier it was on par with sequential broccoli, but now it is definitively worse.

It's worth noting that the difference between sweep and prune/kdtree and naive is much bigger than the difference between sweep and prune/kdtree and broccoli. So using these simpler algorithms gets you big gains as it is. The gains you get from using broccoli are not as pronounced, but are noticeable with more elements.

In the same vein, you can see that there aren't many gains to use broccoli_par over broccoli. It can double/quadruple your performance, but as you can see those gains pale in comparison to the gains from simply using the a better sequential algorithm. Thats not to say multiplying your performance by the number of cores you have isn't great, it's just that it isn't as big a factor. This to me means that typically, allocating effort on investigating if your algorithm is optimal sequentially may be better than spending effort in parallelizing what you have.

Up until now, we have been looking at trends of how the algorithms preform as we increase the number of aabbs that it has to find intersections for. Lets instead look at trends of what happens when we change how clumped the aabbs are instead.

broccolinaivenosortsweepComplexity of space partitioning algs with abspiral(3000,grow)GrowNumber of Comparisons02e64e66e60.200.250.300.350.400.450.500.55

Okay broccoli is still the best. We didn't include naive because it dwarfed the other algorithms so that you couldn't see the differences between the non naive algorithms. Lets look at bench times:

broccbrocc_parnaivenosortnosort_parsweepsweep_parBench of space partitioning algs with abspiral(30_000,grow)GrowTime in Seconds0e00.050.100.150.200.250.200.250.300.350.400.450.500.55

Same trends. Again naive was excluded since it just dwarfed everything.

Extremely clumped up case

So far we've looked at "reasonable" levels of clumpiness. What happens in extremely clumped up cases?

broccolinaivenosortsweepComplexity of space partitioning algs with abspiral(3000,grow)GrowNumber of Comparisons05e61e71.5e70.020.040.060.080.100.120.140.160.18

The naive algorithm I used is not 100% naive. While it does check every possible pair, it first checks if a pair of aabb's collides in one dimension. If it does not collide in that dimension, it does not even check the next dimension. So because of this "short circuiting", there is a slight increase in comparisons when the aabbs are clumped up. If there were no short-circuiting, it would be flat all across.

Broccoli reduces to sweep and prune if it can't take advantage of its tree property, which it can't in extremely clumped up cases. However, it seemingly still does better than sweep and prune. I think this is related to the fact that sweep and prune sweeps across x, but still checks that the rects intersect along the y all the time. Broccoli however, doesn't check the y axis for bots if it knows they lie on the divider line. Bots that lie on an divider line are already guarenteed to intersect on the y axis.

Now lets look at the benches.

broccbrocc_parnaivenosortnosort_parsweepsweep_parBench of space partitioning algs with abspiral(10_000,grow)GrowTime in Seconds0e00.050.100.150.200.020.040.060.080.100.120.140.160.18

Above we benched for a smaller number of elements since it simply takes too long at this density to bench 30_000 elements like we did in the non-extremely-clumped-up case earlier. This graph looks extremely weird!

broccoli par reduces to broccoli sequential when the tree has size one, which is does in extremely clumped up cases.

Fairness

It's important to note that these comparisons aren't really fair. With broccoli, we are focused on optimising the finding colliding pairs portion, but these comparisons are comparing construct + one call to finding colliding pairs. However, we can't really show a graph of just the query portion, because other algorithms can't be easily split up into a construction and query part. Perhaps a better test would be to compare multiple query calls. So for each algorithm with one set of elements, find all the colliding pairs, then also find all the elements in a rectangle, then also find knearest, etc.

Sweep vs Kd-Tree vs Both

Sweep and prune is a nice and simple AABB collision finding system, but it degenerates as there are more and more "false-positives" (objects that intersect on one axis, but not both). Kd Trees have a great recursive way of pruning out elements, but non-point objects that can't be inserted into children are left in the parent node and those objects must be collision checked with everybody else naively. A better solution is to use both.

The basic idea is that you use a tree up until a specific tree height, and then switch to sweep and prune, and then additionally use sweep and prune for elements stuck in higher up tree nodes. The sweep and prune algorithm is a good candidate to use since it uses very little memory (just a stack that can be reused as you handle descendant nodes). But the real reason why it is good is the fact that the aabbs that belong to a non-leaf node in a kd tree are likely to be strewn across the divider in a line. Sweep and prune degenerates when the active list that it must maintain has many aabbs that end up not intersecting. This isn't likely to happen for the aabbs that belong to a node since the aabbs that belong to a node are guaranteed to touch the divider. If the divider partitions aabbs based off their x value, then the aabbs that belong to that node will all have x values that are roughly close together (they must intersect divider), but they y values can be vastly different (all the aabbs will be scattered up and down the dividing line). So when we do sweep and prune, it is important that we sweep and prune along axis that is different from the axis along which the divider is partitioning, otherwise it will degenerate to practically the naive algorithm.

KD tree vs Quad Tree

I think the main benefit of a quad tree is that tree construction is fast since we don't need to find the median at each level. They also have a interesting relationship with z order curves.

But that comes at a cost of a potentially not great partitioning of the physical elements. Our goal is to make the querying as fast as possible as this is the part that can vary and dominate very easily in dense/clumped up situations. The slow construction time of the kdtree is not ideal, but it is a very consistent load (doesn't vary from how clumped the elements are).

KD trees are also great in a multi-threaded setting. With a kd tree, you are guaranteed that for any parent, there are an equal number of objects if you recurse the left side and the right side since you specifically chose the divider to be the median.

This means that during the query phase, the work-load will be fairly equal on both sides. It might not be truly equal because even though for a given node, you can expect both the left and right sub-trees to have an equal number of elements, they most likely will be distributed differently within each sub-tree. For example the left sub-tree might have all of its elements stuck in just one node, while the right sub-tree has all of its elements in its leaf nodes. However, the size of each sub-tree is a somewhat good estimate of the size of the problem of querying it. So this is a good property for a divide and conquer multi-threaded style algorithm. With a quad tree, the load is not as likely to be even between the four children nodes.

Tree space partitioning vs grid

I wanted to make a collision system that could be used in the general case and did not need to be fine-tuned. Grid based collision systems suffer from the teapot-in-a-stadium problem. They also degenerate more rapidly as objects get more clumped up. If, however, you have a system where you have strict rules with how evenly distributed objects will be among the entire space you're checking collisions against, then I think a grid system can be better. But I think these systems are few and far in between. I think in most systems, for example, its perfectly possible for all the objects to exist entirely on one half of the space being collision checked leaving the other half empty. In such a case, half of the data structure of the grid system is not being used to partition anything. There are also difficulties in how to represent the data structure since every grid cell could have a variable number of aabbs in side of it. Having a Vec in each cell, for example, would hardly be efficient.

The way liquid fun does collisions by using grids in one dimension and sweep and prune in the other.

broccoli vs BVT

I'm not sure how broccoli stacks up against a bounding volume tree. This is something I would like to investigate in the future. It would be interesting to compare against bottom up and top down constructions of BVT seeing as KD Tree are top down by nature.