I can think about two methods to code with C++ and MPI to simulate the N-Body problem using Barnes-Hut, but not sure which one is more efficient or if there is any other better methods.
1. First, one processor holds all the information of bodies for the entire system, builds a quadtree using that information, and then distributes subtrees to all other processors. Then each processor calculates the forces on bodies using the subtree it received. How would you distribute the quadtrees while keeping in mind load balancing issues? If the subtree that processor A receives is a subtree of processor B's subtree, what do you do so that forces between bodies aren't considered more than once? Is this an efficient method of parallelizing the algorithm?
2. Each processor receives a certain number of bodies, and builds its own quadtree using those bodies. Then, one after the other, each processor will send out its own bodies to all other processors, and have the processors calculate the forces on its own bodies. Afterwards, the information will be gathered back at the original processor that sent out the bodies. One main problem is that collisions are not detected if two bodies are on different processors. Is there any way to detect collisions without checking every body against every processor's bodies? Also, would this method yield correct results (forces/positions of the bodies)? This is my current implementation.
3. Any other better methods?