I am currently using RVOControllers, which allows to control everything easily and automatically. But I noticed that there is also LightWeightRVO, which is not using RVOControllers nor GameObject’s. As I am working on bringing large number of units, I am curious what is the most important reason why LightWeightRVO is so many times faster? Is it due to object pooling or because of not using Transform ? P.S. I noticed that Transform works sometimes in a weird ways, e.g. if there are GOs without child’s and without components, I can use more than 10k on every single update. If there are mesh and especially shinned mesh renderers, these numbers going down to classical several hundreds, where FPS counts becomes unplayable.
I noticed that LightWeightRVO uses random streams, without pathfinding (objects moving along straight lines between point A and B) - could this be another reason of performance drain?
Finally, lets say if I chose to use LightWeightRVO case and put all my agents like in the example, do I need to worry about AIPath components, which would still be attached to each of GOs in the case if I want to move my agents along paths rather than in random streams?
Hi
Primarily it is the graphics cost and the cost of simply calling all the Update methods.
A draw call is not free, if you use many of them, things will be slow: http://docs.unity3d.com/Manual/DrawCallBatching.html
Just calling Update on a bunch of scripts can be slow due to the transition from C++ to C#: http://blogs.unity3d.com/2015/12/23/1k-update-calls/
Pathfinding is not used since it would take some time to calculate the initial 10000 paths.
Skinned meshes are relatively slow since they need to update the geometry every frame.
Hi,
Thanks for reply. In my case, I don’t use skinned meshes anymore, as I started to use DrawMesh and empty gameObjects. DrawMesh just draws mesh in the positions where empty gameObjects are being placed.
So the main performance eaters, which I am still confused are AIPath and RVOController. I looked at deep profiling and this is how it looks like:
So it seems at the moment that AIPath.CalculateVelocity() and RVOController self-update seems to be taking majority of performance. In CalculateVelocity() is a while loop (I think over the waypoints), do you think this one can be very significant? Can it be also drained by accessing Transform component vectors and quaternions many times?
Hi
Yeah, AIPath is not the fastest script. It does use a bunch of semi fancy calculations to make it move nicer. If you want a faster script I would suggest using the AILerp script, possibly combined with e.g the SimpleSmoothModifier.
Ok, so I managed to get quite good results by using similar approach like LightweightRVO example does. To do that I just used direction to the next waypoint from agents path to calculate the desired velocity for RVO. For smooth rotations I used Vector3 calculations with manually calculating angles rather than quaternions, what brings nice and smooth agent rotation towards the movement direction. As a result, I can easily handle up to 1000-2000 units.
I was curious to investigate RVOSimulator parameters for large number of agents (from 1000 to 10 000). I noticed that there is a very sharp drop in game FPS from 100 down to around 20, starting at 2k agents. However, it strongly depends on desiredSimulationFPS, i.e. I am getting this sharp drop with desiredSimulationFPS~30 for 2k agents but the drop appears with desiredSimulationFPS~15 only for 3k agents. Here are more detailed graphs of how it looks like:
I also got a very clear relation where this sharp drop occurs for desiredSimulationFPS as a function of number of agents.
So I was wondering where this sharp FPS drop could be coming from (maybe some neighbour tree building) and is there possible to do something about it?
P.S. I can also include my testing scripts if needed - just let me know in case.
Hi
Wow, you have done some nice analysis here.
If you are using multithreading together with double buffering, it may be when it takes longer than a whole time step to calculate all velocities in another thread. Before that, the rvo calculations should essentially not affect the game performance at all, but after that, I would expect the ms/frame to increase linearly with the number of agents.
You may want to profile the code inside the rvo simulator.
Also, for future analysis I would suggest that you measure the number of ms a frame took (deltaTime) instead of the fps as the fps has a not so nice inverse relationship to the work done per frame.
Thanks.
So I looked a bit more into details to compare when double buffering is enabled and disabled. When double buffering is enabled, I am able to run more agents with the same deltaTime (dt). It seems like sharp rise of dt is for larger desiredSimulationFPS and small number of agents as well - it becomes less sharp for lower desiredSimulationFPS.
Hi
What is the distribution of the agents?
Are they uniformly spread out?
I am a bit perplexed at the second graph. For a low desired simulation fps, the growth is far from linear (I would expect it to be at least linear, probably slightly higher). However it seems to be something like dt = n^0.16 if I have read the graph correctly… which seems odd. How are you measuring the delta time? Are you taking the minimum of a number of frames or are you averaging a number of frames?
You can adjust how far away it will look for agents by changing the agent.NeighbourDist and agent.MaxNeighbours fields. A lower NeighbourDist will make it ignore agents beyond a certain distance.
Building the lookup tree that is used should take O(n log n) time assuming that all agents are not standing on the same point in which case it will degenerate into a list which may have unpredictable performance.
It does indeed rise very sharply in the first graph. That is a bit suspicious…
Regardless, if you want to improve performance, I think it is better if you profile inside the RVO simulator to check what is actually taking up a lot of time.
Hi, I distributing them uniformly, but I am also forcing all agents to move towards the single point. That ensures that all agents are at maximum stress and none of them are moving in empty space. Here are 2 scripts - one for distribution and movement and another for FPS counts and writing analysis data to files:
https://dl.dropboxusercontent.com/u/248943005/RVOPerformance.zip
In these plots I just took my inverted FPS But it’s kind of average over several updates by using Time.time and number of updates. This averaging gives me smaller fluctuations.
Thanks for tips about NeighbourDist, MaxNeighbours and profiling inside RVO simulator, I will take a look there at some point.
Ah, so you are essentially taking the harmonic mean of the delta times (i.e inverting an average of fps values).
Unfortunately that is a pretty bad mean to use. For example, given 5 frames with delta times of 2ms, 2ms, 2ms, 2ms and 200ms. The harmonic mean will be:
5/(1/2ms + 1/2ms + 1/2ms + 1/2ms + 1/200ms) ≈ 2.5 ms
The harmonic mean will be highly biased toward the smaller values.
In contrast, the arithmetic mean will give you
(2ms + 2ms + 2ms + 2ms + 200ms)/5 ≈ 42 ms
So I have updated the last graphs to use arithmetic mean rather than harmonic mean based directly on time differences (dt). I also updated “RVOPerformance.zip” in my dropbox with recent scripts.
I didn’t checked RVO Simulator internal behaviour yet. On the other hand, I made “desired FPS” (DFPS) adjuster script which automatically changes desiredFPS variable in RVOSimulator every x seconds based on how many agents are set. The script is named “DFPSAdjuster” and is available in RVOPerformance.zip as well. It takes logarithmic slope and intercept as an input parameters to drive DFPS changes when number of agents in the scene changes over the time. I also used minimum and maximum allowed valued for DFPS to make sure that DFPS doesn’t go behind unwanted limits.
To conclude, I made DFPSAdjusterTest which slowly spawns new agents, lets adjuster to change DFPS and measures dt values for given number of agents. The test graph is here:
For this test I used -1.7 for slope and 6.9 for intercept. This gives very nice results for a very wide range of number of agents: when number of agents is low, they are used with higher DFPS to bring up better quality, while when numbers are large, DFPS drops in order to keep the game running at the similar dt. There is still some slope visible when dt rised from 9 ms to 14 ms in the interval between 1k and 10k agents. It might be that playing with other slope and intercept values could bring completely flat behaviour. On the other hand, I think that other things, which are not dependent on simulator can take place.
The most wanted feature would be to make desiredFPS available for public not as an integer but as a float, what could allow to set it’s values not just as a whole numbers, but as any floating value