Hi Aron,
Many thanks to the algorithm which we are using for some years. Ours is a strategy game(not real time yet) like CoC. There are attacks and we have many units to attack an opponent in their battle field. In an attack, attacker drops unit one by one and AI logic drives them to opponent targets according unit behavior. We use AstarPathfinding to find various targets, attacking obstacles(for some), avoiding obstacles etc. We were in AstarPathfinding plugin v 4.x version since 2020 and recently after updating to Unity 2022 LTS we decided to upgrade AstarPathfinding via UPM to latest version and currently using 5.1.6. The problem we are facing now is the units are getting frozen due to Seeker not giving PathComplete callback. At the start we were facing may Seeker related argument exceptions in Claim and Release which we suppressed and if something happens like that we will trigger StartPath again. It used work but in live Seekers are getting unresponsive. We are using a Seeker pool and reusing but sometimes units can be more than Seeker instance in that case we will queue the StartPath request until a Seeker gets free from the pool. It used to work, but for some reason this is causing issue now with all Seekers in the pool gets unresponsive at some point. Our core logic is bound to this algorithm which used to work flawlessly. Anything we are missing. Using Seeker pooling rather than attaching to same unit has any problem? Sorry for the long story. Please advice.
Thank you very much.
I’d recommend not suppressing those. They most likely indicate that your code is doing something wrong.
One important change in version 5 is that the seeker.pathCallback
event is deprecated. You should instead pass a callback every time you call seeker.StartPath. Read more about it in the upgrade guide: Upgrade Guide - A* Pathfinding Project
Thank you for the quick reply.
We are infact using callback in StartPath only. After debugging i just found that I am getting following exception
InvalidOperationException: More receivers are blocked than specified in constructor (5 > 4) which is affecting all Seekers. This is not reproduceable on editor but reproduceable on a device (Mine is an Android octa-core processor device) But only detecting 4 processors. Subsequently all receivers are terminated and no more paths are calculated. Any idea why this is happening? Also we reverted suppressing exceptions in Claim and Release but now there seems no problem also. No exceptions are thrown from Claim or Release. This is the stack trace of invalid operation when we invoke Pathfinding.Seeker:StartPath(Vector3, Vector3, OnPathDelegate, GraphMask)
2024-09-09 15:18:57.084 11666 13845 Error Unity InvalidOperationException: More receivers are blocked than specified in constructor (5 > 4)2024-09-09 15:18:57.084 11666 13845 Error Unity at Pathfinding.ThreadControlQueue.Pop () [0x00000] in <00000000000000000000000000000000>:0
2024-09-09 15:18:57.084 11666 13845 Error Unity at Pathfinding.PathProcessor.CalculatePathsThreaded (Pathfinding.PathHandler pathHandler) [0x00000] in <00000000000000000000000000000000>:0
2024-09-09 15:18:57.084 11666 13845 Error Unity at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <00000000000000000000000000000000>:0
2024-09-09 15:18:57.084 11666 13845 Error Unity Pathfinding.PathProcessor:CalculatePathsThreaded(PathHandler)
2024-09-09 15:18:57.084 11666 13845 Error Unity System.Threading.ExecutionContext:RunInternal(ExecutionContext, ContextCallback, Object, Boolean)
Oh. That looks like a really serious bug. I have not seen anyone report that before.
Can you reliably replicate it?
Yes. Our customers are very angry with unit freeze, but I couldn’t reproduce it from 5.1.6. so I reverted to our previous version which is 4.2.15 where the same issue can be reproduced more frequently. But for some reasons using 5.1.6 in my test device, I couldn’t reproduce yet. The above stack trace is from 4.2.15 only but same case is reported by our live customers where we are using the latest v5.1.6. In v4.2.15 this can be reproduced frequently in my test device. We are using Unity 2022.3.22f1
Also Aron, any issue calling StartPath? As per the Upgrade Guide, it is said to replace Seeker.StartPath with ai.SetPath
Sorry My bad. There is a note above that documentation saying ai.SetPath is not needed if we are using custom movement which is true in our case. We are fetching the path only. Movement is done by our logic once PathComplete callback is received from Seeker.StartPath
Are you sure? That exception literally doesn’t exist in 5.1.6, so I would be surprised if people are getting it on that version…
Yes, You are right Pop has been replaced in 5.x with Receive. Since I can’t reproduce the same issue with 5.1.6 can’t argue that the issue is originating from same place, but from the game play, both seems very similar and I thought it could be the center. I am attaching 2 screenshots of side by side comparison of v4.2.15 and v5.1.6. In 4.2.15, my Seekers are getting blocked exactly same though, its ThreadQueueControl.Pop() vs BlockableChannel.Receive().
Do you have the stack trace or error message for 5.1.6?
Note that it is expected for pathfinding threads to be blocked in the Receive/Pop methods. Those methods will block until there are paths to process.
Unfortunately, I don’t have a v5.1.6 stacktrace since its not reproduced at my side yet. Will share if I manage to reproduce. But once blockedReceivers counter has reached the limit, then without reloading the game all Seeker path request will be ignored which is reason why our pathfinding logic is not working. Any recommended way to flush and reset the counter since we only have one scene and PathProcessor is instantiated in AstarPath which is a Component.
Our system basically has a Seeker prefab reference the scene and a Controller creates some instances on scene load. Right now we are creating only 30 instances even though there can be around 150 pathfinding characters in a scene. We are sequentially finding path using available instances.
There’s no safe way, except to destroy the AstarPath component and re-create it.
That this even happens indicates some pretty serious multi-threading race condition in 4.x.
I would recommend using 5.x since it should be more robust. And then see if you can manage to reproduce it. Maybe you can grab the log from your users?
ok Aron, We already have some users as our internal testers who were able to consistently reproduce the issue 5.1.6. Will try to debug with them. Thanks Aron for the support. Will update you, if I find some lead.
Alright. Let me know what you find!
Hi Aron,
Thank you once again for the support so far.
We’ve sent a build to our internal testing users who were facing the pathfinding unit stuck issue and found the issue actually originates from PathProcessor.CalculatePathsThreaded(PathHandler pathHandler, BlockableChannel.Receiver receiver)) method. Since the exception was handled and logged there itself in catch block, we can see the log as “Unhandled exception during pathfinding. Terminating.” followed by the log “Error : This part should never be reached.”. Unfortunately stack trace was not logged. We suspect it is something to do with threading race condition. Once this exception is triggered following AstarPath.StartPath(Path path, bool pushToFront = false, bool assumeInPlayMode = false) calls are returned as all path processor queue is closed one by one thereafter. The calls are returned at following condition in AstarPath.StartPath()…
if (astar.pathProcessor.queue.isClosed) {
path.FailWithError(“No new paths are accepted”);
return;
}
We are now trying to switch to single threaded for the time being, as this issue is causing us serious damage in our reputation. But we know multithreading is implemented for performance reasons and want the implementation to work for us as well. Any advice?
Thank you and Regards
Right before that, it should have logged the actual exception. Did you happen to catch that?
Yes I saw that code but unfortunately our server logs only logs thrown exceptions. Right now we have added a system to log any issues and send to server on demand. That way we found the exact spot of the issue. We are updating one build which will log stack trace and exception source of the exception which will reveal more information.
Thank you
1 Like
Hi Aron, We managed to fetch the stack trace which is as follows :-
Call: Pathfinding.PathProcessor.CalculatePathsThreaded
Log: Debug.LogError(“Unhandled exception during pathfinding. Terminating.”);
StackTrace:
at Pathfinding.Util.CircularBuffer1[T].PopStart () [0x00000] in <00000000000000000000000000000000>:0 at Pathfinding.BlockableChannel
1+Receiver[T].Receive (T& item) [0x00000] in <00000000000000000000000000000000>:0
at Pathfinding.PathProcessor.CalculatePathsThreaded (Pathfinding.PathHandler pathHandler, Pathfinding.BlockableChannel`1+Receiver[T] receiver) [0x00000] in <00000000000000000000000000000000>:0 at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <00000000000000000000000000000000>:0
I assume this was an InvalidOperationException
?
If so, I really do not understand how this could possibly happen.
All accesses to that queue happen inside a locked block, and it checks for if the queue is empty…
What build backend do you use? IL2CPP? Mono?
What .net version?
I assume this is only happening on Android devices?
Would it be possible for you to upgrade to a more recent version of Unity, just in case this is a unity bug for android devices?
I found this in the 2023.2 release notes, which looks HIGHLY relevant, assuming you are using IL2CPP: