Stuck in JobDependencyTracker (v4.3.13)

Hi,

just had another nasty bug. Our game sometimes (not very often) stalls after generating the map with the new jobified version of A*. It seems to get stuck at JobDependencyTracker.cs, line 536 (doneEvent.WaitOne();).

It is really rare, but a game breaking issue. Any idea what it could be?

Best
Bennet

Hi

Hmmm… That’s annoying.
Do you happen to know what machine this happens on? I’m particularly interested in the core count.
Can you see in the log if there have been any exceptions thrown earlier?

Hi, it happened on an i7 with 8 cores and on an older AMD processor with 6 cores. No exceptions before.

Hmm… I’m not sure what could be causing that.
I’ve analyzed that particular part of the code and I cannot see how it would block there unless either

  1. A burst job on another thread crashed/got stuck in an infinite loop
  2. There is a bug in Unity’s job scheduling system that causes the wrong job to be scheduled.

Just have to push this issue up, because unfortunately we still seem to have it (but even rarer than in February) - it is not reproducible, just playing the game 1000 times and eventually the game will freeze and when debugging and pausing it will show that it is stuck in the JobDependencyTracker on WaitOne().

Further hints: Most recent 4.3.26, Unity 2019.3.14f. Happens on different PCs (and in Editor) and on Nintendo Switch.

Any further tips on how to investigate this would be appreciate.

I just found the following (which always worked before the jobbified A*) - but could this somehow conflict now, since it is not done via work item?

        var g = AstarPath.active.graphs[0];
        foreach (var n in (g as GridGraph).nodes)
        {
            if (...) n.Walkable = false;

            if (stopwatch.ElapsedMilliseconds > maxDurationPerFrame)
            {
                yield return null;
                stopwatch.Restart();
            }
        }

We are still having this issue, even with wrapping the WalkableModifaction code (see post above) in an Work Item.

Is it maybe because we use AstarPath.UpdateGraphs(bounds) at several places?

Hi

Hmm… I really don’t know why this is still happening.
It seems I might have to redesign the grid graph scanning code to not use that code path…

I just got this as well. Stuck on JobDependencyTracker doneEvent.WaitOne();

astarpathfindingproject_master_pro_dev_4_3_34_f1547300

Unity 2019.4.12f1

AMD Ryzen Threadripper 2970WX 24-Core Processor, 3000 Mhz 24

Edit got it 3x in a row, so possibly a consistent repro case here. The problem is my game is > 100 GB so I don’t have an easy way to send it to you or I would. If you message me I would be willing to screen share if you want to debug it remotely.

Can confirm it’s 100% repro and seems to be the same every run, after 4 times I continue it freezes the editor. These screenshots took me a while to put together so I hope it helps. If you want to remote debug it will need to be within the next day or two as the code moves fast.








Hey Adam, usually you respond quickly so I wanted to ask what your plans were for this bug? A hard freeze with 100% repro on one of our missions is a show-stopper for us. We can’t enter early access and have no choice but to roll back to the non-beta version. But we like the performance improvements, especially RVO, plus it’s a significant time investment to do that. If you intend to fix this we can wait a while but if it’s not in your plans we’ll just roll back now.

Hmmm… It seems like I have to do a larger refactoring of the grid generator code. It’s sad because it makes the code more complex, but it seems it’s impossible to coerce the unity job system to do what I want.

If you need this fix very quickly I think you’ll have to downgrade. This is definitely something I should fix relatively soon though.

Thanks for the response. Do you have an idea of what relatively soon would be? We’re planning early access between 2 to 6 weeks, depending on how favorable our mock review turns out.

Hey

I tried tackling this and it turned out to be slightly easier to refactor than I originally though.
I think I have fixed the issue now. I’ll probably upload a new version within a few days.

I have uploaded 4.3.36 now. Let me know if it works.

Great! I appreciate you getting on this, as I wasn’t looking forward to the effort of reverting and also losing performance while doing so.

NavmeshCut is missing two changes I made that are in my opinion necessary for the system to work at all.

   // KevinJ:
    [RuntimeInitializeOnLoadMethod(RuntimeInitializeLoadType.SubsystemRegistration)]
    static public void SubsystemRegistrationInit()
    {
        all = new List<NavmeshClipper>();
    }

  public static void AddEnableCallback (System.Action<NavmeshClipper> onEnable,  System.Action<NavmeshClipper> onDisable) {

        // KevinJ: Prevent double add HACK
        // https://forum.arongranberg.com/t/argumentexception-an-item-with-the-same-key-has-already-been-added/8976/8
        OnEnableCallback = null;
        OnDisableCallback = null;

        OnEnableCallback += onEnable;
		OnDisableCallback += onDisable;
	}

Unfortunately the editor still freezes up. It seems to be later in the process than before, and something different.

Relevant log:
EditorFreeze.txt (882.2 KB)

Threads while frozen

Since it’s later in the process I might be able to send you a repro case if you don’t know what the issue is.

Hi

I cannot replicate that :frowning:
It seems to happen because the GridGraph.nodeSurfaceNormals array is not allocated. But I cannot see any cases where it shouldn’t be (assuming the graph has been scanned/loaded from a file).

I’m not sure why you would need the first change, care to elaborate?

Regarding the second change: as I also said in the other thread, for that to happen you need to have multiple AstarPath components at the same time, which is very much not supported as it breaks the AstarPath.active singleton. I’m not sure how you manage to have multiple ones working at all.