Hi Aron,
I’m posting a consolidated report because we hit severe Android IL2CPP instability after upgrading from A*PP 4.1.x to 5.4.6, and after investigation we found a concrete package-level issue in SlabAllocator that appears to explain our crashes. I would be glad if you could review it, confirm and consider implementation.
Related forum threads (can’t paste links)
-
16115 — IndexOutOfRangeException (most related)
-
15442 — AssertionException / Null Reference in HierarchicalGraph (most related)
-
16466 — Index out of bounds on removing HierarchialNode (most related)
-
15908 — Native crash on Pathfinding.Util.SlabAllocator (most related)
-
15330 — Jobs Errors after update to 4.3.90 (most related)
-
17582 — IndexOutOfRangeException in Span.cs and NullReferenceException in HierarchicalGraph.cs
-
18335 — GUO causes Index Out Of Range Exception
Context
-
Upgrade: A* Pathfinding Project 4.1.x → 5.4.6
-
Unity: 2022.3.62f2
-
Platform: Android IL2CPP
-
Pattern: graph updated frequently during gameplay
The issue appeared only after the upgrade to the new hierarchical graph / SlabAllocator implementation.
Devices / platform pattern
The failures were concentrated mainly on OPPO devices in production.
Our main repro device was OPPO CPH2145, but the pattern was broader than a single device model.
We also saw reports from some OnePlus and Motorola devices.
On iOS, we have not seen matching crashes so far.
We do not have proof that this is architecture-specific, but based on the platform split it looks possible that the bug is more easily exposed on certain Android IL2CPP + ARM64 / vendor runtime / allocator combinations, while remaining hidden on iOS or other devices.
Symptoms
From Crashlytics and on-device repro, we saw combinations of:
-
IndexOutOfRangeExceptioninHierarchicalGraph.JobRecalculateComponents.RemoveHierarchicalNode -
IndexOutOfRangeExceptioninHierarchicalGraph.JobRecalculateComponents.FloodFill -
NullReferenceExceptioninNavmeshEdges.JobCalculateObstacles.CalculateObstacles -
Exception: SlabAllocator cannot allocate more than MaxAllocationSize elements
-
native SIGSEGV in
SlabAllocator.Allocate -
native SIGSEGV in
JobRecalculateComponents.FloodFill
We reproduced this through both:
-
AstarPath.Update() → PerformBlockingActions() → WorkItemProcessor.ProcessWorkItems(…)
-
AstarPath.Scan() during startup / level generation
Investigation summary
We investigated:
-
multithreading / races
-
stale UnsafeSpan after slab reallocation
-
lifecycle / shutdown ordering
-
cascade behavior after job exceptions
We did find one valid stale-span fix in RemoveHierarchicalNode.
Part 1 — refresh conns after allocator mutation in RemoveHierarchicalNode
When removing the current node from a neighbor’s connection list:
otherConnections.Remove(hierarchicalNode); connectionAllocations[adjacentHierarchicalNode] = otherConnections.allocationIndex;
the underlying slab allocation may be reallocated, so conns for the current node must be refreshed afterward:
conns = hGraph.connectionAllocator.GetSpan(connAllocation);
This was a correct fix for that specific stale-span path, but it did not stop the broader Android crash/exception pattern.
Likely root cause we found
The strongest concrete bug we found is in:
Packages/com.arongranberg.astar/Core/Collections/SlabAllocator.cs
Specifically in SlabAllocator.List.RemoveAt.
Call path:
RemoveHierarchicalNode → otherConnections.Remove(hierarchicalNode) → SlabListExtensions.Remove → List.RemoveAt
Before:
public void RemoveAt (int index) {
span.Slice(index + 1).CopyTo(span.Slice(index, span.Length - index - 1));
allocator.Realloc(ref allocationIndex, span.Length - 1);
span = allocator.GetSpan(allocationIndex);
}
UnsafeSpan.CopyTo explicitly assumes source and destination do not alias, and internally uses UnsafeUtility.MemCpy.
But here, source and destination are overlapping parts of the same span.
After:
public void RemoveAt (int index) {
span.Move(index + 1, index, span.Length - index - 1);
allocator.Realloc(ref allocationIndex, span.Length - 1);
span = allocator.GetSpan(allocationIndex);
}
span.Move uses overlap-safe MemMove.
There is also already an existing helper in Span.cs with the same idea:
UnsafeSpan<T>.RemoveAt(ref span, index);
We used span.Move(…) directly to keep the patch minimal.
Why this matches the observed failures
If RemoveAt corrupts a hierarchical node’s connection list while removing back-edges, that can lead to:
-
connections not actually being removed
-
duplicate connections when the graph is rebuilt
-
garbage or out-of-range values being treated as hierarchical node ids
-
invalid lengths / broken spans
-
later failures in RemoveHierarchicalNode, FloodFill, and NavmeshEdges
-
eventually native crashes when allocator/span state becomes invalid
That matches the pattern we were seeing on affected Android devices.
Status
With:
-
the stale-span refresh fix in RemoveHierarchicalNode
-
the overlap-safe RemoveAt fix in SlabAllocator
our testing on the main repro device (OPPO CPH2145) no longer shows the previous crash/exception pattern.
The RemoveAt change is the main candidate for an upstream fix.
This has not been shipped to production yet, so I can’t claim final confirmation, but it looks promising enough that I wanted to report it here.
Happy to follow up if you want more detail.
Thanks.