"Tiles are not adjacent" exception

Hi there!

I’m getting some exceptions during navmesh generation: "Tiles are not adjacent (neither x or z coordinates match)" and "Tiles are not adjacent (tile coordinates must differ by exactly 1. Got '" + t1coord + "' and '" + t2coord + "')" in NavmeshBase -> ConnectTiles(...) method.

The values t1coord and t2coord are always miscalculated by one. For example, it’s TileA (TA): x1 = 30, z1 = 70; TileB (TB): x2 = 30, z2 = 70 (where x1 should be probably 29, because the tile before TA has values x = 28, z = 70).

It happens only on PlayStation 4 and PlayStation 5. The error isn’t deterministic - values passed in error are different every now and then for the same scene loaded. Also, it doesn’t happen with 100% reproduction. But when the error occurs, it might crash navmesh generation.

I’m using Unity 2020.3.27f1, A* version 4.3.47.

Recast navmesh is generated with astarPath.ScanAsync(...) method which then goes to ScanAllTilesBurst(...). I believe it’s important because when I went back to scanning without Unity Jobs system, the error disappeared (but loading time skyrocketed 10-20 times). Unity Jobs package used in project is 0.7.0-preview.17 (required by used A* version). I tried a build with the newest version of Jobs (0.50.0-preview.8) - the same error happened.

I’ve tried debugging the issue but due to parallel computations it’s quite hard to find any useful information. I believe the issue results in failure in ConnectTiles(...) method, but the core of the error is somewhere in a creation of the tiles. The code seems to be correct though.

Does anyone have any clues where should I find the solution for the problem?

Hmm, that’s very weird. I don’t understand how that would be possible. Possibly due to some burst compilation issue on PS4/5?
It might be interesting to log the coordinates of all tiles after the CreateTilesJob.

Hi Aron, thanks for your reply.

In the meantime, I’ve done more investigation and it seems that’s issue is probably connected with burst compilation, like you said, though I’m not 100% sure yet.

I’ve created more logs:

  1. first before instantiating NavmeshTile (line: var tile = new NavmeshTile {) in Pathfinding.RecastGraph.CreateTilesJon -> Execute()
  2. second at the end of the loop (after line: tiles[tileIndex] = tile;) in the same method

Logs tell more info about x, z values, tileIndex, graphTileIndex, tileRect min and max and the tile itself when it’s already created, in the second log.

The results are interesting. Bug case below (without spamming call stacks):

06:13:03	Creating navmesh tile part 111... x: 120, z: 74, tileIndex: 12626, graphTileIndex: 12626, tileRect.xMin: 0, tileRect.yMin: 0
06:13:03	Creating navmesh tile part 222... tileX: 120, tileZ: 74, tileIndex: 12626, graphTileIndex: 12626, tile: Pathfinding.NavmeshTile, tileHashCode: -1916856300
06:13:03	Creating navmesh tile part 111... x: 121, z: 74, tileIndex: 12627, graphTileIndex: 12627, tileRect.xMin: 0, tileRect.yMin: 0
06:13:03	Creating navmesh tile part 222... tileX: 121, tileZ: 74, tileIndex: 12627, graphTileIndex: 12627, tile: Pathfinding.NavmeshTile, tileHashCode: -1916856300

Tile indexes are correct, following one by one. x and z values are also correctly calculated. The cause of a bug seems to be in tileHashCode which is simple System.Object.GetHashCode() method. The result is the same for both tiles: -1916856300.

Hash code is not a perfect solution to make sure two objects are different so I made a memory snapshot (using HeapExplorer) to make sure both tiles are the same object. The result below (for different bugcase though, I hope it’s clear enough):
bugcase

Two tiles have the same address in memory. So it seems, somehow, the tileA is created with some values, and then the tileB is created in the same address in memory, set some values and both tiles are tileB now, which causes the bug.

I created a log just after the tile creation with the same result, so the bug happens at the time of creation of a NavmeshTile, not in some other methods between the creation and the end of the loop.

Here’s the craziest part. I used lock on a constructor of a NavmeshTile and the bug still exists. Mindblowing.

So I went a step further and extracted constructor of navmesh tiles from the CreateTilesJob, just like that:

for (int i = 0; i < tileCount; i++)
{
	tiles[i] = new NavmeshTile();
}

and I left only fields to be assigned in a CreateTilesJob for already created navmesh tiles.

We are testing this right now and the bug hasn’t happened yet. Before it was 25-50% repro on our first map. I hope it’s the correct solution but I’d love to understand what is really happening inside.

I don’t have any specific data for any performance changes of that solution, but it seems to be fine, without any visible loss.

2 Likes

That’s some great investigation! Crazy stuff!
Yeah, this really seems like a compilation bug for burst. I think Unity would be very interested if you sent them a bug report with your project.

Unfortunately, sending a project is not an option, but I’ll think of posting on siedev. Thanks :slight_smile: