Alistair Popple writes: > I don't know the AMD driver well enough to comment definitively but > chances are this warning is spurious. I have been meaning to put > togeather a fix for it. The problem is that migrate_vma_setup() > etc. allow for migration of anonymous folios, which is subtly > different from only allowing migration of anonymous VMA's. > > Specifically migrate_vma checks for folio_test_anon() which returns > true for private file-backed VMAs while the warning is based on > vma_is_anonymous() which is false for such mappings. So it is possible > for the driver to migrate a private filebacked mapping to GPU memory > which will trigger this warning during teardown if the page wasn't > migrated back. Ah, if it is spurious, that is quite unfortunate. We were hoping it's the same issue as the one the rest of the email was describing (those hangs, unkillable processes, and bad page states), since that means we have a good reproducer for it. FWIW, that sounds like a plausible explanation; the program is using dynamic_cast, so typeinfo will need to be accessed. The typeinfo is mmap-ped from the executable, so it's file-backed. I don't see any reason for this page to be thrown out of the GPU later, so it stays mapped until exit, and causes the warning. The trigger for the latter is significantly harder to reproduce, and far less self-contained. So, I suppose we're left with a bug for which the reproducer "run more than nproc of parallel AMDGPU&HMM-utilizing processes in a loop and cross fingers". :/ Thank you very much for fixing the WARN_ON! Have a lovely day. -- Arsen Arsenović