public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* Please backport 3c863ff920b4 ("drm/amdgpu: replace PASID IDR with XArray")
@ 2026-04-23  3:53 Thomas Sowell
  2026-04-23  4:59 ` Greg KH
  0 siblings, 1 reply; 2+ messages in thread
From: Thomas Sowell @ 2026-04-23  3:53 UTC (permalink / raw)
  To: stable

Hello,

Please consider backporting mainline commit 3c863ff920b4 ("drm/amdgpu: replace
PASID IDR with XArray") to 6.18.y and 7.0.y. It fixes a regression introduced
in 14b81abe7bdc ("drm/amdgpu: prevent immediate PASID reuse case").

Using the reproduction steps below I've confirmed that both 6.18 and 7.0 are
affected by the regression and that 3c863ff920b4 resolves it in both.

On my system I frequently see symptoms with sway and physlock. Locking the
screen with physlock and then unlocking it sometimes leaves sway unable to
display any output, recoverable only by killing sway. Sometimes unlocking
instead hangs the entire machine, requiring a hard reboot. In normal usage
these problems occur intermittently, but I also have a procedure (outlined
below) that reliably triggers lockups.

I've observed this hard lockup:

  watchdog: CPU8: Watchdog detected hard LOCKUP on cpu 8
  CPU: 8 UID: 0 PID: 24349 Comm: drmdevice
  Call Trace:
   <IRQ>
   _raw_spin_lock+0x29/0x30
   amdgpu_pasid_free+0x1a/0x80 [amdgpu]
   amdgpu_pasid_free_cb+0x19/0x60 [amdgpu]
   dma_fence_signal_timestamp_locked+0x8e/0x110
   dma_fence_signal+0x30/0x60
   drm_sched_job_done.isra.0+0x58/0x160 [gpu_sched]
   dma_fence_signal_timestamp_locked+0x8e/0x110
   dma_fence_signal+0x30/0x60
   amdgpu_fence_process+0xe1/0x160 [amdgpu]
   sdma_v5_2_process_trap_irq+0x8d/0x130 [amdgpu]
   amdgpu_irq_dispatch+0x176/0x240 [amdgpu]
   amdgpu_ih_process+0x66/0x190 [amdgpu]
   amdgpu_irq_handler+0x23/0x60 [amdgpu]
   __handle_irq_event_percpu+0x58/0x210
   handle_irq_event+0x3e/0x90
   handle_edge_irq+0xe3/0x1e0
   __common_interrupt+0x47/0xe0
   common_interrupt+0x82/0xa0
   </IRQ>
   <TASK>
   asm_common_interrupt+0x26/0x40
   idr_alloc_u32+0xb9/0x100
   idr_alloc_cyclic+0x55/0xc0
   amdgpu_pasid_alloc+0x44/0xb0 [amdgpu]
   amdgpu_driver_open_kms+0xc5/0x300 [amdgpu]
   drm_file_alloc+0x238/0x370
   drm_open_helper+0x8d/0x160
   drm_open+0x72/0x100

The following steps reproduce it reliably:

1. Start sway
2. Find out which core amdgpu IRQs fire on:
   awk 'NR==1 || /amdgpu/' /proc/interrupts
3. Run libdrm's drmdevice test program
   (https://cgit.freedesktop.org/drm/libdrm/tree/tests/drmdevice.c) in a tight
   loop on the same core:
   while true; do taskset -c $AMDGPU_CORE drmdevice; done
4. If that hasn't triggered it yet, lock and unlock screen with physlock

Thanks!
-- 
Thomas Sowell

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-23  4:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23  3:53 Please backport 3c863ff920b4 ("drm/amdgpu: replace PASID IDR with XArray") Thomas Sowell
2026-04-23  4:59 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox