[Bug 221338] New: AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx ring timeout, MODE1 reset, and VRAM loss (SMU driver/firmware mismatch)

public inbox for dri-devel@lists.freedesktop.org
 help / color / mirror / Atom feed

* [Bug 221338] New: AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx ring timeout, MODE1 reset, and VRAM loss (SMU driver/firmware mismatch)
@ 2026-04-09 16:02 bugzilla-daemon
  2026-04-10  7:27 ` [Bug 221338] " bugzilla-daemon
  2026-04-10 12:45 ` bugzilla-daemon
  0 siblings, 2 replies; 3+ messages in thread
From: bugzilla-daemon @ 2026-04-09 16:02 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=221338

            Bug ID: 221338
           Summary: AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx
                    ring timeout, MODE1 reset, and VRAM loss (SMU
                    driver/firmware mismatch)
           Product: Drivers
           Version: 2.5
          Hardware: AMD
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: tona_kosmicznego_smiecia@interia.pl
        Regression: No

disclaimer: AI helped me to investigate the issue and put together the below
bug report, I'm no hardware prodigy, but hope that the below info is accurate.

The OS I'm using is Guix System with the nonguix channel that provides
proprietary firmware and drivers. I reported the issue there, but they told me
it is a kernel issue so here I am.

### Problem

On an AMD Radeon RX 6600 PowerColor Fighter, Vulkan workloads consistently
trigger a GPU hang followed by a MODE1 reset. This occurs reliably in Blender’s
Vulkan backend (installed through flatpak) and in Vulkan‑based games on Steam
(DXVK/VKD3D). After the reset, VRAM is lost and the session becomes unstable.

The kernel logs show a persistent **SMU driver/firmware interface mismatch**,
even on newer kernels. Under heavy Vulkan load, this mismatch leads to a **gfx
ring timeout** and a full GPU reset.

**Hardware:**

* AMD Radeon RX 6600 (PCI ID 1002:73FF)
* VBIOS: `113-D5340100_100`
* VRAM: 8 GB

**Kernel versions tested:**

* 6.18.20 (nonguix build)
* 6.12.79 (nonguix LTS)\
  Both exhibit identical failures.

**Firmware:**

* `linux-firmware` version **20260309** (amdgpu firmware version 59.50.0)

**Mesa:**

* Mesa 26.0.2 (RADV)

**Key dmesg excerpts:**

```
amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013
amdgpu: SMU driver if version not matched

amdgpu: ring gfx_0.0.0 timeout, signaled seq=..., emitted seq=... amdgpu: GPU
reset begin! amdgpu: MODE1 reset [drm] VRAM is lost due to GPU reset!
```

**Reproduction:**

1. Start Blender 5.1 with the Vulkan backend enabled
2. Interact with the viewport (add a cube or something) until it crashes
(usually 1-2 minutes)
3. GPU hangs → gfx ring timeout → MODE1 reset → VRAM lost

Same behavior occurs in Vulkan games via DXVK/VKD3D.

**Proposed cause:**\
The SMU mismatch appears to be the root cause: firmware 59.50.0 exposes a newer
SMU interface than the amdgpu driver in 6.12/6.18 expects. Vulkan workloads
reliably trigger the fault path.

TL;DR firmware does not match Linux version.

Full log:

```
$ sudo dmesg -w | grep -iE 'amdgpu|gpu|ring|fault'
Password: 
[    0.032578] pid_max: default: 32768 minimum: 301
[    0.155176] smp: Bringing up secondary CPUs ...
[    0.200260] ACPI: PM: Registering ACPI NVS region [mem
0x0a200000-0x0a20afff] (45056 bytes)
[    0.200260] ACPI: PM: Registering ACPI NVS region [mem
0xdbe77000-0xdc3cefff] (5603328 bytes)
[    0.288424] iommu: Default domain type: Translated
[    0.288424] NetLabel:  unlabeled traffic allowed by default
[    0.323823] PCI: CLS 64 bytes, default 64
[    0.326715] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    0.330993] Initialise system trusted keyrings
[    0.356756] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    0.509760] nvme nvme0: 16/0/0 default/read/poll queues
[    0.553029] usb usb1: New USB device strings: Mfr=3, Product=2,
SerialNumber=1
[    0.553921] usb usb2: New USB device strings: Mfr=3, Product=2,
SerialNumber=1
[    0.554932] usb usb3: New USB device strings: Mfr=3, Product=2,
SerialNumber=1
[    0.555721] usb usb4: New USB device strings: Mfr=3, Product=2,
SerialNumber=1
[    0.761212] init[1]: segfault at 3fff00 ip 00000000004d5593 sp
00007ffd93ceeeb0 error 4 in guile[d5593,401000+202000] likely on CPU 1 (core 1,
socket 0)
[    0.945861] usb 3-1: New USB device strings: Mfr=1, Product=2,
SerialNumber=3
[    1.121123] usb 1-9: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[   27.899089] shepherd[1]: Registering new logger for udev.
[   30.815854] amdgpu: Virtual CRAT table created for CPU
[   30.815878] amdgpu: Topology: Add CPU node
[   30.820511] amdgpu 0000:28:00.0: No more image in the PCI ROM
[   30.820529] amdgpu 0000:28:00.0: amdgpu: Fetched VBIOS from ROM BAR
[   30.820532] amdgpu: ATOM BIOS: 113-D5340100_100
[   30.853724] amdgpu 0000:28:00.0: vgaarb: deactivate vga console
[   30.853728] amdgpu 0000:28:00.0: amdgpu: Trusted Memory Zone (TMZ) feature
disabled as experimental (default)
[   30.853787] amdgpu 0000:28:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 -
0x00000081FEFFFFFF (8176M used)
[   30.853790] amdgpu 0000:28:00.0: amdgpu: GART: 512M 0x0000000000000000 -
0x000000001FFFFFFF
[   30.854013] [drm] amdgpu: 8176M of VRAM memory ready
[   30.854016] [drm] amdgpu: 11990M of GTT memory ready.
[   30.854037] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   34.489983] amdgpu 0000:28:00.0: amdgpu: STB initialized to 2048 entries
[   34.559736] amdgpu 0000:28:00.0: amdgpu: reserve 0xa00000 from 0x81fd000000
for PSP TMR
[   34.662938] amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[   34.680515] amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta
ucode is not available
[   34.680542] amdgpu 0000:28:00.0: amdgpu: smu driver if version = 0x0000000f,
smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b3200
(59.50.0)
[   34.680552] amdgpu 0000:28:00.0: amdgpu: SMU driver if version not matched
[   34.680588] amdgpu 0000:28:00.0: amdgpu: use vbios provided pptable
[   34.731186] amdgpu 0000:28:00.0: amdgpu: SMU is initialized successfully!
[   34.738403] snd_hda_intel 0000:28:00.1: bound 0000:28:00.0 (ops
amdgpu_dm_audio_component_bind_ops [amdgpu])
[   34.786561] [drm] kiq ring mec 2 pipe 1 q 0
[   34.796035] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[   34.796057] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[   34.796288] amdgpu: Virtual CRAT table created for GPU
[   34.796834] amdgpu: Topology: Add dGPU node [0x73ff:0x1002]
[   34.796836] kfd kfd: amdgpu: added device 1002:73ff
[   34.796858] amdgpu 0000:28:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8,
active_cu_number 28
[   34.796863] amdgpu 0000:28:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[   34.796866] amdgpu 0000:28:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on
hub 0
[   34.796869] amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4
on hub 0
[   34.796871] amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5
on hub 0
[   34.796874] amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6
on hub 0
[   34.796876] amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7
on hub 0
[   34.796878] amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8
on hub 0
[   34.796880] amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9
on hub 0
[   34.796883] amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10
on hub 0
[   34.796885] amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11
on hub 0
[   34.796887] amdgpu 0000:28:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12
on hub 0
[   34.796890] amdgpu 0000:28:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on
hub 0
[   34.796892] amdgpu 0000:28:00.0: amdgpu: ring sdma1 uses VM inv eng 14 on
hub 0
[   34.796894] amdgpu 0000:28:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on
hub 8
[   34.796897] amdgpu 0000:28:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
on hub 8
[   34.796899] amdgpu 0000:28:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
on hub 8
[   34.796901] amdgpu 0000:28:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on
hub 8
[   34.798313] amdgpu 0000:28:00.0: amdgpu: Using BACO for runtime pm
[   34.799078] [drm] Initialized amdgpu 3.61.0 for 0000:28:00.0 on minor 0
[   34.805500] fbcon: amdgpudrmfb (fb0) is primary device
[   34.860088] amdgpu 0000:28:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   45.871018] bridge: filtering via arp/ip/ip6tables is no longer available by
default. Update your scripts to load br_netfilter if you need this.
[ 6230.380965] amdgpu 0000:28:00.0: amdgpu: Dumping IP State
[ 6230.382903] amdgpu 0000:28:00.0: amdgpu: Dumping IP State Completed
[ 6230.393001] amdgpu 0000:28:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled
seq=556035, emitted seq=556037
[ 6230.393014] amdgpu 0000:28:00.0: amdgpu: Process information: process
blender pid 4958 thread blender pid 4985
[ 6230.654347] amdgpu 0000:28:00.0: amdgpu: GPU reset begin!
[ 6230.876312] amdgpu 0000:28:00.0: amdgpu: MODE1 reset
[ 6230.876317] amdgpu 0000:28:00.0: amdgpu: GPU mode1 reset
[ 6230.876381] amdgpu 0000:28:00.0: amdgpu: GPU smu mode1 reset
[ 6231.401925] amdgpu 0000:28:00.0: amdgpu: GPU reset succeeded, trying to
resume
[ 6231.402230] [drm] VRAM is lost due to GPU reset!
[ 6231.402235] amdgpu 0000:28:00.0: amdgpu: PSP is resuming...
[ 6231.485428] amdgpu 0000:28:00.0: amdgpu: reserve 0xa00000 from 0x81fd000000
for PSP TMR
[ 6231.589963] amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[ 6231.607384] amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta
ucode is not available
[ 6231.607393] amdgpu 0000:28:00.0: amdgpu: SMU is resuming...
[ 6231.607401] amdgpu 0000:28:00.0: amdgpu: smu driver if version = 0x0000000f,
smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b3200
(59.50.0)
[ 6231.607411] amdgpu 0000:28:00.0: amdgpu: SMU driver if version not matched
[ 6231.607448] amdgpu 0000:28:00.0: amdgpu: use vbios provided pptable
[ 6231.660858] amdgpu 0000:28:00.0: amdgpu: SMU is resumed successfully!
[ 6231.661599] [drm] kiq ring mec 2 pipe 1 q 0
[ 6231.726662] amdgpu 0000:28:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[ 6231.726667] amdgpu 0000:28:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on
hub 0
[ 6231.726669] amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4
on hub 0
[ 6231.726672] amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5
on hub 0
[ 6231.726674] amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6
on hub 0
[ 6231.726677] amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7
on hub 0
[ 6231.726679] amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8
on hub 0
[ 6231.726682] amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9
on hub 0
[ 6231.726684] amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10
on hub 0
[ 6231.726686] amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11
on hub 0
[ 6231.726688] amdgpu 0000:28:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12
on hub 0
[ 6231.726691] amdgpu 0000:28:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on
hub 0
[ 6231.726693] amdgpu 0000:28:00.0: amdgpu: ring sdma1 uses VM inv eng 14 on
hub 0
[ 6231.726695] amdgpu 0000:28:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on
hub 8
[ 6231.726698] amdgpu 0000:28:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
on hub 8
[ 6231.726700] amdgpu 0000:28:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
on hub 8
[ 6231.726702] amdgpu 0000:28:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on
hub 8
[ 6231.730096] amdgpu 0000:28:00.0: amdgpu: GPU reset(1) succeeded!
[ 6231.770210] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
```

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug 221338] AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx ring timeout, MODE1 reset, and VRAM loss (SMU driver/firmware mismatch)
  2026-04-09 16:02 [Bug 221338] New: AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx ring timeout, MODE1 reset, and VRAM loss (SMU driver/firmware mismatch) bugzilla-daemon
@ 2026-04-10  7:27 ` bugzilla-daemon
  2026-04-10 12:45 ` bugzilla-daemon
  1 sibling, 0 replies; 3+ messages in thread
From: bugzilla-daemon @ 2026-04-10  7:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=221338

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |ANSWERED

--- Comment #1 from Artem S. Tashkinov (aros@gmx.com) ---
Report here instead:

https://gitlab.freedesktop.org/drm/amd/-/issues

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug 221338] AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx ring timeout, MODE1 reset, and VRAM loss (SMU driver/firmware mismatch)
  2026-04-09 16:02 [Bug 221338] New: AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx ring timeout, MODE1 reset, and VRAM loss (SMU driver/firmware mismatch) bugzilla-daemon
  2026-04-10  7:27 ` [Bug 221338] " bugzilla-daemon
@ 2026-04-10 12:45 ` bugzilla-daemon
  1 sibling, 0 replies; 3+ messages in thread
From: bugzilla-daemon @ 2026-04-10 12:45 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=221338

--- Comment #2 from tona_kosmicznego_smiecia@interia.pl ---
Will do, thanks.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-10 12:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 16:02 [Bug 221338] New: AMDGPU: RDNA2 (RX 6600) Vulkan workload causes gfx ring timeout, MODE1 reset, and VRAM loss (SMU driver/firmware mismatch) bugzilla-daemon
2026-04-10  7:27 ` [Bug 221338] " bugzilla-daemon
2026-04-10 12:45 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox