* kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read
@ 2026-01-13 1:01 Marek Marczykowski-Górecki
2026-01-13 2:47 ` Mario Limonciello
2026-01-30 17:01 ` Yazen Ghannam
0 siblings, 2 replies; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-01-13 1:01 UTC (permalink / raw)
To: Mario Limonciello, Yazen Ghannam
Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
open list:AMD NODE DRIVER, regressions
[-- Attachment #1: Type: text/plain, Size: 6300 bytes --]
Hi,
I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU
with AMD Raphael/Granite Ridge USB controller passed through.
It worked correctly in 6.12.59. Between those versions, I don't see any
relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function,
but the AMD node driver did got some changes, so my guess is one of them
is to blame. I know the good-bad range is huge, but there aren't that
many changes to the AMD node driver in this range.
It's running on Qubes OS 4.3, which uses Xen 4.19, and does PCI
passthrough of USB controllers to a dedicated VM (HVM).
The full crash message is:
[ 0.302571] pci 0000:00:08.0: quirk_usb_early_handoff+0x0/0x180 took 16590 usecs
[ 0.303172] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 0.303189] #PF: supervisor read access in kernel mode
[ 0.303202] #PF: error_code(0x0000) - not-present page
[ 0.303216] PGD 0 P4D 0
[ 0.303225] Oops: Oops: 0000 [#1] SMP NOPTI
[ 0.303236] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.9-1.qubes.fc41.x86_64 #1 PREEMPT(full)
[ 0.303258] Hardware name: Xen HVM domU, BIOS 4.19.3 08/26/2025
[ 0.303273] RIP: 0010:__amd_smn_rw+0x30/0x100
[ 0.303288] Code: 05 bd 44 b8 01 66 0f af 05 2d 44 b8 01 41 57 41 56 41 55 41 54 55 53 66 39 c2 0f 83 c0 00 00 00 48 8b 05 c3 61 d7 02 0f b7 d2 <4c> 8b 34 d0 4d 85 f6 0f 84 a9 00 00 00 80 3d a4 61 d7 02 00 0f 84
[ 0.303327] RSP: 0018:ffffcdd30001fd68 EFLAGS: 00010297
[ 0.303341] RAX: 0000000000000000 RBX: ffffcdd30001fdb4 RCX: 0000000010136008
[ 0.303359] RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000060
[ 0.303377] RBP: ffffffffa684bb80 R08: ffffcdd30001fdb4 R09: 0000000000000000
[ 0.303395] R10: ffffffffa7567420 R11: 0000000000000020 R12: ffff8dd081dff000
[ 0.303413] R13: ffffffffa736ab60 R14: 00000000055ee14a R15: ffff8dd081dff000
[ 0.303434] FS: 0000000000000000(0000) GS:ffff8dd0e87c1000(0000) knlGS:0000000000000000
[ 0.303452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.303468] CR2: 0000000000000000 CR3: 000000000c62c000 CR4: 0000000000750ef0
[ 0.303487] PKRU: 55555554
[ 0.303495] Call Trace:
[ 0.303504] <TASK>
[ 0.303513] ? __pfx_quirk_clear_strap_no_soft_reset_dev2_f0+0x10/0x10
[ 0.304112] amd_smn_read+0x27/0x50
[ 0.304112] quirk_clear_strap_no_soft_reset_dev2_f0+0x37/0x80
[ 0.304112] pci_fixup_device+0xf6/0x1b0
[ 0.304112] pci_apply_final_quirks+0xe9/0x280
[ 0.304112] ? __pfx_pci_apply_final_quirks+0x10/0x10
[ 0.304112] do_one_initcall+0x57/0x310
[ 0.304112] do_initcalls+0x1ef/0x240
[ 0.304112] kernel_init_freeable+0x187/0x210
[ 0.304112] ? __pfx_kernel_init+0x10/0x10
[ 0.304112] kernel_init+0x1a/0x140
[ 0.304112] ret_from_fork+0xf2/0x110
[ 0.304112] ? __pfx_kernel_init+0x10/0x10
[ 0.304112] ret_from_fork_asm+0x1a/0x30
[ 0.304112] </TASK>
[ 0.304112] Modules linked in:
[ 0.304112] CR2: 0000000000000000
[ 0.304112] ---[ end trace 0000000000000000 ]---
[ 0.304112] RIP: 0010:__amd_smn_rw+0x30/0x100
[ 0.304112] Code: 05 bd 44 b8 01 66 0f af 05 2d 44 b8 01 41 57 41 56 41 55 41 54 55 53 66 39 c2 0f 83 c0 00 00 00 48 8b 05 c3 61 d7 02 0f b7 d2 <4c> 8b 34 d0 4d 85 f6 0f 84 a9 00 00 00 80 3d a4 61 d7 02 00 0f 84
[ 0.304112] RSP: 0018:ffffcdd30001fd68 EFLAGS: 00010297
[ 0.304112] RAX: 0000000000000000 RBX: ffffcdd30001fdb4 RCX: 0000000010136008
[ 0.304112] RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000060
[ 0.304112] RBP: ffffffffa684bb80 R08: ffffcdd30001fdb4 R09: 0000000000000000
[ 0.304112] R10: ffffffffa7567420 R11: 0000000000000020 R12: ffff8dd081dff000
[ 0.304112] R13: ffffffffa736ab60 R14: 00000000055ee14a R15: ffff8dd081dff000
[ 0.304112] FS: 0000000000000000(0000) GS:ffff8dd0e87c1000(0000) knlGS:0000000000000000
[ 0.304112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.304112] CR2: 0000000000000000 CR3: 000000000c62c000 CR4: 0000000000750ef0
[ 0.304112] PKRU: 55555554
[ 0.304112] Kernel panic - not syncing: Fatal exception
The device, as seen from within the VM:
00:09.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] (prog-if 30 [XHCI])
Subsystem: ASUSTeK Computer Inc. Device [1043:8877]
Physical Slot: 9
Flags: bus master, fast devsel, latency 0, IRQ 21
Memory at f2200000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, IntMsgNum 0
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=8 Masked-
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00: 22 10 b8 15 07 04 10 00 00 30 03 0c 10 00 00 00
10: 04 00 20 f2 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 77 88
30: 00 00 00 00 48 00 00 00 00 00 00 00 2e 01 00 00
40: 00 00 00 00 00 00 00 00 09 50 08 00 43 10 77 88
50: 01 64 03 00 08 00 00 00 00 00 00 00 00 00 00 00
60: 31 60 00 00 10 a0 02 00 a1 8f 00 00 30 29 00 00
70: 04 0d 40 00 00 00 04 11 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 1f 00 01 00 00 00 00 00
90: 1e 00 80 01 04 00 1f 00 00 00 00 00 00 00 00 00
a0: 05 c0 80 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 11 00 07 80 00 e0 0f 00 00 f0 0f 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Any ideas?
Original report at (with full kernel log etc): https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/8
#regzbot introduced: v6.12.59..v6.17.9
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-01-13 1:01 kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read Marek Marczykowski-Górecki @ 2026-01-13 2:47 ` Mario Limonciello 2026-01-13 16:04 ` Borislav Petkov 2026-06-05 17:34 ` Marek Marczykowski-Górecki 2026-01-30 17:01 ` Yazen Ghannam 1 sibling, 2 replies; 17+ messages in thread From: Mario Limonciello @ 2026-01-13 2:47 UTC (permalink / raw) To: Marek Marczykowski-Górecki, Yazen Ghannam Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On 1/12/2026 7:01 PM, Marek Marczykowski-Górecki wrote: > Hi, > > I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU > with AMD Raphael/Granite Ridge USB controller passed through. > It worked correctly in 6.12.59. Between those versions, I don't see any > relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, > but the AMD node driver did got some changes, so my guess is one of them > is to blame. I know the good-bad range is huge, but there aren't that > many changes to the AMD node driver in this range. Is this perhaps a case that only the USB controller was passed through but that the root controller wasn't? That would lead to a case that amd_smn_init() was never called and thus amd_roots was not initialized properly. So it would be a NULL pointer deref. If that's correct, something like this should work to avoid it. diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c index 3d0a4768d603c..894823b444d47 100644 --- a/arch/x86/kernel/amd_node.c +++ b/arch/x86/kernel/amd_node.c @@ -91,6 +91,11 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 address, u32 *value, b if (node >= amd_num_nodes()) return err; + if (!amd_roots) { + pr_warn("AMD SMN roots not initialized.\n"); + return err; + } + root = amd_roots[node]; if (!root) return err; > > It's running on Qubes OS 4.3, which uses Xen 4.19, and does PCI > passthrough of USB controllers to a dedicated VM (HVM). > > The full crash message is: > > [ 0.302571] pci 0000:00:08.0: quirk_usb_early_handoff+0x0/0x180 took 16590 usecs > [ 0.303172] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 0.303189] #PF: supervisor read access in kernel mode > [ 0.303202] #PF: error_code(0x0000) - not-present page > [ 0.303216] PGD 0 P4D 0 > [ 0.303225] Oops: Oops: 0000 [#1] SMP NOPTI > [ 0.303236] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.9-1.qubes.fc41.x86_64 #1 PREEMPT(full) > [ 0.303258] Hardware name: Xen HVM domU, BIOS 4.19.3 08/26/2025 > [ 0.303273] RIP: 0010:__amd_smn_rw+0x30/0x100 > [ 0.303288] Code: 05 bd 44 b8 01 66 0f af 05 2d 44 b8 01 41 57 41 56 41 55 41 54 55 53 66 39 c2 0f 83 c0 00 00 00 48 8b 05 c3 61 d7 02 0f b7 d2 <4c> 8b 34 d0 4d 85 f6 0f 84 a9 00 00 00 80 3d a4 61 d7 02 00 0f 84 > [ 0.303327] RSP: 0018:ffffcdd30001fd68 EFLAGS: 00010297 > [ 0.303341] RAX: 0000000000000000 RBX: ffffcdd30001fdb4 RCX: 0000000010136008 > [ 0.303359] RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000060 > [ 0.303377] RBP: ffffffffa684bb80 R08: ffffcdd30001fdb4 R09: 0000000000000000 > [ 0.303395] R10: ffffffffa7567420 R11: 0000000000000020 R12: ffff8dd081dff000 > [ 0.303413] R13: ffffffffa736ab60 R14: 00000000055ee14a R15: ffff8dd081dff000 > [ 0.303434] FS: 0000000000000000(0000) GS:ffff8dd0e87c1000(0000) knlGS:0000000000000000 > [ 0.303452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.303468] CR2: 0000000000000000 CR3: 000000000c62c000 CR4: 0000000000750ef0 > [ 0.303487] PKRU: 55555554 > [ 0.303495] Call Trace: > [ 0.303504] <TASK> > [ 0.303513] ? __pfx_quirk_clear_strap_no_soft_reset_dev2_f0+0x10/0x10 > [ 0.304112] amd_smn_read+0x27/0x50 > [ 0.304112] quirk_clear_strap_no_soft_reset_dev2_f0+0x37/0x80 > [ 0.304112] pci_fixup_device+0xf6/0x1b0 > [ 0.304112] pci_apply_final_quirks+0xe9/0x280 > [ 0.304112] ? __pfx_pci_apply_final_quirks+0x10/0x10 > [ 0.304112] do_one_initcall+0x57/0x310 > [ 0.304112] do_initcalls+0x1ef/0x240 > [ 0.304112] kernel_init_freeable+0x187/0x210 > [ 0.304112] ? __pfx_kernel_init+0x10/0x10 > [ 0.304112] kernel_init+0x1a/0x140 > [ 0.304112] ret_from_fork+0xf2/0x110 > [ 0.304112] ? __pfx_kernel_init+0x10/0x10 > [ 0.304112] ret_from_fork_asm+0x1a/0x30 > [ 0.304112] </TASK> > [ 0.304112] Modules linked in: > [ 0.304112] CR2: 0000000000000000 > [ 0.304112] ---[ end trace 0000000000000000 ]--- > [ 0.304112] RIP: 0010:__amd_smn_rw+0x30/0x100 > [ 0.304112] Code: 05 bd 44 b8 01 66 0f af 05 2d 44 b8 01 41 57 41 56 41 55 41 54 55 53 66 39 c2 0f 83 c0 00 00 00 48 8b 05 c3 61 d7 02 0f b7 d2 <4c> 8b 34 d0 4d 85 f6 0f 84 a9 00 00 00 80 3d a4 61 d7 02 00 0f 84 > [ 0.304112] RSP: 0018:ffffcdd30001fd68 EFLAGS: 00010297 > [ 0.304112] RAX: 0000000000000000 RBX: ffffcdd30001fdb4 RCX: 0000000010136008 > [ 0.304112] RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000060 > [ 0.304112] RBP: ffffffffa684bb80 R08: ffffcdd30001fdb4 R09: 0000000000000000 > [ 0.304112] R10: ffffffffa7567420 R11: 0000000000000020 R12: ffff8dd081dff000 > [ 0.304112] R13: ffffffffa736ab60 R14: 00000000055ee14a R15: ffff8dd081dff000 > [ 0.304112] FS: 0000000000000000(0000) GS:ffff8dd0e87c1000(0000) knlGS:0000000000000000 > [ 0.304112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.304112] CR2: 0000000000000000 CR3: 000000000c62c000 CR4: 0000000000750ef0 > [ 0.304112] PKRU: 55555554 > [ 0.304112] Kernel panic - not syncing: Fatal exception > > The device, as seen from within the VM: > > 00:09.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] (prog-if 30 [XHCI]) > Subsystem: ASUSTeK Computer Inc. Device [1043:8877] > Physical Slot: 9 > Flags: bus master, fast devsel, latency 0, IRQ 21 > Memory at f2200000 (64-bit, non-prefetchable) [size=1M] > Capabilities: [48] Vendor Specific Information: Len=08 <?> > Capabilities: [50] Power Management version 3 > Capabilities: [64] Express Endpoint, IntMsgNum 0 > Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [c0] MSI-X: Enable+ Count=8 Masked- > Kernel driver in use: xhci_hcd > Kernel modules: xhci_pci > 00: 22 10 b8 15 07 04 10 00 00 30 03 0c 10 00 00 00 > 10: 04 00 20 f2 00 00 00 00 00 00 00 00 00 00 00 00 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 77 88 > 30: 00 00 00 00 48 00 00 00 00 00 00 00 2e 01 00 00 > 40: 00 00 00 00 00 00 00 00 09 50 08 00 43 10 77 88 > 50: 01 64 03 00 08 00 00 00 00 00 00 00 00 00 00 00 > 60: 31 60 00 00 10 a0 02 00 a1 8f 00 00 30 29 00 00 > 70: 04 0d 40 00 00 00 04 11 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 1f 00 01 00 00 00 00 00 > 90: 1e 00 80 01 04 00 1f 00 00 00 00 00 00 00 00 00 > a0: 05 c0 80 00 00 00 00 00 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 11 00 07 80 00 e0 0f 00 00 f0 0f 00 00 00 00 00 > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Any ideas? > > Original report at (with full kernel log etc): https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/8 > > #regzbot introduced: v6.12.59..v6.17.9 > ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-01-13 2:47 ` Mario Limonciello @ 2026-01-13 16:04 ` Borislav Petkov 2026-06-05 17:34 ` Marek Marczykowski-Górecki 1 sibling, 0 replies; 17+ messages in thread From: Borislav Petkov @ 2026-01-13 16:04 UTC (permalink / raw) To: Mario Limonciello Cc: Marek Marczykowski-Górecki, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On Mon, Jan 12, 2026 at 08:47:50PM -0600, Mario Limonciello wrote: > Is this perhaps a case that only the USB controller was passed through but > that the root controller wasn't? That would lead to a case that > amd_smn_init() was never called and thus amd_roots was not initialized > properly. > > So it would be a NULL pointer deref. Yah, looks like a NULL ptr: 0: 05 bd 44 b8 01 add $0x1b844bd,%eax 5: 66 0f af 05 2d 44 b8 imul 0x1b8442d(%rip),%ax # 0x1b8443a c: 01 d: 41 57 push %r15 f: 41 56 push %r14 11: 41 55 push %r13 13: 41 54 push %r12 15: 55 push %rbp 16: 53 push %rbx 17: 66 39 c2 cmp %ax,%dx 1a: 0f 83 c0 00 00 00 jae 0xe0 That's if (node >= amd_num_nodes()) return err; 20: 48 8b 05 c3 61 d7 02 mov 0x2d761c3(%rip),%rax # 0x2d761ea That's fetching amd_roots. Which is 0, see RAX below. 27: 0f b7 d2 movzwl %dx,%edx Zero-extending the "node" var. 2a:* 4c 8b 34 d0 mov (%rax,%rdx,8),%r14 <-- trapping instruction Get the root ptr. Boom. > > [ 0.302571] pci 0000:00:08.0: quirk_usb_early_handoff+0x0/0x180 took 16590 usecs > > [ 0.303172] BUG: kernel NULL pointer dereference, address: 0000000000000000 > > [ 0.303189] #PF: supervisor read access in kernel mode > > [ 0.303202] #PF: error_code(0x0000) - not-present page > > [ 0.303216] PGD 0 P4D 0 > > [ 0.303225] Oops: Oops: 0000 [#1] SMP NOPTI > > [ 0.303236] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.9-1.qubes.fc41.x86_64 #1 PREEMPT(full) > > [ 0.303258] Hardware name: Xen HVM domU, BIOS 4.19.3 08/26/2025 > > [ 0.303273] RIP: 0010:__amd_smn_rw+0x30/0x100 > > [ 0.303288] Code: 05 bd 44 b8 01 66 0f af 05 2d 44 b8 01 41 57 41 56 41 55 41 54 55 53 66 39 c2 0f 83 c0 00 00 00 48 8b 05 c3 61 d7 02 0f b7 d2 <4c> 8b 34 d0 4d 85 f6 0f 84 a9 00 00 00 80 3d a4 61 d7 02 00 0f 84 > > [ 0.303327] RSP: 0018:ffffcdd30001fd68 EFLAGS: 00010297 > > [ 0.303341] RAX: 0000000000000000 RBX: ffffcdd30001fdb4 RCX: 0000000010136008 > > [ 0.303359] RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000060 -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-01-13 2:47 ` Mario Limonciello 2026-01-13 16:04 ` Borislav Petkov @ 2026-06-05 17:34 ` Marek Marczykowski-Górecki 2026-06-05 17:36 ` Mario Limonciello 1 sibling, 1 reply; 17+ messages in thread From: Marek Marczykowski-Górecki @ 2026-06-05 17:34 UTC (permalink / raw) To: Mario Limonciello Cc: Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions [-- Attachment #1: Type: text/plain, Size: 2018 bytes --] On Mon, Jan 12, 2026 at 08:47:50PM -0600, Mario Limonciello wrote: > > > On 1/12/2026 7:01 PM, Marek Marczykowski-Górecki wrote: > > Hi, > > > > I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU > > with AMD Raphael/Granite Ridge USB controller passed through. > > It worked correctly in 6.12.59. Between those versions, I don't see any > > relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, > > but the AMD node driver did got some changes, so my guess is one of them > > is to blame. I know the good-bad range is huge, but there aren't that > > many changes to the AMD node driver in this range. > > Is this perhaps a case that only the USB controller was passed through but > that the root controller wasn't? That would lead to a case that > amd_smn_init() was never called and thus amd_roots was not initialized > properly. > > So it would be a NULL pointer deref. If that's correct, something like this > should work to avoid it. > > diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c > index 3d0a4768d603c..894823b444d47 100644 > --- a/arch/x86/kernel/amd_node.c > +++ b/arch/x86/kernel/amd_node.c > @@ -91,6 +91,11 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 > address, u32 *value, b > if (node >= amd_num_nodes()) > return err; > > + if (!amd_roots) { > + pr_warn("AMD SMN roots not initialized.\n"); > + return err; > + } > + > root = amd_roots[node]; > if (!root) > return err; Thanks, I finally got confirmation from affected user that this patch fixes the issue. From what I understand, adbf61cc47cb ("x86/acpi/boot: Correct acpi_is_processor_usable() check again") was not enough. > > Original report at (with full kernel log etc): https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/8 -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 17:34 ` Marek Marczykowski-Górecki @ 2026-06-05 17:36 ` Mario Limonciello 2026-06-05 17:45 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 17+ messages in thread From: Mario Limonciello @ 2026-06-05 17:36 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On 6/5/26 12:34, Marek Marczykowski-Górecki wrote: > On Mon, Jan 12, 2026 at 08:47:50PM -0600, Mario Limonciello wrote: >> >> >> On 1/12/2026 7:01 PM, Marek Marczykowski-Górecki wrote: >>> Hi, >>> >>> I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU >>> with AMD Raphael/Granite Ridge USB controller passed through. >>> It worked correctly in 6.12.59. Between those versions, I don't see any >>> relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, >>> but the AMD node driver did got some changes, so my guess is one of them >>> is to blame. I know the good-bad range is huge, but there aren't that >>> many changes to the AMD node driver in this range. >> >> Is this perhaps a case that only the USB controller was passed through but >> that the root controller wasn't? That would lead to a case that >> amd_smn_init() was never called and thus amd_roots was not initialized >> properly. >> >> So it would be a NULL pointer deref. If that's correct, something like this >> should work to avoid it. >> >> diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c >> index 3d0a4768d603c..894823b444d47 100644 >> --- a/arch/x86/kernel/amd_node.c >> +++ b/arch/x86/kernel/amd_node.c >> @@ -91,6 +91,11 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 >> address, u32 *value, b >> if (node >= amd_num_nodes()) >> return err; >> >> + if (!amd_roots) { >> + pr_warn("AMD SMN roots not initialized.\n"); >> + return err; >> + } >> + >> root = amd_roots[node]; >> if (!root) >> return err; > > Thanks, I finally got confirmation from affected user that this patch > fixes the issue. From what I understand, adbf61cc47cb ("x86/acpi/boot: Correct > acpi_is_processor_usable() check again") was not enough. > >>> Original report at (with full kernel log etc): https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/8 > There's another patch being discussed. Could this help? https://lore.kernel.org/all/20260602184823.GKah8ld2QJLm28xoa9@fat_crate.local/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 17:36 ` Mario Limonciello @ 2026-06-05 17:45 ` Marek Marczykowski-Górecki 2026-06-05 18:54 ` Mario Limonciello 0 siblings, 1 reply; 17+ messages in thread From: Marek Marczykowski-Górecki @ 2026-06-05 17:45 UTC (permalink / raw) To: Mario Limonciello Cc: Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions [-- Attachment #1: Type: text/plain, Size: 2561 bytes --] On Fri, Jun 05, 2026 at 12:36:29PM -0500, Mario Limonciello wrote: > > > On 6/5/26 12:34, Marek Marczykowski-Górecki wrote: > > On Mon, Jan 12, 2026 at 08:47:50PM -0600, Mario Limonciello wrote: > > > > > > > > > On 1/12/2026 7:01 PM, Marek Marczykowski-Górecki wrote: > > > > Hi, > > > > > > > > I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU > > > > with AMD Raphael/Granite Ridge USB controller passed through. > > > > It worked correctly in 6.12.59. Between those versions, I don't see any > > > > relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, > > > > but the AMD node driver did got some changes, so my guess is one of them > > > > is to blame. I know the good-bad range is huge, but there aren't that > > > > many changes to the AMD node driver in this range. > > > > > > Is this perhaps a case that only the USB controller was passed through but > > > that the root controller wasn't? That would lead to a case that > > > amd_smn_init() was never called and thus amd_roots was not initialized > > > properly. > > > > > > So it would be a NULL pointer deref. If that's correct, something like this > > > should work to avoid it. > > > > > > diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c > > > index 3d0a4768d603c..894823b444d47 100644 > > > --- a/arch/x86/kernel/amd_node.c > > > +++ b/arch/x86/kernel/amd_node.c > > > @@ -91,6 +91,11 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 > > > address, u32 *value, b > > > if (node >= amd_num_nodes()) > > > return err; > > > > > > + if (!amd_roots) { > > > + pr_warn("AMD SMN roots not initialized.\n"); > > > + return err; > > > + } > > > + > > > root = amd_roots[node]; > > > if (!root) > > > return err; > > > > Thanks, I finally got confirmation from affected user that this patch > > fixes the issue. From what I understand, adbf61cc47cb ("x86/acpi/boot: Correct > > acpi_is_processor_usable() check again") was not enough. > > > > > > Original report at (with full kernel log etc): https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/8 > > > > There's another patch being discussed. Could this help? > > https://lore.kernel.org/all/20260602184823.GKah8ld2QJLm28xoa9@fat_crate.local/ Especially with 2/2 patch there, yes, looks like it would help too. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 17:45 ` Marek Marczykowski-Górecki @ 2026-06-05 18:54 ` Mario Limonciello 2026-06-05 20:23 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 17+ messages in thread From: Mario Limonciello @ 2026-06-05 18:54 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On 6/5/26 12:45, Marek Marczykowski-Górecki wrote: > On Fri, Jun 05, 2026 at 12:36:29PM -0500, Mario Limonciello wrote: >> >> >> On 6/5/26 12:34, Marek Marczykowski-Górecki wrote: >>> On Mon, Jan 12, 2026 at 08:47:50PM -0600, Mario Limonciello wrote: >>>> >>>> >>>> On 1/12/2026 7:01 PM, Marek Marczykowski-Górecki wrote: >>>>> Hi, >>>>> >>>>> I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU >>>>> with AMD Raphael/Granite Ridge USB controller passed through. >>>>> It worked correctly in 6.12.59. Between those versions, I don't see any >>>>> relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, >>>>> but the AMD node driver did got some changes, so my guess is one of them >>>>> is to blame. I know the good-bad range is huge, but there aren't that >>>>> many changes to the AMD node driver in this range. >>>> >>>> Is this perhaps a case that only the USB controller was passed through but >>>> that the root controller wasn't? That would lead to a case that >>>> amd_smn_init() was never called and thus amd_roots was not initialized >>>> properly. >>>> >>>> So it would be a NULL pointer deref. If that's correct, something like this >>>> should work to avoid it. >>>> >>>> diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c >>>> index 3d0a4768d603c..894823b444d47 100644 >>>> --- a/arch/x86/kernel/amd_node.c >>>> +++ b/arch/x86/kernel/amd_node.c >>>> @@ -91,6 +91,11 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 >>>> address, u32 *value, b >>>> if (node >= amd_num_nodes()) >>>> return err; >>>> >>>> + if (!amd_roots) { >>>> + pr_warn("AMD SMN roots not initialized.\n"); >>>> + return err; >>>> + } >>>> + >>>> root = amd_roots[node]; >>>> if (!root) >>>> return err; >>> >>> Thanks, I finally got confirmation from affected user that this patch >>> fixes the issue. From what I understand, adbf61cc47cb ("x86/acpi/boot: Correct >>> acpi_is_processor_usable() check again") was not enough. >>> >>>>> Original report at (with full kernel log etc): https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/8 >>> >> >> There's another patch being discussed. Could this help? >> >> https://lore.kernel.org/all/20260602184823.GKah8ld2QJLm28xoa9@fat_crate.local/ > > Especially with 2/2 patch there, yes, looks like it would help too. > Can you try Boris' inline proposal specifically? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 18:54 ` Mario Limonciello @ 2026-06-05 20:23 ` Marek Marczykowski-Górecki 2026-06-05 21:15 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Marek Marczykowski-Górecki @ 2026-06-05 20:23 UTC (permalink / raw) To: Mario Limonciello Cc: Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions [-- Attachment #1: Type: text/plain, Size: 3340 bytes --] On Fri, Jun 05, 2026 at 01:54:10PM -0500, Mario Limonciello wrote: > > > On 6/5/26 12:45, Marek Marczykowski-Górecki wrote: > > On Fri, Jun 05, 2026 at 12:36:29PM -0500, Mario Limonciello wrote: > > > > > > > > > On 6/5/26 12:34, Marek Marczykowski-Górecki wrote: > > > > On Mon, Jan 12, 2026 at 08:47:50PM -0600, Mario Limonciello wrote: > > > > > > > > > > > > > > > On 1/12/2026 7:01 PM, Marek Marczykowski-Górecki wrote: > > > > > > Hi, > > > > > > > > > > > > I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU > > > > > > with AMD Raphael/Granite Ridge USB controller passed through. > > > > > > It worked correctly in 6.12.59. Between those versions, I don't see any > > > > > > relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, > > > > > > but the AMD node driver did got some changes, so my guess is one of them > > > > > > is to blame. I know the good-bad range is huge, but there aren't that > > > > > > many changes to the AMD node driver in this range. > > > > > > > > > > Is this perhaps a case that only the USB controller was passed through but > > > > > that the root controller wasn't? That would lead to a case that > > > > > amd_smn_init() was never called and thus amd_roots was not initialized > > > > > properly. > > > > > > > > > > So it would be a NULL pointer deref. If that's correct, something like this > > > > > should work to avoid it. > > > > > > > > > > diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c > > > > > index 3d0a4768d603c..894823b444d47 100644 > > > > > --- a/arch/x86/kernel/amd_node.c > > > > > +++ b/arch/x86/kernel/amd_node.c > > > > > @@ -91,6 +91,11 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 > > > > > address, u32 *value, b > > > > > if (node >= amd_num_nodes()) > > > > > return err; > > > > > > > > > > + if (!amd_roots) { > > > > > + pr_warn("AMD SMN roots not initialized.\n"); > > > > > + return err; > > > > > + } > > > > > + > > > > > root = amd_roots[node]; > > > > > if (!root) > > > > > return err; > > > > > > > > Thanks, I finally got confirmation from affected user that this patch > > > > fixes the issue. From what I understand, adbf61cc47cb ("x86/acpi/boot: Correct > > > > acpi_is_processor_usable() check again") was not enough. > > > > > > > > > > Original report at (with full kernel log etc): https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/8 > > > > > > > > > > There's another patch being discussed. Could this help? > > > > > > https://lore.kernel.org/all/20260602184823.GKah8ld2QJLm28xoa9@fat_crate.local/ > > > > Especially with 2/2 patch there, yes, looks like it would help too. > > > > Can you try Boris' inline proposal specifically? Instead of the series? No, that's not enough. amd_smn_read() is called from quirk_clear_strap_no_soft_reset_dev2_f0, so it would still hit NULL at amd_roots in __amd_smn_rw(). But if you mean instead of the first patch (but apply the second as is), it should work. I don't have affected hardware, but I'll ask the affected user to test this version. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 484 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 20:23 ` Marek Marczykowski-Górecki @ 2026-06-05 21:15 ` Borislav Petkov 2026-06-05 21:55 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-06-05 21:15 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Mario Limonciello, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On Fri, Jun 05, 2026 at 10:23:18PM +0200, Marek Marczykowski-Górecki wrote: > Instead of the series? No, that's not enough. amd_smn_read() is called > from quirk_clear_strap_no_soft_reset_dev2_f0, so it would still hit NULL > at amd_roots in __amd_smn_rw(). But if you mean instead of the first > patch (but apply the second as is), it should work. I don't have > affected hardware, but I'll ask the affected user to test this version. amd_smn_read() should not happen in guests. It is that simple. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 21:15 ` Borislav Petkov @ 2026-06-05 21:55 ` Marek Marczykowski-Górecki 2026-06-05 22:26 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Marek Marczykowski-Górecki @ 2026-06-05 21:55 UTC (permalink / raw) To: Borislav Petkov Cc: Mario Limonciello, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions [-- Attachment #1: Type: text/plain, Size: 833 bytes --] On Fri, Jun 05, 2026 at 02:15:43PM -0700, Borislav Petkov wrote: > On Fri, Jun 05, 2026 at 10:23:18PM +0200, Marek Marczykowski-Górecki wrote: > > Instead of the series? No, that's not enough. amd_smn_read() is called > > from quirk_clear_strap_no_soft_reset_dev2_f0, so it would still hit NULL > > at amd_roots in __amd_smn_rw(). But if you mean instead of the first > > patch (but apply the second as is), it should work. I don't have > > affected hardware, but I'll ask the affected user to test this version. > > amd_smn_read() should not happen in guests. It is that simple. Well, it clearly happens, see the call trace in the first message of the thread... Do you suggest the fix should change quirk_clear_strap_no_soft_reset_dev2_f0()? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 21:55 ` Marek Marczykowski-Górecki @ 2026-06-05 22:26 ` Borislav Petkov 2026-06-05 22:40 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-06-05 22:26 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Mario Limonciello, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On Fri, Jun 05, 2026 at 11:55:47PM +0200, Marek Marczykowski-Górecki wrote: > Well, it clearly happens, see the call trace in the first message of the > thread... > Do you suggest the fix should change > quirk_clear_strap_no_soft_reset_dev2_f0()? https://lore.kernel.org/all/20260602184823.GKah8ld2QJLm28xoa9@fat_crate.local/ That thing should return an error so that amd_smn_read() is not even available on guests. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 22:26 ` Borislav Petkov @ 2026-06-05 22:40 ` Marek Marczykowski-Górecki 2026-06-05 23:09 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Marek Marczykowski-Górecki @ 2026-06-05 22:40 UTC (permalink / raw) To: Borislav Petkov Cc: Mario Limonciello, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions [-- Attachment #1: Type: text/plain, Size: 1436 bytes --] On Fri, Jun 05, 2026 at 03:26:48PM -0700, Borislav Petkov wrote: > On Fri, Jun 05, 2026 at 11:55:47PM +0200, Marek Marczykowski-Górecki wrote: > > Well, it clearly happens, see the call trace in the first message of the > > thread... > > Do you suggest the fix should change > > quirk_clear_strap_no_soft_reset_dev2_f0()? > > https://lore.kernel.org/all/20260602184823.GKah8ld2QJLm28xoa9@fat_crate.local/ > > That thing should return an error so that amd_smn_read() is not even available > on guests. What do you mean by "not even available on guests"? I'm talking about https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/pci/fixup.c#n876 static void quirk_clear_strap_no_soft_reset_dev2_f0(struct pci_dev *dev) { u32 data; if (!amd_smn_read(0, AMD_15B8_RCC_DEV2_EPF0_STRAP2, &data)) { data &= ~AMD_15B8_RCC_DEV2_EPF0_STRAP2_NO_SOFT_RESET_DEV2_F0_MASK; if (amd_smn_write(0, AMD_15B8_RCC_DEV2_EPF0_STRAP2, data)) pci_err(dev, "Failed to write data 0x%x\n", data); } else { pci_err(dev, "Failed to read data\n"); } } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15b8, quirk_clear_strap_no_soft_reset_dev2_f0); There is nothing here that would prevent amd_smn_read() being called inside a guest... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 22:40 ` Marek Marczykowski-Górecki @ 2026-06-05 23:09 ` Borislav Petkov 2026-06-05 23:37 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-06-05 23:09 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Mario Limonciello, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On Sat, Jun 06, 2026 at 12:40:20AM +0200, Marek Marczykowski-Górecki wrote: > There is nothing here that would prevent amd_smn_read() being called > inside a guest... Yah, there should've been... Anyway, something like the untested below, pls give it a run. Thx. --- diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c index 0be01725a2a4..52eff7fac667 100644 --- a/arch/x86/kernel/amd_node.c +++ b/arch/x86/kernel/amd_node.c @@ -39,6 +39,7 @@ static struct pci_dev **amd_roots; /* Protect the PCI config register pairs used for SMN. */ static DEFINE_MUTEX(smn_mutex); static bool smn_exclusive; +static bool amd_node_off; #define SMN_INDEX_OFFSET 0x60 #define SMN_DATA_OFFSET 0x64 @@ -88,6 +89,9 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 address, u32 *value, b struct pci_dev *root; int err = -ENODEV; + if (amd_node_off) + return -EINVAL; + if (node >= amd_num_nodes()) return err; @@ -248,9 +252,13 @@ static int __init amd_smn_init(void) { u16 count, num_roots, roots_per_node, node, num_nodes; struct pci_dev *root; + int err = -EINVAL; + + if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) + goto err_out; if (!cpu_feature_enabled(X86_FEATURE_ZEN)) - return 0; + goto err_out; guard(mutex)(&smn_mutex); @@ -270,7 +278,8 @@ static int __init amd_smn_init(void) */ if (!pci_request_config_region_exclusive(root, 0, PCI_CFG_SPACE_SIZE, NULL)) { pci_err(root, "Failed to reserve config space\n"); - return -EEXIST; + err = -EEXIST; + goto err_out; } num_roots++; @@ -278,13 +287,17 @@ static int __init amd_smn_init(void) pr_debug("Found %d AMD root devices\n", num_roots); - if (!num_roots) - return -ENODEV; + if (!num_roots) { + err = -ENODEV; + goto err_out; + } num_nodes = amd_num_nodes(); amd_roots = kzalloc_objs(*amd_roots, num_nodes); - if (!amd_roots) - return -ENOMEM; + if (!amd_roots) { + err = -ENOMEM; + goto err_out; + } roots_per_node = num_roots / num_nodes; @@ -311,6 +324,10 @@ static int __init amd_smn_init(void) smn_exclusive = true; return 0; + +err_out: + amd_node_off = true; + return err; } fs_initcall(amd_smn_init); -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 23:09 ` Borislav Petkov @ 2026-06-05 23:37 ` Marek Marczykowski-Górecki 2026-06-06 1:59 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Marek Marczykowski-Górecki @ 2026-06-05 23:37 UTC (permalink / raw) To: Borislav Petkov Cc: Mario Limonciello, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions [-- Attachment #1: Type: text/plain, Size: 1366 bytes --] On Fri, Jun 05, 2026 at 04:09:49PM -0700, Borislav Petkov wrote: > On Sat, Jun 06, 2026 at 12:40:20AM +0200, Marek Marczykowski-Górecki wrote: > > There is nothing here that would prevent amd_smn_read() being called > > inside a guest... > > Yah, there should've been... > > Anyway, something like the untested below, pls give it a run. > > Thx. > > --- > diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c > index 0be01725a2a4..52eff7fac667 100644 > --- a/arch/x86/kernel/amd_node.c > +++ b/arch/x86/kernel/amd_node.c ... > @@ -311,6 +324,10 @@ static int __init amd_smn_init(void) > smn_exclusive = true; > > return 0; > + > +err_out: > + amd_node_off = true; > + return err; > } > > fs_initcall(amd_smn_init); Is it actually guaranteed to run before PCI fixups? They are done via fs_initcall_sync. IMO it would be safer to guard __amd_smn_rw() with something that would also detect calls before amd_smn_init() is called. Like using smn_exclusive in the Penny's patch, or amd_roots in the Mario's patch. That said, amd_smn_read() called before amd_smn_init() would (should?) fail anyway, even in non-virtualized case. So, maybe this approach (still crash on NULL ptr when called before amd_smn_init()) is acceptable? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-06-05 23:37 ` Marek Marczykowski-Górecki @ 2026-06-06 1:59 ` Borislav Petkov 0 siblings, 0 replies; 17+ messages in thread From: Borislav Petkov @ 2026-06-06 1:59 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Mario Limonciello, Yazen Ghannam, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On Sat, Jun 06, 2026 at 01:37:03AM +0200, Marek Marczykowski-Górecki wrote: > Is it actually guaranteed to run before PCI fixups? They are done via > fs_initcall_sync. Yap, the sync initcalls run after the respective level initcalls. > IMO it would be safer to guard __amd_smn_rw() with something that would also > detect calls before amd_smn_init() is called. Like using smn_exclusive in > the Penny's patch, or amd_roots in the Mario's patch. I can do this if absolutely necessary: static bool amd_node_off = true; and then set it accordingly in the init function but I don't think that's needed. > That said, amd_smn_read() called before amd_smn_init() would (should?) > fail anyway, even in non-virtualized case. So, maybe this approach > (still crash on NULL ptr when called before amd_smn_init()) is > acceptable? Right, then we can hear about it and see who's doing what shenanigans. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-01-13 1:01 kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read Marek Marczykowski-Górecki 2026-01-13 2:47 ` Mario Limonciello @ 2026-01-30 17:01 ` Yazen Ghannam 2026-02-07 1:57 ` Marek Marczykowski-Górecki 1 sibling, 1 reply; 17+ messages in thread From: Yazen Ghannam @ 2026-01-30 17:01 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Mario Limonciello, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions On Tue, Jan 13, 2026 at 02:01:34AM +0100, Marek Marczykowski-Górecki wrote: > Hi, > > I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU > with AMD Raphael/Granite Ridge USB controller passed through. > It worked correctly in 6.12.59. Between those versions, I don't see any > relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, > but the AMD node driver did got some changes, so my guess is one of them > is to blame. I know the good-bad range is huge, but there aren't that > many changes to the AMD node driver in this range. > > It's running on Qubes OS 4.3, which uses Xen 4.19, and does PCI > passthrough of USB controllers to a dedicated VM (HVM). > Hi Marek, Can you please test with the following patch? adbf61cc47cb ("x86/acpi/boot: Correct acpi_is_processor_usable() check again") This is in the TIP maintainers repo and linux-next. Thanks, Yazen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read 2026-01-30 17:01 ` Yazen Ghannam @ 2026-02-07 1:57 ` Marek Marczykowski-Górecki 0 siblings, 0 replies; 17+ messages in thread From: Marek Marczykowski-Górecki @ 2026-02-07 1:57 UTC (permalink / raw) To: Yazen Ghannam Cc: Mario Limonciello, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), open list:AMD NODE DRIVER, regressions [-- Attachment #1: Type: text/plain, Size: 1274 bytes --] On Fri, Jan 30, 2026 at 12:01:06PM -0500, Yazen Ghannam wrote: > On Tue, Jan 13, 2026 at 02:01:34AM +0100, Marek Marczykowski-Górecki wrote: > > Hi, > > > > I've got a report that kernel 6.17.9 crashes when running a Xen HVM domU > > with AMD Raphael/Granite Ridge USB controller passed through. > > It worked correctly in 6.12.59. Between those versions, I don't see any > > relevant change to quirk_clear_strap_no_soft_reset_dev2_f0() function, > > but the AMD node driver did got some changes, so my guess is one of them > > is to blame. I know the good-bad range is huge, but there aren't that > > many changes to the AMD node driver in this range. > > > > It's running on Qubes OS 4.3, which uses Xen 4.19, and does PCI > > passthrough of USB controllers to a dedicated VM (HVM). > > > > Hi Marek, > > Can you please test with the following patch? > adbf61cc47cb ("x86/acpi/boot: Correct acpi_is_processor_usable() check again") > > This is in the TIP maintainers repo and linux-next. FWIW I asked[1] a week ago the original reporter to test this, but unfortunately no feedback so far. [1] https://forum.qubes-os.org/t/yet-another-usb-keyboard-thread/38355/20 -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-06-06 1:59 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-13 1:01 kernel NULL pointer dereference in quirk_clear_strap_no_soft_reset_dev2_f0 -> amd_smn_read Marek Marczykowski-Górecki 2026-01-13 2:47 ` Mario Limonciello 2026-01-13 16:04 ` Borislav Petkov 2026-06-05 17:34 ` Marek Marczykowski-Górecki 2026-06-05 17:36 ` Mario Limonciello 2026-06-05 17:45 ` Marek Marczykowski-Górecki 2026-06-05 18:54 ` Mario Limonciello 2026-06-05 20:23 ` Marek Marczykowski-Górecki 2026-06-05 21:15 ` Borislav Petkov 2026-06-05 21:55 ` Marek Marczykowski-Górecki 2026-06-05 22:26 ` Borislav Petkov 2026-06-05 22:40 ` Marek Marczykowski-Górecki 2026-06-05 23:09 ` Borislav Petkov 2026-06-05 23:37 ` Marek Marczykowski-Górecki 2026-06-06 1:59 ` Borislav Petkov 2026-01-30 17:01 ` Yazen Ghannam 2026-02-07 1:57 ` Marek Marczykowski-Górecki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.