* [PATCH v1 1/2] x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
2026-05-06 5:55 [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology Penny Zheng
@ 2026-05-06 5:55 ` Penny Zheng
2026-05-06 5:55 ` [PATCH v1 2/2] x86/amd_node: reject SMN access when amd_smn_init() did not complete Penny Zheng
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Penny Zheng @ 2026-05-06 5:55 UTC (permalink / raw)
To: x86
Cc: ray.huang, Jason.Andryuk, stefano.stabellini, Penny Zheng,
Mario Limonciello, Yazen Ghannam, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
linux-kernel
To prevent each dom0 vCPU from looking like an SMT sibling of another
vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2.
While the spacing every vCPU's APIC ID by 2 can therefore push the IDs
past the package-field boundary, making Linux see more packages than
the platform actually has. amd_num_nodes() inherits that inflated count,
so num_nodes can exceed num_roots (the number of AMD root complexes
discovered on the PCI bus). The subsequent
roots_per_node = num_roots / num_nodes;
... count % roots_per_node ...
then divides by zero in amd_smn_init().
Reject num_roots < num_nodes explicitly and bail out with -ENODEV.
Signed-off-by: Penny Zheng <penny.zheng@amd.com>
---
arch/x86/kernel/amd_node.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c
index 0be01725a2a4..c896060fe0df 100644
--- a/arch/x86/kernel/amd_node.c
+++ b/arch/x86/kernel/amd_node.c
@@ -282,6 +282,18 @@ static int __init amd_smn_init(void)
return -ENODEV;
num_nodes = amd_num_nodes();
+
+ /*
+ * Xen dom0's synthetic APIC IDs may imply more nodes than host
+ * bridges visible in PCI config space. Bail out to avoid a
+ * divide-by-zero when later computing roots_per_node.
+ */
+ if (num_roots < num_nodes) {
+ pr_debug("AMD root count (%u) < node count (%u); skipping SMN init\n",
+ num_roots, num_nodes);
+ return -ENODEV;
+ }
+
amd_roots = kzalloc_objs(*amd_roots, num_nodes);
if (!amd_roots)
return -ENOMEM;
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v1 2/2] x86/amd_node: reject SMN access when amd_smn_init() did not complete
2026-05-06 5:55 [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology Penny Zheng
2026-05-06 5:55 ` [PATCH v1 1/2] x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0 Penny Zheng
@ 2026-05-06 5:55 ` Penny Zheng
2026-05-06 17:17 ` [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology Mario Limonciello
2026-05-07 8:37 ` Jiaqing Zhao
3 siblings, 0 replies; 5+ messages in thread
From: Penny Zheng @ 2026-05-06 5:55 UTC (permalink / raw)
To: x86
Cc: ray.huang, Jason.Andryuk, stefano.stabellini, Penny Zheng,
Mario Limonciello, Yazen Ghannam, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
linux-kernel
amd_smn_init() can fail early (e.g. -ENODEV when num_roots < num_nodes,
-ENOMEM from kcalloc) without setting smn_exclusive. In that case
amd_roots stays NULL, but the existing __amd_smn_rw() ordering dereferenced
amd_roots[node] before the smn_exclusive guard. The first SMN consumer (e.g.
amd_pmc_probe -> amd_smn_read) then hit a NULL pointer dereference
instead of getting -ENODEV.
Move the smn_exclusive check to the very beginning of __amd_smn_rw()
so a failed init is rejected before any deref. Also zero *value in
amd_smn_read() on the error path so callers never read uninitialized
data via the subsequent PCI_POSSIBLE_ERROR() check.
Signed-off-by: Penny Zheng <penny.zheng@amd.com>
---
arch/x86/kernel/amd_node.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/amd_node.c b/arch/x86/kernel/amd_node.c
index c896060fe0df..cb9ed022c53c 100644
--- a/arch/x86/kernel/amd_node.c
+++ b/arch/x86/kernel/amd_node.c
@@ -88,6 +88,9 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 address, u32 *value, b
struct pci_dev *root;
int err = -ENODEV;
+ if (!smn_exclusive)
+ return err;
+
if (node >= amd_num_nodes())
return err;
@@ -95,9 +98,6 @@ static int __amd_smn_rw(u8 i_off, u8 d_off, u16 node, u32 address, u32 *value, b
if (!root)
return err;
- if (!smn_exclusive)
- return err;
-
guard(mutex)(&smn_mutex);
err = pci_write_config_dword(root, i_off, address);
@@ -116,6 +116,11 @@ int __must_check amd_smn_read(u16 node, u32 address, u32 *value)
{
int err = __amd_smn_rw(SMN_INDEX_OFFSET, SMN_DATA_OFFSET, node, address, value, false);
+ if (err) {
+ *value = 0;
+ return err;
+ }
+
if (PCI_POSSIBLE_ERROR(*value)) {
err = -ENODEV;
*value = 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
2026-05-06 5:55 [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology Penny Zheng
2026-05-06 5:55 ` [PATCH v1 1/2] x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0 Penny Zheng
2026-05-06 5:55 ` [PATCH v1 2/2] x86/amd_node: reject SMN access when amd_smn_init() did not complete Penny Zheng
@ 2026-05-06 17:17 ` Mario Limonciello
2026-05-07 8:37 ` Jiaqing Zhao
3 siblings, 0 replies; 5+ messages in thread
From: Mario Limonciello @ 2026-05-06 17:17 UTC (permalink / raw)
To: Penny Zheng, x86
Cc: ray.huang, Jason.Andryuk, stefano.stabellini, Yazen Ghannam,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, xen-devel, linux-kernel
On 5/6/26 00:55, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
>
> Oops: divide error: 0000 [#1] SMP NOPTI
> RIP: 0010:amd_smn_init+0x188/0x2e0
>
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).
So to confirm - does amd_pmc_probe work properly with this series now?
>
> Root cause
> ==========
>
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.
>
> The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
> linux-next/master (next-20260505).
>
> Penny Zheng (2):
> x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
> x86/amd_node: reject SMN access when amd_smn_init() did not complete
>
> arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology
2026-05-06 5:55 [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology Penny Zheng
` (2 preceding siblings ...)
2026-05-06 17:17 ` [PATCH v1 0/2] x86/amd_node: harden amd_smn_init() against Xen dom0 topology Mario Limonciello
@ 2026-05-07 8:37 ` Jiaqing Zhao
3 siblings, 0 replies; 5+ messages in thread
From: Jiaqing Zhao @ 2026-05-07 8:37 UTC (permalink / raw)
To: Penny Zheng, x86
Cc: ray.huang, Jason.Andryuk, stefano.stabellini, Mario Limonciello,
Yazen Ghannam, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, xen-devel, linux-kernel
This amd_smn_init div0 oops is also observed on 6.19.14+deb14 (Debian testing)
and 6.18.27. Given that 6.18 is LTS, suggesting `Cc: stable@vger.kernel.org`
to get it backported.
Thanks,
Jiaqing
On 2026-05-06 13:55, Penny Zheng wrote:
> While booting a recent linux-next kernel as a Xen PVH dom0 on x86, the kernel
> oopses very early during fs_initcall:
>
> Oops: divide error: 0000 [#1] SMP NOPTI
> RIP: 0010:amd_smn_init+0x188/0x2e0
>
> Followed: on a kernel that survives the divide, it will fail by a NULL pointer
> dereference from the first SMN consumer (amd_pmc_probe -> amd_smn_read).
>
> Root cause
> ==========
>
> To prevent each dom0 vCPU from looking like an SMT sibling of another
> vCPU, Xen synthesizes guest x2APIC IDs as vcpu_index * 2. This spacing every
> vCPU's APIC ID by 2 can push the synthesized IDs past the package-field
> boundary. Linux then infers more "packages" and therefore more AMD
> nodes via amd_num_nodes() than the platform actually has, while the
> PCI-side host-bridge scan correctly reports the number of root complex.
>
> The fixes are tested on Xen 4.20 PVH dom0 on AMD Zen (16 vCPUs) on top of
> linux-next/master (next-20260505).
>
> Penny Zheng (2):
> x86/amd_node: avoid divide-by-zero in amd_smn_init() under Xen dom0
> x86/amd_node: reject SMN access when amd_smn_init() did not complete
>
> arch/x86/kernel/amd_node.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
^ permalink raw reply [flat|nested] 5+ messages in thread