* [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable()
@ 2026-05-20 2:20 Jinjie Ruan
2026-06-02 11:09 ` Will Deacon
0 siblings, 1 reply; 7+ messages in thread
From: Jinjie Ruan @ 2026-05-20 2:20 UTC (permalink / raw)
To: catalin.marinas, will, corbet, skhan, punit.agrawal, jic23,
osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz,
lpieralisi, timothy.hayes, sascha.bischoff, arnd,
mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong,
sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel
Cc: ruanjinjie
On arm64, when booting with `maxcpus` greater than the number of present
CPUs (e.g., QEMU -smp cpus=4,maxcpus=8), some CPUs are marked as 'present'
but have not yet been registered via register_cpu(). Consequently,
the per-cpu device objects for these CPUs are not yet initialized.
In cpuhp_smt_enable(), the code iterates over all present CPUs. Calling
_cpu_up() for these unregistered CPUs eventually leads to
sysfs_create_group() being called with a NULL kobject (or a kobject
without a directory), triggering the following warning in
fs/sysfs/group.c:
if (WARN_ON(!kobj || (!update && !kobj->sd)))
return -EINVAL;
When booting with ACPI, arm64 smp_prepare_cpus() currently sets all
enumerated CPUs as "present" regardless of their status in the MADT. This
causes issues with SMT hotplug control. For instance, with QEMU's
"-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as
follows: the first four CPUs are marked Enabled while the remaining four
are marked Online Capable to support potential hot-plugging.
Fix this by:
1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC
entry before calling set_cpu_present() during SMP initialization.
2. Properly managing the present mask in acpi_map_cpu() and
acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with
other architectures like x86 and LoongArch.
3. Update the arm64 CPU hotplug documentation to no longer state that all
online-capable vCPUs are marked as present by the kernel at boot time.
This ensures that only physically available or explicitly enabled CPUs
are in the present mask, keeping the SMT control logic consistent with
the actual hardware state.
How to reproduce:
1. echo off > /sys/devices/system/cpu/smt/control
psci: CPU1 killed (polled 0 ms)
psci: CPU3 killed (polled 0 ms)
2. echo 2 > /sys/devices/system/cpu/smt/control
Detected PIPT I-cache on CPU1
GICv3: CPU1: found redistributor 1 region 0:0x00000000080c0000
CPU1: Booted secondary processor 0x0000000001 [0x410fd082]
Detected PIPT I-cache on CPU3
GICv3: CPU3: found redistributor 3 region 0:0x0000000008100000
CPU3: Booted secondary processor 0x0000000003 [0x410fd082]
------------[ cut here ]------------
WARNING: fs/sysfs/group.c:137 at internal_create_group+0x41c/0x4bc, CPU#2: sh/181
Modules linked in:
CPU: 2 UID: 0 PID: 181 Comm: sh Not tainted 7.0.0-rc1-00010-g8d13386c7624 #142 PREEMPT
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : internal_create_group+0x41c/0x4bc
lr : sysfs_create_group+0x18/0x24
sp : ffff80008078ba40
x29: ffff80008078ba40 x28: ffff296c980ad000 x27: ffff00007fb94128
x26: 0000000000000054 x25: ffffd693e845f3f0 x24: 0000000000000001
x23: 0000000000000001 x22: 0000000000000004 x21: 0000000000000000
x20: ffffd693e845fc10 x19: 0000000000000004 x18: 00000000ffffffff
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000358 x13: 0000000000000007 x12: 0000000000000350
x11: 0000000000000008 x10: 0000000000000407 x9 : 0000000000000400
x8 : ffff00007fbf3b60 x7 : 0000000000000000 x6 : ffffd693e845f3f0
x5 : ffff00007fb94128 x4 : 0000000000000000 x3 : ffff000000f4eac0
x2 : ffffd693e7095a08 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
internal_create_group+0x41c/0x4bc (P)
sysfs_create_group+0x18/0x24
topology_add_dev+0x1c/0x28
cpuhp_invoke_callback+0x104/0x20c
__cpuhp_invoke_callback_range+0x94/0x11c
_cpu_up+0x200/0x37c
cpuhp_smt_enable+0xbc/0x114
control_store+0xe8/0x1d4
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x7c/0x94
kernfs_fop_write_iter+0x128/0x1b8
vfs_write+0x2b0/0x354
ksys_write+0x68/0xfc
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x10c
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc+0x20/0x2c
el0_svc+0x34/0x124
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
---[ end trace 0000000000000000 ]---
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: stable@vger.kernel.org
Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#gic-cpu-interface-gicc-structure
Fixes: eed4583bcf9a6 ("arm64: Kconfig: Enable HOTPLUG_SMT")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
v3:
- Update the arm64 cpu-hotplug documentation as Catalin suggested.
- Update the commit message.
v2:
- Update the fix way.
---
Documentation/arch/arm64/cpu-hotplug.rst | 11 +++++++----
arch/arm64/kernel/acpi.c | 2 ++
arch/arm64/kernel/smp.c | 12 +++++++++++-
3 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
index 8fb438bf7781..60f7f51d7b96 100644
--- a/Documentation/arch/arm64/cpu-hotplug.rst
+++ b/Documentation/arch/arm64/cpu-hotplug.rst
@@ -47,8 +47,9 @@ ever have can be described at boot. There are no power-domain considerations
as such devices are emulated.
CPU Hotplug on virtual systems is supported. It is distinct from physical
-CPU Hotplug as all resources are described as ``present``, but CPUs may be
-marked as disabled by firmware. Only the CPU's online/offline behaviour is
+CPU Hotplug as all resources are described in the static configuration tables,
+but vCPUs that are not enabled at boot are not marked as ``present`` by the
+kernel until they are hotplugged. Only the CPU's online/offline behaviour is
influenced by firmware. An example is where a virtual machine boots with a
single CPU, and additional CPUs are added once a cloud orchestrator deploys
the workload.
@@ -68,8 +69,10 @@ redistributors.
CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
by the DSDT's Processor object's _STA method. On virtual systems the _STA method
-must always report the CPU as ``present``. Changes to the firmware policy can
-be notified to the OS via device-check or eject-request.
+must report the CPU as ``present`` when it is activated by the firmware.
+The kernel will then set the vCPU as ``present`` dynamically during the hotplug
+configuration process. Changes can be notified to the OS via device-check or
+eject-request.
CPUs described as ``enabled`` in the static table, should not have their _STA
modified dynamically by firmware. Soft-restart features such as kexec will
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 5891f92c2035..681aa2bbc399 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -448,12 +448,14 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 apci_id,
return *pcpu;
}
+ set_cpu_present(*pcpu, true);
return 0;
}
EXPORT_SYMBOL(acpi_map_cpu);
int acpi_unmap_cpu(int cpu)
{
+ set_cpu_present(cpu, false);
return 0;
}
EXPORT_SYMBOL(acpi_unmap_cpu);
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 1aa324104afb..5932e5b30b71 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -566,6 +566,11 @@ struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu)
}
EXPORT_SYMBOL_GPL(acpi_cpu_get_madt_gicc);
+static bool acpi_cpu_is_present(int cpu)
+{
+ return acpi_cpu_get_madt_gicc(cpu)->flags & ACPI_MADT_ENABLED;
+}
+
/*
* acpi_map_gic_cpu_interface - parse processor MADT entry
*
@@ -670,6 +675,10 @@ static void __init acpi_parse_and_init_cpus(void)
early_map_cpu_to_node(i, acpi_numa_get_nid(i));
}
#else
+static bool acpi_cpu_is_present(int cpu)
+{
+ return false;
+}
#define acpi_parse_and_init_cpus(...) do { } while (0)
#endif
@@ -808,7 +817,8 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
if (err)
continue;
- set_cpu_present(cpu, true);
+ if (acpi_disabled || acpi_cpu_is_present(cpu))
+ set_cpu_present(cpu, true);
numa_store_cpu_info(cpu);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() 2026-05-20 2:20 [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() Jinjie Ruan @ 2026-06-02 11:09 ` Will Deacon 2026-06-02 12:14 ` Jinjie Ruan 2026-06-03 6:38 ` Jinjie Ruan 0 siblings, 2 replies; 7+ messages in thread From: Will Deacon @ 2026-06-02 11:09 UTC (permalink / raw) To: Jinjie Ruan Cc: catalin.marinas, corbet, skhan, punit.agrawal, jic23, osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz, lpieralisi, timothy.hayes, sascha.bischoff, arnd, mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong, sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel On Wed, May 20, 2026 at 10:20:23AM +0800, Jinjie Ruan wrote: > On arm64, when booting with `maxcpus` greater than the number of present > CPUs (e.g., QEMU -smp cpus=4,maxcpus=8), some CPUs are marked as 'present' > but have not yet been registered via register_cpu(). Consequently, > the per-cpu device objects for these CPUs are not yet initialized. > > In cpuhp_smt_enable(), the code iterates over all present CPUs. Calling > _cpu_up() for these unregistered CPUs eventually leads to > sysfs_create_group() being called with a NULL kobject (or a kobject > without a directory), triggering the following warning in > fs/sysfs/group.c: > > if (WARN_ON(!kobj || (!update && !kobj->sd))) > return -EINVAL; > > When booting with ACPI, arm64 smp_prepare_cpus() currently sets all > enumerated CPUs as "present" regardless of their status in the MADT. This > causes issues with SMT hotplug control. For instance, with QEMU's > "-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as > follows: the first four CPUs are marked Enabled while the remaining four > are marked Online Capable to support potential hot-plugging. > > Fix this by: > > 1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC > entry before calling set_cpu_present() during SMP initialization. > > 2. Properly managing the present mask in acpi_map_cpu() and > acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with > other architectures like x86 and LoongArch. > > 3. Update the arm64 CPU hotplug documentation to no longer state that all > online-capable vCPUs are marked as present by the kernel at boot time. > > This ensures that only physically available or explicitly enabled CPUs > are in the present mask, keeping the SMT control logic consistent with > the actual hardware state. Please can you check the Sashiko review comment? https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com Cheers, Will ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() 2026-06-02 11:09 ` Will Deacon @ 2026-06-02 12:14 ` Jinjie Ruan 2026-06-02 12:47 ` Will Deacon 2026-06-03 6:38 ` Jinjie Ruan 1 sibling, 1 reply; 7+ messages in thread From: Jinjie Ruan @ 2026-06-02 12:14 UTC (permalink / raw) To: Will Deacon Cc: catalin.marinas, corbet, skhan, punit.agrawal, jic23, osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz, lpieralisi, timothy.hayes, sascha.bischoff, arnd, mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong, sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel On 6/2/2026 7:09 PM, Will Deacon wrote: > On Wed, May 20, 2026 at 10:20:23AM +0800, Jinjie Ruan wrote: >> On arm64, when booting with `maxcpus` greater than the number of present >> CPUs (e.g., QEMU -smp cpus=4,maxcpus=8), some CPUs are marked as 'present' >> but have not yet been registered via register_cpu(). Consequently, >> the per-cpu device objects for these CPUs are not yet initialized. >> >> In cpuhp_smt_enable(), the code iterates over all present CPUs. Calling >> _cpu_up() for these unregistered CPUs eventually leads to >> sysfs_create_group() being called with a NULL kobject (or a kobject >> without a directory), triggering the following warning in >> fs/sysfs/group.c: >> >> if (WARN_ON(!kobj || (!update && !kobj->sd))) >> return -EINVAL; >> >> When booting with ACPI, arm64 smp_prepare_cpus() currently sets all >> enumerated CPUs as "present" regardless of their status in the MADT. This >> causes issues with SMT hotplug control. For instance, with QEMU's >> "-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as >> follows: the first four CPUs are marked Enabled while the remaining four >> are marked Online Capable to support potential hot-plugging. >> >> Fix this by: >> >> 1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC >> entry before calling set_cpu_present() during SMP initialization. >> >> 2. Properly managing the present mask in acpi_map_cpu() and >> acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with >> other architectures like x86 and LoongArch. >> >> 3. Update the arm64 CPU hotplug documentation to no longer state that all >> online-capable vCPUs are marked as present by the kernel at boot time. >> >> This ensures that only physically available or explicitly enabled CPUs >> are in the present mask, keeping the SMT control logic consistent with >> the actual hardware state. > > Please can you check the Sashiko review comment? > > https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com On arm64, arch_unregister_cpu() enforces a safety block against physical CPU hot-removal by aborting early if cpu_present() is true but the device is no longer physically present, thereby skipping unregister_cpu() and leaking the sysfs device node. If acpi_unmap_cpu() blindly clears the present bit while the sysfs device remains registered, a subsequent hot-add attempt will see the valid leaked device pointer, skip acpi_processor_hotadd_init() (and thus skip acpi_map_cpu()), leaving the hot-added CPU permanently absent and deadlocked. Hi, Will, what do you think about fix it like this? --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -455,7 +455,6 @@ EXPORT_SYMBOL(acpi_map_cpu); int acpi_unmap_cpu(int cpu) { - set_cpu_present(cpu, false); return 0; } EXPORT_SYMBOL(acpi_unmap_cpu); diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 5932e5b30b71..507c6d761434 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -554,6 +554,7 @@ void arch_unregister_cpu(int cpu) } unregister_cpu(c); + set_cpu_present(cpu, false); } #endif /* CONFIG_ACPI_HOTPLUG_CPU */ > > Cheers, > > Will > ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() 2026-06-02 12:14 ` Jinjie Ruan @ 2026-06-02 12:47 ` Will Deacon 0 siblings, 0 replies; 7+ messages in thread From: Will Deacon @ 2026-06-02 12:47 UTC (permalink / raw) To: Jinjie Ruan Cc: catalin.marinas, corbet, skhan, punit.agrawal, jic23, osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz, lpieralisi, timothy.hayes, sascha.bischoff, arnd, mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong, sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel On Tue, Jun 02, 2026 at 08:14:26PM +0800, Jinjie Ruan wrote: > On 6/2/2026 7:09 PM, Will Deacon wrote: > > Please can you check the Sashiko review comment? > > > > https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com > > On arm64, arch_unregister_cpu() enforces a safety block against physical > CPU hot-removal by aborting early if cpu_present() is true but the > device is no longer physically present, thereby skipping > unregister_cpu() and leaking the sysfs device node. > If acpi_unmap_cpu() blindly clears the present bit while the sysfs > device remains registered, a subsequent hot-add attempt will see the > valid leaked device pointer, skip acpi_processor_hotadd_init() (and thus > skip acpi_map_cpu()), leaving the hot-added CPU permanently absent and > deadlocked. > > Hi, Will, > > what do you think about fix it like this? > > --- a/arch/arm64/kernel/acpi.c > +++ b/arch/arm64/kernel/acpi.c > @@ -455,7 +455,6 @@ EXPORT_SYMBOL(acpi_map_cpu); > > int acpi_unmap_cpu(int cpu) > { > - set_cpu_present(cpu, false); > return 0; > } > EXPORT_SYMBOL(acpi_unmap_cpu); > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 5932e5b30b71..507c6d761434 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -554,6 +554,7 @@ void arch_unregister_cpu(int cpu) > } > > unregister_cpu(c); > + set_cpu_present(cpu, false); > } > #endif /* CONFIG_ACPI_HOTPLUG_CPU */ Hmm, not sure. Doesn't that break error handling cleanup paths that expect acpi_unmap_cpu() to undo acpi_map_cpu()? See acpi_processor_hotadd_init(), for example. Will ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() 2026-06-02 11:09 ` Will Deacon 2026-06-02 12:14 ` Jinjie Ruan @ 2026-06-03 6:38 ` Jinjie Ruan 2026-06-09 17:58 ` Catalin Marinas 1 sibling, 1 reply; 7+ messages in thread From: Jinjie Ruan @ 2026-06-03 6:38 UTC (permalink / raw) To: Will Deacon Cc: catalin.marinas, corbet, skhan, punit.agrawal, jic23, osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz, lpieralisi, timothy.hayes, sascha.bischoff, arnd, mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong, sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel On 6/2/2026 7:09 PM, Will Deacon wrote: > On Wed, May 20, 2026 at 10:20:23AM +0800, Jinjie Ruan wrote: >> On arm64, when booting with `maxcpus` greater than the number of present >> CPUs (e.g., QEMU -smp cpus=4,maxcpus=8), some CPUs are marked as 'present' >> but have not yet been registered via register_cpu(). Consequently, >> the per-cpu device objects for these CPUs are not yet initialized. >> >> In cpuhp_smt_enable(), the code iterates over all present CPUs. Calling >> _cpu_up() for these unregistered CPUs eventually leads to >> sysfs_create_group() being called with a NULL kobject (or a kobject >> without a directory), triggering the following warning in >> fs/sysfs/group.c: >> >> if (WARN_ON(!kobj || (!update && !kobj->sd))) >> return -EINVAL; >> >> When booting with ACPI, arm64 smp_prepare_cpus() currently sets all >> enumerated CPUs as "present" regardless of their status in the MADT. This >> causes issues with SMT hotplug control. For instance, with QEMU's >> "-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as >> follows: the first four CPUs are marked Enabled while the remaining four >> are marked Online Capable to support potential hot-plugging. >> >> Fix this by: >> >> 1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC >> entry before calling set_cpu_present() during SMP initialization. >> >> 2. Properly managing the present mask in acpi_map_cpu() and >> acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with >> other architectures like x86 and LoongArch. >> >> 3. Update the arm64 CPU hotplug documentation to no longer state that all >> online-capable vCPUs are marked as present by the kernel at boot time. >> >> This ensures that only physically available or explicitly enabled CPUs >> are in the present mask, keeping the SMT control logic consistent with >> the actual hardware state. > > Please can you check the Sashiko review comment? > > https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com Hi, all, I think commit eba4675008a6 ("arm64: arch_register_cpu() variant to check if an ACPI handle is now available.") introduced this bug. It introduced an architectural safety block inside arch_unregister_cpu(). If a hot-unplug operation is determined to be a physical hardware removal (where _STA evaluates to !ACPI_STA_DEVICE_PRESENT), it aborts the unregistration transaction early to protect unreadied arm64 infrastructure, thereby skipping unregister_cpu(). However, the generic ACPI processor driver path in acpi_processor_post_eject() currently treats arch_unregister_cpu() as an unconditional void operation. When arch_unregister_cpu() bails out early, the subsequent cleanup flow blindly proceeds to call acpi_unmap_cpu(), clears global per-cpu processor arrays, and unconditionally free the 'struct acpi_processor' object. I think we can fix this by: 1. Refactoring arch_unregister_cpu() to return an integer transaction status. It returns -EOPNOTSUPP when aborting due to physical hot-remove blocking, -EINVAL/-EIO on firmware failures, and 0 only upon successful unregistration. 2. Guarding the downstream execution flow in acpi_processor_post_eject(). If arch_unregister_cpu() returns a error code, the hot-unplug transaction is considered aborted. What do you think about this fix? diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 1aa324104afb..f451c9c82212 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -531,29 +531,30 @@ int arch_register_cpu(int cpu) } #ifdef CONFIG_ACPI_HOTPLUG_CPU -void arch_unregister_cpu(int cpu) +int arch_unregister_cpu(int cpu) { acpi_handle acpi_handle = acpi_get_processor_handle(cpu); struct cpu *c = &per_cpu(cpu_devices, cpu); - acpi_status status; unsigned long long sta; + acpi_status status; if (!acpi_handle) { pr_err_once("Removing a CPU without associated ACPI handle\n"); - return; + return -EINVAL; } status = acpi_evaluate_integer(acpi_handle, "_STA", NULL, &sta); if (ACPI_FAILURE(status)) - return; + return -EIO; /* For now do not allow anything that looks like physical CPU HP */ if (cpu_present(cpu) && !(sta & ACPI_STA_DEVICE_PRESENT)) { pr_err_once("Changing CPU present bit is not supported\n"); - return; + return -EOPNOTSUPP; } unregister_cpu(c); + return 0; } #endif /* CONFIG_ACPI_HOTPLUG_CPU */ diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 00775b91bd41..4361eed26d83 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -499,7 +499,15 @@ static void acpi_processor_post_eject(struct acpi_device *device) cpus_write_lock(); /* Remove the CPU. */ - arch_unregister_cpu(pr->id); + if (arch_unregister_cpu(pr->id)) { + cpus_write_unlock(); + cpu_maps_update_done(); + acpi_bind_one(pr->dev, device); + if (device_attach(pr->dev) < 0) + dev_err(pr->dev, "Processor driver could not be attached\n"); + return; + } + acpi_unmap_cpu(pr->id); /* Clean up. */ diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 875abdc9942e..57980d1c2931 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -570,9 +570,10 @@ int __weak arch_register_cpu(int cpu) } #ifdef CONFIG_HOTPLUG_CPU -void __weak arch_unregister_cpu(int num) +int __weak arch_unregister_cpu(int num) { unregister_cpu(&per_cpu(cpu_devices, num)); + return 0; } #endif /* CONFIG_HOTPLUG_CPU */ #endif /* CONFIG_GENERIC_CPU_DEVICES */ diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 9b6b0d87fdb0..a7c191dea1fc 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -91,7 +91,7 @@ struct device *cpu_device_create(struct device *parent, void *drvdata, const char *fmt, ...); extern bool arch_cpu_is_hotpluggable(int cpu); extern int arch_register_cpu(int cpu); -extern void arch_unregister_cpu(int cpu); +extern int arch_unregister_cpu(int cpu); #ifdef CONFIG_HOTPLUG_CPU extern void unregister_cpu(struct cpu *cpu); extern ssize_t arch_cpu_probe(const char *, size_t); > > Cheers, > > Will > ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() 2026-06-03 6:38 ` Jinjie Ruan @ 2026-06-09 17:58 ` Catalin Marinas 2026-06-10 2:44 ` Jinjie Ruan 0 siblings, 1 reply; 7+ messages in thread From: Catalin Marinas @ 2026-06-09 17:58 UTC (permalink / raw) To: Jinjie Ruan Cc: Will Deacon, corbet, skhan, punit.agrawal, jic23, osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz, lpieralisi, timothy.hayes, sascha.bischoff, arnd, mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong, sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel Hi Jinjie, On Wed, Jun 03, 2026 at 02:38:11PM +0800, Jinjie Ruan wrote: > On 6/2/2026 7:09 PM, Will Deacon wrote: > > On Wed, May 20, 2026 at 10:20:23AM +0800, Jinjie Ruan wrote: > >> When booting with ACPI, arm64 smp_prepare_cpus() currently sets all > >> enumerated CPUs as "present" regardless of their status in the MADT. This > >> causes issues with SMT hotplug control. For instance, with QEMU's > >> "-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as > >> follows: the first four CPUs are marked Enabled while the remaining four > >> are marked Online Capable to support potential hot-plugging. > >> > >> Fix this by: > >> > >> 1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC > >> entry before calling set_cpu_present() during SMP initialization. > >> > >> 2. Properly managing the present mask in acpi_map_cpu() and > >> acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with > >> other architectures like x86 and LoongArch. > >> > >> 3. Update the arm64 CPU hotplug documentation to no longer state that all > >> online-capable vCPUs are marked as present by the kernel at boot time. > >> > >> This ensures that only physically available or explicitly enabled CPUs > >> are in the present mask, keeping the SMT control logic consistent with > >> the actual hardware state. > > > > Please can you check the Sashiko review comment? > > > > https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com > > I think commit eba4675008a6 ("arm64: arch_register_cpu() variant to > check if an ACPI handle is now available.") introduced this bug. > > It introduced an architectural safety block inside > arch_unregister_cpu(). If a hot-unplug operation is determined to be a > physical hardware removal (where _STA evaluates to > !ACPI_STA_DEVICE_PRESENT), it aborts the unregistration transaction > early to protect unreadied arm64 infrastructure, thereby skipping > unregister_cpu(). > > However, the generic ACPI processor driver path in > acpi_processor_post_eject() currently treats arch_unregister_cpu() as > an unconditional void operation. When arch_unregister_cpu() bails out > early, the subsequent cleanup flow blindly proceeds to call > acpi_unmap_cpu(), clears global per-cpu processor arrays, and > unconditionally free the 'struct acpi_processor' object. > > I think we can fix this by: > > 1. Refactoring arch_unregister_cpu() to return an integer > transaction status. It returns -EOPNOTSUPP when aborting due to physical > hot-remove blocking, -EINVAL/-EIO on firmware failures, and 0 only upon > successful unregistration. > > 2. Guarding the downstream execution flow in > acpi_processor_post_eject(). If arch_unregister_cpu() returns a error > code, the hot-unplug transaction is considered aborted. I wonder whether we need all this guarding. In the worst case, we could rewrite the function, something like below, to always unregister and only warn: void arch_unregister_cpu(int cpu) { acpi_handle acpi_handle = acpi_get_processor_handle(cpu); struct cpu *c = &per_cpu(cpu_devices, cpu); acpi_status status; unsigned long long sta; if (!acpi_handle) { pr_err_once("Removing a CPU without associated ACPI handle\n"); } else { status = acpi_evaluate_integer(acpi_handle, "_STA", NULL, &sta); if (!ACPI_FAILURE(status) && cpu_present(cpu) && !(sta & ACPI_STA_DEVICE_PRESENT)) pr_err_once("Changing CPU present bit is not supported\n"); } unregister_cpu(c); } However, on the first condition, can we actually trigger !acpi_handle? If not, we could just drop it. I tried to look up the paths and I don't think we'd ever end up in this function with !acpi_handle. So this leaves us with the next checks. On the second/third conditions, it's more about preventing physical CPU hotplug as we haven't properly defined it for arm yet but we could just add a WARN_ONCE() to make it more visible and still proceed with the unregistering. I think with your proposal, we don't fully unroll the state anyway just by returning an error in arch_unregister_cpu(), so I'd rather continue here. What does firmware do for virtual CPU hotplug w.r.t. _STA? I noticed a slight change in wording in the cpu-hotplug.rst doc with your patch from On virtual systems the _STA method must always report the CPU as ``present`` to On virtual systems the _STA method must report the CPU as ``present`` when it is activated by the firmware Was your intention that _STA.PRESENT can become 0 when hot-unplugging virtual CPUs? -- Catalin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() 2026-06-09 17:58 ` Catalin Marinas @ 2026-06-10 2:44 ` Jinjie Ruan 0 siblings, 0 replies; 7+ messages in thread From: Jinjie Ruan @ 2026-06-10 2:44 UTC (permalink / raw) To: Catalin Marinas Cc: Will Deacon, corbet, skhan, punit.agrawal, jic23, osama.abdelkader, chenl311, fengchengwen, suzuki.poulose, maz, lpieralisi, timothy.hayes, sascha.bischoff, arnd, mrigendra.chaubey, pierre.gondois, dietmar.eggemann, yangyicong, sudeep.holla, linux-arm-kernel, linux-doc, linux-kernel On 6/10/2026 1:58 AM, Catalin Marinas wrote: > Hi Jinjie, > > On Wed, Jun 03, 2026 at 02:38:11PM +0800, Jinjie Ruan wrote: >> On 6/2/2026 7:09 PM, Will Deacon wrote: >>> On Wed, May 20, 2026 at 10:20:23AM +0800, Jinjie Ruan wrote: >>>> When booting with ACPI, arm64 smp_prepare_cpus() currently sets all >>>> enumerated CPUs as "present" regardless of their status in the MADT. This >>>> causes issues with SMT hotplug control. For instance, with QEMU's >>>> "-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as >>>> follows: the first four CPUs are marked Enabled while the remaining four >>>> are marked Online Capable to support potential hot-plugging. >>>> >>>> Fix this by: >>>> >>>> 1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC >>>> entry before calling set_cpu_present() during SMP initialization. >>>> >>>> 2. Properly managing the present mask in acpi_map_cpu() and >>>> acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with >>>> other architectures like x86 and LoongArch. >>>> >>>> 3. Update the arm64 CPU hotplug documentation to no longer state that all >>>> online-capable vCPUs are marked as present by the kernel at boot time. >>>> >>>> This ensures that only physically available or explicitly enabled CPUs >>>> are in the present mask, keeping the SMT control logic consistent with >>>> the actual hardware state. >>> >>> Please can you check the Sashiko review comment? >>> >>> https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com >> >> I think commit eba4675008a6 ("arm64: arch_register_cpu() variant to >> check if an ACPI handle is now available.") introduced this bug. >> >> It introduced an architectural safety block inside >> arch_unregister_cpu(). If a hot-unplug operation is determined to be a >> physical hardware removal (where _STA evaluates to >> !ACPI_STA_DEVICE_PRESENT), it aborts the unregistration transaction >> early to protect unreadied arm64 infrastructure, thereby skipping >> unregister_cpu(). >> >> However, the generic ACPI processor driver path in >> acpi_processor_post_eject() currently treats arch_unregister_cpu() as >> an unconditional void operation. When arch_unregister_cpu() bails out >> early, the subsequent cleanup flow blindly proceeds to call >> acpi_unmap_cpu(), clears global per-cpu processor arrays, and >> unconditionally free the 'struct acpi_processor' object. >> >> I think we can fix this by: >> >> 1. Refactoring arch_unregister_cpu() to return an integer >> transaction status. It returns -EOPNOTSUPP when aborting due to physical >> hot-remove blocking, -EINVAL/-EIO on firmware failures, and 0 only upon >> successful unregistration. >> >> 2. Guarding the downstream execution flow in >> acpi_processor_post_eject(). If arch_unregister_cpu() returns a error >> code, the hot-unplug transaction is considered aborted. > > I wonder whether we need all this guarding. In the worst case, we could > rewrite the function, something like below, to always unregister and > only warn: > > void arch_unregister_cpu(int cpu) > { > acpi_handle acpi_handle = acpi_get_processor_handle(cpu); > struct cpu *c = &per_cpu(cpu_devices, cpu); > acpi_status status; > unsigned long long sta; > > if (!acpi_handle) { > pr_err_once("Removing a CPU without associated ACPI handle\n"); > } else { > status = acpi_evaluate_integer(acpi_handle, "_STA", NULL, &sta); > if (!ACPI_FAILURE(status) && > cpu_present(cpu) && !(sta & ACPI_STA_DEVICE_PRESENT)) > pr_err_once("Changing CPU present bit is not supported\n"); > } > > unregister_cpu(c); > } > > However, on the first condition, can we actually trigger !acpi_handle? > If not, we could just drop it. I tried to look up the paths and I don't > think we'd ever end up in this function with !acpi_handle. So this > leaves us with the next checks. You are absolutely right: Source Binding: During the CPU hot-add phase, acpi_add_single_object() directly binds a valid firmware handle to device->handle, which is then stored into per_cpu(processors, cpu) via acpi_processor_add(). Identical Lifecycle: When the hot-unplug path later invokes acpi_get_processor_handle(), it retrieves the exact same active pr->handle managed by the ACPI device framework, guaranteeing that the returned handle is never NULL as long as the device exists. 648 static struct acpi_scan_handler processor_handler = { 649 >-------.ids = processor_device_ids, 650 >-------.attach = acpi_processor_add, 651 #ifdef CONFIG_ACPI_HOTPLUG_CPU 652 >-------.post_eject = acpi_processor_post_eject, 653 #endif 654 >-------.hotplug = { 655 >------->-------.enabled = true, 656 >-------}, 657 }; acpi_bus_scan() -> acpi_bus_check_add() -> acpi_add_single_object(&device, handle, type, !first_pass) -> acpi_init_device_object() -> device->handle = handle acpi_processor_hotadd_init() -> acpi_processor_set_per_cpu(pr, device) -> per_cpu(processors, pr->id) = pr acpi_processor_add() -> pr->handle = device->handle acpi_get_processor_handle() -> pr = per_cpu(processors, cpu) -> return pr->handle > > On the second/third conditions, it's more about preventing physical CPU > hotplug as we haven't properly defined it for arm yet but we could just > add a WARN_ONCE() to make it more visible and still proceed with the > unregistering. I think with your proposal, we don't fully unroll the Agreed. Unregistering the CPU is absolutely necessary at this stage since we cannot fully roll back the state anyway, and adding a WARN_ONCE() is more than sufficient to flag unsupported physical CPU hot-unplug on ARM64 for now. > state anyway just by returning an error in arch_unregister_cpu(), so I'd > rather continue here. Exactly. Achieving a perfect rollback at this stage is extremely difficult and clean-up is rarely complete. It is much simpler and more robust to just force the unregistration and carry on, which is also consistent with how other architectures handle this by basically blindly unregistering the CPU anyway. > > What does firmware do for virtual CPU hotplug w.r.t. _STA? I noticed a > slight change in wording in the cpu-hotplug.rst doc with your patch from > > On virtual systems the _STA method must always report the CPU as > ``present`` > > to > > On virtual systems the _STA method must report the CPU as ``present`` > when it is activated by the firmware > > Was your intention that _STA.PRESENT can become 0 when hot-unplugging > virtual CPUs? Sorry, that was not the intention but a mistake. On ARM64 virtual systems, _STA.PRESENT will always remain 1 even when a vCPU is hot-unplugged.Due to ARM64 architectural constraints (such as the GICv3 Redistributor and KVM vGIC configuration which must be statically sized at boot), virtual CPU hotplug is emulated by keeping all possible vCPUs present in the system, while toggling their availability via the _STA.ENABLED bit. Expose below ACPI Status to Guest kernel: a. Always _STA.Present=1 (all possible vCPUs) b. _STA.Enabled=1 (plugged vCPUs) c. _STA.Enabled=0 (unplugged vCPUs) Link: https://lists.gnu.org/archive/html/qemu-devel/2025-05/msg05076.html > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-10 2:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-20 2:20 [PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable() Jinjie Ruan 2026-06-02 11:09 ` Will Deacon 2026-06-02 12:14 ` Jinjie Ruan 2026-06-02 12:47 ` Will Deacon 2026-06-03 6:38 ` Jinjie Ruan 2026-06-09 17:58 ` Catalin Marinas 2026-06-10 2:44 ` Jinjie Ruan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox