* [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
@ 2025-08-29 20:01 Rafael J. Wysocki
2025-09-02 15:05 ` Rafael J. Wysocki
2025-09-05 7:39 ` Thomas Gleixner
0 siblings, 2 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-08-29 20:01 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
"dead" SMT siblings during initialization") broke the use case in
which both nosmt and maxcpus were added to the kernel command line
because it caused CPUs that were not SMT siblings to be brought
online during the intel_idle driver initialization in violation of the
maxcpus limit.
The underlying reason for this is a missing topology_is_primary_thread()
check in cpuhp_smt_enable() which causes that function to put online
more CPUs than just the SMT siblings that it is supposed to handle.
Add the missing check to address the issue.
Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
Reported-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
kernel/cpu.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2710,6 +2710,12 @@
cpu_maps_update_begin();
cpu_smt_control = CPU_SMT_ENABLED;
for_each_present_cpu(cpu) {
+ /*
+ * Avoid accidentally onlining primary thread CPUs that have
+ * been taken offline.
+ */
+ if (topology_is_primary_thread(cpu))
+ continue;
/* Skip online CPUs and CPUs on offline nodes */
if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
continue;
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki @ 2025-09-02 15:05 ` Rafael J. Wysocki 2025-09-02 16:56 ` Dave Hansen 2025-09-05 7:39 ` Thomas Gleixner 1 sibling, 1 reply; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-02 15:05 UTC (permalink / raw) To: Linux PM Cc: Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle, Dave Hansen, the arch/x86 maintainers On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Christian has reported that commit a430c11f4015 ("intel_idle: Rescan > "dead" SMT siblings during initialization") broke the use case in > which both nosmt and maxcpus were added to the kernel command line > because it caused CPUs that were not SMT siblings to be brought > online during the intel_idle driver initialization in violation of the > maxcpus limit. > > The underlying reason for this is a missing topology_is_primary_thread() > check in cpuhp_smt_enable() which causes that function to put online > more CPUs than just the SMT siblings that it is supposed to handle. > > Add the missing check to address the issue. > > Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization") > Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization") > Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/ > Reported-by: Christian Loehle <christian.loehle@arm.com> > Tested-by: Christian Loehle <christian.loehle@arm.com> > Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> I'm wondering if there are any objections or concerns. Since this is a regression in 6.16, it would be good to get it fixed in 6.17-rc. > --- > kernel/cpu.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2710,6 +2710,12 @@ > cpu_maps_update_begin(); > cpu_smt_control = CPU_SMT_ENABLED; > for_each_present_cpu(cpu) { > + /* > + * Avoid accidentally onlining primary thread CPUs that have > + * been taken offline. > + */ > + if (topology_is_primary_thread(cpu)) > + continue; > /* Skip online CPUs and CPUs on offline nodes */ > if (cpu_online(cpu) || !node_online(cpu_to_node(cpu))) > continue; > > > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-02 15:05 ` Rafael J. Wysocki @ 2025-09-02 16:56 ` Dave Hansen 2025-09-02 17:18 ` Rafael J. Wysocki 0 siblings, 1 reply; 16+ messages in thread From: Dave Hansen @ 2025-09-02 16:56 UTC (permalink / raw) To: Rafael J. Wysocki, Linux PM Cc: Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle, Dave Hansen, the arch/x86 maintainers On 9/2/25 08:05, Rafael J. Wysocki wrote: > On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan >> "dead" SMT siblings during initialization") broke the use case in Does "dead" here mean present but offline? >> which both nosmt and maxcpus were added to the kernel command line >> because it caused CPUs that were not SMT siblings to be brought >> online during the intel_idle driver initialization in violation of the >> maxcpus limit. How does intel_idle fit in here? I don't immediately see it calling cpuhp_smt_enable(). >> The underlying reason for this is a missing topology_is_primary_thread() >> check in cpuhp_smt_enable() which causes that function to put online >> more CPUs than just the SMT siblings that it is supposed to handle. >> >> Add the missing check to address the issue. We should probably add a bit more checking in cpuhp_smt_enable() to make sure that it's being called under expected conditions like a: WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED); and probably also some comments about how it is expected to work. cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it from DISABLED=>ENABLED. Thinking about it in that context, enabling _just_ the secondary (disabled) threads makes a lot of sense. But that can come later, after the bug fix. >> --- a/kernel/cpu.c >> +++ b/kernel/cpu.c >> @@ -2710,6 +2710,12 @@ No 'diff -p', eh? >> cpu_maps_update_begin(); >> cpu_smt_control = CPU_SMT_ENABLED; >> for_each_present_cpu(cpu) { >> + /* >> + * Avoid accidentally onlining primary thread CPUs that have >> + * been taken offline. >> + */ >> + if (topology_is_primary_thread(cpu)) >> + continue; >> /* Skip online CPUs and CPUs on offline nodes */ >> if (cpu_online(cpu) || !node_online(cpu_to_node(cpu))) >> continue; Is there a more generic problem with this not respecting 'maxcpus'? If maxcpus had forced a primary thread offline, this would still online the secondary thread, even with the fix. Taking _that_ online might still bring you over the maxcpus limit. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-02 16:56 ` Dave Hansen @ 2025-09-02 17:18 ` Rafael J. Wysocki 2025-09-03 18:00 ` Dave Hansen 0 siblings, 1 reply; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-02 17:18 UTC (permalink / raw) To: Dave Hansen Cc: Rafael J. Wysocki, Linux PM, Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle, Dave Hansen, the arch/x86 maintainers On Tue, Sep 2, 2025 at 6:56 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 9/2/25 08:05, Rafael J. Wysocki wrote: > > On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan > >> "dead" SMT siblings during initialization") broke the use case in > > Does "dead" here mean present but offline? Yes. > >> which both nosmt and maxcpus were added to the kernel command line > >> because it caused CPUs that were not SMT siblings to be brought > >> online during the intel_idle driver initialization in violation of the > >> maxcpus limit. > > How does intel_idle fit in here? I don't immediately see it calling > cpuhp_smt_enable(). It calls arch_cpu_rescan_dead_smt_siblings() which calls cpuhp_smt_enable(). > >> The underlying reason for this is a missing topology_is_primary_thread() > >> check in cpuhp_smt_enable() which causes that function to put online > >> more CPUs than just the SMT siblings that it is supposed to handle. > >> > >> Add the missing check to address the issue. > > We should probably add a bit more checking in cpuhp_smt_enable() to make > sure that it's being called under expected conditions like a: > > WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED); > > and probably also some comments about how it is expected to work. Well, see the callers. Two out of three take care of this already and the third one doesn't care. > cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it > from DISABLED=>ENABLED. Thinking about it in that context, enabling > _just_ the secondary (disabled) threads makes a lot of sense. > > But that can come later, after the bug fix. > > >> --- a/kernel/cpu.c > >> +++ b/kernel/cpu.c > >> @@ -2710,6 +2710,12 @@ > > No 'diff -p', eh? Ah, sorry about this. > >> cpu_maps_update_begin(); > >> cpu_smt_control = CPU_SMT_ENABLED; > >> for_each_present_cpu(cpu) { > >> + /* > >> + * Avoid accidentally onlining primary thread CPUs that have > >> + * been taken offline. > >> + */ > >> + if (topology_is_primary_thread(cpu)) > >> + continue; > >> /* Skip online CPUs and CPUs on offline nodes */ > >> if (cpu_online(cpu) || !node_online(cpu_to_node(cpu))) > >> continue; > > Is there a more generic problem with this not respecting 'maxcpus'? If > maxcpus had forced a primary thread offline, this would still online the > secondary thread, even with the fix. Taking _that_ online might still > bring you over the maxcpus limit. Yes, there is AFAICS, but it has been there for some time and it doesn't affect the rescan during boot. For the rescan, it is actually useful to not respect maxcpus because it allows all of the SMT siblings to be kicked, but for run-time enabling of SMT it may be sort of a problem. This change simply addresses the regression, which is that cores without SMT become online quite surprisingly after intel_idle (and ACPI idle too for that matter) initialization, due to arch_cpu_rescan_dead_smt_siblings(). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-02 17:18 ` Rafael J. Wysocki @ 2025-09-03 18:00 ` Dave Hansen 2025-09-03 18:35 ` Rafael J. Wysocki 0 siblings, 1 reply; 16+ messages in thread From: Dave Hansen @ 2025-09-03 18:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux PM, Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle, Dave Hansen, the arch/x86 maintainers I munged the changelog up a bit: > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5 I hope that looks better to everyone. I also went looking at this assuming that it was x86-specific. I'd appreciate a quick head nod from Thomas or Peter on this before I push it anywhere, though. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-03 18:00 ` Dave Hansen @ 2025-09-03 18:35 ` Rafael J. Wysocki 0 siblings, 0 replies; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-03 18:35 UTC (permalink / raw) To: Dave Hansen Cc: Rafael J. Wysocki, Linux PM, Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle, Dave Hansen, the arch/x86 maintainers On Wed, Sep 3, 2025 at 8:00 PM Dave Hansen <dave.hansen@intel.com> wrote: > > I munged the changelog up a bit: > > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5 > > I hope that looks better to everyone. It does to me, thanks! > I also went looking at this assuming that it was x86-specific. I'd > appreciate a quick head nod from Thomas or Peter on this before I push > it anywhere, though. Sure. Thank you! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki 2025-09-02 15:05 ` Rafael J. Wysocki @ 2025-09-05 7:39 ` Thomas Gleixner 2025-09-05 13:13 ` Rafael J. Wysocki 1 sibling, 1 reply; 16+ messages in thread From: Thomas Gleixner @ 2025-09-05 7:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote: > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2710,6 +2710,12 @@ > cpu_maps_update_begin(); > cpu_smt_control = CPU_SMT_ENABLED; > for_each_present_cpu(cpu) { > + /* > + * Avoid accidentally onlining primary thread CPUs that have > + * been taken offline. > + */ > + if (topology_is_primary_thread(cpu)) > + continue; Hmm. That's kinda solving the problem, but I think the proper solution would be to implement topology_is_core_online() for x86. The above is preventing the primary thread to be onlined, but then onlines the corresponding hyperthread, which does not really make sense. Something like the completely untested below. Thanks, tglx --- diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index 6c79ee7c0957..21041898157a 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -231,6 +231,16 @@ static inline bool topology_is_primary_thread(unsigned int cpu) } #define topology_is_primary_thread topology_is_primary_thread +int topology_get_primary_thread(unsigned int cpu); + +static inline bool topology_is_core_online(unsigned int cpu) +{ + int pcpu = topology_get_primary_thread(cpu); + + return pcpu >= 0 ? cpu_online(pcpu) : false; +} +#define topology_is_core_online topology_is_core_online + #else /* CONFIG_SMP */ static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; } static inline int topology_max_smt_threads(void) { return 1; } diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c index e35ccdc84910..6073a16628f9 100644 --- a/arch/x86/kernel/cpu/topology.c +++ b/arch/x86/kernel/cpu/topology.c @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_uni return topo_unit_count(lvlid, at_level, apic_maps[which_units].map); } +#ifdef CONFIG_SMP +int topology_get_primary_thread(unsigned int cpu) +{ + u32 apic_id = cpuid_to_apicid[cpu]; + + /* + * Get the core domain level APIC id, which is the primary thread + * and return the CPU number assigned to it. + */ + return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN)); +} +#endif + #ifdef CONFIG_ACPI_HOTPLUG_CPU /** * topology_hotplug_apic - Handle a physical hotplugged APIC after boot ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-05 7:39 ` Thomas Gleixner @ 2025-09-05 13:13 ` Rafael J. Wysocki 2025-09-05 13:27 ` Rafael J. Wysocki 0 siblings, 1 reply; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-05 13:13 UTC (permalink / raw) To: Thomas Gleixner Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Fri, Sep 5, 2025 at 9:39 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote: > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@ -2710,6 +2710,12 @@ > > cpu_maps_update_begin(); > > cpu_smt_control = CPU_SMT_ENABLED; > > for_each_present_cpu(cpu) { > > + /* > > + * Avoid accidentally onlining primary thread CPUs that have > > + * been taken offline. > > + */ > > + if (topology_is_primary_thread(cpu)) > > + continue; > > Hmm. That's kinda solving the problem, but I think the proper solution > would be to implement topology_is_core_online() for x86. > > The above is preventing the primary thread to be onlined, but then > onlines the corresponding hyperthread, which does not really make sense. Well, manual online can be used for onlining the secondary thread of a core where the primary thread is offline, so this is technically possible already. > Something like the completely untested below. So given the above, shouldn't topology_is_core_online() check if any thread in the given core is online? > --- > diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h > index 6c79ee7c0957..21041898157a 100644 > --- a/arch/x86/include/asm/topology.h > +++ b/arch/x86/include/asm/topology.h > @@ -231,6 +231,16 @@ static inline bool topology_is_primary_thread(unsigned int cpu) > } > #define topology_is_primary_thread topology_is_primary_thread > > +int topology_get_primary_thread(unsigned int cpu); > + > +static inline bool topology_is_core_online(unsigned int cpu) > +{ > + int pcpu = topology_get_primary_thread(cpu); > + > + return pcpu >= 0 ? cpu_online(pcpu) : false; > +} > +#define topology_is_core_online topology_is_core_online > + > #else /* CONFIG_SMP */ > static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; } > static inline int topology_max_smt_threads(void) { return 1; } > diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c > index e35ccdc84910..6073a16628f9 100644 > --- a/arch/x86/kernel/cpu/topology.c > +++ b/arch/x86/kernel/cpu/topology.c > @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_uni > return topo_unit_count(lvlid, at_level, apic_maps[which_units].map); > } > > +#ifdef CONFIG_SMP > +int topology_get_primary_thread(unsigned int cpu) > +{ > + u32 apic_id = cpuid_to_apicid[cpu]; > + > + /* > + * Get the core domain level APIC id, which is the primary thread > + * and return the CPU number assigned to it. > + */ > + return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN)); > +} > +#endif > + > #ifdef CONFIG_ACPI_HOTPLUG_CPU > /** > * topology_hotplug_apic - Handle a physical hotplugged APIC after boot > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-05 13:13 ` Rafael J. Wysocki @ 2025-09-05 13:27 ` Rafael J. Wysocki 2025-09-05 20:47 ` Thomas Gleixner 0 siblings, 1 reply; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-05 13:27 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Fri, Sep 5, 2025 at 9:39 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote: > > > --- a/kernel/cpu.c > > > +++ b/kernel/cpu.c > > > @@ -2710,6 +2710,12 @@ > > > cpu_maps_update_begin(); > > > cpu_smt_control = CPU_SMT_ENABLED; > > > for_each_present_cpu(cpu) { > > > + /* > > > + * Avoid accidentally onlining primary thread CPUs that have > > > + * been taken offline. > > > + */ > > > + if (topology_is_primary_thread(cpu)) > > > + continue; > > > > Hmm. That's kinda solving the problem, but I think the proper solution > > would be to implement topology_is_core_online() for x86. > > > > The above is preventing the primary thread to be onlined, but then > > onlines the corresponding hyperthread, which does not really make sense. > > Well, manual online can be used for onlining the secondary thread of a > core where the primary thread is offline, so this is technically > possible already. > > > Something like the completely untested below. > > So given the above, shouldn't topology_is_core_online() check if any > thread in the given core is online? Besides, this would cause the siblings of offline SMT threads to be skipped while enabling SMT via sysfs (using /sys/devices/system/cpu/smt/control), but I'm not sure if this is the expectation in the field today. The current behavior is to online all secondary SMT threads (and more, but that part is quite arguably broken). That's why I decided to use a simpler patch which doesn't go that far. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-05 13:27 ` Rafael J. Wysocki @ 2025-09-05 20:47 ` Thomas Gleixner 2025-09-05 20:56 ` Rafael J. Wysocki 0 siblings, 1 reply; 16+ messages in thread From: Thomas Gleixner @ 2025-09-05 20:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote: > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote: >> Well, manual online can be used for onlining the secondary thread of a >> core where the primary thread is offline, so this is technically >> possible already. >> >> > Something like the completely untested below. >> >> So given the above, shouldn't topology_is_core_online() check if any >> thread in the given core is online? > > Besides, this would cause the siblings of offline SMT threads to be > skipped while enabling SMT via sysfs (using > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the > expectation in the field today. The current behavior is to online all > secondary SMT threads (and more, but that part is quite arguably > broken). It is broken, because the initial logic is to bring up primary threads unconditionally and then refuse to bring up sibling threads. With "maxcpus=xxx" this obviously limits the amount of primary threads, so there is arguably no point to online any of the related secondary threads of them. The initial implementation was naively making that assumption, but the core check which was added due to PPC made this actually correct. It just did not snap with me back then, but it's actually the correct thing to do, no? Thanks, tglx ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-05 20:47 ` Thomas Gleixner @ 2025-09-05 20:56 ` Rafael J. Wysocki 2025-09-07 13:14 ` Thomas Gleixner 0 siblings, 1 reply; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-05 20:56 UTC (permalink / raw) To: Thomas Gleixner Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote: > > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > >> Well, manual online can be used for onlining the secondary thread of a > >> core where the primary thread is offline, so this is technically > >> possible already. > >> > >> > Something like the completely untested below. > >> > >> So given the above, shouldn't topology_is_core_online() check if any > >> thread in the given core is online? > > > > Besides, this would cause the siblings of offline SMT threads to be > > skipped while enabling SMT via sysfs (using > > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the > > expectation in the field today. The current behavior is to online all > > secondary SMT threads (and more, but that part is quite arguably > > broken). > > It is broken, because the initial logic is to bring up primary threads > unconditionally and then refuse to bring up sibling threads. > > With "maxcpus=xxx" this obviously limits the amount of primary threads, > so there is arguably no point to online any of the related secondary > threads of them. > > The initial implementation was naively making that assumption, but the > core check which was added due to PPC made this actually correct. > > It just did not snap with me back then, but it's actually the correct > thing to do, no? It would at least be consistent with the existing PPC behavior. :-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-05 20:56 ` Rafael J. Wysocki @ 2025-09-07 13:14 ` Thomas Gleixner 2025-09-11 10:31 ` Rafael J. Wysocki 0 siblings, 1 reply; 16+ messages in thread From: Thomas Gleixner @ 2025-09-07 13:14 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Fri, Sep 05 2025 at 22:56, Rafael J. Wysocki wrote: > On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> >> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote: >> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote: >> >> Well, manual online can be used for onlining the secondary thread of a >> >> core where the primary thread is offline, so this is technically >> >> possible already. >> >> >> >> > Something like the completely untested below. >> >> >> >> So given the above, shouldn't topology_is_core_online() check if any >> >> thread in the given core is online? >> > >> > Besides, this would cause the siblings of offline SMT threads to be >> > skipped while enabling SMT via sysfs (using >> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the >> > expectation in the field today. The current behavior is to online all >> > secondary SMT threads (and more, but that part is quite arguably >> > broken). >> >> It is broken, because the initial logic is to bring up primary threads >> unconditionally and then refuse to bring up sibling threads. >> >> With "maxcpus=xxx" this obviously limits the amount of primary threads, >> so there is arguably no point to online any of the related secondary >> threads of them. >> >> The initial implementation was naively making that assumption, but the >> core check which was added due to PPC made this actually correct. >> >> It just did not snap with me back then, but it's actually the correct >> thing to do, no? > > It would at least be consistent with the existing PPC behavior. :-) Correct. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() 2025-09-07 13:14 ` Thomas Gleixner @ 2025-09-11 10:31 ` Rafael J. Wysocki 2025-09-21 8:56 ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner 0 siblings, 1 reply; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-11 10:31 UTC (permalink / raw) To: Thomas Gleixner Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Sun, Sep 7, 2025 at 3:14 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Fri, Sep 05 2025 at 22:56, Rafael J. Wysocki wrote: > > On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote: > >> > >> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote: > >> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > >> >> Well, manual online can be used for onlining the secondary thread of a > >> >> core where the primary thread is offline, so this is technically > >> >> possible already. > >> >> > >> >> > Something like the completely untested below. > >> >> > >> >> So given the above, shouldn't topology_is_core_online() check if any > >> >> thread in the given core is online? > >> > > >> > Besides, this would cause the siblings of offline SMT threads to be > >> > skipped while enabling SMT via sysfs (using > >> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the > >> > expectation in the field today. The current behavior is to online all > >> > secondary SMT threads (and more, but that part is quite arguably > >> > broken). > >> > >> It is broken, because the initial logic is to bring up primary threads > >> unconditionally and then refuse to bring up sibling threads. > >> > >> With "maxcpus=xxx" this obviously limits the amount of primary threads, > >> so there is arguably no point to online any of the related secondary > >> threads of them. > >> > >> The initial implementation was naively making that assumption, but the > >> core check which was added due to PPC made this actually correct. > >> > >> It just did not snap with me back then, but it's actually the correct > >> thing to do, no? > > > > It would at least be consistent with the existing PPC behavior. :-) > > Correct. So are you going to send a patch or do you want me to do something? From a user standpoint, this issue is a regression in 6.16, so it would be good to address it before final 6.17. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression 2025-09-11 10:31 ` Rafael J. Wysocki @ 2025-09-21 8:56 ` Thomas Gleixner 2025-09-21 11:00 ` Christian Loehle 2025-09-22 15:42 ` Rafael J. Wysocki 0 siblings, 2 replies; 16+ messages in thread From: Thomas Gleixner @ 2025-09-21 8:56 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization") broke the use case in which both 'nosmt' and 'maxcpus' are on the kernel command line because it onlines primary threads, which were offline due to the maxcpus limit. The initially proposed fix to skip primary threads in the loop is inconsistent. While it prevents the primary thread to be onlined, it then onlines the corresponding hyperthread(s), which does not really make sense. The CPU iterator in cpuhp_smt_enable() contains a check which excludes all threads of a core, when the primary thread is offline. The default implementation is a NOOP and therefore not effective on x86. Implement topology_is_core_online() on x86 to address this issue. This makes the behaviour consistent between x86 and PowerPC. Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization") Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization") Reported-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/ --- arch/x86/include/asm/topology.h | 10 ++++++++++ arch/x86/kernel/cpu/topology.c | 13 +++++++++++++ 2 files changed, 23 insertions(+) --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -231,6 +231,16 @@ static inline bool topology_is_primary_t } #define topology_is_primary_thread topology_is_primary_thread +int topology_get_primary_thread(unsigned int cpu); + +static inline bool topology_is_core_online(unsigned int cpu) +{ + int pcpu = topology_get_primary_thread(cpu); + + return pcpu >= 0 ? cpu_online(pcpu) : false; +} +#define topology_is_core_online topology_is_core_online + #else /* CONFIG_SMP */ static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; } static inline int topology_max_smt_threads(void) { return 1; } --- a/arch/x86/kernel/cpu/topology.c +++ b/arch/x86/kernel/cpu/topology.c @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 api return topo_unit_count(lvlid, at_level, apic_maps[which_units].map); } +#ifdef CONFIG_SMP +int topology_get_primary_thread(unsigned int cpu) +{ + u32 apic_id = cpuid_to_apicid[cpu]; + + /* + * Get the core domain level APIC id, which is the primary thread + * and return the CPU number assigned to it. + */ + return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN)); +} +#endif + #ifdef CONFIG_ACPI_HOTPLUG_CPU /** * topology_hotplug_apic - Handle a physical hotplugged APIC after boot ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression 2025-09-21 8:56 ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner @ 2025-09-21 11:00 ` Christian Loehle 2025-09-22 15:42 ` Rafael J. Wysocki 1 sibling, 0 replies; 16+ messages in thread From: Christian Loehle @ 2025-09-21 11:00 UTC (permalink / raw) To: Thomas Gleixner, Rafael J. Wysocki Cc: LKML, Linux PM, Peter Zijlstra, Dave Hansen On 9/21/25 09:56, Thomas Gleixner wrote: > Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT > siblings during initialization") broke the use case in which both 'nosmt' > and 'maxcpus' are on the kernel command line because it onlines primary > threads, which were offline due to the maxcpus limit. > > The initially proposed fix to skip primary threads in the loop is > inconsistent. While it prevents the primary thread to be onlined, it then > onlines the corresponding hyperthread(s), which does not really make sense. > > The CPU iterator in cpuhp_smt_enable() contains a check which excludes all > threads of a core, when the primary thread is offline. The default > implementation is a NOOP and therefore not effective on x86. > > Implement topology_is_core_online() on x86 to address this issue. This > makes the behaviour consistent between x86 and PowerPC. > > Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization") > Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization") > Reported-by: Christian Loehle <christian.loehle@arm.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Christian Loehle <christian.loehle@arm.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression 2025-09-21 8:56 ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner 2025-09-21 11:00 ` Christian Loehle @ 2025-09-22 15:42 ` Rafael J. Wysocki 1 sibling, 0 replies; 16+ messages in thread From: Rafael J. Wysocki @ 2025-09-22 15:42 UTC (permalink / raw) To: Thomas Gleixner Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen On Sun, Sep 21, 2025 at 10:56 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT > siblings during initialization") broke the use case in which both 'nosmt' > and 'maxcpus' are on the kernel command line because it onlines primary > threads, which were offline due to the maxcpus limit. > > The initially proposed fix to skip primary threads in the loop is > inconsistent. While it prevents the primary thread to be onlined, it then > onlines the corresponding hyperthread(s), which does not really make sense. > > The CPU iterator in cpuhp_smt_enable() contains a check which excludes all > threads of a core, when the primary thread is offline. The default > implementation is a NOOP and therefore not effective on x86. > > Implement topology_is_core_online() on x86 to address this issue. This > makes the behaviour consistent between x86 and PowerPC. > > Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization") > Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization") > Reported-by: Christian Loehle <christian.loehle@arm.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: stable@vger.kernel.org > Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki > Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/ Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> > --- > arch/x86/include/asm/topology.h | 10 ++++++++++ > arch/x86/kernel/cpu/topology.c | 13 +++++++++++++ > 2 files changed, 23 insertions(+) > > --- a/arch/x86/include/asm/topology.h > +++ b/arch/x86/include/asm/topology.h > @@ -231,6 +231,16 @@ static inline bool topology_is_primary_t > } > #define topology_is_primary_thread topology_is_primary_thread > > +int topology_get_primary_thread(unsigned int cpu); > + > +static inline bool topology_is_core_online(unsigned int cpu) > +{ > + int pcpu = topology_get_primary_thread(cpu); > + > + return pcpu >= 0 ? cpu_online(pcpu) : false; > +} > +#define topology_is_core_online topology_is_core_online > + > #else /* CONFIG_SMP */ > static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; } > static inline int topology_max_smt_threads(void) { return 1; } > --- a/arch/x86/kernel/cpu/topology.c > +++ b/arch/x86/kernel/cpu/topology.c > @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 api > return topo_unit_count(lvlid, at_level, apic_maps[which_units].map); > } > > +#ifdef CONFIG_SMP > +int topology_get_primary_thread(unsigned int cpu) > +{ > + u32 apic_id = cpuid_to_apicid[cpu]; > + > + /* > + * Get the core domain level APIC id, which is the primary thread > + * and return the CPU number assigned to it. > + */ > + return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN)); > +} > +#endif > + > #ifdef CONFIG_ACPI_HOTPLUG_CPU > /** > * topology_hotplug_apic - Handle a physical hotplugged APIC after boot ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-09-22 15:42 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki 2025-09-02 15:05 ` Rafael J. Wysocki 2025-09-02 16:56 ` Dave Hansen 2025-09-02 17:18 ` Rafael J. Wysocki 2025-09-03 18:00 ` Dave Hansen 2025-09-03 18:35 ` Rafael J. Wysocki 2025-09-05 7:39 ` Thomas Gleixner 2025-09-05 13:13 ` Rafael J. Wysocki 2025-09-05 13:27 ` Rafael J. Wysocki 2025-09-05 20:47 ` Thomas Gleixner 2025-09-05 20:56 ` Rafael J. Wysocki 2025-09-07 13:14 ` Thomas Gleixner 2025-09-11 10:31 ` Rafael J. Wysocki 2025-09-21 8:56 ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner 2025-09-21 11:00 ` Christian Loehle 2025-09-22 15:42 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox