public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
@ 2025-08-29 20:01 Rafael J. Wysocki
  2025-09-02 15:05 ` Rafael J. Wysocki
  2025-09-05  7:39 ` Thomas Gleixner
  0 siblings, 2 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-08-29 20:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
"dead" SMT siblings during initialization") broke the use case in
which both nosmt and maxcpus were added to the kernel command line
because it caused CPUs that were not SMT siblings to be brought
online during the intel_idle driver initialization in violation of the
maxcpus limit.

The underlying reason for this is a missing topology_is_primary_thread()
check in cpuhp_smt_enable() which causes that function to put online
more CPUs than just the SMT siblings that it is supposed to handle.

Add the missing check to address the issue.

Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
Reported-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/cpu.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2710,6 +2710,12 @@
 	cpu_maps_update_begin();
 	cpu_smt_control = CPU_SMT_ENABLED;
 	for_each_present_cpu(cpu) {
+		/*
+		 * Avoid accidentally onlining primary thread CPUs that have
+		 * been taken offline.
+		 */
+		if (topology_is_primary_thread(cpu))
+			continue;
 		/* Skip online CPUs and CPUs on offline nodes */
 		if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
 			continue;




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki
@ 2025-09-02 15:05 ` Rafael J. Wysocki
  2025-09-02 16:56   ` Dave Hansen
  2025-09-05  7:39 ` Thomas Gleixner
  1 sibling, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-02 15:05 UTC (permalink / raw)
  To: Linux PM
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle,
	Dave Hansen, the arch/x86 maintainers

On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
> "dead" SMT siblings during initialization") broke the use case in
> which both nosmt and maxcpus were added to the kernel command line
> because it caused CPUs that were not SMT siblings to be brought
> online during the intel_idle driver initialization in violation of the
> maxcpus limit.
>
> The underlying reason for this is a missing topology_is_primary_thread()
> check in cpuhp_smt_enable() which causes that function to put online
> more CPUs than just the SMT siblings that it is supposed to handle.
>
> Add the missing check to address the issue.
>
> Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
> Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
> Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Tested-by: Christian Loehle <christian.loehle@arm.com>
> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

I'm wondering if there are any objections or concerns.

Since this is a regression in 6.16, it would be good to get it fixed in 6.17-rc.

> ---
>  kernel/cpu.c |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2710,6 +2710,12 @@
>         cpu_maps_update_begin();
>         cpu_smt_control = CPU_SMT_ENABLED;
>         for_each_present_cpu(cpu) {
> +               /*
> +                * Avoid accidentally onlining primary thread CPUs that have
> +                * been taken offline.
> +                */
> +               if (topology_is_primary_thread(cpu))
> +                       continue;
>                 /* Skip online CPUs and CPUs on offline nodes */
>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
>                         continue;
>
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-02 15:05 ` Rafael J. Wysocki
@ 2025-09-02 16:56   ` Dave Hansen
  2025-09-02 17:18     ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2025-09-02 16:56 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle,
	Dave Hansen, the arch/x86 maintainers

On 9/2/25 08:05, Rafael J. Wysocki wrote:
> On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
>> "dead" SMT siblings during initialization") broke the use case in

Does "dead" here mean present but offline?

>> which both nosmt and maxcpus were added to the kernel command line
>> because it caused CPUs that were not SMT siblings to be brought
>> online during the intel_idle driver initialization in violation of the
>> maxcpus limit.

How does intel_idle fit in here? I don't immediately see it calling
cpuhp_smt_enable().

>> The underlying reason for this is a missing topology_is_primary_thread()
>> check in cpuhp_smt_enable() which causes that function to put online
>> more CPUs than just the SMT siblings that it is supposed to handle.
>>
>> Add the missing check to address the issue.

We should probably add a bit more checking in cpuhp_smt_enable() to make
sure that it's being called under expected conditions like a:

	WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED);

and probably also some comments about how it is expected to work.

cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it
from DISABLED=>ENABLED. Thinking about it in that context, enabling
_just_ the secondary (disabled) threads makes a lot of sense.

But that can come later, after the bug fix.

>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -2710,6 +2710,12 @@

No 'diff -p', eh?

>>         cpu_maps_update_begin();
>>         cpu_smt_control = CPU_SMT_ENABLED;
>>         for_each_present_cpu(cpu) {
>> +               /*
>> +                * Avoid accidentally onlining primary thread CPUs that have
>> +                * been taken offline.
>> +                */
>> +               if (topology_is_primary_thread(cpu))
>> +                       continue;
>>                 /* Skip online CPUs and CPUs on offline nodes */
>>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
>>                         continue;
Is there a more generic problem with this not respecting 'maxcpus'? If
maxcpus had forced a primary thread offline, this would still online the
secondary thread, even with the fix. Taking _that_ online might still
bring you over the maxcpus limit.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-02 16:56   ` Dave Hansen
@ 2025-09-02 17:18     ` Rafael J. Wysocki
  2025-09-03 18:00       ` Dave Hansen
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-02 17:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Rafael J. Wysocki, Linux PM, Thomas Gleixner, LKML,
	Peter Zijlstra, Christian Loehle, Dave Hansen,
	the arch/x86 maintainers

On Tue, Sep 2, 2025 at 6:56 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 9/2/25 08:05, Rafael J. Wysocki wrote:
> > On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
> >> "dead" SMT siblings during initialization") broke the use case in
>
> Does "dead" here mean present but offline?

Yes.

> >> which both nosmt and maxcpus were added to the kernel command line
> >> because it caused CPUs that were not SMT siblings to be brought
> >> online during the intel_idle driver initialization in violation of the
> >> maxcpus limit.
>
> How does intel_idle fit in here? I don't immediately see it calling
> cpuhp_smt_enable().

It calls arch_cpu_rescan_dead_smt_siblings() which calls cpuhp_smt_enable().

> >> The underlying reason for this is a missing topology_is_primary_thread()
> >> check in cpuhp_smt_enable() which causes that function to put online
> >> more CPUs than just the SMT siblings that it is supposed to handle.
> >>
> >> Add the missing check to address the issue.
>
> We should probably add a bit more checking in cpuhp_smt_enable() to make
> sure that it's being called under expected conditions like a:
>
>         WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED);
>
> and probably also some comments about how it is expected to work.

Well, see the callers.  Two out of three take care of this already and
the third one doesn't care.

> cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it
> from DISABLED=>ENABLED. Thinking about it in that context, enabling
> _just_ the secondary (disabled) threads makes a lot of sense.
>
> But that can come later, after the bug fix.
>
> >> --- a/kernel/cpu.c
> >> +++ b/kernel/cpu.c
> >> @@ -2710,6 +2710,12 @@
>
> No 'diff -p', eh?

Ah, sorry about this.

> >>         cpu_maps_update_begin();
> >>         cpu_smt_control = CPU_SMT_ENABLED;
> >>         for_each_present_cpu(cpu) {
> >> +               /*
> >> +                * Avoid accidentally onlining primary thread CPUs that have
> >> +                * been taken offline.
> >> +                */
> >> +               if (topology_is_primary_thread(cpu))
> >> +                       continue;
> >>                 /* Skip online CPUs and CPUs on offline nodes */
> >>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
> >>                         continue;
>
> Is there a more generic problem with this not respecting 'maxcpus'? If
> maxcpus had forced a primary thread offline, this would still online the
> secondary thread, even with the fix. Taking _that_ online might still
> bring you over the maxcpus limit.

Yes, there is AFAICS, but it has been there for some time and it
doesn't affect the rescan during boot.

For the rescan, it is actually useful to not respect maxcpus because
it allows all of the SMT siblings to be kicked, but for run-time
enabling of SMT it may be sort of a problem.

This change simply addresses the regression, which is that cores
without SMT become online quite surprisingly after intel_idle (and
ACPI idle too for that matter) initialization, due to
arch_cpu_rescan_dead_smt_siblings().

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-02 17:18     ` Rafael J. Wysocki
@ 2025-09-03 18:00       ` Dave Hansen
  2025-09-03 18:35         ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2025-09-03 18:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle,
	Dave Hansen, the arch/x86 maintainers

I munged the changelog up a bit:

> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5

I hope that looks better to everyone.

I also went looking at this assuming that it was x86-specific. I'd
appreciate a quick head nod from Thomas or Peter on this before I push
it anywhere, though.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-03 18:00       ` Dave Hansen
@ 2025-09-03 18:35         ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-03 18:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Rafael J. Wysocki, Linux PM, Thomas Gleixner, LKML,
	Peter Zijlstra, Christian Loehle, Dave Hansen,
	the arch/x86 maintainers

On Wed, Sep 3, 2025 at 8:00 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> I munged the changelog up a bit:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5
>
> I hope that looks better to everyone.

It does to me, thanks!

> I also went looking at this assuming that it was x86-specific. I'd
> appreciate a quick head nod from Thomas or Peter on this before I push
> it anywhere, though.

Sure.

Thank you!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki
  2025-09-02 15:05 ` Rafael J. Wysocki
@ 2025-09-05  7:39 ` Thomas Gleixner
  2025-09-05 13:13   ` Rafael J. Wysocki
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2025-09-05  7:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen

On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote:
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2710,6 +2710,12 @@
>  	cpu_maps_update_begin();
>  	cpu_smt_control = CPU_SMT_ENABLED;
>  	for_each_present_cpu(cpu) {
> +		/*
> +		 * Avoid accidentally onlining primary thread CPUs that have
> +		 * been taken offline.
> +		 */
> +		if (topology_is_primary_thread(cpu))
> +			continue;

Hmm. That's kinda solving the problem, but I think the proper solution
would be to implement topology_is_core_online() for x86.

The above is preventing the primary thread to be onlined, but then
onlines the corresponding hyperthread, which does not really make sense.

Something like the completely untested below.

Thanks,

        tglx
---
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7c0957..21041898157a 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -231,6 +231,16 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
 }
 #define topology_is_primary_thread topology_is_primary_thread
 
+int topology_get_primary_thread(unsigned int cpu);
+
+static inline bool topology_is_core_online(unsigned int cpu)
+{
+	int pcpu = topology_get_primary_thread(cpu);
+
+	return pcpu >= 0 ? cpu_online(pcpu) : false;
+}
+#define topology_is_core_online topology_is_core_online
+
 #else /* CONFIG_SMP */
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index e35ccdc84910..6073a16628f9 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_uni
 	return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
 }
 
+#ifdef CONFIG_SMP
+int topology_get_primary_thread(unsigned int cpu)
+{
+	u32 apic_id = cpuid_to_apicid[cpu];
+
+	/*
+	 * Get the core domain level APIC id, which is the primary thread
+	 * and return the CPU number assigned to it.
+	 */
+	return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /**
  * topology_hotplug_apic - Handle a physical hotplugged APIC after boot



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-05  7:39 ` Thomas Gleixner
@ 2025-09-05 13:13   ` Rafael J. Wysocki
  2025-09-05 13:27     ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-05 13:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra,
	Christian Loehle, Dave Hansen

On Fri, Sep 5, 2025 at 9:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote:
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2710,6 +2710,12 @@
> >       cpu_maps_update_begin();
> >       cpu_smt_control = CPU_SMT_ENABLED;
> >       for_each_present_cpu(cpu) {
> > +             /*
> > +              * Avoid accidentally onlining primary thread CPUs that have
> > +              * been taken offline.
> > +              */
> > +             if (topology_is_primary_thread(cpu))
> > +                     continue;
>
> Hmm. That's kinda solving the problem, but I think the proper solution
> would be to implement topology_is_core_online() for x86.
>
> The above is preventing the primary thread to be onlined, but then
> onlines the corresponding hyperthread, which does not really make sense.

Well, manual online can be used for onlining the secondary thread of a
core where the primary thread is offline, so this is technically
possible already.

> Something like the completely untested below.

So given the above, shouldn't topology_is_core_online() check if any
thread in the given core is online?

> ---
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index 6c79ee7c0957..21041898157a 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -231,6 +231,16 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
>  }
>  #define topology_is_primary_thread topology_is_primary_thread
>
> +int topology_get_primary_thread(unsigned int cpu);
> +
> +static inline bool topology_is_core_online(unsigned int cpu)
> +{
> +       int pcpu = topology_get_primary_thread(cpu);
> +
> +       return pcpu >= 0 ? cpu_online(pcpu) : false;
> +}
> +#define topology_is_core_online topology_is_core_online
> +
>  #else /* CONFIG_SMP */
>  static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
>  static inline int topology_max_smt_threads(void) { return 1; }
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index e35ccdc84910..6073a16628f9 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_uni
>         return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
>  }
>
> +#ifdef CONFIG_SMP
> +int topology_get_primary_thread(unsigned int cpu)
> +{
> +       u32 apic_id = cpuid_to_apicid[cpu];
> +
> +       /*
> +        * Get the core domain level APIC id, which is the primary thread
> +        * and return the CPU number assigned to it.
> +        */
> +       return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
> +}
> +#endif
> +
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  /**
>   * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-05 13:13   ` Rafael J. Wysocki
@ 2025-09-05 13:27     ` Rafael J. Wysocki
  2025-09-05 20:47       ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-05 13:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen

On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Sep 5, 2025 at 9:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Fri, Aug 29 2025 at 22:01, Rafael J. Wysocki wrote:
> > > --- a/kernel/cpu.c
> > > +++ b/kernel/cpu.c
> > > @@ -2710,6 +2710,12 @@
> > >       cpu_maps_update_begin();
> > >       cpu_smt_control = CPU_SMT_ENABLED;
> > >       for_each_present_cpu(cpu) {
> > > +             /*
> > > +              * Avoid accidentally onlining primary thread CPUs that have
> > > +              * been taken offline.
> > > +              */
> > > +             if (topology_is_primary_thread(cpu))
> > > +                     continue;
> >
> > Hmm. That's kinda solving the problem, but I think the proper solution
> > would be to implement topology_is_core_online() for x86.
> >
> > The above is preventing the primary thread to be onlined, but then
> > onlines the corresponding hyperthread, which does not really make sense.
>
> Well, manual online can be used for onlining the secondary thread of a
> core where the primary thread is offline, so this is technically
> possible already.
>
> > Something like the completely untested below.
>
> So given the above, shouldn't topology_is_core_online() check if any
> thread in the given core is online?

Besides, this would cause the siblings of offline SMT threads to be
skipped while enabling SMT via sysfs (using
/sys/devices/system/cpu/smt/control), but I'm not sure if this is the
expectation in the field today.  The current behavior is to online all
secondary SMT threads (and more, but that part is quite arguably
broken).

That's why I decided to use a simpler patch which doesn't go that far.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-05 13:27     ` Rafael J. Wysocki
@ 2025-09-05 20:47       ` Thomas Gleixner
  2025-09-05 20:56         ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2025-09-05 20:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen

On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
> On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> Well, manual online can be used for onlining the secondary thread of a
>> core where the primary thread is offline, so this is technically
>> possible already.
>>
>> > Something like the completely untested below.
>>
>> So given the above, shouldn't topology_is_core_online() check if any
>> thread in the given core is online?
>
> Besides, this would cause the siblings of offline SMT threads to be
> skipped while enabling SMT via sysfs (using
> /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
> expectation in the field today.  The current behavior is to online all
> secondary SMT threads (and more, but that part is quite arguably
> broken).

It is broken, because the initial logic is to bring up primary threads
unconditionally and then refuse to bring up sibling threads.

With "maxcpus=xxx" this obviously limits the amount of primary threads,
so there is arguably no point to online any of the related secondary
threads of them.

The initial implementation was naively making that assumption, but the
core check which was added due to PPC made this actually correct.

It just did not snap with me back then, but it's actually the correct
thing to do, no?

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-05 20:47       ` Thomas Gleixner
@ 2025-09-05 20:56         ` Rafael J. Wysocki
  2025-09-07 13:14           ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-05 20:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra,
	Christian Loehle, Dave Hansen

On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> Well, manual online can be used for onlining the secondary thread of a
> >> core where the primary thread is offline, so this is technically
> >> possible already.
> >>
> >> > Something like the completely untested below.
> >>
> >> So given the above, shouldn't topology_is_core_online() check if any
> >> thread in the given core is online?
> >
> > Besides, this would cause the siblings of offline SMT threads to be
> > skipped while enabling SMT via sysfs (using
> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
> > expectation in the field today.  The current behavior is to online all
> > secondary SMT threads (and more, but that part is quite arguably
> > broken).
>
> It is broken, because the initial logic is to bring up primary threads
> unconditionally and then refuse to bring up sibling threads.
>
> With "maxcpus=xxx" this obviously limits the amount of primary threads,
> so there is arguably no point to online any of the related secondary
> threads of them.
>
> The initial implementation was naively making that assumption, but the
> core check which was added due to PPC made this actually correct.
>
> It just did not snap with me back then, but it's actually the correct
> thing to do, no?

It would at least be consistent with the existing PPC behavior. :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-05 20:56         ` Rafael J. Wysocki
@ 2025-09-07 13:14           ` Thomas Gleixner
  2025-09-11 10:31             ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2025-09-07 13:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra,
	Christian Loehle, Dave Hansen

On Fri, Sep 05 2025 at 22:56, Rafael J. Wysocki wrote:
> On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
>> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> >> Well, manual online can be used for onlining the secondary thread of a
>> >> core where the primary thread is offline, so this is technically
>> >> possible already.
>> >>
>> >> > Something like the completely untested below.
>> >>
>> >> So given the above, shouldn't topology_is_core_online() check if any
>> >> thread in the given core is online?
>> >
>> > Besides, this would cause the siblings of offline SMT threads to be
>> > skipped while enabling SMT via sysfs (using
>> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
>> > expectation in the field today.  The current behavior is to online all
>> > secondary SMT threads (and more, but that part is quite arguably
>> > broken).
>>
>> It is broken, because the initial logic is to bring up primary threads
>> unconditionally and then refuse to bring up sibling threads.
>>
>> With "maxcpus=xxx" this obviously limits the amount of primary threads,
>> so there is arguably no point to online any of the related secondary
>> threads of them.
>>
>> The initial implementation was naively making that assumption, but the
>> core check which was added due to PPC made this actually correct.
>>
>> It just did not snap with me back then, but it's actually the correct
>> thing to do, no?
>
> It would at least be consistent with the existing PPC behavior. :-)

Correct.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-07 13:14           ` Thomas Gleixner
@ 2025-09-11 10:31             ` Rafael J. Wysocki
  2025-09-21  8:56               ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-11 10:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra,
	Christian Loehle, Dave Hansen

On Sun, Sep 7, 2025 at 3:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Sep 05 2025 at 22:56, Rafael J. Wysocki wrote:
> > On Fri, Sep 5, 2025 at 10:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>
> >> On Fri, Sep 05 2025 at 15:27, Rafael J. Wysocki wrote:
> >> > On Fri, Sep 5, 2025 at 3:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> >> Well, manual online can be used for onlining the secondary thread of a
> >> >> core where the primary thread is offline, so this is technically
> >> >> possible already.
> >> >>
> >> >> > Something like the completely untested below.
> >> >>
> >> >> So given the above, shouldn't topology_is_core_online() check if any
> >> >> thread in the given core is online?
> >> >
> >> > Besides, this would cause the siblings of offline SMT threads to be
> >> > skipped while enabling SMT via sysfs (using
> >> > /sys/devices/system/cpu/smt/control), but I'm not sure if this is the
> >> > expectation in the field today.  The current behavior is to online all
> >> > secondary SMT threads (and more, but that part is quite arguably
> >> > broken).
> >>
> >> It is broken, because the initial logic is to bring up primary threads
> >> unconditionally and then refuse to bring up sibling threads.
> >>
> >> With "maxcpus=xxx" this obviously limits the amount of primary threads,
> >> so there is arguably no point to online any of the related secondary
> >> threads of them.
> >>
> >> The initial implementation was naively making that assumption, but the
> >> core check which was added due to PPC made this actually correct.
> >>
> >> It just did not snap with me back then, but it's actually the correct
> >> thing to do, no?
> >
> > It would at least be consistent with the existing PPC behavior. :-)
>
> Correct.

So are you going to send a patch or do you want me to do something?

From a user standpoint, this issue is a regression in 6.16, so it
would be good to address it before final 6.17.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression
  2025-09-11 10:31             ` Rafael J. Wysocki
@ 2025-09-21  8:56               ` Thomas Gleixner
  2025-09-21 11:00                 ` Christian Loehle
  2025-09-22 15:42                 ` Rafael J. Wysocki
  0 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2025-09-21  8:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra,
	Christian Loehle, Dave Hansen

Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT
siblings during initialization") broke the use case in which both 'nosmt'
and 'maxcpus' are on the kernel command line because it onlines primary
threads, which were offline due to the maxcpus limit.

The initially proposed fix to skip primary threads in the loop is
inconsistent. While it prevents the primary thread to be onlined, it then
onlines the corresponding hyperthread(s), which does not really make sense.

The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
threads of a core, when the primary thread is offline. The default
implementation is a NOOP and therefore not effective on x86.

Implement topology_is_core_online() on x86 to address this issue. This
makes the behaviour consistent between x86 and PowerPC.

Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Reported-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
---
 arch/x86/include/asm/topology.h |   10 ++++++++++
 arch/x86/kernel/cpu/topology.c  |   13 +++++++++++++
 2 files changed, 23 insertions(+)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -231,6 +231,16 @@ static inline bool topology_is_primary_t
 }
 #define topology_is_primary_thread topology_is_primary_thread
 
+int topology_get_primary_thread(unsigned int cpu);
+
+static inline bool topology_is_core_online(unsigned int cpu)
+{
+	int pcpu = topology_get_primary_thread(cpu);
+
+	return pcpu >= 0 ? cpu_online(pcpu) : false;
+}
+#define topology_is_core_online topology_is_core_online
+
 #else /* CONFIG_SMP */
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 api
 	return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
 }
 
+#ifdef CONFIG_SMP
+int topology_get_primary_thread(unsigned int cpu)
+{
+	u32 apic_id = cpuid_to_apicid[cpu];
+
+	/*
+	 * Get the core domain level APIC id, which is the primary thread
+	 * and return the CPU number assigned to it.
+	 */
+	return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /**
  * topology_hotplug_apic - Handle a physical hotplugged APIC after boot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression
  2025-09-21  8:56               ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner
@ 2025-09-21 11:00                 ` Christian Loehle
  2025-09-22 15:42                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Loehle @ 2025-09-21 11:00 UTC (permalink / raw)
  To: Thomas Gleixner, Rafael J. Wysocki
  Cc: LKML, Linux PM, Peter Zijlstra, Dave Hansen

On 9/21/25 09:56, Thomas Gleixner wrote:
> Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT
> siblings during initialization") broke the use case in which both 'nosmt'
> and 'maxcpus' are on the kernel command line because it onlines primary
> threads, which were offline due to the maxcpus limit.
> 
> The initially proposed fix to skip primary threads in the loop is
> inconsistent. While it prevents the primary thread to be onlined, it then
> onlines the corresponding hyperthread(s), which does not really make sense.
> 
> The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
> threads of a core, when the primary thread is offline. The default
> implementation is a NOOP and therefore not effective on x86.
> 
> Implement topology_is_core_online() on x86 to address this issue. This
> makes the behaviour consistent between x86 and PowerPC.
> 
> Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
> Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Tested-by: Christian Loehle <christian.loehle@arm.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression
  2025-09-21  8:56               ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner
  2025-09-21 11:00                 ` Christian Loehle
@ 2025-09-22 15:42                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-09-22 15:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Linux PM, Peter Zijlstra,
	Christian Loehle, Dave Hansen

On Sun, Sep 21, 2025 at 10:56 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Christian reported that commit a430c11f4015 ("intel_idle: Rescan "dead" SMT
> siblings during initialization") broke the use case in which both 'nosmt'
> and 'maxcpus' are on the kernel command line because it onlines primary
> threads, which were offline due to the maxcpus limit.
>
> The initially proposed fix to skip primary threads in the loop is
> inconsistent. While it prevents the primary thread to be onlined, it then
> onlines the corresponding hyperthread(s), which does not really make sense.
>
> The CPU iterator in cpuhp_smt_enable() contains a check which excludes all
> threads of a core, when the primary thread is offline. The default
> implementation is a NOOP and therefore not effective on x86.
>
> Implement topology_is_core_online() on x86 to address this issue. This
> makes the behaviour consistent between x86 and PowerPC.
>
> Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
> Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/12740505.O9o76ZdvQC@rafael.j.wysocki
> Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/

Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

> ---
>  arch/x86/include/asm/topology.h |   10 ++++++++++
>  arch/x86/kernel/cpu/topology.c  |   13 +++++++++++++
>  2 files changed, 23 insertions(+)
>
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -231,6 +231,16 @@ static inline bool topology_is_primary_t
>  }
>  #define topology_is_primary_thread topology_is_primary_thread
>
> +int topology_get_primary_thread(unsigned int cpu);
> +
> +static inline bool topology_is_core_online(unsigned int cpu)
> +{
> +       int pcpu = topology_get_primary_thread(cpu);
> +
> +       return pcpu >= 0 ? cpu_online(pcpu) : false;
> +}
> +#define topology_is_core_online topology_is_core_online
> +
>  #else /* CONFIG_SMP */
>  static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
>  static inline int topology_max_smt_threads(void) { return 1; }
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -372,6 +372,19 @@ unsigned int topology_unit_count(u32 api
>         return topo_unit_count(lvlid, at_level, apic_maps[which_units].map);
>  }
>
> +#ifdef CONFIG_SMP
> +int topology_get_primary_thread(unsigned int cpu)
> +{
> +       u32 apic_id = cpuid_to_apicid[cpu];
> +
> +       /*
> +        * Get the core domain level APIC id, which is the primary thread
> +        * and return the CPU number assigned to it.
> +        */
> +       return topo_lookup_cpuid(topo_apicid(apic_id, TOPO_CORE_DOMAIN));
> +}
> +#endif
> +
>  #ifdef CONFIG_ACPI_HOTPLUG_CPU
>  /**
>   * topology_hotplug_apic - Handle a physical hotplugged APIC after boot

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-22 15:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki
2025-09-02 15:05 ` Rafael J. Wysocki
2025-09-02 16:56   ` Dave Hansen
2025-09-02 17:18     ` Rafael J. Wysocki
2025-09-03 18:00       ` Dave Hansen
2025-09-03 18:35         ` Rafael J. Wysocki
2025-09-05  7:39 ` Thomas Gleixner
2025-09-05 13:13   ` Rafael J. Wysocki
2025-09-05 13:27     ` Rafael J. Wysocki
2025-09-05 20:47       ` Thomas Gleixner
2025-09-05 20:56         ` Rafael J. Wysocki
2025-09-07 13:14           ` Thomas Gleixner
2025-09-11 10:31             ` Rafael J. Wysocki
2025-09-21  8:56               ` [PATCH] x86/topology: Implement topology_is_core_online() to address SMT regression Thomas Gleixner
2025-09-21 11:00                 ` Christian Loehle
2025-09-22 15:42                 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox