linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
@ 2025-08-29 20:01 Rafael J. Wysocki
  2025-09-02 15:05 ` Rafael J. Wysocki
  0 siblings, 1 reply; 6+ messages in thread
From: Rafael J. Wysocki @ 2025-08-29 20:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linux PM, Peter Zijlstra, Christian Loehle, Dave Hansen

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
"dead" SMT siblings during initialization") broke the use case in
which both nosmt and maxcpus were added to the kernel command line
because it caused CPUs that were not SMT siblings to be brought
online during the intel_idle driver initialization in violation of the
maxcpus limit.

The underlying reason for this is a missing topology_is_primary_thread()
check in cpuhp_smt_enable() which causes that function to put online
more CPUs than just the SMT siblings that it is supposed to handle.

Add the missing check to address the issue.

Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
Reported-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/cpu.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2710,6 +2710,12 @@
 	cpu_maps_update_begin();
 	cpu_smt_control = CPU_SMT_ENABLED;
 	for_each_present_cpu(cpu) {
+		/*
+		 * Avoid accidentally onlining primary thread CPUs that have
+		 * been taken offline.
+		 */
+		if (topology_is_primary_thread(cpu))
+			continue;
 		/* Skip online CPUs and CPUs on offline nodes */
 		if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
 			continue;




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki
@ 2025-09-02 15:05 ` Rafael J. Wysocki
  2025-09-02 16:56   ` Dave Hansen
  0 siblings, 1 reply; 6+ messages in thread
From: Rafael J. Wysocki @ 2025-09-02 15:05 UTC (permalink / raw)
  To: Linux PM
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle,
	Dave Hansen, the arch/x86 maintainers

On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
> "dead" SMT siblings during initialization") broke the use case in
> which both nosmt and maxcpus were added to the kernel command line
> because it caused CPUs that were not SMT siblings to be brought
> online during the intel_idle driver initialization in violation of the
> maxcpus limit.
>
> The underlying reason for this is a missing topology_is_primary_thread()
> check in cpuhp_smt_enable() which causes that function to put online
> more CPUs than just the SMT siblings that it is supposed to handle.
>
> Add the missing check to address the issue.
>
> Fixes: a430c11f4015 ("intel_idle: Rescan "dead" SMT siblings during initialization")
> Fixes: f694481b1d31 ("ACPI: processor: Rescan "dead" SMT siblings during initialization")
> Closes: https://lore.kernel.org/linux-pm/724616a2-6374-4ba3-8ce3-ea9c45e2ae3b@arm.com/
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Tested-by: Christian Loehle <christian.loehle@arm.com>
> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

I'm wondering if there are any objections or concerns.

Since this is a regression in 6.16, it would be good to get it fixed in 6.17-rc.

> ---
>  kernel/cpu.c |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2710,6 +2710,12 @@
>         cpu_maps_update_begin();
>         cpu_smt_control = CPU_SMT_ENABLED;
>         for_each_present_cpu(cpu) {
> +               /*
> +                * Avoid accidentally onlining primary thread CPUs that have
> +                * been taken offline.
> +                */
> +               if (topology_is_primary_thread(cpu))
> +                       continue;
>                 /* Skip online CPUs and CPUs on offline nodes */
>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
>                         continue;
>
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-02 15:05 ` Rafael J. Wysocki
@ 2025-09-02 16:56   ` Dave Hansen
  2025-09-02 17:18     ` Rafael J. Wysocki
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2025-09-02 16:56 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle,
	Dave Hansen, the arch/x86 maintainers

On 9/2/25 08:05, Rafael J. Wysocki wrote:
> On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
>> "dead" SMT siblings during initialization") broke the use case in

Does "dead" here mean present but offline?

>> which both nosmt and maxcpus were added to the kernel command line
>> because it caused CPUs that were not SMT siblings to be brought
>> online during the intel_idle driver initialization in violation of the
>> maxcpus limit.

How does intel_idle fit in here? I don't immediately see it calling
cpuhp_smt_enable().

>> The underlying reason for this is a missing topology_is_primary_thread()
>> check in cpuhp_smt_enable() which causes that function to put online
>> more CPUs than just the SMT siblings that it is supposed to handle.
>>
>> Add the missing check to address the issue.

We should probably add a bit more checking in cpuhp_smt_enable() to make
sure that it's being called under expected conditions like a:

	WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED);

and probably also some comments about how it is expected to work.

cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it
from DISABLED=>ENABLED. Thinking about it in that context, enabling
_just_ the secondary (disabled) threads makes a lot of sense.

But that can come later, after the bug fix.

>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -2710,6 +2710,12 @@

No 'diff -p', eh?

>>         cpu_maps_update_begin();
>>         cpu_smt_control = CPU_SMT_ENABLED;
>>         for_each_present_cpu(cpu) {
>> +               /*
>> +                * Avoid accidentally onlining primary thread CPUs that have
>> +                * been taken offline.
>> +                */
>> +               if (topology_is_primary_thread(cpu))
>> +                       continue;
>>                 /* Skip online CPUs and CPUs on offline nodes */
>>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
>>                         continue;
Is there a more generic problem with this not respecting 'maxcpus'? If
maxcpus had forced a primary thread offline, this would still online the
secondary thread, even with the fix. Taking _that_ online might still
bring you over the maxcpus limit.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-02 16:56   ` Dave Hansen
@ 2025-09-02 17:18     ` Rafael J. Wysocki
  2025-09-03 18:00       ` Dave Hansen
  0 siblings, 1 reply; 6+ messages in thread
From: Rafael J. Wysocki @ 2025-09-02 17:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Rafael J. Wysocki, Linux PM, Thomas Gleixner, LKML,
	Peter Zijlstra, Christian Loehle, Dave Hansen,
	the arch/x86 maintainers

On Tue, Sep 2, 2025 at 6:56 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 9/2/25 08:05, Rafael J. Wysocki wrote:
> > On Fri, Aug 29, 2025 at 10:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Christian has reported that commit a430c11f4015 ("intel_idle: Rescan
> >> "dead" SMT siblings during initialization") broke the use case in
>
> Does "dead" here mean present but offline?

Yes.

> >> which both nosmt and maxcpus were added to the kernel command line
> >> because it caused CPUs that were not SMT siblings to be brought
> >> online during the intel_idle driver initialization in violation of the
> >> maxcpus limit.
>
> How does intel_idle fit in here? I don't immediately see it calling
> cpuhp_smt_enable().

It calls arch_cpu_rescan_dead_smt_siblings() which calls cpuhp_smt_enable().

> >> The underlying reason for this is a missing topology_is_primary_thread()
> >> check in cpuhp_smt_enable() which causes that function to put online
> >> more CPUs than just the SMT siblings that it is supposed to handle.
> >>
> >> Add the missing check to address the issue.
>
> We should probably add a bit more checking in cpuhp_smt_enable() to make
> sure that it's being called under expected conditions like a:
>
>         WARN_ON_ONCE(cpu_smt_control != CPU_SMT_DISABLED);
>
> and probably also some comments about how it is expected to work.

Well, see the callers.  Two out of three take care of this already and
the third one doesn't care.

> cpuhp_smt_enable() doesn't _really_ enable SMT. It specifically takes it
> from DISABLED=>ENABLED. Thinking about it in that context, enabling
> _just_ the secondary (disabled) threads makes a lot of sense.
>
> But that can come later, after the bug fix.
>
> >> --- a/kernel/cpu.c
> >> +++ b/kernel/cpu.c
> >> @@ -2710,6 +2710,12 @@
>
> No 'diff -p', eh?

Ah, sorry about this.

> >>         cpu_maps_update_begin();
> >>         cpu_smt_control = CPU_SMT_ENABLED;
> >>         for_each_present_cpu(cpu) {
> >> +               /*
> >> +                * Avoid accidentally onlining primary thread CPUs that have
> >> +                * been taken offline.
> >> +                */
> >> +               if (topology_is_primary_thread(cpu))
> >> +                       continue;
> >>                 /* Skip online CPUs and CPUs on offline nodes */
> >>                 if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
> >>                         continue;
>
> Is there a more generic problem with this not respecting 'maxcpus'? If
> maxcpus had forced a primary thread offline, this would still online the
> secondary thread, even with the fix. Taking _that_ online might still
> bring you over the maxcpus limit.

Yes, there is AFAICS, but it has been there for some time and it
doesn't affect the rescan during boot.

For the rescan, it is actually useful to not respect maxcpus because
it allows all of the SMT siblings to be kicked, but for run-time
enabling of SMT it may be sort of a problem.

This change simply addresses the regression, which is that cores
without SMT become online quite surprisingly after intel_idle (and
ACPI idle too for that matter) initialization, due to
arch_cpu_rescan_dead_smt_siblings().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-02 17:18     ` Rafael J. Wysocki
@ 2025-09-03 18:00       ` Dave Hansen
  2025-09-03 18:35         ` Rafael J. Wysocki
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2025-09-03 18:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Thomas Gleixner, LKML, Peter Zijlstra, Christian Loehle,
	Dave Hansen, the arch/x86 maintainers

I munged the changelog up a bit:

> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5

I hope that looks better to everyone.

I also went looking at this assuming that it was x86-specific. I'd
appreciate a quick head nod from Thomas or Peter on this before I push
it anywhere, though.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] cpu: Add missing check to cpuhp_smt_enable()
  2025-09-03 18:00       ` Dave Hansen
@ 2025-09-03 18:35         ` Rafael J. Wysocki
  0 siblings, 0 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2025-09-03 18:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Rafael J. Wysocki, Linux PM, Thomas Gleixner, LKML,
	Peter Zijlstra, Christian Loehle, Dave Hansen,
	the arch/x86 maintainers

On Wed, Sep 3, 2025 at 8:00 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> I munged the changelog up a bit:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=13f107f882bb5
>
> I hope that looks better to everyone.

It does to me, thanks!

> I also went looking at this assuming that it was x86-specific. I'd
> appreciate a quick head nod from Thomas or Peter on this before I push
> it anywhere, though.

Sure.

Thank you!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-03 18:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-29 20:01 [PATCH v1] cpu: Add missing check to cpuhp_smt_enable() Rafael J. Wysocki
2025-09-02 15:05 ` Rafael J. Wysocki
2025-09-02 16:56   ` Dave Hansen
2025-09-02 17:18     ` Rafael J. Wysocki
2025-09-03 18:00       ` Dave Hansen
2025-09-03 18:35         ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).