public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry
@ 2024-11-25 13:20 Patryk Wlazlyn
  2024-11-25 13:20 ` [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures Patryk Wlazlyn
                   ` (7 more replies)
  0 siblings, 8 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

Sending an interim RFC patch. Rebased on top of Rafael's patches and
applied modified patch/suggestion from Gautham.

Please tell me what you think. Is this approach in general is
acceptable?

When playing dead we have now the preference ordered:
 1. Idle driver.
      - acpi_idle can now do FFH and calls mwait_play_dead_with_hint,
        but we disable the handlers on AMD, because AMD doesn't want to
        use mwait for play_dead.
      - intel_idle have a handler for SRF. Intel platforms fallback to
        cpuid leaf 0x5 on other platforms.
 2. Old mwait_play_dead() based on cpuid leaf 0x5.
 3. hlt_play_dead().


Changes since v3:
 * Reworded commit messages authored by me as advised by Dave.
   Expanded the context about the mwait_play_dead code, hint computation
   and what is a problem we observe and the patch addresses.
 * Different play_dead methods are called now one after the other,
   because they only return on success.

Patryk Wlazlyn (5):
  x86/smp: Allow calling mwait_play_dead with arbitrary hint
  x86/smp native_play_dead: Prefer cpuidle_play_dead() over
    mwait_play_dead()
  intel_idle: Provide enter_dead() handler for SRF
  acpi_idle: Add FFH cstate handling
  acpi_idle: Disallow play_dead with FFH cstate on AMD platforms

Rafael J. Wysocki (3):
  cpuidle: Do not return from cpuidle_play_dead() on callback failures
  cpuidle: Change :enter_dead() driver callback return type to void
  ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/smp.h         |  3 +
 arch/x86/kernel/acpi/cstate.c      |  9 +++
 arch/x86/kernel/smpboot.c          | 91 ++++++++++++++++--------------
 drivers/acpi/processor_idle.c      | 23 ++++----
 drivers/cpuidle/cpuidle.c          | 10 +++-
 drivers/idle/intel_idle.c          | 15 +++++
 include/acpi/processor.h           |  5 ++
 include/linux/cpuidle.h            |  2 +-
 9 files changed, 103 insertions(+), 56 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 13:26   ` Rafael J. Wysocki
  2024-11-25 13:20 ` [RFC PATCH v4 2/8] cpuidle: Change :enter_dead() driver callback return type to void Patryk Wlazlyn
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

If the :enter_dead() idle state callback fails for a certain state,
there may be still a shallower state for which it will work.

Because the only caller of cpuidle_play_dead(), native_play_dead(),
falls back to hlt_play_dead() if it returns an error, it should
better try all of the idle states for which :enter_dead() is present
before failing, so change it accordingly.

Also notice that the :enter_dead() state callback is not expected
to return on success (the CPU should be "dead" then), so make
cpuidle_play_dead() ignore its return value.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # 6.12-rc7
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
---
 drivers/cpuidle/cpuidle.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 06ace16f9e71..0835da449db8 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -69,11 +69,15 @@ int cpuidle_play_dead(void)
 	if (!drv)
 		return -ENODEV;
 
-	/* Find lowest-power state that supports long-term idle */
-	for (i = drv->state_count - 1; i >= 0; i--)
+	for (i = drv->state_count - 1; i >= 0; i--) {
 		if (drv->states[i].enter_dead)
-			return drv->states[i].enter_dead(dev, i);
+			drv->states[i].enter_dead(dev, i);
+	}
 
+	/*
+	 * If :enter_dead() is successful, it will never return, so reaching
+	 * here means that all of them failed above or were not present.
+	 */
 	return -ENODEV;
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 2/8] cpuidle: Change :enter_dead() driver callback return type to void
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
  2024-11-25 13:20 ` [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 13:20 ` [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states Patryk Wlazlyn
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

After a previous change, cpuidle_play_dead(), which is the only caller
of idle state :enter_dead() callbacks, ignores their return values, so
they may as well be void.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/acpi/processor_idle.c | 7 ++-----
 include/linux/cpuidle.h       | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 831fa4a12159..ce728cf7e301 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -578,7 +578,7 @@ static void __cpuidle acpi_idle_do_entry(struct acpi_processor_cx *cx)
  * @dev: the target CPU
  * @index: the index of suggested state
  */
-static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
+static void acpi_idle_play_dead(struct cpuidle_device *dev, int index)
 {
 	struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
 
@@ -591,11 +591,8 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
 		else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
 			io_idle(cx->address);
 		} else
-			return -ENODEV;
+			return;
 	}
-
-	/* Never reached */
-	return 0;
 }
 
 static __always_inline bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3183aeb7f5b4..a9ee4fe55dcf 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -61,7 +61,7 @@ struct cpuidle_state {
 			struct cpuidle_driver *drv,
 			int index);
 
-	int (*enter_dead) (struct cpuidle_device *dev, int index);
+	void (*enter_dead) (struct cpuidle_device *dev, int index);
 
 	/*
 	 * CPUs execute ->enter_s2idle with the local tick or entire timekeeping
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
  2024-11-25 13:20 ` [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures Patryk Wlazlyn
  2024-11-25 13:20 ` [RFC PATCH v4 2/8] cpuidle: Change :enter_dead() driver callback return type to void Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 13:24   ` Rafael J. Wysocki
  2024-11-25 14:36   ` Gautham R. Shenoy
  2024-11-25 13:20 ` [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint Patryk Wlazlyn
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Notice that acpi_processor_setup_cstates() can set state->enter_dead to acpi_idle_play_dead() for all C-states unconditionally and remove the
confusing C-state type check done before setting it.

No intentional functional impact.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # 6.12-rc7
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/acpi/processor_idle.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index ce728cf7e301..698897b29de2 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -800,12 +800,12 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
 		state->enter = acpi_idle_enter;
 
 		state->flags = 0;
-		if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2 ||
-		    cx->type == ACPI_STATE_C3) {
-			state->enter_dead = acpi_idle_play_dead;
-			if (cx->type != ACPI_STATE_C3)
-				drv->safe_state_index = count;
-		}
+
+		state->enter_dead = acpi_idle_play_dead;
+
+		if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
+			drv->safe_state_index = count;
+
 		/*
 		 * Halt-induced C1 is not good for ->enter_s2idle, because it
 		 * re-enables interrupts on exit.  Moreover, C1 is generally not
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
                   ` (2 preceding siblings ...)
  2024-11-25 13:20 ` [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 14:39   ` Rafael J. Wysocki
  2024-11-25 13:20 ` [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead() Patryk Wlazlyn
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

The MWAIT instruction needs different hints on different CPUs to reach
the most specific idle states. The current hint calculation* in
mwait_play_dead() code works in practice on current hardware, but it
fails on a recent one, Intel's Sierra Forest and possibly some future ones.
Those newer CPUs' power efficiency suffers when the CPU is put offline.

 * The current calculation is the for loop inspecting edx in
   mwait_play_dead()

The current implementation for looking up the mwait hint for the deepest
cstate, in mwait_play_dead() code works by inspecting CPUID leaf 0x5 and
calculates the mwait hint based on the number of reported substates.
This approach depends on the hints associated with them to be continuous
in the range [0, NUM_SUBSTATES-1]. This continuity is not documented and
is not met on the recent Intel platforms.

For example, Intel's Sierra Forest report two cstates with two substates
each in cpuid leaf 5:

  Name*   target cstate    target subcstate (mwait hint)
  ===========================================================
  C1      0x00             0x00
  C1E     0x00             0x01

  --      0x10             ----

  C6S     0x20             0x22
  C6P     0x20             0x23

  --      0x30             ----

  /* No more (sub)states all the way down to the end. */
  ===========================================================

   * Names of the cstates are not included in the CPUID leaf 0x5, they are
     taken from the product specific documentation.

Notice that hints 0x20 and 0x21 are skipped entirely for the target
cstate 0x20 (C6), being a cause of the problem for the current cpuid
leaf 0x5 algorithm.

Allow cpuidle code to provide mwait play dead loop with known, mwait
hint for the deepest idle state on a given platform, skipping the cpuid
based calculation.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
---
 arch/x86/include/asm/smp.h |  3 ++
 arch/x86/kernel/smpboot.c  | 86 ++++++++++++++++++++------------------
 2 files changed, 49 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ca073f40698f..d12fab4a83c5 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -114,6 +114,7 @@ void wbinvd_on_cpu(int cpu);
 int wbinvd_on_all_cpus(void);
 
 void smp_kick_mwait_play_dead(void);
+void mwait_play_dead_with_hint(unsigned long hint);
 
 void native_smp_send_reschedule(int cpu);
 void native_send_call_func_ipi(const struct cpumask *mask);
@@ -164,6 +165,8 @@ static inline struct cpumask *cpu_llc_shared_mask(int cpu)
 {
 	return (struct cpumask *)cpumask_of(0);
 }
+
+static inline void mwait_play_dead_with_hint(unsigned long eax_hint) { }
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_DEBUG_NMI_SELFTEST
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b5a8f0891135..d0464c7a0af5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1272,13 +1272,57 @@ void play_dead_common(void)
 	local_irq_disable();
 }
 
+void __noreturn mwait_play_dead_with_hint(unsigned long eax_hint)
+{
+	struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead);
+
+	/* Set up state for the kexec() hack below */
+	md->status = CPUDEAD_MWAIT_WAIT;
+	md->control = CPUDEAD_MWAIT_WAIT;
+
+	wbinvd();
+
+	while (1) {
+		/*
+		 * The CLFLUSH is a workaround for erratum AAI65 for
+		 * the Xeon 7400 series.  It's not clear it is actually
+		 * needed, but it should be harmless in either case.
+		 * The WBINVD is insufficient due to the spurious-wakeup
+		 * case where we return around the loop.
+		 */
+		mb();
+		clflush(md);
+		mb();
+		__monitor(md, 0, 0);
+		mb();
+		__mwait(eax_hint, 0);
+
+		if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) {
+			/*
+			 * Kexec is about to happen. Don't go back into mwait() as
+			 * the kexec kernel might overwrite text and data including
+			 * page tables and stack. So mwait() would resume when the
+			 * monitor cache line is written to and then the CPU goes
+			 * south due to overwritten text, page tables and stack.
+			 *
+			 * Note: This does _NOT_ protect against a stray MCE, NMI,
+			 * SMI. They will resume execution at the instruction
+			 * following the HLT instruction and run into the problem
+			 * which this is trying to prevent.
+			 */
+			WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT);
+			while(1)
+				native_halt();
+		}
+	}
+}
+
 /*
  * We need to flush the caches before going to sleep, lest we have
  * dirty data in our caches when we come back up.
  */
 static inline void mwait_play_dead(void)
 {
-	struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead);
 	unsigned int eax, ebx, ecx, edx;
 	unsigned int highest_cstate = 0;
 	unsigned int highest_subcstate = 0;
@@ -1316,45 +1360,7 @@ static inline void mwait_play_dead(void)
 			(highest_subcstate - 1);
 	}
 
-	/* Set up state for the kexec() hack below */
-	md->status = CPUDEAD_MWAIT_WAIT;
-	md->control = CPUDEAD_MWAIT_WAIT;
-
-	wbinvd();
-
-	while (1) {
-		/*
-		 * The CLFLUSH is a workaround for erratum AAI65 for
-		 * the Xeon 7400 series.  It's not clear it is actually
-		 * needed, but it should be harmless in either case.
-		 * The WBINVD is insufficient due to the spurious-wakeup
-		 * case where we return around the loop.
-		 */
-		mb();
-		clflush(md);
-		mb();
-		__monitor(md, 0, 0);
-		mb();
-		__mwait(eax, 0);
-
-		if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) {
-			/*
-			 * Kexec is about to happen. Don't go back into mwait() as
-			 * the kexec kernel might overwrite text and data including
-			 * page tables and stack. So mwait() would resume when the
-			 * monitor cache line is written to and then the CPU goes
-			 * south due to overwritten text, page tables and stack.
-			 *
-			 * Note: This does _NOT_ protect against a stray MCE, NMI,
-			 * SMI. They will resume execution at the instruction
-			 * following the HLT instruction and run into the problem
-			 * which this is trying to prevent.
-			 */
-			WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT);
-			while(1)
-				native_halt();
-		}
-	}
+	mwait_play_dead_with_hint(eax);
 }
 
 /*
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
                   ` (3 preceding siblings ...)
  2024-11-25 13:20 ` [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 13:45   ` Peter Zijlstra
  2024-11-25 13:56   ` Rafael J. Wysocki
  2024-11-25 13:20 ` [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF Patryk Wlazlyn
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

The algorithm based on cpuid leaf 0x5 in the mwait_play_dead(),
for looking up the mwait hint for the deepest cstate may calculate the
wrong hint on recent Intel x86 platforms.

Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
only fallback to the later in case of missing appropriate enter_dead()
handler.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d0464c7a0af5..2627b56fb9bc 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1413,9 +1413,10 @@ void native_play_dead(void)
 	play_dead_common();
 	tboot_shutdown(TB_SHUTDOWN_WFS);
 
+	/* The first successful play_dead() will not return */
+	cpuidle_play_dead();
 	mwait_play_dead();
-	if (cpuidle_play_dead())
-		hlt_play_dead();
+	hlt_play_dead();
 }
 
 #else /* ... !CONFIG_HOTPLUG_CPU */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
                   ` (4 preceding siblings ...)
  2024-11-25 13:20 ` [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead() Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 13:44   ` Rafael J. Wysocki
  2024-11-25 13:53   ` Peter Zijlstra
  2024-11-25 13:20 ` [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling Patryk Wlazlyn
  2024-11-25 13:20 ` [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms Patryk Wlazlyn
  7 siblings, 2 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

Intel's Sierra Forest report two C6 substates in cpuid leaf 5:
    C6S  (hint 0x22)
    C6SP (hint 0x23)

Hints 0x20 and 0x21 are skipped entirely, causing the generic
implementation in mwait_play_dead() to compute the wrong hint, when
looking for the deepest cstate. As a result, package with an offlined
CPU can never reach PC6.

Define the enter_dead() handler for SRF.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
---
 drivers/idle/intel_idle.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index ac4d8faa3886..c2ca01e74add 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -56,6 +56,7 @@
 #include <asm/mwait.h>
 #include <asm/spec-ctrl.h>
 #include <asm/fpu/api.h>
+#include <asm/smp.h>
 
 #define INTEL_IDLE_VERSION "0.5.1"
 
@@ -227,6 +228,16 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
 	return 0;
 }
 
+static __cpuidle void intel_idle_enter_dead(struct cpuidle_device *dev,
+					    int index)
+{
+	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
+	struct cpuidle_state *state = &drv->states[index];
+	unsigned long eax = flg2MWAIT(state->flags);
+
+	mwait_play_dead_with_hint(eax);
+}
+
 /*
  * States are indexed by the cstate number,
  * which is also the index into the MWAIT hint array.
@@ -1391,6 +1402,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
+		.enter_dead = &intel_idle_enter_dead,
 		.enter_s2idle = intel_idle_s2idle, },
 	{
 		.name = "C1E",
@@ -1399,6 +1411,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 10,
 		.enter = &intel_idle,
+		.enter_dead = &intel_idle_enter_dead,
 		.enter_s2idle = intel_idle_s2idle, },
 	{
 		.name = "C6S",
@@ -1408,6 +1421,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 270,
 		.target_residency = 700,
 		.enter = &intel_idle,
+		.enter_dead = &intel_idle_enter_dead,
 		.enter_s2idle = intel_idle_s2idle, },
 	{
 		.name = "C6SP",
@@ -1417,6 +1431,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 310,
 		.target_residency = 900,
 		.enter = &intel_idle,
+		.enter_dead = &intel_idle_enter_dead,
 		.enter_s2idle = intel_idle_s2idle, },
 	{
 		.enter = NULL }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
                   ` (5 preceding siblings ...)
  2024-11-25 13:20 ` [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 13:41   ` Rafael J. Wysocki
  2024-11-25 15:14   ` Gautham R. Shenoy
  2024-11-25 13:20 ` [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms Patryk Wlazlyn
  7 siblings, 2 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/acpi/cstate.c      | 9 +++++++++
 drivers/acpi/processor_idle.c      | 2 ++
 include/acpi/processor.h           | 5 +++++
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ea33439a5d00..1da5e08de257 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -236,6 +236,7 @@
 #define X86_FEATURE_PVUNLOCK		( 8*32+20) /* PV unlock function */
 #define X86_FEATURE_VCPUPREEMPT		( 8*32+21) /* PV vcpu_is_preempted function */
 #define X86_FEATURE_TDX_GUEST		( 8*32+22) /* "tdx_guest" Intel Trust Domain Extensions Guest */
+#define X86_FEATURE_NO_MWAIT_OFFLINE    ( 8*32+23) /* Don't use MWAIT states for offlined CPUs */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* "fsgsbase" RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index f3ffd0a3a012..c80a3e6dba5f 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -204,6 +204,15 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
 }
 EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
 
+void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
+{
+	unsigned int cpu = smp_processor_id();
+	struct cstate_entry *percpu_entry;
+
+	percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+	mwait_play_dead_with_hint(percpu_entry->states[cx->index].eax);
+}
+
 void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
 {
 	unsigned int cpu = smp_processor_id();
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 698897b29de2..586cc7d1d8aa 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -590,6 +590,8 @@ static void acpi_idle_play_dead(struct cpuidle_device *dev, int index)
 			raw_safe_halt();
 		else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
 			io_idle(cx->address);
+		} else if (cx->entry_method == ACPI_CSTATE_FFH) {
+			acpi_processor_ffh_play_dead(cx);
 		} else
 			return;
 	}
diff --git a/include/acpi/processor.h b/include/acpi/processor.h
index a17e97e634a6..63a37e72b721 100644
--- a/include/acpi/processor.h
+++ b/include/acpi/processor.h
@@ -280,6 +280,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
 				    struct acpi_processor_cx *cx,
 				    struct acpi_power_register *reg);
 void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cstate);
+void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx);
 #else
 static inline void acpi_processor_power_init_bm_check(struct
 						      acpi_processor_flags
@@ -300,6 +301,10 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
 {
 	return;
 }
+static inline void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
+{
+	return;
+}
 #endif
 
 static inline int call_on_cpu(int cpu, long (*fn)(void *), void *arg,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms
  2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
                   ` (6 preceding siblings ...)
  2024-11-25 13:20 ` [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling Patryk Wlazlyn
@ 2024-11-25 13:20 ` Patryk Wlazlyn
  2024-11-25 13:46   ` Rafael J. Wysocki
                     ` (2 more replies)
  7 siblings, 3 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 13:20 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, patryk.wlazlyn, peterz, tglx,
	gautham.shenoy

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
---
 drivers/acpi/processor_idle.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 586cc7d1d8aa..4b4ac8d55b55 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -803,7 +803,11 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
 
 		state->flags = 0;
 
-		state->enter_dead = acpi_idle_play_dead;
+		/* AMD doesn't want to use mwait for play dead. */
+		bool amd_or_hygon = boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+				    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON;
+		if (!(cx->entry_method == ACPI_CSTATE_FFH && amd_or_hygon))
+			state->enter_dead = acpi_idle_play_dead;
 
 		if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
 			drv->safe_state_index = count;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states
  2024-11-25 13:20 ` [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states Patryk Wlazlyn
@ 2024-11-25 13:24   ` Rafael J. Wysocki
  2024-11-25 14:39     ` Patryk Wlazlyn
  2024-11-25 14:36   ` Gautham R. Shenoy
  1 sibling, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 13:24 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:20 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
>
> Notice that acpi_processor_setup_cstates() can set state->enter_dead to acpi_idle_play_dead() for all C-states unconditionally and remove the
> confusing C-state type check done before setting it.
>
> No intentional functional impact.
>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # 6.12-rc7
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

This already is in the queue of patches to be merged during the 6.13
merge window.

I gather that it has been included here for completeness.

> ---
>  drivers/acpi/processor_idle.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index ce728cf7e301..698897b29de2 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -800,12 +800,12 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
>                 state->enter = acpi_idle_enter;
>
>                 state->flags = 0;
> -               if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2 ||
> -                   cx->type == ACPI_STATE_C3) {
> -                       state->enter_dead = acpi_idle_play_dead;
> -                       if (cx->type != ACPI_STATE_C3)
> -                               drv->safe_state_index = count;
> -               }
> +
> +               state->enter_dead = acpi_idle_play_dead;
> +
> +               if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
> +                       drv->safe_state_index = count;
> +
>                 /*
>                  * Halt-induced C1 is not good for ->enter_s2idle, because it
>                  * re-enables interrupts on exit.  Moreover, C1 is generally not
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures
  2024-11-25 13:20 ` [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures Patryk Wlazlyn
@ 2024-11-25 13:26   ` Rafael J. Wysocki
  2024-11-25 14:36     ` Patryk Wlazlyn
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 13:26 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:20 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
>
> If the :enter_dead() idle state callback fails for a certain state,
> there may be still a shallower state for which it will work.
>
> Because the only caller of cpuidle_play_dead(), native_play_dead(),
> falls back to hlt_play_dead() if it returns an error, it should
> better try all of the idle states for which :enter_dead() is present
> before failing, so change it accordingly.
>
> Also notice that the :enter_dead() state callback is not expected
> to return on success (the CPU should be "dead" then), so make
> cpuidle_play_dead() ignore its return value.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # 6.12-rc7
> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>

This one and the [2/2] are in the mainline as of today, no need to resend them.

> ---
>  drivers/cpuidle/cpuidle.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 06ace16f9e71..0835da449db8 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -69,11 +69,15 @@ int cpuidle_play_dead(void)
>         if (!drv)
>                 return -ENODEV;
>
> -       /* Find lowest-power state that supports long-term idle */
> -       for (i = drv->state_count - 1; i >= 0; i--)
> +       for (i = drv->state_count - 1; i >= 0; i--) {
>                 if (drv->states[i].enter_dead)
> -                       return drv->states[i].enter_dead(dev, i);
> +                       drv->states[i].enter_dead(dev, i);
> +       }
>
> +       /*
> +        * If :enter_dead() is successful, it will never return, so reaching
> +        * here means that all of them failed above or were not present.
> +        */
>         return -ENODEV;
>  }
>
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling
  2024-11-25 13:20 ` [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling Patryk Wlazlyn
@ 2024-11-25 13:41   ` Rafael J. Wysocki
  2024-11-25 14:00     ` Rafael J. Wysocki
  2024-11-25 15:14   ` Gautham R. Shenoy
  1 sibling, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 13:41 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:21 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

The changes below look good to me, but the patch needs a changelog
which should explain why it is needed or useful.

In this particular case, you want to change the code ordering in
native_play_dead() so that it calls cpuidle_play_dead() first and only
fall back to anything else if that fails.

In particular, this needs to work when intel_idle is not in use and
the entry method for at least one idle state in the _CST return
package for at least one CPU is ACPI_CSTATE_FFH, so this case needs to
be added to acpi_idle_play_dead().

You may also make a note in the changelog that had there been a
non-Intel x86 platform with a _CST returning idle states where CPU is
ACPI_CSTATE_FFH had been the entry method, it wouldn't have been
handled correctly today (but this is academic because of the lack of
such platforms).

Also, this can be the first patch in your series if [1-3/7] are dropped.

> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/acpi/cstate.c      | 9 +++++++++
>  drivers/acpi/processor_idle.c      | 2 ++
>  include/acpi/processor.h           | 5 +++++
>  4 files changed, 17 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index ea33439a5d00..1da5e08de257 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -236,6 +236,7 @@
>  #define X86_FEATURE_PVUNLOCK           ( 8*32+20) /* PV unlock function */
>  #define X86_FEATURE_VCPUPREEMPT                ( 8*32+21) /* PV vcpu_is_preempted function */
>  #define X86_FEATURE_TDX_GUEST          ( 8*32+22) /* "tdx_guest" Intel Trust Domain Extensions Guest */
> +#define X86_FEATURE_NO_MWAIT_OFFLINE    ( 8*32+23) /* Don't use MWAIT states for offlined CPUs */
>
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
>  #define X86_FEATURE_FSGSBASE           ( 9*32+ 0) /* "fsgsbase" RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
> index f3ffd0a3a012..c80a3e6dba5f 100644
> --- a/arch/x86/kernel/acpi/cstate.c
> +++ b/arch/x86/kernel/acpi/cstate.c
> @@ -204,6 +204,15 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
>  }
>  EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
>
> +void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
> +{
> +       unsigned int cpu = smp_processor_id();
> +       struct cstate_entry *percpu_entry;
> +
> +       percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
> +       mwait_play_dead_with_hint(percpu_entry->states[cx->index].eax);
> +}
> +
>  void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
>  {
>         unsigned int cpu = smp_processor_id();
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 698897b29de2..586cc7d1d8aa 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -590,6 +590,8 @@ static void acpi_idle_play_dead(struct cpuidle_device *dev, int index)
>                         raw_safe_halt();
>                 else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
>                         io_idle(cx->address);
> +               } else if (cx->entry_method == ACPI_CSTATE_FFH) {
> +                       acpi_processor_ffh_play_dead(cx);
>                 } else
>                         return;
>         }
> diff --git a/include/acpi/processor.h b/include/acpi/processor.h
> index a17e97e634a6..63a37e72b721 100644
> --- a/include/acpi/processor.h
> +++ b/include/acpi/processor.h
> @@ -280,6 +280,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
>                                     struct acpi_processor_cx *cx,
>                                     struct acpi_power_register *reg);
>  void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cstate);
> +void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx);
>  #else
>  static inline void acpi_processor_power_init_bm_check(struct
>                                                       acpi_processor_flags
> @@ -300,6 +301,10 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
>  {
>         return;
>  }
> +static inline void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
> +{
> +       return;
> +}
>  #endif
>
>  static inline int call_on_cpu(int cpu, long (*fn)(void *), void *arg,
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 13:20 ` [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF Patryk Wlazlyn
@ 2024-11-25 13:44   ` Rafael J. Wysocki
  2024-11-25 14:48     ` Patryk Wlazlyn
  2024-11-25 13:53   ` Peter Zijlstra
  1 sibling, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 13:44 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:21 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> Intel's Sierra Forest report two C6 substates in cpuid leaf 5:
>     C6S  (hint 0x22)
>     C6SP (hint 0x23)
>
> Hints 0x20 and 0x21 are skipped entirely, causing the generic
> implementation in mwait_play_dead() to compute the wrong hint, when
> looking for the deepest cstate. As a result, package with an offlined
> CPU can never reach PC6.

Is this still the case with the latest firmware?

If so, this could be the second patch in the series if [1-3/7] are dropped.

Otherwise, I don't think it is needed any more.

> Define the enter_dead() handler for SRF.
>
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> ---
>  drivers/idle/intel_idle.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index ac4d8faa3886..c2ca01e74add 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -56,6 +56,7 @@
>  #include <asm/mwait.h>
>  #include <asm/spec-ctrl.h>
>  #include <asm/fpu/api.h>
> +#include <asm/smp.h>
>
>  #define INTEL_IDLE_VERSION "0.5.1"
>
> @@ -227,6 +228,16 @@ static __cpuidle int intel_idle_s2idle(struct cpuidle_device *dev,
>         return 0;
>  }
>
> +static __cpuidle void intel_idle_enter_dead(struct cpuidle_device *dev,
> +                                           int index)
> +{
> +       struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
> +       struct cpuidle_state *state = &drv->states[index];
> +       unsigned long eax = flg2MWAIT(state->flags);
> +
> +       mwait_play_dead_with_hint(eax);
> +}
> +
>  /*
>   * States are indexed by the cstate number,
>   * which is also the index into the MWAIT hint array.
> @@ -1391,6 +1402,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> +               .enter_dead = &intel_idle_enter_dead,
>                 .enter_s2idle = intel_idle_s2idle, },
>         {
>                 .name = "C1E",
> @@ -1399,6 +1411,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 10,
>                 .enter = &intel_idle,
> +               .enter_dead = &intel_idle_enter_dead,
>                 .enter_s2idle = intel_idle_s2idle, },
>         {
>                 .name = "C6S",
> @@ -1408,6 +1421,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 270,
>                 .target_residency = 700,
>                 .enter = &intel_idle,
> +               .enter_dead = &intel_idle_enter_dead,
>                 .enter_s2idle = intel_idle_s2idle, },
>         {
>                 .name = "C6SP",
> @@ -1417,6 +1431,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 310,
>                 .target_residency = 900,
>                 .enter = &intel_idle,
> +               .enter_dead = &intel_idle_enter_dead,
>                 .enter_s2idle = intel_idle_s2idle, },
>         {
>                 .enter = NULL }
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-25 13:20 ` [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead() Patryk Wlazlyn
@ 2024-11-25 13:45   ` Peter Zijlstra
  2024-11-25 13:56   ` Rafael J. Wysocki
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2024-11-25 13:45 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 02:20:25PM +0100, Patryk Wlazlyn wrote:
> The algorithm based on cpuid leaf 0x5 in the mwait_play_dead(),
> for looking up the mwait hint for the deepest cstate may calculate the
> wrong hint on recent Intel x86 platforms.
> 
> Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
> only fallback to the later in case of missing appropriate enter_dead()
> handler.

So 7/8 should go first, such that picking cpuidle doesn't end up in some
weird state.

> 
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> ---
>  arch/x86/kernel/smpboot.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d0464c7a0af5..2627b56fb9bc 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1413,9 +1413,10 @@ void native_play_dead(void)
>  	play_dead_common();
>  	tboot_shutdown(TB_SHUTDOWN_WFS);
>  
> +	/* The first successful play_dead() will not return */
> +	cpuidle_play_dead();
>  	mwait_play_dead();
> -	if (cpuidle_play_dead())
> -		hlt_play_dead();
> +	hlt_play_dead();
>  }
>  
>  #else /* ... !CONFIG_HOTPLUG_CPU */
> -- 
> 2.47.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms
  2024-11-25 13:20 ` [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms Patryk Wlazlyn
@ 2024-11-25 13:46   ` Rafael J. Wysocki
  2024-11-25 13:54   ` Peter Zijlstra
  2024-11-25 15:15   ` Gautham R. Shenoy
  2 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 13:46 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:21 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
> ---
>  drivers/acpi/processor_idle.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 586cc7d1d8aa..4b4ac8d55b55 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -803,7 +803,11 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
>
>                 state->flags = 0;
>
> -               state->enter_dead = acpi_idle_play_dead;
> +               /* AMD doesn't want to use mwait for play dead. */
> +               bool amd_or_hygon = boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> +                                   boot_cpu_data.x86_vendor == X86_VENDOR_HYGON;
> +               if (!(cx->entry_method == ACPI_CSTATE_FFH && amd_or_hygon))
> +                       state->enter_dead = acpi_idle_play_dead;
>
>                 if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
>                         drv->safe_state_index = count;
> --

I don't think this is needed.

There is a vendor check in mwait_play_dead() already, no need to
duplicate it in the otherwise arch-agnostic ACPI code.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 13:20 ` [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF Patryk Wlazlyn
  2024-11-25 13:44   ` Rafael J. Wysocki
@ 2024-11-25 13:53   ` Peter Zijlstra
  2024-11-25 13:58     ` Rafael J. Wysocki
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2024-11-25 13:53 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 02:20:26PM +0100, Patryk Wlazlyn wrote:
> Intel's Sierra Forest report two C6 substates in cpuid leaf 5:
>     C6S  (hint 0x22)
>     C6SP (hint 0x23)
> 
> Hints 0x20 and 0x21 are skipped entirely, causing the generic
> implementation in mwait_play_dead() to compute the wrong hint, when
> looking for the deepest cstate. As a result, package with an offlined
> CPU can never reach PC6.
> 
> Define the enter_dead() handler for SRF.

How about you do something like so instead?

---
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index ac4d8faa3886..c49ca89ee17c 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -240,7 +240,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
 		.exit_latency = 3,
 		.target_residency = 6,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -248,7 +248,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -256,7 +256,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
 		.exit_latency = 20,
 		.target_residency = 80,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -264,7 +264,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
 		.exit_latency = 200,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -277,7 +277,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -285,7 +285,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -293,7 +293,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
 		.exit_latency = 80,
 		.target_residency = 211,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -301,7 +301,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
 		.exit_latency = 104,
 		.target_residency = 345,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7",
 		.desc = "MWAIT 0x30",
@@ -309,7 +309,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
 		.exit_latency = 109,
 		.target_residency = 345,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -322,7 +322,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6N",
 		.desc = "MWAIT 0x58",
@@ -330,7 +330,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
 		.exit_latency = 300,
 		.target_residency = 275,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6S",
 		.desc = "MWAIT 0x52",
@@ -338,7 +338,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
 		.exit_latency = 500,
 		.target_residency = 560,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7",
 		.desc = "MWAIT 0x60",
@@ -346,7 +346,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
 		.exit_latency = 1200,
 		.target_residency = 4000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7S",
 		.desc = "MWAIT 0x64",
@@ -354,7 +354,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
 		.exit_latency = 10000,
 		.target_residency = 20000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -367,7 +367,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6N",
 		.desc = "MWAIT 0x58",
@@ -375,7 +375,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
 		.exit_latency = 80,
 		.target_residency = 275,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6S",
 		.desc = "MWAIT 0x52",
@@ -383,7 +383,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
 		.exit_latency = 200,
 		.target_residency = 560,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7",
 		.desc = "MWAIT 0x60",
@@ -391,7 +391,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
 		.exit_latency = 1200,
 		.target_residency = 4000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7S",
 		.desc = "MWAIT 0x64",
@@ -399,7 +399,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
 		.exit_latency = 10000,
 		.target_residency = 20000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -412,7 +412,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -420,7 +420,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -428,7 +428,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
 		.exit_latency = 59,
 		.target_residency = 156,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -436,7 +436,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
 		.exit_latency = 80,
 		.target_residency = 300,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7",
 		.desc = "MWAIT 0x30",
@@ -444,7 +444,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
 		.exit_latency = 87,
 		.target_residency = 300,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -457,7 +457,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -465,7 +465,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 80,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -473,7 +473,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
 		.exit_latency = 59,
 		.target_residency = 156,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -481,7 +481,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
 		.exit_latency = 82,
 		.target_residency = 300,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -494,7 +494,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -502,7 +502,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 250,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -510,7 +510,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
 		.exit_latency = 59,
 		.target_residency = 300,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -518,7 +518,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
 		.exit_latency = 84,
 		.target_residency = 400,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -531,7 +531,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -539,7 +539,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 500,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -547,7 +547,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
 		.exit_latency = 59,
 		.target_residency = 600,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -555,7 +555,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
 		.exit_latency = 88,
 		.target_residency = 700,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -568,7 +568,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -576,7 +576,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -584,7 +584,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 33,
 		.target_residency = 100,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -592,7 +592,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 133,
 		.target_residency = 400,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7s",
 		.desc = "MWAIT 0x32",
@@ -600,7 +600,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 166,
 		.target_residency = 500,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C8",
 		.desc = "MWAIT 0x40",
@@ -608,7 +608,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 300,
 		.target_residency = 900,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C9",
 		.desc = "MWAIT 0x50",
@@ -616,7 +616,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 600,
 		.target_residency = 1800,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -624,7 +624,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
 		.exit_latency = 2600,
 		.target_residency = 7700,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -636,7 +636,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -644,7 +644,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -652,7 +652,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 40,
 		.target_residency = 100,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -660,7 +660,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 133,
 		.target_residency = 400,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7s",
 		.desc = "MWAIT 0x32",
@@ -668,7 +668,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 166,
 		.target_residency = 500,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C8",
 		.desc = "MWAIT 0x40",
@@ -676,7 +676,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 300,
 		.target_residency = 900,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C9",
 		.desc = "MWAIT 0x50",
@@ -684,7 +684,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 600,
 		.target_residency = 1800,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -692,7 +692,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
 		.exit_latency = 2600,
 		.target_residency = 7700,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -705,7 +705,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -713,7 +713,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C3",
 		.desc = "MWAIT 0x10",
@@ -721,7 +721,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 70,
 		.target_residency = 100,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -729,7 +729,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 85,
 		.target_residency = 200,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7s",
 		.desc = "MWAIT 0x33",
@@ -737,7 +737,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 124,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C8",
 		.desc = "MWAIT 0x40",
@@ -745,7 +745,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 200,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C9",
 		.desc = "MWAIT 0x50",
@@ -753,7 +753,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 480,
 		.target_residency = 5000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -761,7 +761,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
 		.exit_latency = 890,
 		.target_residency = 5000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -774,7 +774,7 @@ static struct cpuidle_state skx_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -782,7 +782,7 @@ static struct cpuidle_state skx_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -790,7 +790,7 @@ static struct cpuidle_state skx_cstates[] __initdata = {
 		.exit_latency = 133,
 		.target_residency = 600,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -803,7 +803,7 @@ static struct cpuidle_state icx_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -811,7 +811,7 @@ static struct cpuidle_state icx_cstates[] __initdata = {
 		.exit_latency = 4,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -819,7 +819,7 @@ static struct cpuidle_state icx_cstates[] __initdata = {
 		.exit_latency = 170,
 		.target_residency = 600,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -842,7 +842,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -850,7 +850,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -858,7 +858,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
 		.exit_latency = 220,
 		.target_residency = 600,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C8",
 		.desc = "MWAIT 0x40",
@@ -866,7 +866,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
 		.exit_latency = 280,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -874,7 +874,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
 		.exit_latency = 680,
 		.target_residency = 2000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -887,7 +887,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -895,7 +895,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -903,7 +903,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
 		.exit_latency = 170,
 		.target_residency = 500,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C8",
 		.desc = "MWAIT 0x40",
@@ -911,7 +911,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
 		.exit_latency = 200,
 		.target_residency = 600,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -919,7 +919,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
 		.exit_latency = 230,
 		.target_residency = 700,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -932,7 +932,7 @@ static struct cpuidle_state mtl_l_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -940,7 +940,7 @@ static struct cpuidle_state mtl_l_cstates[] __initdata = {
 		.exit_latency = 140,
 		.target_residency = 420,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -948,7 +948,7 @@ static struct cpuidle_state mtl_l_cstates[] __initdata = {
 		.exit_latency = 310,
 		.target_residency = 930,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -961,7 +961,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -969,7 +969,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -977,7 +977,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
 		.exit_latency = 195,
 		.target_residency = 585,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C8",
 		.desc = "MWAIT 0x40",
@@ -985,7 +985,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
 		.exit_latency = 260,
 		.target_residency = 1040,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -993,7 +993,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
 		.exit_latency = 660,
 		.target_residency = 1980,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1006,7 +1006,7 @@ static struct cpuidle_state spr_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1014,7 +1014,7 @@ static struct cpuidle_state spr_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -1023,7 +1023,7 @@ static struct cpuidle_state spr_cstates[] __initdata = {
 		.exit_latency = 290,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1036,7 +1036,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1044,7 +1044,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
 		.exit_latency = 4,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -1054,7 +1054,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
 		.exit_latency = 170,
 		.target_residency = 650,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6P",
 		.desc = "MWAIT 0x21",
@@ -1064,7 +1064,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
 		.exit_latency = 210,
 		.target_residency = 1000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1077,7 +1077,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1085,7 +1085,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
 		.exit_latency = 4,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -1095,7 +1095,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
 		.exit_latency = 220,
 		.target_residency = 650,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6P",
 		.desc = "MWAIT 0x21",
@@ -1105,7 +1105,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
 		.exit_latency = 240,
 		.target_residency = 750,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1118,7 +1118,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C2",
 		.desc = "MWAIT 0x10",
@@ -1126,7 +1126,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
 		.exit_latency = 20,
 		.target_residency = 80,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C4",
 		.desc = "MWAIT 0x30",
@@ -1134,7 +1134,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
 		.exit_latency = 100,
 		.target_residency = 400,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x52",
@@ -1142,7 +1142,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
 		.exit_latency = 140,
 		.target_residency = 560,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1154,7 +1154,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 4,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C4",
 		.desc = "MWAIT 0x30",
@@ -1162,7 +1162,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
 		.exit_latency = 100,
 		.target_residency = 400,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x52",
@@ -1170,7 +1170,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
 		.exit_latency = 140,
 		.target_residency = 560,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7",
 		.desc = "MWAIT 0x60",
@@ -1178,7 +1178,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
 		.exit_latency = 1200,
 		.target_residency = 4000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C9",
 		.desc = "MWAIT 0x64",
@@ -1186,7 +1186,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
 		.exit_latency = 10000,
 		.target_residency = 20000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1198,7 +1198,7 @@ static struct cpuidle_state avn_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x51",
@@ -1206,7 +1206,7 @@ static struct cpuidle_state avn_cstates[] __initdata = {
 		.exit_latency = 15,
 		.target_residency = 45,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1239,7 +1239,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1247,7 +1247,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -1255,7 +1255,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
 		.exit_latency = 133,
 		.target_residency = 133,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C7s",
 		.desc = "MWAIT 0x31",
@@ -1263,7 +1263,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
 		.exit_latency = 155,
 		.target_residency = 155,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C8",
 		.desc = "MWAIT 0x40",
@@ -1271,7 +1271,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
 		.exit_latency = 1000,
 		.target_residency = 1000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C9",
 		.desc = "MWAIT 0x50",
@@ -1279,7 +1279,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
 		.exit_latency = 2000,
 		.target_residency = 2000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C10",
 		.desc = "MWAIT 0x60",
@@ -1287,7 +1287,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
 		.exit_latency = 10000,
 		.target_residency = 10000,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1300,7 +1300,7 @@ static struct cpuidle_state dnv_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1308,7 +1308,7 @@ static struct cpuidle_state dnv_cstates[] __initdata = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -1316,7 +1316,7 @@ static struct cpuidle_state dnv_cstates[] __initdata = {
 		.exit_latency = 50,
 		.target_residency = 500,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1333,7 +1333,7 @@ static struct cpuidle_state snr_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1341,7 +1341,7 @@ static struct cpuidle_state snr_cstates[] __initdata = {
 		.exit_latency = 15,
 		.target_residency = 25,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6",
 		.desc = "MWAIT 0x20",
@@ -1349,7 +1349,7 @@ static struct cpuidle_state snr_cstates[] __initdata = {
 		.exit_latency = 130,
 		.target_residency = 500,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1362,7 +1362,7 @@ static struct cpuidle_state grr_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1370,7 +1370,7 @@ static struct cpuidle_state grr_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 10,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6S",
 		.desc = "MWAIT 0x22",
@@ -1378,7 +1378,7 @@ static struct cpuidle_state grr_cstates[] __initdata = {
 		.exit_latency = 140,
 		.target_residency = 500,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -1391,7 +1391,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 1,
 		.target_residency = 1,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C1E",
 		.desc = "MWAIT 0x01",
@@ -1399,7 +1399,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 2,
 		.target_residency = 10,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6S",
 		.desc = "MWAIT 0x22",
@@ -1408,7 +1408,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 270,
 		.target_residency = 700,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.name = "C6SP",
 		.desc = "MWAIT 0x23",
@@ -1417,7 +1417,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
 		.exit_latency = 310,
 		.target_residency = 900,
 		.enter = &intel_idle,
-		.enter_s2idle = intel_idle_s2idle, },
+	},
 	{
 		.enter = NULL }
 };
@@ -2143,10 +2143,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 		if (intel_idle_max_cstate_reached(cstate))
 			break;
 
-		if (!cpuidle_state_table[cstate].enter &&
-		    !cpuidle_state_table[cstate].enter_s2idle)
+		if (!cpuidle_state_table[cstate].enter)
 			break;
 
+		if (!cpuidle_state_table[cstate].enter_dead)
+			cpuidle_state_table[cstate].enter_dead = intel_idle_enter_dead;
+
+		if (!cpuidle_state_table[cstate].enter_s2idle)
+			cpuidle_state_table[cstate].enter_s2idle = intel_idle_s2idle;
+
 		/* If marked as unusable, skip this state. */
 		if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_UNUSABLE) {
 			pr_debug("state %s is disabled\n",

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms
  2024-11-25 13:20 ` [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms Patryk Wlazlyn
  2024-11-25 13:46   ` Rafael J. Wysocki
@ 2024-11-25 13:54   ` Peter Zijlstra
  2024-11-25 14:56     ` Patryk Wlazlyn
  2024-11-25 15:15   ` Gautham R. Shenoy
  2 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2024-11-25 13:54 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 02:20:28PM +0100, Patryk Wlazlyn wrote:
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
> ---
>  drivers/acpi/processor_idle.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 586cc7d1d8aa..4b4ac8d55b55 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -803,7 +803,11 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
>  
>  		state->flags = 0;
>  
> -		state->enter_dead = acpi_idle_play_dead;
> +		/* AMD doesn't want to use mwait for play dead. */
> +		bool amd_or_hygon = boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> +				    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON;
> +		if (!(cx->entry_method == ACPI_CSTATE_FFH && amd_or_hygon))
> +			state->enter_dead = acpi_idle_play_dead;

So I don't like this. Less exceptions is better.

This *SHOULD* never trigger on AMD anyway, because they recommend IO
port C[23]. But if their partner BIOS engineer does a wobbly and they
end up in MWAIT anyway, it *should* all work regardless.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-25 13:20 ` [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead() Patryk Wlazlyn
  2024-11-25 13:45   ` Peter Zijlstra
@ 2024-11-25 13:56   ` Rafael J. Wysocki
  2024-11-25 14:43     ` Patryk Wlazlyn
  1 sibling, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 13:56 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:21 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> The algorithm based on cpuid leaf 0x5 in the mwait_play_dead(),
> for looking up the mwait hint for the deepest cstate may calculate the
> wrong hint on recent Intel x86 platforms.
>
> Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
> only fallback to the later in case of missing appropriate enter_dead()
> handler.
>
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> ---
>  arch/x86/kernel/smpboot.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d0464c7a0af5..2627b56fb9bc 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1413,9 +1413,10 @@ void native_play_dead(void)
>         play_dead_common();
>         tboot_shutdown(TB_SHUTDOWN_WFS);
>
> +       /* The first successful play_dead() will not return */
> +       cpuidle_play_dead();
>         mwait_play_dead();
> -       if (cpuidle_play_dead())
> -               hlt_play_dead();
> +       hlt_play_dead();
>  }
>
>  #else /* ... !CONFIG_HOTPLUG_CPU */
> --

If you first make intel_idle provide :enter_dead() for all CPUs on all
platforms and implement it by calling mwait_play_dead_with_hint(), you
won't need mwait_play_dead() any more.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 13:53   ` Peter Zijlstra
@ 2024-11-25 13:58     ` Rafael J. Wysocki
  2024-11-25 14:12       ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patryk Wlazlyn, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:53 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Nov 25, 2024 at 02:20:26PM +0100, Patryk Wlazlyn wrote:
> > Intel's Sierra Forest report two C6 substates in cpuid leaf 5:
> >     C6S  (hint 0x22)
> >     C6SP (hint 0x23)
> >
> > Hints 0x20 and 0x21 are skipped entirely, causing the generic
> > implementation in mwait_play_dead() to compute the wrong hint, when
> > looking for the deepest cstate. As a result, package with an offlined
> > CPU can never reach PC6.
> >
> > Define the enter_dead() handler for SRF.
>
> How about you do something like so instead?
>
> ---
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index ac4d8faa3886..c49ca89ee17c 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -240,7 +240,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
>                 .exit_latency = 3,
>                 .target_residency = 6,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -248,7 +248,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -256,7 +256,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
>                 .exit_latency = 20,
>                 .target_residency = 80,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -264,7 +264,7 @@ static struct cpuidle_state nehalem_cstates[] __initdata = {
>                 .exit_latency = 200,
>                 .target_residency = 800,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -277,7 +277,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -285,7 +285,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -293,7 +293,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
>                 .exit_latency = 80,
>                 .target_residency = 211,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -301,7 +301,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
>                 .exit_latency = 104,
>                 .target_residency = 345,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7",
>                 .desc = "MWAIT 0x30",
> @@ -309,7 +309,7 @@ static struct cpuidle_state snb_cstates[] __initdata = {
>                 .exit_latency = 109,
>                 .target_residency = 345,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -322,7 +322,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6N",
>                 .desc = "MWAIT 0x58",
> @@ -330,7 +330,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
>                 .exit_latency = 300,
>                 .target_residency = 275,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6S",
>                 .desc = "MWAIT 0x52",
> @@ -338,7 +338,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
>                 .exit_latency = 500,
>                 .target_residency = 560,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7",
>                 .desc = "MWAIT 0x60",
> @@ -346,7 +346,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
>                 .exit_latency = 1200,
>                 .target_residency = 4000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7S",
>                 .desc = "MWAIT 0x64",
> @@ -354,7 +354,7 @@ static struct cpuidle_state byt_cstates[] __initdata = {
>                 .exit_latency = 10000,
>                 .target_residency = 20000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -367,7 +367,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6N",
>                 .desc = "MWAIT 0x58",
> @@ -375,7 +375,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
>                 .exit_latency = 80,
>                 .target_residency = 275,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6S",
>                 .desc = "MWAIT 0x52",
> @@ -383,7 +383,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
>                 .exit_latency = 200,
>                 .target_residency = 560,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7",
>                 .desc = "MWAIT 0x60",
> @@ -391,7 +391,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
>                 .exit_latency = 1200,
>                 .target_residency = 4000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7S",
>                 .desc = "MWAIT 0x64",
> @@ -399,7 +399,7 @@ static struct cpuidle_state cht_cstates[] __initdata = {
>                 .exit_latency = 10000,
>                 .target_residency = 20000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -412,7 +412,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -420,7 +420,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -428,7 +428,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
>                 .exit_latency = 59,
>                 .target_residency = 156,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -436,7 +436,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
>                 .exit_latency = 80,
>                 .target_residency = 300,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7",
>                 .desc = "MWAIT 0x30",
> @@ -444,7 +444,7 @@ static struct cpuidle_state ivb_cstates[] __initdata = {
>                 .exit_latency = 87,
>                 .target_residency = 300,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -457,7 +457,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -465,7 +465,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 80,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -473,7 +473,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
>                 .exit_latency = 59,
>                 .target_residency = 156,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -481,7 +481,7 @@ static struct cpuidle_state ivt_cstates[] __initdata = {
>                 .exit_latency = 82,
>                 .target_residency = 300,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -494,7 +494,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -502,7 +502,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 250,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -510,7 +510,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
>                 .exit_latency = 59,
>                 .target_residency = 300,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -518,7 +518,7 @@ static struct cpuidle_state ivt_cstates_4s[] __initdata = {
>                 .exit_latency = 84,
>                 .target_residency = 400,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -531,7 +531,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -539,7 +539,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 500,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -547,7 +547,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
>                 .exit_latency = 59,
>                 .target_residency = 600,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -555,7 +555,7 @@ static struct cpuidle_state ivt_cstates_8s[] __initdata = {
>                 .exit_latency = 88,
>                 .target_residency = 700,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -568,7 +568,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -576,7 +576,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -584,7 +584,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 33,
>                 .target_residency = 100,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -592,7 +592,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 133,
>                 .target_residency = 400,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7s",
>                 .desc = "MWAIT 0x32",
> @@ -600,7 +600,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 166,
>                 .target_residency = 500,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C8",
>                 .desc = "MWAIT 0x40",
> @@ -608,7 +608,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 300,
>                 .target_residency = 900,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C9",
>                 .desc = "MWAIT 0x50",
> @@ -616,7 +616,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 600,
>                 .target_residency = 1800,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -624,7 +624,7 @@ static struct cpuidle_state hsw_cstates[] __initdata = {
>                 .exit_latency = 2600,
>                 .target_residency = 7700,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -636,7 +636,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -644,7 +644,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -652,7 +652,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 40,
>                 .target_residency = 100,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -660,7 +660,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 133,
>                 .target_residency = 400,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7s",
>                 .desc = "MWAIT 0x32",
> @@ -668,7 +668,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 166,
>                 .target_residency = 500,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C8",
>                 .desc = "MWAIT 0x40",
> @@ -676,7 +676,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 300,
>                 .target_residency = 900,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C9",
>                 .desc = "MWAIT 0x50",
> @@ -684,7 +684,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 600,
>                 .target_residency = 1800,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -692,7 +692,7 @@ static struct cpuidle_state bdw_cstates[] __initdata = {
>                 .exit_latency = 2600,
>                 .target_residency = 7700,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -705,7 +705,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -713,7 +713,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C3",
>                 .desc = "MWAIT 0x10",
> @@ -721,7 +721,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 70,
>                 .target_residency = 100,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -729,7 +729,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 85,
>                 .target_residency = 200,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7s",
>                 .desc = "MWAIT 0x33",
> @@ -737,7 +737,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 124,
>                 .target_residency = 800,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C8",
>                 .desc = "MWAIT 0x40",
> @@ -745,7 +745,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 200,
>                 .target_residency = 800,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C9",
>                 .desc = "MWAIT 0x50",
> @@ -753,7 +753,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 480,
>                 .target_residency = 5000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -761,7 +761,7 @@ static struct cpuidle_state skl_cstates[] __initdata = {
>                 .exit_latency = 890,
>                 .target_residency = 5000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -774,7 +774,7 @@ static struct cpuidle_state skx_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -782,7 +782,7 @@ static struct cpuidle_state skx_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -790,7 +790,7 @@ static struct cpuidle_state skx_cstates[] __initdata = {
>                 .exit_latency = 133,
>                 .target_residency = 600,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -803,7 +803,7 @@ static struct cpuidle_state icx_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -811,7 +811,7 @@ static struct cpuidle_state icx_cstates[] __initdata = {
>                 .exit_latency = 4,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -819,7 +819,7 @@ static struct cpuidle_state icx_cstates[] __initdata = {
>                 .exit_latency = 170,
>                 .target_residency = 600,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -842,7 +842,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -850,7 +850,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -858,7 +858,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
>                 .exit_latency = 220,
>                 .target_residency = 600,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C8",
>                 .desc = "MWAIT 0x40",
> @@ -866,7 +866,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
>                 .exit_latency = 280,
>                 .target_residency = 800,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -874,7 +874,7 @@ static struct cpuidle_state adl_cstates[] __initdata = {
>                 .exit_latency = 680,
>                 .target_residency = 2000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -887,7 +887,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -895,7 +895,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -903,7 +903,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
>                 .exit_latency = 170,
>                 .target_residency = 500,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C8",
>                 .desc = "MWAIT 0x40",
> @@ -911,7 +911,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
>                 .exit_latency = 200,
>                 .target_residency = 600,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -919,7 +919,7 @@ static struct cpuidle_state adl_l_cstates[] __initdata = {
>                 .exit_latency = 230,
>                 .target_residency = 700,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -932,7 +932,7 @@ static struct cpuidle_state mtl_l_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -940,7 +940,7 @@ static struct cpuidle_state mtl_l_cstates[] __initdata = {
>                 .exit_latency = 140,
>                 .target_residency = 420,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -948,7 +948,7 @@ static struct cpuidle_state mtl_l_cstates[] __initdata = {
>                 .exit_latency = 310,
>                 .target_residency = 930,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -961,7 +961,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -969,7 +969,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -977,7 +977,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
>                 .exit_latency = 195,
>                 .target_residency = 585,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C8",
>                 .desc = "MWAIT 0x40",
> @@ -985,7 +985,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
>                 .exit_latency = 260,
>                 .target_residency = 1040,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -993,7 +993,7 @@ static struct cpuidle_state gmt_cstates[] __initdata = {
>                 .exit_latency = 660,
>                 .target_residency = 1980,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1006,7 +1006,7 @@ static struct cpuidle_state spr_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1014,7 +1014,7 @@ static struct cpuidle_state spr_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -1023,7 +1023,7 @@ static struct cpuidle_state spr_cstates[] __initdata = {
>                 .exit_latency = 290,
>                 .target_residency = 800,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1036,7 +1036,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1044,7 +1044,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
>                 .exit_latency = 4,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -1054,7 +1054,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
>                 .exit_latency = 170,
>                 .target_residency = 650,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6P",
>                 .desc = "MWAIT 0x21",
> @@ -1064,7 +1064,7 @@ static struct cpuidle_state gnr_cstates[] __initdata = {
>                 .exit_latency = 210,
>                 .target_residency = 1000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1077,7 +1077,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1085,7 +1085,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
>                 .exit_latency = 4,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -1095,7 +1095,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
>                 .exit_latency = 220,
>                 .target_residency = 650,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6P",
>                 .desc = "MWAIT 0x21",
> @@ -1105,7 +1105,7 @@ static struct cpuidle_state gnrd_cstates[] __initdata = {
>                 .exit_latency = 240,
>                 .target_residency = 750,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1118,7 +1118,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C2",
>                 .desc = "MWAIT 0x10",
> @@ -1126,7 +1126,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
>                 .exit_latency = 20,
>                 .target_residency = 80,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C4",
>                 .desc = "MWAIT 0x30",
> @@ -1134,7 +1134,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
>                 .exit_latency = 100,
>                 .target_residency = 400,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x52",
> @@ -1142,7 +1142,7 @@ static struct cpuidle_state atom_cstates[] __initdata = {
>                 .exit_latency = 140,
>                 .target_residency = 560,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1154,7 +1154,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 4,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C4",
>                 .desc = "MWAIT 0x30",
> @@ -1162,7 +1162,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
>                 .exit_latency = 100,
>                 .target_residency = 400,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x52",
> @@ -1170,7 +1170,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
>                 .exit_latency = 140,
>                 .target_residency = 560,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7",
>                 .desc = "MWAIT 0x60",
> @@ -1178,7 +1178,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
>                 .exit_latency = 1200,
>                 .target_residency = 4000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C9",
>                 .desc = "MWAIT 0x64",
> @@ -1186,7 +1186,7 @@ static struct cpuidle_state tangier_cstates[] __initdata = {
>                 .exit_latency = 10000,
>                 .target_residency = 20000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1198,7 +1198,7 @@ static struct cpuidle_state avn_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x51",
> @@ -1206,7 +1206,7 @@ static struct cpuidle_state avn_cstates[] __initdata = {
>                 .exit_latency = 15,
>                 .target_residency = 45,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1239,7 +1239,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1247,7 +1247,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -1255,7 +1255,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
>                 .exit_latency = 133,
>                 .target_residency = 133,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C7s",
>                 .desc = "MWAIT 0x31",
> @@ -1263,7 +1263,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
>                 .exit_latency = 155,
>                 .target_residency = 155,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C8",
>                 .desc = "MWAIT 0x40",
> @@ -1271,7 +1271,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
>                 .exit_latency = 1000,
>                 .target_residency = 1000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C9",
>                 .desc = "MWAIT 0x50",
> @@ -1279,7 +1279,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
>                 .exit_latency = 2000,
>                 .target_residency = 2000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C10",
>                 .desc = "MWAIT 0x60",
> @@ -1287,7 +1287,7 @@ static struct cpuidle_state bxt_cstates[] __initdata = {
>                 .exit_latency = 10000,
>                 .target_residency = 10000,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1300,7 +1300,7 @@ static struct cpuidle_state dnv_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1308,7 +1308,7 @@ static struct cpuidle_state dnv_cstates[] __initdata = {
>                 .exit_latency = 10,
>                 .target_residency = 20,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -1316,7 +1316,7 @@ static struct cpuidle_state dnv_cstates[] __initdata = {
>                 .exit_latency = 50,
>                 .target_residency = 500,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1333,7 +1333,7 @@ static struct cpuidle_state snr_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 2,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1341,7 +1341,7 @@ static struct cpuidle_state snr_cstates[] __initdata = {
>                 .exit_latency = 15,
>                 .target_residency = 25,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6",
>                 .desc = "MWAIT 0x20",
> @@ -1349,7 +1349,7 @@ static struct cpuidle_state snr_cstates[] __initdata = {
>                 .exit_latency = 130,
>                 .target_residency = 500,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1362,7 +1362,7 @@ static struct cpuidle_state grr_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1370,7 +1370,7 @@ static struct cpuidle_state grr_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 10,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6S",
>                 .desc = "MWAIT 0x22",
> @@ -1378,7 +1378,7 @@ static struct cpuidle_state grr_cstates[] __initdata = {
>                 .exit_latency = 140,
>                 .target_residency = 500,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -1391,7 +1391,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 1,
>                 .target_residency = 1,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C1E",
>                 .desc = "MWAIT 0x01",
> @@ -1399,7 +1399,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 2,
>                 .target_residency = 10,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6S",
>                 .desc = "MWAIT 0x22",
> @@ -1408,7 +1408,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 270,
>                 .target_residency = 700,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .name = "C6SP",
>                 .desc = "MWAIT 0x23",
> @@ -1417,7 +1417,7 @@ static struct cpuidle_state srf_cstates[] __initdata = {
>                 .exit_latency = 310,
>                 .target_residency = 900,
>                 .enter = &intel_idle,
> -               .enter_s2idle = intel_idle_s2idle, },
> +       },
>         {
>                 .enter = NULL }
>  };
> @@ -2143,10 +2143,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
>                 if (intel_idle_max_cstate_reached(cstate))
>                         break;
>
> -               if (!cpuidle_state_table[cstate].enter &&
> -                   !cpuidle_state_table[cstate].enter_s2idle)
> +               if (!cpuidle_state_table[cstate].enter)
>                         break;
>
> +               if (!cpuidle_state_table[cstate].enter_dead)
> +                       cpuidle_state_table[cstate].enter_dead = intel_idle_enter_dead;
> +
> +               if (!cpuidle_state_table[cstate].enter_s2idle)
> +                       cpuidle_state_table[cstate].enter_s2idle = intel_idle_s2idle;
> +
>                 /* If marked as unusable, skip this state. */
>                 if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_UNUSABLE) {
>                         pr_debug("state %s is disabled\n",

You can do the same thing with :enter() I think?  And it doesn't need
to be in one go.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling
  2024-11-25 13:41   ` Rafael J. Wysocki
@ 2024-11-25 14:00     ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 14:00 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:41 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Nov 25, 2024 at 2:21 PM Patryk Wlazlyn
> <patryk.wlazlyn@linux.intel.com> wrote:
> >
> > Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
>
> The changes below look good to me, but the patch needs a changelog
> which should explain why it is needed or useful.
>
> In this particular case, you want to change the code ordering in
> native_play_dead() so that it calls cpuidle_play_dead() first and only
> fall back to anything else if that fails.
>
> In particular, this needs to work when intel_idle is not in use and
> the entry method for at least one idle state in the _CST return
> package for at least one CPU is ACPI_CSTATE_FFH, so this case needs to
> be added to acpi_idle_play_dead().
>
> You may also make a note in the changelog that had there been a
> non-Intel x86 platform with a _CST returning idle states where CPU is
> ACPI_CSTATE_FFH had been the entry method, it wouldn't have been
> handled correctly today (but this is academic because of the lack of
> such platforms).
>
> Also, this can be the first patch in your series if [1-3/7] are dropped.

Ah sorry, not quite.

You need to introduce mwait_play_dead_with_hint() first of course, so
it could be the second patch in your series and the intel_idle one
would be the third.

> > ---
> >  arch/x86/include/asm/cpufeatures.h | 1 +
> >  arch/x86/kernel/acpi/cstate.c      | 9 +++++++++
> >  drivers/acpi/processor_idle.c      | 2 ++
> >  include/acpi/processor.h           | 5 +++++
> >  4 files changed, 17 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index ea33439a5d00..1da5e08de257 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -236,6 +236,7 @@
> >  #define X86_FEATURE_PVUNLOCK           ( 8*32+20) /* PV unlock function */
> >  #define X86_FEATURE_VCPUPREEMPT                ( 8*32+21) /* PV vcpu_is_preempted function */
> >  #define X86_FEATURE_TDX_GUEST          ( 8*32+22) /* "tdx_guest" Intel Trust Domain Extensions Guest */
> > +#define X86_FEATURE_NO_MWAIT_OFFLINE    ( 8*32+23) /* Don't use MWAIT states for offlined CPUs */
> >
> >  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
> >  #define X86_FEATURE_FSGSBASE           ( 9*32+ 0) /* "fsgsbase" RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
> > diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
> > index f3ffd0a3a012..c80a3e6dba5f 100644
> > --- a/arch/x86/kernel/acpi/cstate.c
> > +++ b/arch/x86/kernel/acpi/cstate.c
> > @@ -204,6 +204,15 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
> >  }
> >  EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
> >
> > +void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
> > +{
> > +       unsigned int cpu = smp_processor_id();
> > +       struct cstate_entry *percpu_entry;
> > +
> > +       percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
> > +       mwait_play_dead_with_hint(percpu_entry->states[cx->index].eax);
> > +}
> > +
> >  void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
> >  {
> >         unsigned int cpu = smp_processor_id();
> > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> > index 698897b29de2..586cc7d1d8aa 100644
> > --- a/drivers/acpi/processor_idle.c
> > +++ b/drivers/acpi/processor_idle.c
> > @@ -590,6 +590,8 @@ static void acpi_idle_play_dead(struct cpuidle_device *dev, int index)
> >                         raw_safe_halt();
> >                 else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
> >                         io_idle(cx->address);
> > +               } else if (cx->entry_method == ACPI_CSTATE_FFH) {
> > +                       acpi_processor_ffh_play_dead(cx);
> >                 } else
> >                         return;
> >         }
> > diff --git a/include/acpi/processor.h b/include/acpi/processor.h
> > index a17e97e634a6..63a37e72b721 100644
> > --- a/include/acpi/processor.h
> > +++ b/include/acpi/processor.h
> > @@ -280,6 +280,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
> >                                     struct acpi_processor_cx *cx,
> >                                     struct acpi_power_register *reg);
> >  void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cstate);
> > +void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx);
> >  #else
> >  static inline void acpi_processor_power_init_bm_check(struct
> >                                                       acpi_processor_flags
> > @@ -300,6 +301,10 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
> >  {
> >         return;
> >  }
> > +static inline void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
> > +{
> > +       return;
> > +}
> >  #endif
> >
> >  static inline int call_on_cpu(int cpu, long (*fn)(void *), void *arg,
> > --
> > 2.47.0
> >
> >

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 13:58     ` Rafael J. Wysocki
@ 2024-11-25 14:12       ` Peter Zijlstra
  2024-11-25 14:19         ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2024-11-25 14:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Patryk Wlazlyn, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 02:58:50PM +0100, Rafael J. Wysocki wrote:

> > @@ -2143,10 +2143,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
> >                 if (intel_idle_max_cstate_reached(cstate))
> >                         break;
> >
> > -               if (!cpuidle_state_table[cstate].enter &&
> > -                   !cpuidle_state_table[cstate].enter_s2idle)
> > +               if (!cpuidle_state_table[cstate].enter)
> >                         break;
> >
> > +               if (!cpuidle_state_table[cstate].enter_dead)
> > +                       cpuidle_state_table[cstate].enter_dead = intel_idle_enter_dead;
> > +
> > +               if (!cpuidle_state_table[cstate].enter_s2idle)
> > +                       cpuidle_state_table[cstate].enter_s2idle = intel_idle_s2idle;
> > +
> >                 /* If marked as unusable, skip this state. */
> >                 if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_UNUSABLE) {
> >                         pr_debug("state %s is disabled\n",
> 
> You can do the same thing with :enter() I think?  And it doesn't need
> to be in one go.

!enter is used as a terminator.

And it's harder to not do everything in one go; regex ftw :-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 14:12       ` Peter Zijlstra
@ 2024-11-25 14:19         ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 14:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Patryk Wlazlyn, x86, linux-kernel, linux-pm,
	rafael.j.wysocki, len.brown, artem.bityutskiy, dave.hansen, tglx,
	gautham.shenoy

On Mon, Nov 25, 2024 at 3:12 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Nov 25, 2024 at 02:58:50PM +0100, Rafael J. Wysocki wrote:
>
> > > @@ -2143,10 +2143,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
> > >                 if (intel_idle_max_cstate_reached(cstate))
> > >                         break;
> > >
> > > -               if (!cpuidle_state_table[cstate].enter &&
> > > -                   !cpuidle_state_table[cstate].enter_s2idle)
> > > +               if (!cpuidle_state_table[cstate].enter)
> > >                         break;
> > >
> > > +               if (!cpuidle_state_table[cstate].enter_dead)
> > > +                       cpuidle_state_table[cstate].enter_dead = intel_idle_enter_dead;
> > > +
> > > +               if (!cpuidle_state_table[cstate].enter_s2idle)
> > > +                       cpuidle_state_table[cstate].enter_s2idle = intel_idle_s2idle;
> > > +
> > >                 /* If marked as unusable, skip this state. */
> > >                 if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_UNUSABLE) {
> > >                         pr_debug("state %s is disabled\n",
> >
> > You can do the same thing with :enter() I think?  And it doesn't need
> > to be in one go.
>
> !enter is used as a terminator.

Ah, OK

Something like (!name && !desc) might be used for this purpose though.

> And it's harder to not do everything in one go; regex ftw :-)

Yeah, whatever.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states
  2024-11-25 13:20 ` [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states Patryk Wlazlyn
  2024-11-25 13:24   ` Rafael J. Wysocki
@ 2024-11-25 14:36   ` Gautham R. Shenoy
  1 sibling, 0 replies; 42+ messages in thread
From: Gautham R. Shenoy @ 2024-11-25 14:36 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx

On Mon, Nov 25, 2024 at 02:20:23PM +0100, Patryk Wlazlyn wrote:
> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> 
> Notice that acpi_processor_setup_cstates() can set state->enter_dead to acpi_idle_play_dead() for all C-states unconditionally and remove the
> confusing C-state type check done before setting it.
> 
> No intentional functional impact.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # 6.12-rc7
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

I am fine with this change.
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>

> ---
>  drivers/acpi/processor_idle.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index ce728cf7e301..698897b29de2 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -800,12 +800,12 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
>  		state->enter = acpi_idle_enter;
>  
>  		state->flags = 0;
> -		if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2 ||
> -		    cx->type == ACPI_STATE_C3) {
> -			state->enter_dead = acpi_idle_play_dead;
> -			if (cx->type != ACPI_STATE_C3)
> -				drv->safe_state_index = count;
> -		}
> +
> +		state->enter_dead = acpi_idle_play_dead;
> +
> +		if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
> +			drv->safe_state_index = count;
> +
>  		/*
>  		 * Halt-induced C1 is not good for ->enter_s2idle, because it
>  		 * re-enables interrupts on exit.  Moreover, C1 is generally not
> -- 
> 2.47.0
> 
--
Thanks and Regards
gautham.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures
  2024-11-25 13:26   ` Rafael J. Wysocki
@ 2024-11-25 14:36     ` Patryk Wlazlyn
  0 siblings, 0 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 14:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

> This one and the [2/2] are in the mainline as of today, no need to resend theM
ACK. Purely RFC. I just want to make sure we are all on the same page.
I'll rebase when we all agree on how to solve the problem.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states
  2024-11-25 13:24   ` Rafael J. Wysocki
@ 2024-11-25 14:39     ` Patryk Wlazlyn
  0 siblings, 0 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 14:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

> This already is in the queue of patches to be merged during the 6.13
> merge window.
>
> I gather that it has been included here for completeness.
As in the previous email.
Just an update to the series. Purely RFC. I'll rebase when we all agree.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint
  2024-11-25 13:20 ` [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint Patryk Wlazlyn
@ 2024-11-25 14:39   ` Rafael J. Wysocki
  2024-11-26 11:36     ` Patryk Wlazlyn
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 14:39 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

On Mon, Nov 25, 2024 at 2:21 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> The MWAIT instruction needs different hints on different CPUs to reach
> the most specific idle states. The current hint calculation* in
> mwait_play_dead() code works in practice on current hardware, but it
> fails on a recent one, Intel's Sierra Forest and possibly some future ones.
> Those newer CPUs' power efficiency suffers when the CPU is put offline.
>
>  * The current calculation is the for loop inspecting edx in
>    mwait_play_dead()
>
> The current implementation for looking up the mwait hint for the deepest
> cstate, in mwait_play_dead() code works by inspecting CPUID leaf 0x5 and
> calculates the mwait hint based on the number of reported substates.
> This approach depends on the hints associated with them to be continuous
> in the range [0, NUM_SUBSTATES-1]. This continuity is not documented and
> is not met on the recent Intel platforms.
>
> For example, Intel's Sierra Forest report two cstates with two substates
> each in cpuid leaf 5:
>
>   Name*   target cstate    target subcstate (mwait hint)
>   ===========================================================
>   C1      0x00             0x00
>   C1E     0x00             0x01
>
>   --      0x10             ----
>
>   C6S     0x20             0x22
>   C6P     0x20             0x23
>
>   --      0x30             ----
>
>   /* No more (sub)states all the way down to the end. */
>   ===========================================================
>
>    * Names of the cstates are not included in the CPUID leaf 0x5, they are
>      taken from the product specific documentation.
>
> Notice that hints 0x20 and 0x21 are skipped entirely for the target
> cstate 0x20 (C6), being a cause of the problem for the current cpuid
> leaf 0x5 algorithm.
>
> Allow cpuidle code to provide mwait play dead loop with known, mwait
> hint for the deepest idle state on a given platform, skipping the cpuid
> based calculation.
>
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

I'm going to risk saying that the changelog doesn't match the code
changes in the patch any more.

The code changes are actually relatively straightforward: The bottom
half of mwait_play_dead() is split off, so it can be called from
multiple places.

The other places from which to call it are cpuidle drivers
implementing :enter_dead() callbacks that may want to use MWAIT as the
idle state entry method.  The ACPI processor_idle driver and
intel_idle will be updated by subsequent patches to do so.

The reason for it is mostly consistency: If the cpuidle driver uses a
specific idle state for things like suspend-to-idle, it is better to
let it decide what idle state to use for "play_dead" because it may
know better.

Another reason is what mwait_play_dead() does to determine the MWAIT
argument (referred to as a "hint"), but IMO it belongs in a changelog
of a later patch because this one doesn't actually do anything about
it.  In fact, it is not expected to change the behavior of the code.

> ---
>  arch/x86/include/asm/smp.h |  3 ++
>  arch/x86/kernel/smpboot.c  | 86 ++++++++++++++++++++------------------
>  2 files changed, 49 insertions(+), 40 deletions(-)
>
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index ca073f40698f..d12fab4a83c5 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -114,6 +114,7 @@ void wbinvd_on_cpu(int cpu);
>  int wbinvd_on_all_cpus(void);
>
>  void smp_kick_mwait_play_dead(void);
> +void mwait_play_dead_with_hint(unsigned long hint);
>
>  void native_smp_send_reschedule(int cpu);
>  void native_send_call_func_ipi(const struct cpumask *mask);
> @@ -164,6 +165,8 @@ static inline struct cpumask *cpu_llc_shared_mask(int cpu)
>  {
>         return (struct cpumask *)cpumask_of(0);
>  }
> +
> +static inline void mwait_play_dead_with_hint(unsigned long eax_hint) { }
>  #endif /* CONFIG_SMP */
>
>  #ifdef CONFIG_DEBUG_NMI_SELFTEST
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index b5a8f0891135..d0464c7a0af5 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1272,13 +1272,57 @@ void play_dead_common(void)
>         local_irq_disable();
>  }
>
> +void __noreturn mwait_play_dead_with_hint(unsigned long eax_hint)
> +{
> +       struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead);
> +
> +       /* Set up state for the kexec() hack below */
> +       md->status = CPUDEAD_MWAIT_WAIT;
> +       md->control = CPUDEAD_MWAIT_WAIT;
> +
> +       wbinvd();
> +
> +       while (1) {
> +               /*
> +                * The CLFLUSH is a workaround for erratum AAI65 for
> +                * the Xeon 7400 series.  It's not clear it is actually
> +                * needed, but it should be harmless in either case.
> +                * The WBINVD is insufficient due to the spurious-wakeup
> +                * case where we return around the loop.
> +                */
> +               mb();
> +               clflush(md);
> +               mb();
> +               __monitor(md, 0, 0);
> +               mb();
> +               __mwait(eax_hint, 0);
> +
> +               if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) {
> +                       /*
> +                        * Kexec is about to happen. Don't go back into mwait() as
> +                        * the kexec kernel might overwrite text and data including
> +                        * page tables and stack. So mwait() would resume when the
> +                        * monitor cache line is written to and then the CPU goes
> +                        * south due to overwritten text, page tables and stack.
> +                        *
> +                        * Note: This does _NOT_ protect against a stray MCE, NMI,
> +                        * SMI. They will resume execution at the instruction
> +                        * following the HLT instruction and run into the problem
> +                        * which this is trying to prevent.
> +                        */
> +                       WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT);
> +                       while(1)
> +                               native_halt();
> +               }
> +       }
> +}
> +
>  /*
>   * We need to flush the caches before going to sleep, lest we have
>   * dirty data in our caches when we come back up.
>   */
>  static inline void mwait_play_dead(void)
>  {
> -       struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead);
>         unsigned int eax, ebx, ecx, edx;
>         unsigned int highest_cstate = 0;
>         unsigned int highest_subcstate = 0;
> @@ -1316,45 +1360,7 @@ static inline void mwait_play_dead(void)
>                         (highest_subcstate - 1);
>         }
>
> -       /* Set up state for the kexec() hack below */
> -       md->status = CPUDEAD_MWAIT_WAIT;
> -       md->control = CPUDEAD_MWAIT_WAIT;
> -
> -       wbinvd();
> -
> -       while (1) {
> -               /*
> -                * The CLFLUSH is a workaround for erratum AAI65 for
> -                * the Xeon 7400 series.  It's not clear it is actually
> -                * needed, but it should be harmless in either case.
> -                * The WBINVD is insufficient due to the spurious-wakeup
> -                * case where we return around the loop.
> -                */
> -               mb();
> -               clflush(md);
> -               mb();
> -               __monitor(md, 0, 0);
> -               mb();
> -               __mwait(eax, 0);
> -
> -               if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) {
> -                       /*
> -                        * Kexec is about to happen. Don't go back into mwait() as
> -                        * the kexec kernel might overwrite text and data including
> -                        * page tables and stack. So mwait() would resume when the
> -                        * monitor cache line is written to and then the CPU goes
> -                        * south due to overwritten text, page tables and stack.
> -                        *
> -                        * Note: This does _NOT_ protect against a stray MCE, NMI,
> -                        * SMI. They will resume execution at the instruction
> -                        * following the HLT instruction and run into the problem
> -                        * which this is trying to prevent.
> -                        */
> -                       WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT);
> -                       while(1)
> -                               native_halt();
> -               }
> -       }
> +       mwait_play_dead_with_hint(eax);
>  }
>
>  /*
> --
> 2.47.0
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-25 13:56   ` Rafael J. Wysocki
@ 2024-11-25 14:43     ` Patryk Wlazlyn
  2024-11-25 14:48       ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 14:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

> If you first make intel_idle provide :enter_dead() for all CPUs on all
> platforms and implement it by calling mwait_play_dead_with_hint(), you
> won't need mwait_play_dead() any more.
Crossed my mind, but because mwait_play_dead doesn't filter on Intel
vendor specifically, I thought that I might break some edge case.


I am all for simplifying.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 13:44   ` Rafael J. Wysocki
@ 2024-11-25 14:48     ` Patryk Wlazlyn
  2024-11-25 14:50       ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 14:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

> Is this still the case with the latest firmware?
>
> If so, this could be the second patch in the series if [1-3/7] are dropped.
>
> Otherwise, I don't think it is needed any more.
I discussed this case with Artem or Len off-list, before.
The idea is to add it for SRF too, to make sure that you get PC6, even on old firmware.

Just as reminder - the whole patch series is here to guard for future platforms too.
The SRF is just the one the problem was observed on.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-25 14:43     ` Patryk Wlazlyn
@ 2024-11-25 14:48       ` Rafael J. Wysocki
  2024-11-26 11:56         ` Patryk Wlazlyn
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 14:48 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: Rafael J. Wysocki, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, peterz, tglx,
	gautham.shenoy

On Mon, Nov 25, 2024 at 3:43 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> > If you first make intel_idle provide :enter_dead() for all CPUs on all
> > platforms and implement it by calling mwait_play_dead_with_hint(), you
> > won't need mwait_play_dead() any more.
> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
> vendor specifically,

In practice, it does.

The vendor check in it is equivalent to "if Intel".

> I thought that I might break some edge case.
>
>
> I am all for simplifying.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF
  2024-11-25 14:48     ` Patryk Wlazlyn
@ 2024-11-25 14:50       ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-25 14:50 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: Rafael J. Wysocki, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, peterz, tglx,
	gautham.shenoy

On Mon, Nov 25, 2024 at 3:48 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> > Is this still the case with the latest firmware?
> >
> > If so, this could be the second patch in the series if [1-3/7] are dropped.
> >
> > Otherwise, I don't think it is needed any more.
> I discussed this case with Artem or Len off-list, before.
> The idea is to add it for SRF too, to make sure that you get PC6, even on old firmware.
>
> Just as reminder - the whole patch series is here to guard for future platforms too.

This actually is its main purpose.

> The SRF is just the one the problem was observed on.

So if you change intel_idle to provide :enter_dead() for all CPUs on
all platforms, the SRF-specific patch won't be necessary any more.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms
  2024-11-25 13:54   ` Peter Zijlstra
@ 2024-11-25 14:56     ` Patryk Wlazlyn
  2024-11-25 15:17       ` Gautham R. Shenoy
  0 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-25 14:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, tglx, gautham.shenoy

> So I don't like this. Less exceptions is better.
>
> This *SHOULD* never trigger on AMD anyway, because they recommend IO
> port C[23]. But if their partner BIOS engineer does a wobbly and they
> end up in MWAIT anyway, it *should* all work regardless.
Agreed.
I thought relaying on BIOS to not put FFH states there was a concern.
I believe Gautham confirmed that AMD would be fine executing that,
it's just that they prefer ioidle (or hlt?).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling
  2024-11-25 13:20 ` [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling Patryk Wlazlyn
  2024-11-25 13:41   ` Rafael J. Wysocki
@ 2024-11-25 15:14   ` Gautham R. Shenoy
  2024-11-26 12:03     ` Patryk Wlazlyn
  1 sibling, 1 reply; 42+ messages in thread
From: Gautham R. Shenoy @ 2024-11-25 15:14 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx

Hello Patryk,

On Mon, Nov 25, 2024 at 02:20:27PM +0100, Patryk Wlazlyn wrote:
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>

The patch looks good to me. But this needs a more detailed changelog.

--
Thanks and Regards
gautham.

> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/acpi/cstate.c      | 9 +++++++++
>  drivers/acpi/processor_idle.c      | 2 ++
>  include/acpi/processor.h           | 5 +++++
>  4 files changed, 17 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index ea33439a5d00..1da5e08de257 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -236,6 +236,7 @@
>  #define X86_FEATURE_PVUNLOCK		( 8*32+20) /* PV unlock function */
>  #define X86_FEATURE_VCPUPREEMPT		( 8*32+21) /* PV vcpu_is_preempted function */
>  #define X86_FEATURE_TDX_GUEST		( 8*32+22) /* "tdx_guest" Intel Trust Domain Extensions Guest */
> +#define X86_FEATURE_NO_MWAIT_OFFLINE    ( 8*32+23) /* Don't use MWAIT states for offlined CPUs */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
>  #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* "fsgsbase" RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
> index f3ffd0a3a012..c80a3e6dba5f 100644
> --- a/arch/x86/kernel/acpi/cstate.c
> +++ b/arch/x86/kernel/acpi/cstate.c
> @@ -204,6 +204,15 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
>  }
>  EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
>  
> +void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
> +{
> +	unsigned int cpu = smp_processor_id();
> +	struct cstate_entry *percpu_entry;
> +
> +	percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
> +	mwait_play_dead_with_hint(percpu_entry->states[cx->index].eax);
> +}
> +
>  void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
>  {
>  	unsigned int cpu = smp_processor_id();
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 698897b29de2..586cc7d1d8aa 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -590,6 +590,8 @@ static void acpi_idle_play_dead(struct cpuidle_device *dev, int index)
>  			raw_safe_halt();
>  		else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
>  			io_idle(cx->address);
> +		} else if (cx->entry_method == ACPI_CSTATE_FFH) {
> +			acpi_processor_ffh_play_dead(cx);
>  		} else
>  			return;
>  	}
> diff --git a/include/acpi/processor.h b/include/acpi/processor.h
> index a17e97e634a6..63a37e72b721 100644
> --- a/include/acpi/processor.h
> +++ b/include/acpi/processor.h
> @@ -280,6 +280,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
>  				    struct acpi_processor_cx *cx,
>  				    struct acpi_power_register *reg);
>  void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cstate);
> +void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx);
>  #else
>  static inline void acpi_processor_power_init_bm_check(struct
>  						      acpi_processor_flags
> @@ -300,6 +301,10 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
>  {
>  	return;
>  }
> +static inline void acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
> +{
> +	return;
> +}
>  #endif
>  
>  static inline int call_on_cpu(int cpu, long (*fn)(void *), void *arg,
> -- 
> 2.47.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms
  2024-11-25 13:20 ` [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms Patryk Wlazlyn
  2024-11-25 13:46   ` Rafael J. Wysocki
  2024-11-25 13:54   ` Peter Zijlstra
@ 2024-11-25 15:15   ` Gautham R. Shenoy
  2024-11-26 12:05     ` Patryk Wlazlyn
  2 siblings, 1 reply; 42+ messages in thread
From: Gautham R. Shenoy @ 2024-11-25 15:15 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx

On Mon, Nov 25, 2024 at 02:20:28PM +0100, Patryk Wlazlyn wrote:
> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com>

You can drop this patch. It is very rare to find a platform that
supports FFH based C1 and doesn't have a IOPORT based C2.

--
Thanks and Regards
gautham.


> ---
>  drivers/acpi/processor_idle.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 586cc7d1d8aa..4b4ac8d55b55 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -803,7 +803,11 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
>  
>  		state->flags = 0;
>  
> -		state->enter_dead = acpi_idle_play_dead;
> +		/* AMD doesn't want to use mwait for play dead. */
> +		bool amd_or_hygon = boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> +				    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON;
> +		if (!(cx->entry_method == ACPI_CSTATE_FFH && amd_or_hygon))
> +			state->enter_dead = acpi_idle_play_dead;
>  
>  		if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
>  			drv->safe_state_index = count;
> -- 
> 2.47.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms
  2024-11-25 14:56     ` Patryk Wlazlyn
@ 2024-11-25 15:17       ` Gautham R. Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R. Shenoy @ 2024-11-25 15:17 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: Peter Zijlstra, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, tglx

On Mon, Nov 25, 2024 at 03:56:25PM +0100, Patryk Wlazlyn wrote:
> > So I don't like this. Less exceptions is better.
> >
> > This *SHOULD* never trigger on AMD anyway, because they recommend IO
> > port C[23]. But if their partner BIOS engineer does a wobbly and they
> > end up in MWAIT anyway, it *should* all work regardless.
> Agreed.
> I thought relaying on BIOS to not put FFH states there was a concern.
> I believe Gautham confirmed that AMD would be fine executing that,
> it's just that they prefer ioidle (or hlt?).

Yes, HLT or IOPORT based idle states for CPU Offline are
preferrable. But FFH based idle states work just fine.

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint
  2024-11-25 14:39   ` Rafael J. Wysocki
@ 2024-11-26 11:36     ` Patryk Wlazlyn
  2024-11-26 11:45       ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-26 11:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

>> The MWAIT instruction needs different hints on different CPUs to reach
>> the most specific idle states. The current hint calculation* in
>> mwait_play_dead() code works in practice on current hardware, but it
>> fails on a recent one, Intel's Sierra Forest and possibly some future ones.
>> Those newer CPUs' power efficiency suffers when the CPU is put offline.
>>
>>  * The current calculation is the for loop inspecting edx in
>>    mwait_play_dead()
>>
>> The current implementation for looking up the mwait hint for the deepest
>> cstate, in mwait_play_dead() code works by inspecting CPUID leaf 0x5 and
>> calculates the mwait hint based on the number of reported substates.
>> This approach depends on the hints associated with them to be continuous
>> in the range [0, NUM_SUBSTATES-1]. This continuity is not documented and
>> is not met on the recent Intel platforms.
>>
>> For example, Intel's Sierra Forest report two cstates with two substates
>> each in cpuid leaf 5:
>>
>>   Name*   target cstate    target subcstate (mwait hint)
>>   ===========================================================
>>   C1      0x00             0x00
>>   C1E     0x00             0x01
>>
>>   --      0x10             ----
>>
>>   C6S     0x20             0x22
>>   C6P     0x20             0x23
>>
>>   --      0x30             ----
>>
>>   /* No more (sub)states all the way down to the end. */
>>   ===========================================================
>>
>>    * Names of the cstates are not included in the CPUID leaf 0x5, they are
>>      taken from the product specific documentation.
>>
>> Notice that hints 0x20 and 0x21 are skipped entirely for the target
>> cstate 0x20 (C6), being a cause of the problem for the current cpuid
>> leaf 0x5 algorithm.
>>
>> Allow cpuidle code to provide mwait play dead loop with known, mwait
>> hint for the deepest idle state on a given platform, skipping the cpuid
>> based calculation.
>>
>> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
>
> I'm going to risk saying that the changelog doesn't match the code
> changes in the patch any more.
>
> The code changes are actually relatively straightforward: The bottom
> half of mwait_play_dead() is split off, so it can be called from
> multiple places.
>
> The other places from which to call it are cpuidle drivers
> implementing :enter_dead() callbacks that may want to use MWAIT as the
> idle state entry method.  The ACPI processor_idle driver and
> intel_idle will be updated by subsequent patches to do so.
>
> The reason for it is mostly consistency: If the cpuidle driver uses a
> specific idle state for things like suspend-to-idle, it is better to
> let it decide what idle state to use for "play_dead" because it may
> know better.
>
> Another reason is what mwait_play_dead() does to determine the MWAIT
> argument (referred to as a "hint"), but IMO it belongs in a changelog
> of a later patch because this one doesn't actually do anything about
> it.  In fact, it is not expected to change the behavior of the code.

The commit message here is to justify the change. I thought that giving
some context is important. Do you suggest moving this under a
different commit or don't mention the SRF and C6 states at all?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint
  2024-11-26 11:36     ` Patryk Wlazlyn
@ 2024-11-26 11:45       ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-26 11:45 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: Rafael J. Wysocki, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, peterz, tglx,
	gautham.shenoy

On Tue, Nov 26, 2024 at 12:37 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> >> The MWAIT instruction needs different hints on different CPUs to reach
> >> the most specific idle states. The current hint calculation* in
> >> mwait_play_dead() code works in practice on current hardware, but it
> >> fails on a recent one, Intel's Sierra Forest and possibly some future ones.
> >> Those newer CPUs' power efficiency suffers when the CPU is put offline.
> >>
> >>  * The current calculation is the for loop inspecting edx in
> >>    mwait_play_dead()
> >>
> >> The current implementation for looking up the mwait hint for the deepest
> >> cstate, in mwait_play_dead() code works by inspecting CPUID leaf 0x5 and
> >> calculates the mwait hint based on the number of reported substates.
> >> This approach depends on the hints associated with them to be continuous
> >> in the range [0, NUM_SUBSTATES-1]. This continuity is not documented and
> >> is not met on the recent Intel platforms.
> >>
> >> For example, Intel's Sierra Forest report two cstates with two substates
> >> each in cpuid leaf 5:
> >>
> >>   Name*   target cstate    target subcstate (mwait hint)
> >>   ===========================================================
> >>   C1      0x00             0x00
> >>   C1E     0x00             0x01
> >>
> >>   --      0x10             ----
> >>
> >>   C6S     0x20             0x22
> >>   C6P     0x20             0x23
> >>
> >>   --      0x30             ----
> >>
> >>   /* No more (sub)states all the way down to the end. */
> >>   ===========================================================
> >>
> >>    * Names of the cstates are not included in the CPUID leaf 0x5, they are
> >>      taken from the product specific documentation.
> >>
> >> Notice that hints 0x20 and 0x21 are skipped entirely for the target
> >> cstate 0x20 (C6), being a cause of the problem for the current cpuid
> >> leaf 0x5 algorithm.
> >>
> >> Allow cpuidle code to provide mwait play dead loop with known, mwait
> >> hint for the deepest idle state on a given platform, skipping the cpuid
> >> based calculation.
> >>
> >> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> >
> > I'm going to risk saying that the changelog doesn't match the code
> > changes in the patch any more.
> >
> > The code changes are actually relatively straightforward: The bottom
> > half of mwait_play_dead() is split off, so it can be called from
> > multiple places.
> >
> > The other places from which to call it are cpuidle drivers
> > implementing :enter_dead() callbacks that may want to use MWAIT as the
> > idle state entry method.  The ACPI processor_idle driver and
> > intel_idle will be updated by subsequent patches to do so.
> >
> > The reason for it is mostly consistency: If the cpuidle driver uses a
> > specific idle state for things like suspend-to-idle, it is better to
> > let it decide what idle state to use for "play_dead" because it may
> > know better.
> >
> > Another reason is what mwait_play_dead() does to determine the MWAIT
> > argument (referred to as a "hint"), but IMO it belongs in a changelog
> > of a later patch because this one doesn't actually do anything about
> > it.  In fact, it is not expected to change the behavior of the code.
>
> The commit message here is to justify the change. I thought that giving
> some context is important. Do you suggest moving this under a
> different commit or don't mention the SRF and C6 states at all?

I would move it to the patch that actually makes a difference for SRF.
This one doesn't do that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-25 14:48       ` Rafael J. Wysocki
@ 2024-11-26 11:56         ` Patryk Wlazlyn
  2024-11-26 12:04           ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-26 11:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

>>> If you first make intel_idle provide :enter_dead() for all CPUs on all
>>> platforms and implement it by calling mwait_play_dead_with_hint(), you
>>> won't need mwait_play_dead() any more.
>> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
>> vendor specifically,
>
> In practice, it does.
>
> The vendor check in it is equivalent to "if Intel".

Actually, what about INTEL_IDLE=n?
We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
if we don't, don't we want to try mwait_play_dead before hlt or is it too
unrealistic to happen?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling
  2024-11-25 15:14   ` Gautham R. Shenoy
@ 2024-11-26 12:03     ` Patryk Wlazlyn
  0 siblings, 0 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-26 12:03 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx

> The patch looks good to me. But this needs a more detailed changelog.

Yup, sure. Thanks!


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-26 11:56         ` Patryk Wlazlyn
@ 2024-11-26 12:04           ` Rafael J. Wysocki
  2024-11-26 13:10             ` Patryk Wlazlyn
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-26 12:04 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: Rafael J. Wysocki, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, peterz, tglx,
	gautham.shenoy

On Tue, Nov 26, 2024 at 12:56 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> >>> If you first make intel_idle provide :enter_dead() for all CPUs on all
> >>> platforms and implement it by calling mwait_play_dead_with_hint(), you
> >>> won't need mwait_play_dead() any more.
> >> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
> >> vendor specifically,
> >
> > In practice, it does.
> >
> > The vendor check in it is equivalent to "if Intel".
>
> Actually, what about INTEL_IDLE=n?
> We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
> if we don't, don't we want to try mwait_play_dead before hlt or is it too
> unrealistic to happen?

In that case the hint to use would not be known anyway, so
hlt_play_dead() is the right choice IMV.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms
  2024-11-25 15:15   ` Gautham R. Shenoy
@ 2024-11-26 12:05     ` Patryk Wlazlyn
  0 siblings, 0 replies; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-26 12:05 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx

On 11/25/24 4:15 PM, Gautham R. Shenoy wrote:
> On Mon, Nov 25, 2024 at 02:20:28PM +0100, Patryk Wlazlyn wrote:
>> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
>> Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
>
> You can drop this patch. It is very rare to find a platform that
> supports FFH based C1 and doesn't have a IOPORT based C2.
>
> --
> Thanks and Regards
> gautham.
>
>
>> ---
>>  drivers/acpi/processor_idle.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
>> index 586cc7d1d8aa..4b4ac8d55b55 100644
>> --- a/drivers/acpi/processor_idle.c
>> +++ b/drivers/acpi/processor_idle.c
>> @@ -803,7 +803,11 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
>>  
>>          state->flags = 0;
>>  
>> -        state->enter_dead = acpi_idle_play_dead;
>> +        /* AMD doesn't want to use mwait for play dead. */
>> +        bool amd_or_hygon = boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
>> +                    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON;
>> +        if (!(cx->entry_method == ACPI_CSTATE_FFH && amd_or_hygon))
>> +            state->enter_dead = acpi_idle_play_dead;
>>  
>>          if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
>>              drv->safe_state_index = count;
>> --
>> 2.47.0
>>

ACK, thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-26 12:04           ` Rafael J. Wysocki
@ 2024-11-26 13:10             ` Patryk Wlazlyn
  2024-11-26 13:38               ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Patryk Wlazlyn @ 2024-11-26 13:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: x86, linux-kernel, linux-pm, rafael.j.wysocki, len.brown,
	artem.bityutskiy, dave.hansen, peterz, tglx, gautham.shenoy

>>>>> If you first make intel_idle provide :enter_dead() for all CPUs on all
>>>>> platforms and implement it by calling mwait_play_dead_with_hint(), you
>>>>> won't need mwait_play_dead() any more.
>>>> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
>>>> vendor specifically,
>>>
>>> In practice, it does.
>>>
>>> The vendor check in it is equivalent to "if Intel".
>>
>> Actually, what about INTEL_IDLE=n?
>> We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
>> if we don't, don't we want to try mwait_play_dead before hlt or is it too
>> unrealistic to happen?
>
> In that case the hint to use would not be known anyway, so
> hlt_play_dead() is the right choice IMV.

Fair, it's not known, but we could fallback to the old, cpuid leaf 0x5 based
algorithm, which works on a lot of hardware.

That being said, I think it's cleaner to get rid of the old algorithm entirely
and rely on idle drivers to do the right thing from this point forward.

If we could bring the CPU out of the mwait loop into the hlt loop somehow (via
interrupt for example) we could remove the old kexec hack altogether.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
  2024-11-26 13:10             ` Patryk Wlazlyn
@ 2024-11-26 13:38               ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-11-26 13:38 UTC (permalink / raw)
  To: Patryk Wlazlyn
  Cc: Rafael J. Wysocki, x86, linux-kernel, linux-pm, rafael.j.wysocki,
	len.brown, artem.bityutskiy, dave.hansen, peterz, tglx,
	gautham.shenoy

On Tue, Nov 26, 2024 at 2:10 PM Patryk Wlazlyn
<patryk.wlazlyn@linux.intel.com> wrote:
>
> >>>>> If you first make intel_idle provide :enter_dead() for all CPUs on all
> >>>>> platforms and implement it by calling mwait_play_dead_with_hint(), you
> >>>>> won't need mwait_play_dead() any more.
> >>>> Crossed my mind, but because mwait_play_dead doesn't filter on Intel
> >>>> vendor specifically,
> >>>
> >>> In practice, it does.
> >>>
> >>> The vendor check in it is equivalent to "if Intel".
> >>
> >> Actually, what about INTEL_IDLE=n?
> >> We might hit acpi_idle, which would call mwait_play_dead_with_hint() now, but
> >> if we don't, don't we want to try mwait_play_dead before hlt or is it too
> >> unrealistic to happen?
> >
> > In that case the hint to use would not be known anyway, so
> > hlt_play_dead() is the right choice IMV.
>
> Fair, it's not known, but we could fallback to the old, cpuid leaf 0x5 based
> algorithm, which works on a lot of hardware.
>
> That being said, I think it's cleaner to get rid of the old algorithm entirely
> and rely on idle drivers to do the right thing from this point forward.
>
> If we could bring the CPU out of the mwait loop into the hlt loop somehow (via
> interrupt for example) we could remove the old kexec hack altogether.

Well, the problem is that CPUs may be woken up from MWAIT waits for
various reasons and the reason for a given wakeup is not known a
priori.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-11-26 13:38 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-25 13:20 [RFC PATCH v4 0/8] SRF: Fix offline CPU preventing pc6 entry Patryk Wlazlyn
2024-11-25 13:20 ` [RFC PATCH v4 1/8] cpuidle: Do not return from cpuidle_play_dead() on callback failures Patryk Wlazlyn
2024-11-25 13:26   ` Rafael J. Wysocki
2024-11-25 14:36     ` Patryk Wlazlyn
2024-11-25 13:20 ` [RFC PATCH v4 2/8] cpuidle: Change :enter_dead() driver callback return type to void Patryk Wlazlyn
2024-11-25 13:20 ` [RFC PATCH v4 3/8] ACPI: processor_idle: Use acpi_idle_play_dead() for all C-states Patryk Wlazlyn
2024-11-25 13:24   ` Rafael J. Wysocki
2024-11-25 14:39     ` Patryk Wlazlyn
2024-11-25 14:36   ` Gautham R. Shenoy
2024-11-25 13:20 ` [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint Patryk Wlazlyn
2024-11-25 14:39   ` Rafael J. Wysocki
2024-11-26 11:36     ` Patryk Wlazlyn
2024-11-26 11:45       ` Rafael J. Wysocki
2024-11-25 13:20 ` [RFC PATCH v4 5/8] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead() Patryk Wlazlyn
2024-11-25 13:45   ` Peter Zijlstra
2024-11-25 13:56   ` Rafael J. Wysocki
2024-11-25 14:43     ` Patryk Wlazlyn
2024-11-25 14:48       ` Rafael J. Wysocki
2024-11-26 11:56         ` Patryk Wlazlyn
2024-11-26 12:04           ` Rafael J. Wysocki
2024-11-26 13:10             ` Patryk Wlazlyn
2024-11-26 13:38               ` Rafael J. Wysocki
2024-11-25 13:20 ` [RFC PATCH v4 6/8] intel_idle: Provide enter_dead() handler for SRF Patryk Wlazlyn
2024-11-25 13:44   ` Rafael J. Wysocki
2024-11-25 14:48     ` Patryk Wlazlyn
2024-11-25 14:50       ` Rafael J. Wysocki
2024-11-25 13:53   ` Peter Zijlstra
2024-11-25 13:58     ` Rafael J. Wysocki
2024-11-25 14:12       ` Peter Zijlstra
2024-11-25 14:19         ` Rafael J. Wysocki
2024-11-25 13:20 ` [RFC PATCH v4 7/8] acpi_idle: Add FFH cstate handling Patryk Wlazlyn
2024-11-25 13:41   ` Rafael J. Wysocki
2024-11-25 14:00     ` Rafael J. Wysocki
2024-11-25 15:14   ` Gautham R. Shenoy
2024-11-26 12:03     ` Patryk Wlazlyn
2024-11-25 13:20 ` [RFC PATCH v4 8/8] acpi_idle: Disallow play_dead with FFH cstate on AMD platforms Patryk Wlazlyn
2024-11-25 13:46   ` Rafael J. Wysocki
2024-11-25 13:54   ` Peter Zijlstra
2024-11-25 14:56     ` Patryk Wlazlyn
2024-11-25 15:17       ` Gautham R. Shenoy
2024-11-25 15:15   ` Gautham R. Shenoy
2024-11-26 12:05     ` Patryk Wlazlyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox