* [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states
@ 2025-01-02 15:01 Frederic Weisbecker
2025-01-02 15:01 ` [PATCH 1/6] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2025-01-02 15:01 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Peter Zijlstra
The TIF_NR_POLLING handling against TIF_NEED_RESCHED polling/monitoring
idle states (mwait and also software polling) is a bit messy, with quite
some wasted cycles spent on useless atomic operations. This tries to
consolidate this state handling from the cpuidle core.
Changes since v2:
_ Handle buggy mwait implementations (thanks Rafael)
Frederic Weisbecker (4):
cpuidle: Remove unnecessary current_clr_polling_and_test() from
haltpoll
x86/cpuidle: Move buggy mwait implementations away from
CPUIDLE_FLAG_MWAIT
cpuidle: Remove call_cpuidle_s2idle()
cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle
states
Peter Zijlstra (2):
cpuidle: Introduce CPUIDLE_FLAG_MWAIT
cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states
arch/arm/include/asm/cpuidle.h | 2 ++
arch/arm64/include/asm/cpuidle.h | 3 ++
arch/powerpc/include/asm/cpuidle.h | 4 +++
arch/riscv/include/asm/cpuidle.h | 2 ++
arch/x86/include/asm/cpuidle.h | 12 +++++++
arch/x86/include/asm/mwait.h | 27 +++++++--------
drivers/acpi/processor_idle.c | 5 +++
drivers/cpuidle/cpuidle-haltpoll.c | 3 --
drivers/cpuidle/cpuidle-powernv.c | 10 ------
drivers/cpuidle/cpuidle-pseries.c | 11 -------
drivers/cpuidle/cpuidle.c | 22 ++++++++++++-
drivers/cpuidle/poll_state.c | 30 +++++++----------
drivers/idle/intel_idle.c | 8 +++++
include/asm-generic/Kbuild | 1 +
include/asm-generic/cpuidle.h | 10 ++++++
include/linux/cpuidle.h | 1 +
include/linux/sched/idle.h | 7 +++-
kernel/sched/idle.c | 53 +++++++++---------------------
18 files changed, 114 insertions(+), 97 deletions(-)
create mode 100644 arch/x86/include/asm/cpuidle.h
create mode 100644 include/asm-generic/cpuidle.h
--
2.46.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/6] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
@ 2025-01-02 15:01 ` Frederic Weisbecker
2025-01-02 15:01 ` [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT Frederic Weisbecker
` (5 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2025-01-02 15:01 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Marcelo Tosatti
When cpuidle drivers ->enter() callback are called, the TIF_NR_POLLING
flag is cleared already and TIF_NEED_RESCHED checked by call_cpuidle().
Therefore calling current_clr_polling_and_test() is redundant here and
further setting of TIF_NEED_RESCHED will result in an IPI and thus an
idle loop exit. This call can be safely removed.
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
drivers/cpuidle/cpuidle-haltpoll.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c
index bcd03e893a0a..07dd9f000273 100644
--- a/drivers/cpuidle/cpuidle-haltpoll.c
+++ b/drivers/cpuidle/cpuidle-haltpoll.c
@@ -28,9 +28,6 @@ static enum cpuhp_state haltpoll_hp_state;
static __cpuidle int default_enter_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- if (current_clr_polling_and_test())
- return index;
-
arch_cpu_idle();
return index;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
2025-01-02 15:01 ` [PATCH 1/6] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
@ 2025-01-02 15:01 ` Frederic Weisbecker
2025-01-14 14:01 ` Rafael J. Wysocki
2025-01-02 15:01 ` [PATCH 3/6] x86/cpuidle: Move buggy mwait implementations away from CPUIDLE_FLAG_MWAIT Frederic Weisbecker
` (4 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2025-01-02 15:01 UTC (permalink / raw)
To: LKML
Cc: Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Jacob Pan, Frederic Weisbecker
From: Peter Zijlstra <peterz@infradead.org>
Provide a way to tell the cpuidle core about states monitoring
TIF_NEED_RESCHED on the hardware level, monitor/mwait users being the
only examples in use.
This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
kinds of TIF_NEED_RESCHED watching states while keeping a necessary
distinction for the governors between software loops polling on
TIF_NEED_RESCHED and hardware monitored writes to thread flags.
[fweisbec: _ Initialize flag from acpi_processor_setup_cstates() instead
of acpi_processor_setup_lpi_states(), as the latter seem to
be about arm64...
_ Rename CPUIDLE_FLAG_NO_IPI to CPUIDLE_FLAG_MWAIT]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
drivers/acpi/processor_idle.c | 3 +++
drivers/idle/intel_idle.c | 5 ++++-
include/linux/cpuidle.h | 1 +
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 698897b29de2..66cb5536d91e 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -806,6 +806,9 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
drv->safe_state_index = count;
+ if (cx->entry_method == ACPI_CSTATE_FFH)
+ state->flags |= CPUIDLE_FLAG_MWAIT;
+
/*
* Halt-induced C1 is not good for ->enter_s2idle, because it
* re-enables interrupts on exit. Moreover, C1 is generally not
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index ac4d8faa3886..d52723fbeb04 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1787,7 +1787,8 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
if (cx->type > ACPI_STATE_C1)
state->target_residency *= 3;
- state->flags = MWAIT2flg(cx->address);
+ state->flags = MWAIT2flg(cx->address) | CPUIDLE_FLAG_MWAIT;
+
if (cx->type > ACPI_STATE_C2)
state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;
@@ -2072,6 +2073,8 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
static void state_update_enter_method(struct cpuidle_state *state, int cstate)
{
+ state->flags |= CPUIDLE_FLAG_MWAIT;
+
if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
/*
* Combining with XSTATE with IBRS or IRQ_ENABLE flags
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index a9ee4fe55dcf..b8084617aa27 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -85,6 +85,7 @@ struct cpuidle_state {
#define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */
#define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */
#define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* idle-state takes care of RCU */
+#define CPUIDLE_FLAG_MWAIT BIT(7) /* hardware need_resched() monitoring */
struct cpuidle_device_kobj;
struct cpuidle_state_kobj;
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/6] x86/cpuidle: Move buggy mwait implementations away from CPUIDLE_FLAG_MWAIT
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
2025-01-02 15:01 ` [PATCH 1/6] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
2025-01-02 15:01 ` [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT Frederic Weisbecker
@ 2025-01-02 15:01 ` Frederic Weisbecker
2025-01-14 14:35 ` Rafael J. Wysocki
2025-01-02 15:01 ` [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states Frederic Weisbecker
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2025-01-02 15:01 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Jacob Pan, Peter Zijlstra,
Rafael J . Wysocki, Daniel Lezcano, linux-pm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Russell King,
Catalin Marinas, Will Deacon, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Madhavan Srinivasan, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Len Brown, Arnd Bergmann, linux-arch
Buggy MWAIT implementations can't carry the CPUIDLE_FLAG_MWAIT flag
because they require IPIs to wake up. Therefore they shouldn't be
called with TIF_NR_POLLING.
Reported-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/arm/include/asm/cpuidle.h | 2 ++
arch/arm64/include/asm/cpuidle.h | 3 +++
arch/powerpc/include/asm/cpuidle.h | 4 ++++
arch/riscv/include/asm/cpuidle.h | 2 ++
arch/x86/include/asm/cpuidle.h | 12 ++++++++++++
drivers/acpi/processor_idle.c | 4 +++-
drivers/idle/intel_idle.c | 9 +++++++--
include/asm-generic/Kbuild | 1 +
include/asm-generic/cpuidle.h | 10 ++++++++++
9 files changed, 44 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/include/asm/cpuidle.h
create mode 100644 include/asm-generic/cpuidle.h
diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
index 397be5ed30e7..0ea1d2ec837d 100644
--- a/arch/arm/include/asm/cpuidle.h
+++ b/arch/arm/include/asm/cpuidle.h
@@ -55,4 +55,6 @@ struct arm_cpuidle_irq_context { };
#define arm_cpuidle_save_irq_context(c) (void)c
#define arm_cpuidle_restore_irq_context(c) (void)c
+#include <asm-generic/cpuidle.h>
+
#endif
diff --git a/arch/arm64/include/asm/cpuidle.h b/arch/arm64/include/asm/cpuidle.h
index 2047713e097d..ef49124135a7 100644
--- a/arch/arm64/include/asm/cpuidle.h
+++ b/arch/arm64/include/asm/cpuidle.h
@@ -38,4 +38,7 @@ struct arm_cpuidle_irq_context { };
#define arm_cpuidle_save_irq_context(c) (void)c
#define arm_cpuidle_restore_irq_context(c) (void)c
#endif
+
+#include <asm-generic/cpuidle.h>
+
#endif
diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
index 0cce5dc7fb1c..788706bc04ec 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -102,4 +102,8 @@ static inline void report_invalid_psscr_val(u64 psscr_val, int err)
#endif
+#ifndef __ASSEMBLY__
+#include <asm-generic/cpuidle.h>
+#endif
+
#endif
diff --git a/arch/riscv/include/asm/cpuidle.h b/arch/riscv/include/asm/cpuidle.h
index 71fdc607d4bc..1f1b24901d86 100644
--- a/arch/riscv/include/asm/cpuidle.h
+++ b/arch/riscv/include/asm/cpuidle.h
@@ -21,4 +21,6 @@ static inline void cpu_do_idle(void)
wait_for_interrupt();
}
+#include <asm-generic/cpuidle.h>
+
#endif
diff --git a/arch/x86/include/asm/cpuidle.h b/arch/x86/include/asm/cpuidle.h
new file mode 100644
index 000000000000..a59db1a3314a
--- /dev/null
+++ b/arch/x86/include/asm/cpuidle.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CPUIDLE_H
+#define _ASM_X86_CPUIDLE_H
+
+#include <asm/cpufeature.h>
+
+static inline bool arch_cpuidle_mwait_needs_ipi(void)
+{
+ return boot_cpu_has_bug(X86_BUG_MONITOR);
+}
+
+#endif /* _ASM_X86_CPUIDLE_H */
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 66cb5536d91e..0f29dd92b346 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -23,6 +23,7 @@
#include <linux/perf_event.h>
#include <acpi/processor.h>
#include <linux/context_tracking.h>
+#include <asm/cpuidle.h>
/*
* Include the apic definitions for x86 to have the APIC timer related defines
@@ -806,7 +807,8 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
drv->safe_state_index = count;
- if (cx->entry_method == ACPI_CSTATE_FFH)
+ if (cx->entry_method == ACPI_CSTATE_FFH &&
+ !arch_cpuidle_mwait_needs_ipi())
state->flags |= CPUIDLE_FLAG_MWAIT;
/*
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index d52723fbeb04..b2f494effd4a 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -56,6 +56,7 @@
#include <asm/mwait.h>
#include <asm/spec-ctrl.h>
#include <asm/fpu/api.h>
+#include <asm/cpuidle.h>
#define INTEL_IDLE_VERSION "0.5.1"
@@ -1787,7 +1788,10 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
if (cx->type > ACPI_STATE_C1)
state->target_residency *= 3;
- state->flags = MWAIT2flg(cx->address) | CPUIDLE_FLAG_MWAIT;
+ state->flags = MWAIT2flg(cx->address);
+
+ if (!arch_cpuidle_mwait_needs_ipi())
+ state->flags |= CPUIDLE_FLAG_MWAIT;
if (cx->type > ACPI_STATE_C2)
state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;
@@ -2073,7 +2077,8 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
static void state_update_enter_method(struct cpuidle_state *state, int cstate)
{
- state->flags |= CPUIDLE_FLAG_MWAIT;
+ if (!arch_cpuidle_mwait_needs_ipi())
+ state->flags |= CPUIDLE_FLAG_MWAIT;
if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
/*
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 1b43c3a77012..7754da499d16 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -13,6 +13,7 @@ mandatory-y += cacheflush.h
mandatory-y += cfi.h
mandatory-y += checksum.h
mandatory-y += compat.h
+mandatory-y += cpuidle.h
mandatory-y += current.h
mandatory-y += delay.h
mandatory-y += device.h
diff --git a/include/asm-generic/cpuidle.h b/include/asm-generic/cpuidle.h
new file mode 100644
index 000000000000..748a2022ed2a
--- /dev/null
+++ b/include/asm-generic/cpuidle.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_CPUIDLE_H
+#define __ASM_CPUIDLE_H
+
+static inline bool arch_cpuidle_mwait_needs_ipi(void)
+{
+ return true;
+}
+
+#endif /* __ASM_CPUIDLE_H */
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
` (2 preceding siblings ...)
2025-01-02 15:01 ` [PATCH 3/6] x86/cpuidle: Move buggy mwait implementations away from CPUIDLE_FLAG_MWAIT Frederic Weisbecker
@ 2025-01-02 15:01 ` Frederic Weisbecker
2025-01-14 14:50 ` Rafael J. Wysocki
2025-01-17 18:09 ` K Prateek Nayak
2025-01-02 15:01 ` [PATCH 5/6] cpuidle: Remove call_cpuidle_s2idle() Frederic Weisbecker
` (2 subsequent siblings)
6 siblings, 2 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2025-01-02 15:01 UTC (permalink / raw)
To: LKML
Cc: Peter Zijlstra, Jacob Pan, Rafael J . Wysocki, Daniel Lezcano,
linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Frederic Weisbecker
From: Peter Zijlstra <peterz@infradead.org>
The current handling of TIF_NR_POLLING is a bit of a maze:
1) TIF_NR_POLLING is set on idle entry (one atomic set)
2) Once cpuidle has selected an appropriate state and the tick is
evaluated and then possibly stopped, TIF_NR_POLLING is cleared
(one RmW operation)
2) The cpuidle state is then called with TIF_NR_POLLING cleared but if
the state polls on (or monitors) need_resched() it sets again
TIF_NR_POLLING before sleeping and clears it on wake-up. Summary:
another pair of set/clear
3) Set back TIF_NR_POLLING (one atomic set)
4) goto 2) if need_resched() is not set
All those costly atomic operations, fully ordered RmW for some of
them, could be avoided if the cpuidle core knew in advance if the target
state polls on (or monitors) need_resched(). If so, TIF_NR_POLLING could
simply be set once upon entering the idle loop and cleared once after
idle loop exit.
Start dealing with that with handling TIF_NR_POLLING on behalf of
mwait based states.
[fweisbec: _ Handle broadcast properly
_ Ignore mwait_idle() as it can be used by default_idle_call()]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/x86/include/asm/mwait.h | 27 +++++++++++-------------
drivers/cpuidle/cpuidle.c | 22 +++++++++++++++++++-
include/linux/sched/idle.h | 7 ++++++-
kernel/sched/idle.c | 40 +++++++++++++-----------------------
4 files changed, 53 insertions(+), 43 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 920426d691ce..3e06a7f3bf5a 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -116,25 +116,22 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
*/
static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
- if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
- if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
- mb();
- clflush((void *)¤t_thread_info()->flags);
- mb();
- }
+ if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
+ mb();
+ clflush((void *)¤t_thread_info()->flags);
+ mb();
+ }
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ __monitor((void *)¤t_thread_info()->flags, 0, 0);
- if (!need_resched()) {
- if (ecx & 1) {
- __mwait(eax, ecx);
- } else {
- __sti_mwait(eax, ecx);
- raw_local_irq_disable();
- }
+ if (!need_resched()) {
+ if (ecx & 1) {
+ __mwait(eax, ecx);
+ } else {
+ __sti_mwait(eax, ecx);
+ raw_local_irq_disable();
}
}
- current_clr_polling();
}
/*
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 0835da449db8..46c0a2726f67 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -217,10 +217,10 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
int index)
{
int entered_state;
-
struct cpuidle_state *target_state = &drv->states[index];
bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
ktime_t time_start, time_end;
+ bool polling;
instrumentation_begin();
@@ -237,6 +237,23 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
broadcast = false;
}
+ polling = target_state->flags & CPUIDLE_FLAG_MWAIT;
+
+ /*
+ * If the target state doesn't poll on need_resched(), this is
+ * the last check after which further TIF_NEED_RESCHED remote setting
+ * will involve an IPI.
+ */
+ if (!polling && current_clr_polling_and_test()) {
+ if (broadcast)
+ tick_broadcast_exit();
+ dev->last_residency_ns = 0;
+ local_irq_enable();
+ instrumentation_end();
+ return -EBUSY;
+ }
+
+
if (target_state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
leave_mm();
@@ -336,6 +353,9 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
dev->states_usage[index].rejected++;
}
+ if (!polling)
+ __current_set_polling();
+
instrumentation_end();
return entered_state;
diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index e670ac282333..3e3482bfb028 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -68,6 +68,8 @@ static __always_inline bool __must_check current_set_polling_and_test(void)
static __always_inline bool __must_check current_clr_polling_and_test(void)
{
+ bool ret;
+
__current_clr_polling();
/*
@@ -76,7 +78,10 @@ static __always_inline bool __must_check current_clr_polling_and_test(void)
*/
smp_mb__after_atomic();
- return unlikely(tif_need_resched());
+ ret = unlikely(tif_need_resched());
+ if (ret)
+ __current_set_polling();
+ return ret;
}
#else
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 621696269584..9eece3df1080 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -114,12 +114,13 @@ void __cpuidle default_idle_call(void)
stop_critical_timings();
ct_cpuidle_enter();
- arch_cpu_idle();
+ arch_cpu_idle(); // XXX assumes !polling
ct_cpuidle_exit();
start_critical_timings();
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
cond_tick_broadcast_exit();
+ __current_set_polling();
}
local_irq_enable();
instrumentation_end();
@@ -128,31 +129,14 @@ void __cpuidle default_idle_call(void)
static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
struct cpuidle_device *dev)
{
+ int ret;
+
if (current_clr_polling_and_test())
return -EBUSY;
- return cpuidle_enter_s2idle(drv, dev);
-}
-
-static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
- int next_state)
-{
- /*
- * The idle task must be scheduled, it is pointless to go to idle, just
- * update no idle residency and return.
- */
- if (current_clr_polling_and_test()) {
- dev->last_residency_ns = 0;
- local_irq_enable();
- return -EBUSY;
- }
-
- /*
- * Enter the idle state previously returned by the governor decision.
- * This function will block until an interrupt occurs and will take
- * care of re-enabling the local interrupts
- */
- return cpuidle_enter(drv, dev, next_state);
+ ret = cpuidle_enter_s2idle(drv, dev);
+ __current_set_polling();
+ return ret;
}
/**
@@ -213,7 +197,7 @@ static void cpuidle_idle_call(void)
tick_nohz_idle_stop_tick();
next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
- call_cpuidle(drv, dev, next_state);
+ cpuidle_enter(drv, dev, next_state);
} else {
bool stop_tick = true;
@@ -227,7 +211,12 @@ static void cpuidle_idle_call(void)
else
tick_nohz_idle_retain_tick();
- entered_state = call_cpuidle(drv, dev, next_state);
+ /*
+ * Enter the idle state previously returned by the governor decision.
+ * This function will block until an interrupt occurs and will take
+ * care of re-enabling the local interrupts.
+ */
+ entered_state = cpuidle_enter(drv, dev, next_state);
/*
* Give the governor an opportunity to reflect on the outcome
*/
@@ -235,7 +224,6 @@ static void cpuidle_idle_call(void)
}
exit_idle:
- __current_set_polling();
/*
* It is up to the idle functions to re-enable local interrupts
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/6] cpuidle: Remove call_cpuidle_s2idle()
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
` (3 preceding siblings ...)
2025-01-02 15:01 ` [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states Frederic Weisbecker
@ 2025-01-02 15:01 ` Frederic Weisbecker
2025-01-14 14:53 ` Rafael J. Wysocki
2025-01-02 15:02 ` [PATCH 6/6] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
2025-01-17 17:51 ` [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of " K Prateek Nayak
6 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2025-01-02 15:01 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, linux-pm, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, Rafael J. Wysocki
This middle call is unecessary, especially now that its counterpart
call_cpuidle() has been removed.
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/sched/idle.c | 23 ++++++-----------------
1 file changed, 6 insertions(+), 17 deletions(-)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 9eece3df1080..86b902eb24fe 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -126,19 +126,6 @@ void __cpuidle default_idle_call(void)
instrumentation_end();
}
-static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
- struct cpuidle_device *dev)
-{
- int ret;
-
- if (current_clr_polling_and_test())
- return -EBUSY;
-
- ret = cpuidle_enter_s2idle(drv, dev);
- __current_set_polling();
- return ret;
-}
-
/**
* cpuidle_idle_call - the main idle function
*
@@ -184,10 +171,12 @@ static void cpuidle_idle_call(void)
u64 max_latency_ns;
if (idle_should_enter_s2idle()) {
-
- entered_state = call_cpuidle_s2idle(drv, dev);
- if (entered_state > 0)
- goto exit_idle;
+ if (!current_clr_polling_and_test()) {
+ entered_state = cpuidle_enter_s2idle(drv, dev);
+ __current_set_polling();
+ if (entered_state > 0)
+ goto exit_idle;
+ }
max_latency_ns = U64_MAX;
} else {
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/6] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
` (4 preceding siblings ...)
2025-01-02 15:01 ` [PATCH 5/6] cpuidle: Remove call_cpuidle_s2idle() Frederic Weisbecker
@ 2025-01-02 15:02 ` Frederic Weisbecker
2025-01-14 14:56 ` Rafael J. Wysocki
2025-01-17 17:51 ` [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of " K Prateek Nayak
6 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2025-01-02 15:02 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, linux-pm, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen
Software polling idle states set again TIF_NR_POLLING and clear it upon
exit. This involves error prone duplicated code and wasted cycles
performing atomic operations, sometimes RmW fully ordered.
To avoid this, benefit instead from the same generic TIF_NR_POLLING
handling that is currently in use for hardware monitoring states.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
drivers/cpuidle/cpuidle-powernv.c | 10 ----------
drivers/cpuidle/cpuidle-pseries.c | 11 -----------
drivers/cpuidle/cpuidle.c | 2 +-
drivers/cpuidle/poll_state.c | 30 ++++++++++++------------------
4 files changed, 13 insertions(+), 40 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 9ebedd972df0..1bf0d2234016 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -71,8 +71,6 @@ static int snooze_loop(struct cpuidle_device *dev,
{
u64 snooze_exit_time;
- set_thread_flag(TIF_POLLING_NRFLAG);
-
local_irq_enable();
snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index);
@@ -81,21 +79,13 @@ static int snooze_loop(struct cpuidle_device *dev,
HMT_very_low();
while (!need_resched()) {
if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
- /*
- * Task has not woken up but we are exiting the polling
- * loop anyway. Require a barrier after polling is
- * cleared to order subsequent test of need_resched().
- */
- clear_thread_flag(TIF_POLLING_NRFLAG);
dev->poll_time_limit = true;
- smp_mb();
break;
}
}
HMT_medium();
ppc64_runlatch_on();
- clear_thread_flag(TIF_POLLING_NRFLAG);
local_irq_disable();
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index f68c65f1d023..704bb01d9e9e 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -40,8 +40,6 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
{
u64 snooze_exit_time;
- set_thread_flag(TIF_POLLING_NRFLAG);
-
pseries_idle_prolog();
raw_local_irq_enable();
snooze_exit_time = get_tb() + snooze_timeout;
@@ -51,21 +49,12 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
HMT_low();
HMT_very_low();
if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
- /*
- * Task has not woken up but we are exiting the polling
- * loop anyway. Require a barrier after polling is
- * cleared to order subsequent test of need_resched().
- */
dev->poll_time_limit = true;
- clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
break;
}
}
HMT_medium();
- clear_thread_flag(TIF_POLLING_NRFLAG);
-
raw_local_irq_disable();
pseries_idle_epilog();
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 46c0a2726f67..fecc50c2860e 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -237,7 +237,7 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
broadcast = false;
}
- polling = target_state->flags & CPUIDLE_FLAG_MWAIT;
+ polling = target_state->flags & (CPUIDLE_FLAG_MWAIT | CPUIDLE_FLAG_POLLING);
/*
* If the target state doesn't poll on need_resched(), this is
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 9b6d90a72601..d69936e2517e 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -13,35 +13,29 @@
static int __cpuidle poll_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- u64 time_start;
-
- time_start = local_clock_noinstr();
+ u64 time_start = local_clock_noinstr();
+ unsigned int loop_count = 0;
+ u64 limit;
dev->poll_time_limit = false;
raw_local_irq_enable();
- if (!current_set_polling_and_test()) {
- unsigned int loop_count = 0;
- u64 limit;
- limit = cpuidle_poll_time(drv, dev);
+ limit = cpuidle_poll_time(drv, dev);
- while (!need_resched()) {
- cpu_relax();
- if (loop_count++ < POLL_IDLE_RELAX_COUNT)
- continue;
+ while (!need_resched()) {
+ cpu_relax();
+ if (loop_count++ < POLL_IDLE_RELAX_COUNT)
+ continue;
- loop_count = 0;
- if (local_clock_noinstr() - time_start > limit) {
- dev->poll_time_limit = true;
- break;
- }
+ loop_count = 0;
+ if (local_clock_noinstr() - time_start > limit) {
+ dev->poll_time_limit = true;
+ break;
}
}
raw_local_irq_disable();
- current_clr_polling();
-
return index;
}
--
2.46.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT
2025-01-02 15:01 ` [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT Frederic Weisbecker
@ 2025-01-14 14:01 ` Rafael J. Wysocki
2025-01-14 14:34 ` Sudeep Holla
0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-01-14 14:01 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Jacob Pan
On Thu, Jan 2, 2025 at 4:02 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> Provide a way to tell the cpuidle core about states monitoring
> TIF_NEED_RESCHED on the hardware level, monitor/mwait users being the
> only examples in use.
>
> This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
> kinds of TIF_NEED_RESCHED watching states while keeping a necessary
> distinction for the governors between software loops polling on
> TIF_NEED_RESCHED and hardware monitored writes to thread flags.
>
> [fweisbec: _ Initialize flag from acpi_processor_setup_cstates() instead
> of acpi_processor_setup_lpi_states(), as the latter seem to
> be about arm64...
> _ Rename CPUIDLE_FLAG_NO_IPI to CPUIDLE_FLAG_MWAIT]
>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> drivers/acpi/processor_idle.c | 3 +++
> drivers/idle/intel_idle.c | 5 ++++-
> include/linux/cpuidle.h | 1 +
> 3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 698897b29de2..66cb5536d91e 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -806,6 +806,9 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
> if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
> drv->safe_state_index = count;
>
> + if (cx->entry_method == ACPI_CSTATE_FFH)
> + state->flags |= CPUIDLE_FLAG_MWAIT;
FFH need not mean MWAIT in principle.
FFH in _CST means MWAIT in practice because _CST is used on x86 which
implements FFH through MWAIT, but it would be good at least to put a
comment here to explain that this code is only expected to run on x86.
Or better still, add something like acpi_arch_idle_state_flags(u8
entry_method) that will return CPUIDLE_FLAG_MWAIT for ACPI_CSTATE_FFH
and 0 otherwise and then do
state->flags |= acpi_arch_idle_state_flags(cx->entry_method);
> +
> /*
> * Halt-induced C1 is not good for ->enter_s2idle, because it
> * re-enables interrupts on exit. Moreover, C1 is generally not
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index ac4d8faa3886..d52723fbeb04 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1787,7 +1787,8 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
> if (cx->type > ACPI_STATE_C1)
> state->target_residency *= 3;
>
> - state->flags = MWAIT2flg(cx->address);
> + state->flags = MWAIT2flg(cx->address) | CPUIDLE_FLAG_MWAIT;
> +
> if (cx->type > ACPI_STATE_C2)
> state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;
>
> @@ -2072,6 +2073,8 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>
> static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> {
> + state->flags |= CPUIDLE_FLAG_MWAIT;
> +
> if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
> /*
> * Combining with XSTATE with IBRS or IRQ_ENABLE flags
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index a9ee4fe55dcf..b8084617aa27 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -85,6 +85,7 @@ struct cpuidle_state {
> #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */
> #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */
> #define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* idle-state takes care of RCU */
> +#define CPUIDLE_FLAG_MWAIT BIT(7) /* hardware need_resched() monitoring */
>
> struct cpuidle_device_kobj;
> struct cpuidle_state_kobj;
> --
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT
2025-01-14 14:01 ` Rafael J. Wysocki
@ 2025-01-14 14:34 ` Sudeep Holla
2025-01-14 14:37 ` Rafael J. Wysocki
0 siblings, 1 reply; 16+ messages in thread
From: Sudeep Holla @ 2025-01-14 14:34 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Frederic Weisbecker, Sudeep Holla, LKML, Peter Zijlstra,
Daniel Lezcano, linux-pm, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, Jacob Pan
On Tue, Jan 14, 2025 at 03:01:26PM +0100, Rafael J. Wysocki wrote:
> On Thu, Jan 2, 2025 at 4:02 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> > From: Peter Zijlstra <peterz@infradead.org>
> >
> > Provide a way to tell the cpuidle core about states monitoring
> > TIF_NEED_RESCHED on the hardware level, monitor/mwait users being the
> > only examples in use.
> >
> > This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
> > kinds of TIF_NEED_RESCHED watching states while keeping a necessary
> > distinction for the governors between software loops polling on
> > TIF_NEED_RESCHED and hardware monitored writes to thread flags.
> >
> > [fweisbec: _ Initialize flag from acpi_processor_setup_cstates() instead
> > of acpi_processor_setup_lpi_states(), as the latter seem to
> > be about arm64...
> > _ Rename CPUIDLE_FLAG_NO_IPI to CPUIDLE_FLAG_MWAIT]
> >
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> > drivers/acpi/processor_idle.c | 3 +++
> > drivers/idle/intel_idle.c | 5 ++++-
> > include/linux/cpuidle.h | 1 +
> > 3 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> > index 698897b29de2..66cb5536d91e 100644
> > --- a/drivers/acpi/processor_idle.c
> > +++ b/drivers/acpi/processor_idle.c
> > @@ -806,6 +806,9 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
> > if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
> > drv->safe_state_index = count;
> >
> > + if (cx->entry_method == ACPI_CSTATE_FFH)
> > + state->flags |= CPUIDLE_FLAG_MWAIT;
>
> FFH need not mean MWAIT in principle.
>
> FFH in _CST means MWAIT in practice because _CST is used on x86 which
> implements FFH through MWAIT, but it would be good at least to put a
> comment here to explain that this code is only expected to run on x86.
>
> Or better still, add something like acpi_arch_idle_state_flags(u8
> entry_method) that will return CPUIDLE_FLAG_MWAIT for ACPI_CSTATE_FFH
> and 0 otherwise and then do
>
> state->flags |= acpi_arch_idle_state_flags(cx->entry_method);
>
+1, was about to suggest the same. Though I am not aware of any Arm platforms
using C-States(LPI was added to suit Arm requirements), it is better to keep
the FFH definition arch specific.
--
Regards,
Sudeep
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/6] x86/cpuidle: Move buggy mwait implementations away from CPUIDLE_FLAG_MWAIT
2025-01-02 15:01 ` [PATCH 3/6] x86/cpuidle: Move buggy mwait implementations away from CPUIDLE_FLAG_MWAIT Frederic Weisbecker
@ 2025-01-14 14:35 ` Rafael J. Wysocki
0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-01-14 14:35 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Daniel Lezcano, linux-pm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Russell King,
Catalin Marinas, Will Deacon, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Madhavan Srinivasan, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Len Brown, Arnd Bergmann, linux-arch
On Thu, Jan 2, 2025 at 4:02 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Buggy MWAIT implementations can't carry the CPUIDLE_FLAG_MWAIT flag
> because they require IPIs to wake up. Therefore they shouldn't be
> called with TIF_NR_POLLING.
>
> Reported-by: Rafael J. Wysocki <rafael@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
I would do a patch introducing arch_cpuidle_mwait_needs_ipi() or
equivalent before patch [2/6] and I would use it in patch [2/6] right
away.
> ---
> arch/arm/include/asm/cpuidle.h | 2 ++
> arch/arm64/include/asm/cpuidle.h | 3 +++
> arch/powerpc/include/asm/cpuidle.h | 4 ++++
> arch/riscv/include/asm/cpuidle.h | 2 ++
> arch/x86/include/asm/cpuidle.h | 12 ++++++++++++
> drivers/acpi/processor_idle.c | 4 +++-
> drivers/idle/intel_idle.c | 9 +++++++--
> include/asm-generic/Kbuild | 1 +
> include/asm-generic/cpuidle.h | 10 ++++++++++
> 9 files changed, 44 insertions(+), 3 deletions(-)
> create mode 100644 arch/x86/include/asm/cpuidle.h
> create mode 100644 include/asm-generic/cpuidle.h
>
> diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
> index 397be5ed30e7..0ea1d2ec837d 100644
> --- a/arch/arm/include/asm/cpuidle.h
> +++ b/arch/arm/include/asm/cpuidle.h
> @@ -55,4 +55,6 @@ struct arm_cpuidle_irq_context { };
> #define arm_cpuidle_save_irq_context(c) (void)c
> #define arm_cpuidle_restore_irq_context(c) (void)c
>
> +#include <asm-generic/cpuidle.h>
> +
> #endif
> diff --git a/arch/arm64/include/asm/cpuidle.h b/arch/arm64/include/asm/cpuidle.h
> index 2047713e097d..ef49124135a7 100644
> --- a/arch/arm64/include/asm/cpuidle.h
> +++ b/arch/arm64/include/asm/cpuidle.h
> @@ -38,4 +38,7 @@ struct arm_cpuidle_irq_context { };
> #define arm_cpuidle_save_irq_context(c) (void)c
> #define arm_cpuidle_restore_irq_context(c) (void)c
> #endif
> +
> +#include <asm-generic/cpuidle.h>
> +
> #endif
> diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
> index 0cce5dc7fb1c..788706bc04ec 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -102,4 +102,8 @@ static inline void report_invalid_psscr_val(u64 psscr_val, int err)
>
> #endif
>
> +#ifndef __ASSEMBLY__
> +#include <asm-generic/cpuidle.h>
> +#endif
> +
> #endif
> diff --git a/arch/riscv/include/asm/cpuidle.h b/arch/riscv/include/asm/cpuidle.h
> index 71fdc607d4bc..1f1b24901d86 100644
> --- a/arch/riscv/include/asm/cpuidle.h
> +++ b/arch/riscv/include/asm/cpuidle.h
> @@ -21,4 +21,6 @@ static inline void cpu_do_idle(void)
> wait_for_interrupt();
> }
>
> +#include <asm-generic/cpuidle.h>
> +
> #endif
> diff --git a/arch/x86/include/asm/cpuidle.h b/arch/x86/include/asm/cpuidle.h
> new file mode 100644
> index 000000000000..a59db1a3314a
> --- /dev/null
> +++ b/arch/x86/include/asm/cpuidle.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_CPUIDLE_H
> +#define _ASM_X86_CPUIDLE_H
> +
> +#include <asm/cpufeature.h>
> +
> +static inline bool arch_cpuidle_mwait_needs_ipi(void)
> +{
> + return boot_cpu_has_bug(X86_BUG_MONITOR);
> +}
> +
> +#endif /* _ASM_X86_CPUIDLE_H */
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 66cb5536d91e..0f29dd92b346 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -23,6 +23,7 @@
> #include <linux/perf_event.h>
> #include <acpi/processor.h>
> #include <linux/context_tracking.h>
> +#include <asm/cpuidle.h>
>
> /*
> * Include the apic definitions for x86 to have the APIC timer related defines
> @@ -806,7 +807,8 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
> if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
> drv->safe_state_index = count;
>
> - if (cx->entry_method == ACPI_CSTATE_FFH)
> + if (cx->entry_method == ACPI_CSTATE_FFH &&
> + !arch_cpuidle_mwait_needs_ipi())
> state->flags |= CPUIDLE_FLAG_MWAIT;
>
> /*
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index d52723fbeb04..b2f494effd4a 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -56,6 +56,7 @@
> #include <asm/mwait.h>
> #include <asm/spec-ctrl.h>
> #include <asm/fpu/api.h>
> +#include <asm/cpuidle.h>
>
> #define INTEL_IDLE_VERSION "0.5.1"
>
> @@ -1787,7 +1788,10 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
> if (cx->type > ACPI_STATE_C1)
> state->target_residency *= 3;
>
> - state->flags = MWAIT2flg(cx->address) | CPUIDLE_FLAG_MWAIT;
> + state->flags = MWAIT2flg(cx->address);
> +
> + if (!arch_cpuidle_mwait_needs_ipi())
> + state->flags |= CPUIDLE_FLAG_MWAIT;
>
> if (cx->type > ACPI_STATE_C2)
> state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;
> @@ -2073,7 +2077,8 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>
> static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> {
> - state->flags |= CPUIDLE_FLAG_MWAIT;
> + if (!arch_cpuidle_mwait_needs_ipi())
> + state->flags |= CPUIDLE_FLAG_MWAIT;
Also, some code duplication could be avoided by having something like
arch_x86_mwait_state() returning the flag if the condition is met:
state->flags |= arch_x86_mwait_state();
>
> if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
> /*
> diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
> index 1b43c3a77012..7754da499d16 100644
> --- a/include/asm-generic/Kbuild
> +++ b/include/asm-generic/Kbuild
> @@ -13,6 +13,7 @@ mandatory-y += cacheflush.h
> mandatory-y += cfi.h
> mandatory-y += checksum.h
> mandatory-y += compat.h
> +mandatory-y += cpuidle.h
> mandatory-y += current.h
> mandatory-y += delay.h
> mandatory-y += device.h
> diff --git a/include/asm-generic/cpuidle.h b/include/asm-generic/cpuidle.h
> new file mode 100644
> index 000000000000..748a2022ed2a
> --- /dev/null
> +++ b/include/asm-generic/cpuidle.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_CPUIDLE_H
> +#define __ASM_CPUIDLE_H
> +
> +static inline bool arch_cpuidle_mwait_needs_ipi(void)
> +{
> + return true;
> +}
> +
> +#endif /* __ASM_CPUIDLE_H */
> --
> 2.46.0
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT
2025-01-14 14:34 ` Sudeep Holla
@ 2025-01-14 14:37 ` Rafael J. Wysocki
0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-01-14 14:37 UTC (permalink / raw)
To: Sudeep Holla, Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Daniel Lezcano, linux-pm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen
On Tue, Jan 14, 2025 at 3:34 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Tue, Jan 14, 2025 at 03:01:26PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Jan 2, 2025 at 4:02 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> > >
> > > From: Peter Zijlstra <peterz@infradead.org>
> > >
> > > Provide a way to tell the cpuidle core about states monitoring
> > > TIF_NEED_RESCHED on the hardware level, monitor/mwait users being the
> > > only examples in use.
> > >
> > > This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
> > > kinds of TIF_NEED_RESCHED watching states while keeping a necessary
> > > distinction for the governors between software loops polling on
> > > TIF_NEED_RESCHED and hardware monitored writes to thread flags.
> > >
> > > [fweisbec: _ Initialize flag from acpi_processor_setup_cstates() instead
> > > of acpi_processor_setup_lpi_states(), as the latter seem to
> > > be about arm64...
> > > _ Rename CPUIDLE_FLAG_NO_IPI to CPUIDLE_FLAG_MWAIT]
> > >
> > > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > > drivers/acpi/processor_idle.c | 3 +++
> > > drivers/idle/intel_idle.c | 5 ++++-
> > > include/linux/cpuidle.h | 1 +
> > > 3 files changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> > > index 698897b29de2..66cb5536d91e 100644
> > > --- a/drivers/acpi/processor_idle.c
> > > +++ b/drivers/acpi/processor_idle.c
> > > @@ -806,6 +806,9 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
> > > if (cx->type == ACPI_STATE_C1 || cx->type == ACPI_STATE_C2)
> > > drv->safe_state_index = count;
> > >
> > > + if (cx->entry_method == ACPI_CSTATE_FFH)
> > > + state->flags |= CPUIDLE_FLAG_MWAIT;
> >
> > FFH need not mean MWAIT in principle.
> >
> > FFH in _CST means MWAIT in practice because _CST is used on x86 which
> > implements FFH through MWAIT, but it would be good at least to put a
> > comment here to explain that this code is only expected to run on x86.
> >
> > Or better still, add something like acpi_arch_idle_state_flags(u8
> > entry_method) that will return CPUIDLE_FLAG_MWAIT for ACPI_CSTATE_FFH
> > and 0 otherwise and then do
> >
> > state->flags |= acpi_arch_idle_state_flags(cx->entry_method);
> >
>
> +1, was about to suggest the same. Though I am not aware of any Arm platforms
> using C-States(LPI was added to suit Arm requirements), it is better to keep
> the FFH definition arch specific.
Which will be consistent with this patch among other things:
https://lore.kernel.org/linux-pm/20250110115953.6058-3-patryk.wlazlyn@linux.intel.com/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states
2025-01-02 15:01 ` [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states Frederic Weisbecker
@ 2025-01-14 14:50 ` Rafael J. Wysocki
2025-01-17 18:09 ` K Prateek Nayak
1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-01-14 14:50 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Daniel Lezcano, linux-pm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen
On Thu, Jan 2, 2025 at 4:02 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> The current handling of TIF_NR_POLLING is a bit of a maze:
>
> 1) TIF_NR_POLLING is set on idle entry (one atomic set)
>
> 2) Once cpuidle has selected an appropriate state and the tick is
> evaluated and then possibly stopped, TIF_NR_POLLING is cleared
> (one RmW operation)
>
> 2) The cpuidle state is then called with TIF_NR_POLLING cleared but if
> the state polls on (or monitors) need_resched() it sets again
> TIF_NR_POLLING before sleeping and clears it on wake-up. Summary:
> another pair of set/clear
>
> 3) Set back TIF_NR_POLLING (one atomic set)
>
> 4) goto 2) if need_resched() is not set
>
> All those costly atomic operations, fully ordered RmW for some of
> them, could be avoided if the cpuidle core knew in advance if the target
> state polls on (or monitors) need_resched(). If so, TIF_NR_POLLING could
> simply be set once upon entering the idle loop and cleared once after
> idle loop exit.
>
> Start dealing with that with handling TIF_NR_POLLING on behalf of
> mwait based states.
>
> [fweisbec: _ Handle broadcast properly
> _ Ignore mwait_idle() as it can be used by default_idle_call()]
>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> arch/x86/include/asm/mwait.h | 27 +++++++++++-------------
> drivers/cpuidle/cpuidle.c | 22 +++++++++++++++++++-
> include/linux/sched/idle.h | 7 ++++++-
> kernel/sched/idle.c | 40 +++++++++++++-----------------------
> 4 files changed, 53 insertions(+), 43 deletions(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 920426d691ce..3e06a7f3bf5a 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -116,25 +116,22 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> */
> static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
> {
> - if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
> - if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
> - mb();
> - clflush((void *)¤t_thread_info()->flags);
> - mb();
> - }
> + if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
> + mb();
> + clflush((void *)¤t_thread_info()->flags);
> + mb();
> + }
>
> - __monitor((void *)¤t_thread_info()->flags, 0, 0);
> + __monitor((void *)¤t_thread_info()->flags, 0, 0);
>
> - if (!need_resched()) {
> - if (ecx & 1) {
> - __mwait(eax, ecx);
> - } else {
> - __sti_mwait(eax, ecx);
> - raw_local_irq_disable();
> - }
> + if (!need_resched()) {
> + if (ecx & 1) {
> + __mwait(eax, ecx);
> + } else {
> + __sti_mwait(eax, ecx);
> + raw_local_irq_disable();
> }
> }
> - current_clr_polling();
> }
>
> /*
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 0835da449db8..46c0a2726f67 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -217,10 +217,10 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
> int index)
> {
> int entered_state;
> -
> struct cpuidle_state *target_state = &drv->states[index];
> bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
> ktime_t time_start, time_end;
> + bool polling;
>
> instrumentation_begin();
>
> @@ -237,6 +237,23 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
> broadcast = false;
> }
>
> + polling = target_state->flags & CPUIDLE_FLAG_MWAIT;
Hmmm. What about "polling" states, like state 0 on all x86?
They also monitor need_resched() -see poll_idle().
> +
> + /*
> + * If the target state doesn't poll on need_resched(), this is
> + * the last check after which further TIF_NEED_RESCHED remote setting
> + * will involve an IPI.
> + */
> + if (!polling && current_clr_polling_and_test()) {
> + if (broadcast)
> + tick_broadcast_exit();
> + dev->last_residency_ns = 0;
> + local_irq_enable();
> + instrumentation_end();
> + return -EBUSY;
> + }
> +
> +
> if (target_state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
> leave_mm();
>
> @@ -336,6 +353,9 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
> dev->states_usage[index].rejected++;
> }
>
> + if (!polling)
> + __current_set_polling();
> +
> instrumentation_end();
>
> return entered_state;
> diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
> index e670ac282333..3e3482bfb028 100644
> --- a/include/linux/sched/idle.h
> +++ b/include/linux/sched/idle.h
> @@ -68,6 +68,8 @@ static __always_inline bool __must_check current_set_polling_and_test(void)
>
> static __always_inline bool __must_check current_clr_polling_and_test(void)
> {
> + bool ret;
> +
> __current_clr_polling();
>
> /*
> @@ -76,7 +78,10 @@ static __always_inline bool __must_check current_clr_polling_and_test(void)
> */
> smp_mb__after_atomic();
>
> - return unlikely(tif_need_resched());
> + ret = unlikely(tif_need_resched());
> + if (ret)
> + __current_set_polling();
> + return ret;
> }
>
> #else
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 621696269584..9eece3df1080 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -114,12 +114,13 @@ void __cpuidle default_idle_call(void)
> stop_critical_timings();
>
> ct_cpuidle_enter();
> - arch_cpu_idle();
> + arch_cpu_idle(); // XXX assumes !polling
Well, what if x86_idle is default_idle()? Say, somebody boots with
IDLE_NOMWAIT?
> ct_cpuidle_exit();
>
> start_critical_timings();
> trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> cond_tick_broadcast_exit();
> + __current_set_polling();
> }
> local_irq_enable();
> instrumentation_end();
> @@ -128,31 +129,14 @@ void __cpuidle default_idle_call(void)
> static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
> struct cpuidle_device *dev)
> {
> + int ret;
> +
> if (current_clr_polling_and_test())
> return -EBUSY;
>
> - return cpuidle_enter_s2idle(drv, dev);
> -}
> -
> -static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> - int next_state)
> -{
> - /*
> - * The idle task must be scheduled, it is pointless to go to idle, just
> - * update no idle residency and return.
> - */
> - if (current_clr_polling_and_test()) {
> - dev->last_residency_ns = 0;
> - local_irq_enable();
> - return -EBUSY;
> - }
> -
> - /*
> - * Enter the idle state previously returned by the governor decision.
> - * This function will block until an interrupt occurs and will take
> - * care of re-enabling the local interrupts
> - */
> - return cpuidle_enter(drv, dev, next_state);
> + ret = cpuidle_enter_s2idle(drv, dev);
> + __current_set_polling();
> + return ret;
> }
>
> /**
> @@ -213,7 +197,7 @@ static void cpuidle_idle_call(void)
> tick_nohz_idle_stop_tick();
>
> next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
> - call_cpuidle(drv, dev, next_state);
> + cpuidle_enter(drv, dev, next_state);
> } else {
> bool stop_tick = true;
>
> @@ -227,7 +211,12 @@ static void cpuidle_idle_call(void)
> else
> tick_nohz_idle_retain_tick();
>
> - entered_state = call_cpuidle(drv, dev, next_state);
> + /*
> + * Enter the idle state previously returned by the governor decision.
> + * This function will block until an interrupt occurs and will take
> + * care of re-enabling the local interrupts.
> + */
> + entered_state = cpuidle_enter(drv, dev, next_state);
> /*
> * Give the governor an opportunity to reflect on the outcome
> */
> @@ -235,7 +224,6 @@ static void cpuidle_idle_call(void)
> }
>
> exit_idle:
> - __current_set_polling();
>
> /*
> * It is up to the idle functions to re-enable local interrupts
> --
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/6] cpuidle: Remove call_cpuidle_s2idle()
2025-01-02 15:01 ` [PATCH 5/6] cpuidle: Remove call_cpuidle_s2idle() Frederic Weisbecker
@ 2025-01-14 14:53 ` Rafael J. Wysocki
0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-01-14 14:53 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Daniel Lezcano, linux-pm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Rafael J. Wysocki
On Thu, Jan 2, 2025 at 4:02 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> This middle call is unecessary, especially now that its counterpart
unnecessary (one 'n' is missing)
> call_cpuidle() has been removed.
>
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
For the code change
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> kernel/sched/idle.c | 23 ++++++-----------------
> 1 file changed, 6 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 9eece3df1080..86b902eb24fe 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -126,19 +126,6 @@ void __cpuidle default_idle_call(void)
> instrumentation_end();
> }
>
> -static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
> - struct cpuidle_device *dev)
> -{
> - int ret;
> -
> - if (current_clr_polling_and_test())
> - return -EBUSY;
> -
> - ret = cpuidle_enter_s2idle(drv, dev);
> - __current_set_polling();
> - return ret;
> -}
> -
> /**
> * cpuidle_idle_call - the main idle function
> *
> @@ -184,10 +171,12 @@ static void cpuidle_idle_call(void)
> u64 max_latency_ns;
>
> if (idle_should_enter_s2idle()) {
> -
> - entered_state = call_cpuidle_s2idle(drv, dev);
> - if (entered_state > 0)
> - goto exit_idle;
> + if (!current_clr_polling_and_test()) {
> + entered_state = cpuidle_enter_s2idle(drv, dev);
> + __current_set_polling();
> + if (entered_state > 0)
> + goto exit_idle;
> + }
>
> max_latency_ns = U64_MAX;
> } else {
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 6/6] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states
2025-01-02 15:02 ` [PATCH 6/6] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
@ 2025-01-14 14:56 ` Rafael J. Wysocki
0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2025-01-14 14:56 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Daniel Lezcano, linux-pm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen
On Thu, Jan 2, 2025 at 4:02 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Software polling idle states set again TIF_NR_POLLING and clear it upon
> exit. This involves error prone duplicated code and wasted cycles
> performing atomic operations, sometimes RmW fully ordered.
>
> To avoid this, benefit instead from the same generic TIF_NR_POLLING
> handling that is currently in use for hardware monitoring states.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
So this is the part I was wondering about when I was looking at patch [4/6].
The code changes below look good to me, so feel free to add
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
to this patch when ready.
> ---
> drivers/cpuidle/cpuidle-powernv.c | 10 ----------
> drivers/cpuidle/cpuidle-pseries.c | 11 -----------
> drivers/cpuidle/cpuidle.c | 2 +-
> drivers/cpuidle/poll_state.c | 30 ++++++++++++------------------
> 4 files changed, 13 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 9ebedd972df0..1bf0d2234016 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -71,8 +71,6 @@ static int snooze_loop(struct cpuidle_device *dev,
> {
> u64 snooze_exit_time;
>
> - set_thread_flag(TIF_POLLING_NRFLAG);
> -
> local_irq_enable();
>
> snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index);
> @@ -81,21 +79,13 @@ static int snooze_loop(struct cpuidle_device *dev,
> HMT_very_low();
> while (!need_resched()) {
> if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
> - /*
> - * Task has not woken up but we are exiting the polling
> - * loop anyway. Require a barrier after polling is
> - * cleared to order subsequent test of need_resched().
> - */
> - clear_thread_flag(TIF_POLLING_NRFLAG);
> dev->poll_time_limit = true;
> - smp_mb();
> break;
> }
> }
>
> HMT_medium();
> ppc64_runlatch_on();
> - clear_thread_flag(TIF_POLLING_NRFLAG);
>
> local_irq_disable();
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index f68c65f1d023..704bb01d9e9e 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -40,8 +40,6 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> {
> u64 snooze_exit_time;
>
> - set_thread_flag(TIF_POLLING_NRFLAG);
> -
> pseries_idle_prolog();
> raw_local_irq_enable();
> snooze_exit_time = get_tb() + snooze_timeout;
> @@ -51,21 +49,12 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> HMT_low();
> HMT_very_low();
> if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
> - /*
> - * Task has not woken up but we are exiting the polling
> - * loop anyway. Require a barrier after polling is
> - * cleared to order subsequent test of need_resched().
> - */
> dev->poll_time_limit = true;
> - clear_thread_flag(TIF_POLLING_NRFLAG);
> - smp_mb();
> break;
> }
> }
>
> HMT_medium();
> - clear_thread_flag(TIF_POLLING_NRFLAG);
> -
> raw_local_irq_disable();
>
> pseries_idle_epilog();
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 46c0a2726f67..fecc50c2860e 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -237,7 +237,7 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
> broadcast = false;
> }
>
> - polling = target_state->flags & CPUIDLE_FLAG_MWAIT;
> + polling = target_state->flags & (CPUIDLE_FLAG_MWAIT | CPUIDLE_FLAG_POLLING);
>
> /*
> * If the target state doesn't poll on need_resched(), this is
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 9b6d90a72601..d69936e2517e 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -13,35 +13,29 @@
> static int __cpuidle poll_idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> - u64 time_start;
> -
> - time_start = local_clock_noinstr();
> + u64 time_start = local_clock_noinstr();
> + unsigned int loop_count = 0;
> + u64 limit;
>
> dev->poll_time_limit = false;
>
> raw_local_irq_enable();
> - if (!current_set_polling_and_test()) {
> - unsigned int loop_count = 0;
> - u64 limit;
>
> - limit = cpuidle_poll_time(drv, dev);
> + limit = cpuidle_poll_time(drv, dev);
>
> - while (!need_resched()) {
> - cpu_relax();
> - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> - continue;
> + while (!need_resched()) {
> + cpu_relax();
> + if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> + continue;
>
> - loop_count = 0;
> - if (local_clock_noinstr() - time_start > limit) {
> - dev->poll_time_limit = true;
> - break;
> - }
> + loop_count = 0;
> + if (local_clock_noinstr() - time_start > limit) {
> + dev->poll_time_limit = true;
> + break;
> }
> }
> raw_local_irq_disable();
>
> - current_clr_polling();
> -
> return index;
> }
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
` (5 preceding siblings ...)
2025-01-02 15:02 ` [PATCH 6/6] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
@ 2025-01-17 17:51 ` K Prateek Nayak
6 siblings, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2025-01-17 17:51 UTC (permalink / raw)
To: Frederic Weisbecker, LKML
Cc: Rafael J . Wysocki, Daniel Lezcano, linux-pm, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Peter Zijlstra
Hello Frederic,
On 1/2/2025 8:31 PM, Frederic Weisbecker wrote:
> The TIF_NR_POLLING handling against TIF_NEED_RESCHED polling/monitoring
> idle states (mwait and also software polling) is a bit messy, with quite
> some wasted cycles spent on useless atomic operations. This tries to
> consolidate this state handling from the cpuidle core.
I tested this series on my 3rd Generation EPYC System with ipistorm and
I've not observed any regressions in IPI throughput to idle CPUs in
TIF_POLLING mode. Additionally tested few microbenchmarks and there were
no surprises there either. Feel free to add:
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
--
Thanks and Regards,
Prateek
>
> Changes since v2:
>
> _ Handle buggy mwait implementations (thanks Rafael)
>
> Frederic Weisbecker (4):
> cpuidle: Remove unnecessary current_clr_polling_and_test() from
> haltpoll
> x86/cpuidle: Move buggy mwait implementations away from
> CPUIDLE_FLAG_MWAIT
> cpuidle: Remove call_cpuidle_s2idle()
> cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle
> states
>
> Peter Zijlstra (2):
> cpuidle: Introduce CPUIDLE_FLAG_MWAIT
> cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states
>
> arch/arm/include/asm/cpuidle.h | 2 ++
> arch/arm64/include/asm/cpuidle.h | 3 ++
> arch/powerpc/include/asm/cpuidle.h | 4 +++
> arch/riscv/include/asm/cpuidle.h | 2 ++
> arch/x86/include/asm/cpuidle.h | 12 +++++++
> arch/x86/include/asm/mwait.h | 27 +++++++--------
> drivers/acpi/processor_idle.c | 5 +++
> drivers/cpuidle/cpuidle-haltpoll.c | 3 --
> drivers/cpuidle/cpuidle-powernv.c | 10 ------
> drivers/cpuidle/cpuidle-pseries.c | 11 -------
> drivers/cpuidle/cpuidle.c | 22 ++++++++++++-
> drivers/cpuidle/poll_state.c | 30 +++++++----------
> drivers/idle/intel_idle.c | 8 +++++
> include/asm-generic/Kbuild | 1 +
> include/asm-generic/cpuidle.h | 10 ++++++
> include/linux/cpuidle.h | 1 +
> include/linux/sched/idle.h | 7 +++-
> kernel/sched/idle.c | 53 +++++++++---------------------
> 18 files changed, 114 insertions(+), 97 deletions(-)
> create mode 100644 arch/x86/include/asm/cpuidle.h
> create mode 100644 include/asm-generic/cpuidle.h
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states
2025-01-02 15:01 ` [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states Frederic Weisbecker
2025-01-14 14:50 ` Rafael J. Wysocki
@ 2025-01-17 18:09 ` K Prateek Nayak
1 sibling, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2025-01-17 18:09 UTC (permalink / raw)
To: Frederic Weisbecker, LKML
Cc: Peter Zijlstra, Jacob Pan, Rafael J . Wysocki, Daniel Lezcano,
linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen
Hello Frederic,
On 1/2/2025 8:31 PM, Frederic Weisbecker wrote:
> [..snip..]
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 621696269584..9eece3df1080 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -114,12 +114,13 @@ void __cpuidle default_idle_call(void)
> stop_critical_timings();
>
> ct_cpuidle_enter();
> - arch_cpu_idle();
> + arch_cpu_idle(); // XXX assumes !polling
> ct_cpuidle_exit();
>
> start_critical_timings();
> trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> cond_tick_broadcast_exit();
> + __current_set_polling();
> }
> local_irq_enable();
> instrumentation_end();
> @@ -128,31 +129,14 @@ void __cpuidle default_idle_call(void)
> static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
> struct cpuidle_device *dev)
> {
> + int ret;
> +
> if (current_clr_polling_and_test())
> return -EBUSY;
>
> - return cpuidle_enter_s2idle(drv, dev);
> -}
> -
> -static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> - int next_state)
> -{
> - /*
> - * The idle task must be scheduled, it is pointless to go to idle, just
> - * update no idle residency and return.
> - */
> - if (current_clr_polling_and_test()) {
nit.
Previously, if TIF_NEED_RESCHED was set by this point, the CPU would
bail out early before entering the idle state but with this change, I
believe only at need_resched() in mwait_idle_with_hints() do we realize
we have a pending IPI / task wakeup. Is this a concern?
In my testing with ipistorm I could not see any difference in IPI
throughput to polling idle CPUs but a bailout before the entry method
for need_resched() can save a few cycles.
--
Thanks and Regards,
Prateek
> - dev->last_residency_ns = 0;
> - local_irq_enable();
> - return -EBUSY;
> - }
> -
> - /*
> - * Enter the idle state previously returned by the governor decision.
> - * This function will block until an interrupt occurs and will take
> - * care of re-enabling the local interrupts
> - */
> - return cpuidle_enter(drv, dev, next_state);
> + ret = cpuidle_enter_s2idle(drv, dev);
> + __current_set_polling();
> + return ret;
> }
>
> /**
> @@ -213,7 +197,7 @@ static void cpuidle_idle_call(void)
> tick_nohz_idle_stop_tick();
>
> next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
> - call_cpuidle(drv, dev, next_state);
> + cpuidle_enter(drv, dev, next_state);
> } else {
> bool stop_tick = true;
>
> @@ -227,7 +211,12 @@ static void cpuidle_idle_call (void)
> else
> tick_nohz_idle_retain_tick();
>
> - entered_state = call_cpuidle(drv, dev, next_state);
> + /*
> + * Enter the idle state previously returned by the governor decision.
> + * This function will block until an interrupt occurs and will take
> + * care of re-enabling the local interrupts.
> + */
> + entered_state = cpuidle_enter(drv, dev, next_state);
> /*
> * Give the governor an opportunity to reflect on the outcome
> */
> @@ -235,7 +224,6 @@ static void cpuidle_idle_call(void)
> }
>
> exit_idle:
> - __current_set_polling();
>
> /*
> * It is up to the idle functions to re-enable local interrupts
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-01-17 18:09 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02 15:01 [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
2025-01-02 15:01 ` [PATCH 1/6] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
2025-01-02 15:01 ` [PATCH 2/6] cpuidle: Introduce CPUIDLE_FLAG_MWAIT Frederic Weisbecker
2025-01-14 14:01 ` Rafael J. Wysocki
2025-01-14 14:34 ` Sudeep Holla
2025-01-14 14:37 ` Rafael J. Wysocki
2025-01-02 15:01 ` [PATCH 3/6] x86/cpuidle: Move buggy mwait implementations away from CPUIDLE_FLAG_MWAIT Frederic Weisbecker
2025-01-14 14:35 ` Rafael J. Wysocki
2025-01-02 15:01 ` [PATCH 4/6] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_MWAIT states Frederic Weisbecker
2025-01-14 14:50 ` Rafael J. Wysocki
2025-01-17 18:09 ` K Prateek Nayak
2025-01-02 15:01 ` [PATCH 5/6] cpuidle: Remove call_cpuidle_s2idle() Frederic Weisbecker
2025-01-14 14:53 ` Rafael J. Wysocki
2025-01-02 15:02 ` [PATCH 6/6] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
2025-01-14 14:56 ` Rafael J. Wysocki
2025-01-17 17:51 ` [PATCH 0/6 v3] cpuidle: Handle TIF_NR_POLLING on behalf of " K Prateek Nayak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).