* [PATCH v10 00/11] arm64: support poll_idle()
@ 2025-02-18 21:33 Ankur Arora
2025-02-18 21:33 ` [PATCH v10 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait() Ankur Arora
` (11 more replies)
0 siblings, 12 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
Hi,
This patchset adds support for polling in idle on arm64 via poll_idle()
and adds the requisite support to acpi-idle and cpuidle-haltpoll.
v10 is a respin of v9 with the timed wait barrier logic
(smp_cond_load_relaxed_timewait()) moved out into a separate
series [0]. (The barrier patches could also do with some eyes.)
Why poll in idle?
==
The benefit of polling in idle is to reduce the cost (and latency)
of remote wakeups. When enabled, these can be done just by setting the
need-resched bit, eliding the IPI, and the cost of handling the
interrupt on the receiver.
Comparing sched-pipe performance on a guest VM:
# perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions \
-e sched:sched_wake_idle_without_ipi perf bench sched pipe -l 1000000 --cpu 4
# without polling in idle
Performance counter stats for 'CPU(s) 4,5' (5 runs):
25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% )
45,821,250,284 cycles # 1.816 GHz ( +- 10.07% )
26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% )
0 sched:sched_wake_idle_without_ipi # 0.000 /sec
12.615 +- 0.977 seconds time elapsed ( +- 7.75% )
# polling in idle (with haltpoll):
Performance counter stats for 'CPU(s) 4,5' (5 runs):
15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% )
34,158,188,839 cycles # 2.257 GHz ( +- 6.91% )
20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% )
1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% )
7.566 +- 0.756 seconds time elapsed ( +- 10.00% )
Comparing the two cases, there's a significant drop in both cycles and
instructions executed. And a signficant drop in the wakeup latency.
Tomohiro Misono and Haris Okanovic also report similar latency
improvements on Grace and Graviton systems (for v7) [1] [2].
Haris also tested a modified v9 on top of the split out barrier
primitives.
Lifeng also reports improved context switch latency on a bare-metal
machine with acpi-idle [3].
Series layout
==
- patches 1-3,
"cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait()"
"cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL"
"Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig"
switch poll_idle() to using the new barrier interface. Also, do some
munging of related kconfig options.
- patches 4-5,
"arm64: define TIF_POLLING_NRFLAG"
"arm64: add support for poll_idle()"
add arm64 support for the polling flag and enable poll_idle()
support.
- patches 6, 7-11,
"ACPI: processor_idle: Support polling state for LPI"
"cpuidle-haltpoll: define arch_haltpoll_want()"
"governors/haltpoll: drop kvm_para_available() check"
"cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL"
"arm64: idle: export arch_cpu_idle()"
"arm64: support cpuidle-haltpoll"
add support for polling via acpi-idle, and cpuidle-haltpoll.
Changelog
==
v10: respin of v9
- sent out smp_cond_load_relaxed_timeout() separately [0]
- Dropped from this series:
"asm-generic: add barrier smp_cond_load_relaxed_timeout()"
"arm64: barrier: add support for smp_cond_relaxed_timeout()"
"arm64/delay: move some constants out to a separate header"
"arm64: support WFET in smp_cond_relaxed_timeout()"
- reworded some commit messages
v9:
- reworked the series to address a comment from Catalin Marinas
about how v8 was abusing semantics of smp_cond_load_relaxed().
- add poll_idle() support in acpi-idle (Lifeng Zheng)
- dropped some earlier "Tested-by", "Reviewed-by" due to the
above rework.
v8: No logic changes. Largely respin of v7, with changes
noted below:
- move selection of ARCH_HAS_OPTIMIZED_POLL on arm64 to its
own patch.
(patch-9 "arm64: select ARCH_HAS_OPTIMIZED_POLL")
- address comments simplifying arm64 support (Will Deacon)
(patch-11 "arm64: support cpuidle-haltpoll")
v7: No significant logic changes. Mostly a respin of v6.
- minor cleanup in poll_idle() (Christoph Lameter)
- fixes conflicts due to code movement in arch/arm64/kernel/cpuidle.c
(Tomohiro Misono)
v6:
- reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL
changes together (comment from Christoph Lameter)
- threshes out the commit messages a bit more (comments from Christoph
Lameter, Sudeep Holla)
- also rework selection of cpuidle-haltpoll. Now selected based
on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
- moved back to arch_haltpoll_want() (comment from Joao Martins)
Also, arch_haltpoll_want() now takes the force parameter and is
now responsible for the complete selection (or not) of haltpoll.
- fixes the build breakage on i386
- fixes the cpuidle-haltpoll module breakage on arm64 (comment from
Tomohiro Misono, Haris Okanovic)
v5:
- rework the poll_idle() loop around smp_cond_load_relaxed() (review
comment from Tomohiro Misono.)
- also rework selection of cpuidle-haltpoll. Now selected based
on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
- arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on
arm64 now depends on the event-stream being enabled.
- limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic)
- ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL.
v4 changes from v3:
- change 7/8 per Rafael input: drop the parens and use ret for the final check
- add 8/8 which renames the guard for building poll_state
v3 changes from v2:
- fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig
- add Ack-by from Rafael Wysocki on 2/7
v2 changes from v1:
- added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ
(this improves by 50% at least the CPU cycles consumed in the tests above:
10,716,881,137 now vs 14,503,014,257 before)
- removed the ifdef from patch 1 per RafaelW
Would appreciate any review comments.
Ankur
[0] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
[1] https://lore.kernel.org/lkml/TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@TY3PR01MB11148.jpnprd01.prod.outlook.com/
[2] https://lore.kernel.org/lkml/104d0ec31cb45477e27273e089402d4205ee4042.camel@amazon.com/
[3] https://lore.kernel.org/lkml/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@huawei.com/
Ankur Arora (6):
cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait()
cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL
arm64: add support for poll_idle()
cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL
arm64: idle: export arch_cpu_idle()
arm64: support cpuidle-haltpoll
Joao Martins (4):
Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig
arm64: define TIF_POLLING_NRFLAG
cpuidle-haltpoll: define arch_haltpoll_want()
governors/haltpoll: drop kvm_para_available() check
Lifeng Zheng (1):
ACPI: processor_idle: Support polling state for LPI
arch/Kconfig | 3 ++
arch/arm64/Kconfig | 7 ++++
arch/arm64/include/asm/cpuidle_haltpoll.h | 20 +++++++++++
arch/arm64/include/asm/thread_info.h | 2 ++
arch/arm64/kernel/idle.c | 1 +
arch/x86/Kconfig | 5 ++-
arch/x86/include/asm/cpuidle_haltpoll.h | 1 +
arch/x86/kernel/kvm.c | 13 +++++++
drivers/acpi/processor_idle.c | 43 +++++++++++++++++++----
drivers/cpuidle/Kconfig | 5 ++-
drivers/cpuidle/Makefile | 2 +-
drivers/cpuidle/cpuidle-haltpoll.c | 12 +------
drivers/cpuidle/governors/haltpoll.c | 6 +---
drivers/cpuidle/poll_state.c | 27 +++++---------
drivers/idle/Kconfig | 1 +
include/linux/cpuidle.h | 2 +-
include/linux/cpuidle_haltpoll.h | 5 +++
17 files changed, 105 insertions(+), 50 deletions(-)
create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h
--
2.43.5
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v10 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait()
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-05-13 5:29 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 02/11] cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL Ankur Arora
` (10 subsequent siblings)
11 siblings, 1 reply; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
The inner loop in poll_idle() polls to see if the thread's
TIF_NEED_RESCHED bit is set. The loop exits once the condition is met,
or if the poll time limit has been exceeded.
To minimize the number of instructions executed in each iteration, the
time check is rate-limited. In addition, each loop iteration executes
cpu_relax() which on certain platforms provides a hint to the pipeline
that the loop is busy-waiting, which allows the processor to reduce
power consumption.
However, cpu_relax() is defined optimally only on x86. On arm64, for
instance, it is implemented as a YIELD which only serves as a hint
to the CPU that it prioritize a different hardware thread if one is
available. arm64, does expose a more optimal polling mechanism via
smp_cond_load_relaxed_timewait() which uses LDXR, WFE to wait until a
store to a specified region, or until a timeout.
These semantics are essentially identical to what we want
from poll_idle(). So, restructure the loop to use
smp_cond_load_relaxed_timewait() instead.
The generated code remains close to the original version.
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
drivers/cpuidle/poll_state.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 9b6d90a72601..5117d3d37036 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -8,35 +8,24 @@
#include <linux/sched/clock.h>
#include <linux/sched/idle.h>
-#define POLL_IDLE_RELAX_COUNT 200
-
static int __cpuidle poll_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- u64 time_start;
-
- time_start = local_clock_noinstr();
dev->poll_time_limit = false;
raw_local_irq_enable();
if (!current_set_polling_and_test()) {
- unsigned int loop_count = 0;
- u64 limit;
+ unsigned long flags;
+ u64 time_start = local_clock_noinstr();
+ u64 limit = cpuidle_poll_time(drv, dev);
- limit = cpuidle_poll_time(drv, dev);
+ flags = smp_cond_load_relaxed_timewait(¤t_thread_info()->flags,
+ VAL & _TIF_NEED_RESCHED,
+ local_clock_noinstr(),
+ time_start + limit);
- while (!need_resched()) {
- cpu_relax();
- if (loop_count++ < POLL_IDLE_RELAX_COUNT)
- continue;
-
- loop_count = 0;
- if (local_clock_noinstr() - time_start > limit) {
- dev->poll_time_limit = true;
- break;
- }
- }
+ dev->poll_time_limit = !(flags & _TIF_NEED_RESCHED);
}
raw_local_irq_disable();
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 02/11] cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
2025-02-18 21:33 ` [PATCH v10 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait() Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 03/11] Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig Ankur Arora
` (9 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
ARCH_HAS_CPU_RELAX is defined on architectures that provide an
primitive (via cpu_relax()) that can be used as part of a polling
mechanism -- one that would be cheaper than spinning in a tight
loop.
However, recent changes in poll_idle() mean that a higher level
primitive -- smp_cond_load_relaxed_timewait() is used for polling.
This would in-turn use cpu_relax() or an architecture specific
implementation. On ARM64 in particular this turns into a WFE which
waits on a store to a cacheline instead of a busy poll.
Accordingly condition the polling drivers on ARCH_HAS_OPTIMIZED_POLL
instead of ARCH_HAS_CPU_RELAX. While at it, make both intel-idle
and cpuidle-haltpoll, which depend on poll_idle() being available,
explicitly depend on ARCH_HAS_OPTIMIZED_POLL.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/x86/Kconfig | 2 +-
drivers/acpi/processor_idle.c | 4 ++--
drivers/cpuidle/Kconfig | 2 +-
drivers/cpuidle/Makefile | 2 +-
drivers/idle/Kconfig | 1 +
include/linux/cpuidle.h | 2 +-
6 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9d7bd0ae48c4..d5f483957d45 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -381,7 +381,7 @@ config ARCH_MAY_HAVE_PC_FDC
config GENERIC_CALIBRATE_DELAY
def_bool y
-config ARCH_HAS_CPU_RELAX
+config ARCH_HAS_OPTIMIZED_POLL
def_bool y
config ARCH_HIBERNATION_POSSIBLE
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 698897b29de2..778f0e053988 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -35,7 +35,7 @@
#include <asm/cpu.h>
#endif
-#define ACPI_IDLE_STATE_START (IS_ENABLED(CONFIG_ARCH_HAS_CPU_RELAX) ? 1 : 0)
+#define ACPI_IDLE_STATE_START (IS_ENABLED(CONFIG_ARCH_HAS_OPTIMIZED_POLL) ? 1 : 0)
static unsigned int max_cstate __read_mostly = ACPI_PROCESSOR_MAX_POWER;
module_param(max_cstate, uint, 0400);
@@ -779,7 +779,7 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
if (max_cstate == 0)
max_cstate = 1;
- if (IS_ENABLED(CONFIG_ARCH_HAS_CPU_RELAX)) {
+ if (IS_ENABLED(CONFIG_ARCH_HAS_OPTIMIZED_POLL)) {
cpuidle_poll_state_init(drv);
count = 1;
} else {
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index cac5997dca50..75f6e176bbc8 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -73,7 +73,7 @@ endmenu
config HALTPOLL_CPUIDLE
tristate "Halt poll cpuidle driver"
- depends on X86 && KVM_GUEST
+ depends on X86 && KVM_GUEST && ARCH_HAS_OPTIMIZED_POLL
select CPU_IDLE_GOV_HALTPOLL
default y
help
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index d103342b7cfc..f29dfd1525b0 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -7,7 +7,7 @@ obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
obj-$(CONFIG_DT_IDLE_GENPD) += dt_idle_genpd.o
-obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
+obj-$(CONFIG_ARCH_HAS_OPTIMIZED_POLL) += poll_state.o
obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o
##################################################################################
diff --git a/drivers/idle/Kconfig b/drivers/idle/Kconfig
index 6707d2539fc4..6f9b1d48fede 100644
--- a/drivers/idle/Kconfig
+++ b/drivers/idle/Kconfig
@@ -4,6 +4,7 @@ config INTEL_IDLE
depends on CPU_IDLE
depends on X86
depends on CPU_SUP_INTEL
+ depends on ARCH_HAS_OPTIMIZED_POLL
help
Enable intel_idle, a cpuidle driver that includes knowledge of
native Intel hardware idle features. The acpi_idle driver
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index a9ee4fe55dcf..2ecc0907c467 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -275,7 +275,7 @@ static inline void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
}
#endif
-#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX)
+#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_OPTIMIZED_POLL)
void cpuidle_poll_state_init(struct cpuidle_driver *drv);
#else
static inline void cpuidle_poll_state_init(struct cpuidle_driver *drv) {}
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 03/11] Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
2025-02-18 21:33 ` [PATCH v10 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait() Ankur Arora
2025-02-18 21:33 ` [PATCH v10 02/11] cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 04/11] arm64: define TIF_POLLING_NRFLAG Ankur Arora
` (8 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
From: Joao Martins <joao.m.martins@oracle.com>
ARCH_HAS_OPTIMIZED_POLL gates selection of polling while idle in
poll_idle(). Move the configuration option to arch/Kconfig to allow
non-x86 architectures to select it.
Note that ARCH_HAS_OPTIMIZED_POLL should probably be exclusive with
GENERIC_IDLE_POLL_SETUP (which controls the generic polling logic in
cpu_idle_poll()). However, that would remove boot options
(hlt=, nohlt=). So, leave it untouched for now.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/Kconfig | 3 +++
arch/x86/Kconfig | 4 +---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 6682b2a53e34..fe3ecbf2d578 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -274,6 +274,9 @@ config HAVE_ARCH_TRACEHOOK
config HAVE_DMA_CONTIGUOUS
bool
+config ARCH_HAS_OPTIMIZED_POLL
+ bool
+
config GENERIC_SMP_IDLE_THREAD
bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d5f483957d45..e826b990fe50 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -142,6 +142,7 @@ config X86
select ARCH_WANTS_NO_INSTR
select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_HUGE_PMD_SHARE
+ select ARCH_HAS_OPTIMIZED_POLL
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if X86_64
select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64
@@ -381,9 +382,6 @@ config ARCH_MAY_HAVE_PC_FDC
config GENERIC_CALIBRATE_DELAY
def_bool y
-config ARCH_HAS_OPTIMIZED_POLL
- def_bool y
-
config ARCH_HIBERNATION_POSSIBLE
def_bool y
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 04/11] arm64: define TIF_POLLING_NRFLAG
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (2 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 03/11] Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 05/11] arm64: add support for poll_idle() Ankur Arora
` (7 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
From: Joao Martins <joao.m.martins@oracle.com>
Commit 842514849a61 ("arm64: Remove TIF_POLLING_NRFLAG") had removed
TIF_POLLING_NRFLAG because arm64 only supported non-polled idling via
cpu_do_idle().
To support polling in idle via poll_idle() define TIF_POLLING_NRFLAG
which is set while polling.
We reuse the same bit for the definition.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/arm64/include/asm/thread_info.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 1114c1c3300a..5326cd583b01 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -69,6 +69,7 @@ void arch_setup_new_exec(void);
#define TIF_SYSCALL_TRACEPOINT 10 /* syscall tracepoint for ftrace */
#define TIF_SECCOMP 11 /* syscall secure computing */
#define TIF_SYSCALL_EMU 12 /* syscall emulation active */
+#define TIF_POLLING_NRFLAG 16 /* set while polling in poll_idle() */
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
#define TIF_FREEZE 19
#define TIF_RESTORE_SIGMASK 20
@@ -92,6 +93,7 @@ void arch_setup_new_exec(void);
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
+#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_32BIT (1 << TIF_32BIT)
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 05/11] arm64: add support for poll_idle()
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (3 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 04/11] arm64: define TIF_POLLING_NRFLAG Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 06/11] ACPI: processor_idle: Support polling state for LPI Ankur Arora
` (6 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
Polling in idle helps reduce the cost of remote wakeups: if the target
sets TIF_POLLING_NRFLAG (as it does while polling in idle), the scheduler
can do remote wakeups just by setting the TIF_NEED_RESCHED.
This contrasts with sending an IPI, and incurring the cost of handling
the cost of the interrupt on the receiver.
Enabling poll_idle() needs a cheap mechanism to do the actual polling
(via smp_cond_load_relaxed_timewait()) and TIF_POLLING_NRFLAG support.
arm64 has both of these. So, select ARCH_HAS_OPTIMIZED_POLL.
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 100570a048c5..d96a6c6d8894 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -39,6 +39,7 @@ config ARM64
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
+ select ARCH_HAS_OPTIMIZED_POLL
select ARCH_HAS_PTE_DEVMAP
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_HW_PTE_YOUNG
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 06/11] ACPI: processor_idle: Support polling state for LPI
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (4 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 05/11] arm64: add support for poll_idle() Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 07/11] cpuidle-haltpoll: define arch_haltpoll_want() Ankur Arora
` (5 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
From: Lifeng Zheng <zhenglifeng1@huawei.com>
Initialize an optional polling state besides LPI states.
Wrap up a new enter method to correctly reflect the actual entered state
when the polling state is enabled.
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
---
drivers/acpi/processor_idle.c | 39 ++++++++++++++++++++++++++++++-----
1 file changed, 34 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 778f0e053988..1a9228f55355 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1191,20 +1191,46 @@ static int acpi_idle_lpi_enter(struct cpuidle_device *dev,
return -EINVAL;
}
+/* To correctly reflect the entered state if the poll state is enabled. */
+static int acpi_idle_lpi_enter_with_poll_state(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
+{
+ int entered_state;
+
+ if (unlikely(index < 1))
+ return -EINVAL;
+
+ entered_state = acpi_idle_lpi_enter(dev, drv, index - 1);
+ if (entered_state < 0)
+ return entered_state;
+
+ return entered_state + 1;
+}
+
static int acpi_processor_setup_lpi_states(struct acpi_processor *pr)
{
- int i;
+ int i, count;
struct acpi_lpi_state *lpi;
struct cpuidle_state *state;
struct cpuidle_driver *drv = &acpi_idle_driver;
+ typeof(state->enter) enter_method;
if (!pr->flags.has_lpi)
return -EOPNOTSUPP;
+ if (IS_ENABLED(CONFIG_ARCH_HAS_OPTIMIZED_POLL)) {
+ cpuidle_poll_state_init(drv);
+ count = 1;
+ enter_method = acpi_idle_lpi_enter_with_poll_state;
+ } else {
+ count = 0;
+ enter_method = acpi_idle_lpi_enter;
+ }
+
for (i = 0; i < pr->power.count && i < CPUIDLE_STATE_MAX; i++) {
lpi = &pr->power.lpi_states[i];
- state = &drv->states[i];
+ state = &drv->states[count];
snprintf(state->name, CPUIDLE_NAME_LEN, "LPI-%d", i);
strscpy(state->desc, lpi->desc, CPUIDLE_DESC_LEN);
state->exit_latency = lpi->wake_latency;
@@ -1212,11 +1238,14 @@ static int acpi_processor_setup_lpi_states(struct acpi_processor *pr)
state->flags |= arch_get_idle_state_flags(lpi->arch_flags);
if (i != 0 && lpi->entry_method == ACPI_CSTATE_FFH)
state->flags |= CPUIDLE_FLAG_RCU_IDLE;
- state->enter = acpi_idle_lpi_enter;
- drv->safe_state_index = i;
+ state->enter = enter_method;
+ drv->safe_state_index = count;
+ count++;
+ if (count == CPUIDLE_STATE_MAX)
+ break;
}
- drv->state_count = i;
+ drv->state_count = count;
return 0;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 07/11] cpuidle-haltpoll: define arch_haltpoll_want()
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (5 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 06/11] ACPI: processor_idle: Support polling state for LPI Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 08/11] governors/haltpoll: drop kvm_para_available() check Ankur Arora
` (4 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
From: Joao Martins <joao.m.martins@oracle.com>
While initializing haltpoll we check if KVM supports the
realtime hint and if idle is overridden at boot.
Both of these checks are x86 specific. So, in pursuit of
making cpuidle-haltpoll architecture independent, move these
checks out of common code.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/x86/include/asm/cpuidle_haltpoll.h | 1 +
arch/x86/kernel/kvm.c | 13 +++++++++++++
drivers/cpuidle/cpuidle-haltpoll.c | 12 +-----------
include/linux/cpuidle_haltpoll.h | 5 +++++
4 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/cpuidle_haltpoll.h b/arch/x86/include/asm/cpuidle_haltpoll.h
index c8b39c6716ff..8a0a12769c2e 100644
--- a/arch/x86/include/asm/cpuidle_haltpoll.h
+++ b/arch/x86/include/asm/cpuidle_haltpoll.h
@@ -4,5 +4,6 @@
void arch_haltpoll_enable(unsigned int cpu);
void arch_haltpoll_disable(unsigned int cpu);
+bool arch_haltpoll_want(bool force);
#endif
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 21e9e4845354..6d717819eb4e 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1155,4 +1155,17 @@ void arch_haltpoll_disable(unsigned int cpu)
smp_call_function_single(cpu, kvm_enable_host_haltpoll, NULL, 1);
}
EXPORT_SYMBOL_GPL(arch_haltpoll_disable);
+
+bool arch_haltpoll_want(bool force)
+{
+ /* Do not load haltpoll if idle= is passed */
+ if (boot_option_idle_override != IDLE_NO_OVERRIDE)
+ return false;
+
+ if (!kvm_para_available())
+ return false;
+
+ return kvm_para_has_hint(KVM_HINTS_REALTIME) || force;
+}
+EXPORT_SYMBOL_GPL(arch_haltpoll_want);
#endif
diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c
index bcd03e893a0a..e532aa2bf608 100644
--- a/drivers/cpuidle/cpuidle-haltpoll.c
+++ b/drivers/cpuidle/cpuidle-haltpoll.c
@@ -15,7 +15,6 @@
#include <linux/cpuidle.h>
#include <linux/module.h>
#include <linux/sched/idle.h>
-#include <linux/kvm_para.h>
#include <linux/cpuidle_haltpoll.h>
static bool force __read_mostly;
@@ -93,21 +92,12 @@ static void haltpoll_uninit(void)
haltpoll_cpuidle_devices = NULL;
}
-static bool haltpoll_want(void)
-{
- return kvm_para_has_hint(KVM_HINTS_REALTIME) || force;
-}
-
static int __init haltpoll_init(void)
{
int ret;
struct cpuidle_driver *drv = &haltpoll_driver;
- /* Do not load haltpoll if idle= is passed */
- if (boot_option_idle_override != IDLE_NO_OVERRIDE)
- return -ENODEV;
-
- if (!kvm_para_available() || !haltpoll_want())
+ if (!arch_haltpoll_want(force))
return -ENODEV;
cpuidle_poll_state_init(drv);
diff --git a/include/linux/cpuidle_haltpoll.h b/include/linux/cpuidle_haltpoll.h
index d50c1e0411a2..68eb7a757120 100644
--- a/include/linux/cpuidle_haltpoll.h
+++ b/include/linux/cpuidle_haltpoll.h
@@ -12,5 +12,10 @@ static inline void arch_haltpoll_enable(unsigned int cpu)
static inline void arch_haltpoll_disable(unsigned int cpu)
{
}
+
+static inline bool arch_haltpoll_want(bool force)
+{
+ return false;
+}
#endif
#endif
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 08/11] governors/haltpoll: drop kvm_para_available() check
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (6 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 07/11] cpuidle-haltpoll: define arch_haltpoll_want() Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-24 16:57 ` Christoph Lameter (Ampere)
2025-02-18 21:33 ` [PATCH v10 09/11] cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL Ankur Arora
` (3 subsequent siblings)
11 siblings, 1 reply; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
From: Joao Martins <joao.m.martins@oracle.com>
The haltpoll governor is selected either by the cpuidle-haltpoll
driver, or explicitly by the user.
In particular, it is never selected by default since it has the lowest
rating of all governors (menu=20, teo=19, ladder=10/25, haltpoll=9).
So, we can safely forgo the kvm_para_available() check. This also
allows cpuidle-haltpoll to be tested on baremetal.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
drivers/cpuidle/governors/haltpoll.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/cpuidle/governors/haltpoll.c b/drivers/cpuidle/governors/haltpoll.c
index 663b7f164d20..c8752f793e61 100644
--- a/drivers/cpuidle/governors/haltpoll.c
+++ b/drivers/cpuidle/governors/haltpoll.c
@@ -18,7 +18,6 @@
#include <linux/tick.h>
#include <linux/sched.h>
#include <linux/module.h>
-#include <linux/kvm_para.h>
#include <trace/events/power.h>
static unsigned int guest_halt_poll_ns __read_mostly = 200000;
@@ -148,10 +147,7 @@ static struct cpuidle_governor haltpoll_governor = {
static int __init init_haltpoll(void)
{
- if (kvm_para_available())
- return cpuidle_register_governor(&haltpoll_governor);
-
- return 0;
+ return cpuidle_register_governor(&haltpoll_governor);
}
postcore_initcall(init_haltpoll);
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 09/11] cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (7 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 08/11] governors/haltpoll: drop kvm_para_available() check Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 10/11] arm64: idle: export arch_cpu_idle() Ankur Arora
` (2 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
The cpuidle-haltpoll driver and its namesake governor are selected
under KVM_GUEST on X86. KVM_GUEST in-turn selects ARCH_CPUIDLE_HALTPOLL
and defines the requisite arch_haltpoll_{enable,disable}() functions.
So remove the explicit dependence of HALTPOLL_CPUIDLE on KVM_GUEST,
and instead use ARCH_CPUIDLE_HALTPOLL as proxy for architectural
support for haltpoll.
Also change "halt poll" to "haltpoll" in one of the summary clauses,
since the second form is used everywhere else.
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/x86/Kconfig | 1 +
drivers/cpuidle/Kconfig | 5 ++---
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e826b990fe50..d7f538f28daa 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -847,6 +847,7 @@ config KVM_GUEST
config ARCH_CPUIDLE_HALTPOLL
def_bool n
+ depends on KVM_GUEST
prompt "Disable host haltpoll when loading haltpoll driver"
help
If virtualized under KVM, disable host haltpoll.
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index 75f6e176bbc8..c1bebadf22bc 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -35,7 +35,6 @@ config CPU_IDLE_GOV_TEO
config CPU_IDLE_GOV_HALTPOLL
bool "Haltpoll governor (for virtualized systems)"
- depends on KVM_GUEST
help
This governor implements haltpoll idle state selection, to be
used in conjunction with the haltpoll cpuidle driver, allowing
@@ -72,8 +71,8 @@ source "drivers/cpuidle/Kconfig.riscv"
endmenu
config HALTPOLL_CPUIDLE
- tristate "Halt poll cpuidle driver"
- depends on X86 && KVM_GUEST && ARCH_HAS_OPTIMIZED_POLL
+ tristate "Haltpoll cpuidle driver"
+ depends on ARCH_CPUIDLE_HALTPOLL && ARCH_HAS_OPTIMIZED_POLL
select CPU_IDLE_GOV_HALTPOLL
default y
help
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (8 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 09/11] cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-04-11 3:32 ` Shuai Xue
2025-02-18 21:33 ` [PATCH v10 11/11] arm64: support cpuidle-haltpoll Ankur Arora
2025-05-13 5:23 ` [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
11 siblings, 1 reply; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
Needed for cpuidle-haltpoll.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/arm64/kernel/idle.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
index 05cfb347ec26..b85ba0df9b02 100644
--- a/arch/arm64/kernel/idle.c
+++ b/arch/arm64/kernel/idle.c
@@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
*/
cpu_do_idle();
}
+EXPORT_SYMBOL_GPL(arch_cpu_idle);
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v10 11/11] arm64: support cpuidle-haltpoll
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (9 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 10/11] arm64: idle: export arch_cpu_idle() Ankur Arora
@ 2025-02-18 21:33 ` Ankur Arora
2025-05-13 5:23 ` [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-18 21:33 UTC (permalink / raw)
To: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
Add architectural support for the cpuidle-haltpoll driver by defining
arch_haltpoll_*(). Also define ARCH_CPUIDLE_HALTPOLL to allow
cpuidle-haltpoll to be selected.
Tested-by: Haris Okanovic <harisokn@amazon.com>
Tested-by: Misono Tomohiro <misono.tomohiro@fujitsu.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/arm64/Kconfig | 6 ++++++
arch/arm64/include/asm/cpuidle_haltpoll.h | 20 ++++++++++++++++++++
2 files changed, 26 insertions(+)
create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d96a6c6d8894..eef50fd9a190 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2485,6 +2485,12 @@ config ARCH_HIBERNATION_HEADER
config ARCH_SUSPEND_POSSIBLE
def_bool y
+config ARCH_CPUIDLE_HALTPOLL
+ bool "Enable selection of the cpuidle-haltpoll driver"
+ help
+ cpuidle-haltpoll allows for adaptive polling based on
+ current load before entering the idle state.
+
endmenu # "Power management options"
menu "CPU Power Management"
diff --git a/arch/arm64/include/asm/cpuidle_haltpoll.h b/arch/arm64/include/asm/cpuidle_haltpoll.h
new file mode 100644
index 000000000000..aa01ae9ad5dd
--- /dev/null
+++ b/arch/arm64/include/asm/cpuidle_haltpoll.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ARCH_HALTPOLL_H
+#define _ARCH_HALTPOLL_H
+
+static inline void arch_haltpoll_enable(unsigned int cpu) { }
+static inline void arch_haltpoll_disable(unsigned int cpu) { }
+
+static inline bool arch_haltpoll_want(bool force)
+{
+ /*
+ * Enabling haltpoll requires KVM support for arch_haltpoll_enable(),
+ * arch_haltpoll_disable().
+ *
+ * Given that that's missing right now, only allow force loading for
+ * haltpoll.
+ */
+ return force;
+}
+#endif
--
2.43.5
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v10 08/11] governors/haltpoll: drop kvm_para_available() check
2025-02-18 21:33 ` [PATCH v10 08/11] governors/haltpoll: drop kvm_para_available() check Ankur Arora
@ 2025-02-24 16:57 ` Christoph Lameter (Ampere)
2025-02-25 19:06 ` Ankur Arora
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter (Ampere) @ 2025-02-24 16:57 UTC (permalink / raw)
To: Ankur Arora
Cc: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi,
catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
On Tue, 18 Feb 2025, Ankur Arora wrote:
> So, we can safely forgo the kvm_para_available() check. This also
> allows cpuidle-haltpoll to be tested on baremetal.
I would hope that we will have this functionality as the default on
baremetal after testing in the future.
Reviewed-by; Christoph Lameter (Ampere) <cl@linux.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 08/11] governors/haltpoll: drop kvm_para_available() check
2025-02-24 16:57 ` Christoph Lameter (Ampere)
@ 2025-02-25 19:06 ` Ankur Arora
0 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-02-25 19:06 UTC (permalink / raw)
To: Christoph Lameter (Ampere)
Cc: Ankur Arora, linux-pm, kvm, linux-arm-kernel, linux-kernel,
linux-acpi, catalin.marinas, will, x86, pbonzini, vkuznets,
rafael, daniel.lezcano, peterz, arnd, lenb, mark.rutland,
harisokn, mtosatti, sudeep.holla, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
Christoph Lameter (Ampere) <cl@gentwo.org> writes:
> On Tue, 18 Feb 2025, Ankur Arora wrote:
>
>> So, we can safely forgo the kvm_para_available() check. This also
>> allows cpuidle-haltpoll to be tested on baremetal.
>
> I would hope that we will have this functionality as the default on
> baremetal after testing in the future.
Yeah, supporting haltpoll style adaptive polling on baremetal has some
way to go.
But, with Lifeng's patch-6 "ACPI: processor_idle: Support polling state
for LPI" we do get polling support in acpi-idle.
> Reviewed-by; Christoph Lameter (Ampere) <cl@linux.com>
Thanks Christoph!
--
ankur
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-02-18 21:33 ` [PATCH v10 10/11] arm64: idle: export arch_cpu_idle() Ankur Arora
@ 2025-04-11 3:32 ` Shuai Xue
2025-04-11 17:42 ` Okanovic, Haris
2025-04-11 20:57 ` Ankur Arora
0 siblings, 2 replies; 23+ messages in thread
From: Shuai Xue @ 2025-04-11 3:32 UTC (permalink / raw)
To: Ankur Arora, linux-pm, kvm, linux-arm-kernel, linux-kernel,
linux-acpi
Cc: catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
在 2025/2/19 05:33, Ankur Arora 写道:
> Needed for cpuidle-haltpoll.
>
> Acked-by: Will Deacon <will@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
> arch/arm64/kernel/idle.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
> index 05cfb347ec26..b85ba0df9b02 100644
> --- a/arch/arm64/kernel/idle.c
> +++ b/arch/arm64/kernel/idle.c
> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
> */
> cpu_do_idle();
Hi, Ankur,
With haltpoll_driver registered, arch_cpu_idle() on x86 can select
mwait_idle() in idle threads.
It use MONITOR sets up an effective address range that is monitored
for write-to-memory activities; MWAIT places the processor in
an optimized state (this may vary between different implementations)
until a write to the monitored address range occurs.
Should arch_cpu_idle() on arm64 also use the LDXR/WFE
to avoid wakeup IPI like x86 monitor/mwait?
Thanks.
Shuai
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-04-11 3:32 ` Shuai Xue
@ 2025-04-11 17:42 ` Okanovic, Haris
2025-04-11 20:57 ` Ankur Arora
1 sibling, 0 replies; 23+ messages in thread
From: Okanovic, Haris @ 2025-04-11 17:42 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-acpi@vger.kernel.org, xueshuai@linux.alibaba.com,
ankur.a.arora@oracle.com
Cc: joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
maz@kernel.org, zhenglifeng1@huawei.com, konrad.wilk@oracle.com,
cl@gentwo.org, catalin.marinas@arm.com, maobibo@loongson.cn,
pbonzini@redhat.com, misono.tomohiro@fujitsu.com,
daniel.lezcano@linaro.org, arnd@arndb.de, mtosatti@redhat.com,
will@kernel.org, lenb@kernel.org, peterz@infradead.org,
vkuznets@redhat.com, sudeep.holla@arm.com, Okanovic, Haris,
rafael@kernel.org, x86@kernel.org, mark.rutland@arm.com
On Fri, 2025-04-11 at 11:32 +0800, Shuai Xue wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > 在 2025/2/19 05:33, Ankur Arora 写道:
> > > > Needed for cpuidle-haltpoll.
> > > >
> > > > Acked-by: Will Deacon <will@kernel.org>
> > > > Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> > > > ---
> > > > arch/arm64/kernel/idle.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
> > > > index 05cfb347ec26..b85ba0df9b02 100644
> > > > --- a/arch/arm64/kernel/idle.c
> > > > +++ b/arch/arm64/kernel/idle.c
> > > > @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
> > > > */
> > > > cpu_do_idle();
> >
> > Hi, Ankur,
> >
> > With haltpoll_driver registered, arch_cpu_idle() on x86 can select
> > mwait_idle() in idle threads.
> >
> > It use MONITOR sets up an effective address range that is monitored
> > for write-to-memory activities; MWAIT places the processor in
> > an optimized state (this may vary between different implementations)
> > until a write to the monitored address range occurs.
> >
> > Should arch_cpu_idle() on arm64 also use the LDXR/WFE
> > to avoid wakeup IPI like x86 monitor/mwait?
WFE will wake from the event stream, which can have short sub-ms
periods on many systems. May be something to consider when WFET is more
widely available.
> >
> > Thanks.
> > Shuai
> >
> >
Regards,
Haris Okanovic
AWS Graviton Software
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-04-11 3:32 ` Shuai Xue
2025-04-11 17:42 ` Okanovic, Haris
@ 2025-04-11 20:57 ` Ankur Arora
2025-04-14 2:01 ` Shuai Xue
1 sibling, 1 reply; 23+ messages in thread
From: Ankur Arora @ 2025-04-11 20:57 UTC (permalink / raw)
To: Shuai Xue
Cc: Ankur Arora, linux-pm, kvm, linux-arm-kernel, linux-kernel,
linux-acpi, catalin.marinas, will, x86, pbonzini, vkuznets,
rafael, daniel.lezcano, peterz, arnd, lenb, mark.rutland,
harisokn, mtosatti, sudeep.holla, cl, maz, misono.tomohiro,
maobibo, zhenglifeng1, joao.m.martins, boris.ostrovsky,
konrad.wilk
Shuai Xue <xueshuai@linux.alibaba.com> writes:
> 在 2025/2/19 05:33, Ankur Arora 写道:
>> Needed for cpuidle-haltpoll.
>> Acked-by: Will Deacon <will@kernel.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>> arch/arm64/kernel/idle.c | 1 +
>> 1 file changed, 1 insertion(+)
>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
>> index 05cfb347ec26..b85ba0df9b02 100644
>> --- a/arch/arm64/kernel/idle.c
>> +++ b/arch/arm64/kernel/idle.c
>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
>> */
>> cpu_do_idle();
>
> Hi, Ankur,
>
> With haltpoll_driver registered, arch_cpu_idle() on x86 can select
> mwait_idle() in idle threads.
>
> It use MONITOR sets up an effective address range that is monitored
> for write-to-memory activities; MWAIT places the processor in
> an optimized state (this may vary between different implementations)
> until a write to the monitored address range occurs.
MWAIT is more capable than WFE -- it allows selection of deeper idle
state. IIRC C2/C3.
> Should arch_cpu_idle() on arm64 also use the LDXR/WFE
> to avoid wakeup IPI like x86 monitor/mwait?
Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support
that this series adds.
As Haris notes, the negative with only using WFE is that it only allows
a single idle state, one that is fairly shallow because the event-stream
causes a wakeup every 100us.
--
ankur
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-04-11 20:57 ` Ankur Arora
@ 2025-04-14 2:01 ` Shuai Xue
2025-04-14 3:46 ` Ankur Arora
0 siblings, 1 reply; 23+ messages in thread
From: Shuai Xue @ 2025-04-14 2:01 UTC (permalink / raw)
To: Ankur Arora, harisokn
Cc: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi,
catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
在 2025/4/12 04:57, Ankur Arora 写道:
>
> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>
>> 在 2025/2/19 05:33, Ankur Arora 写道:
>>> Needed for cpuidle-haltpoll.
>>> Acked-by: Will Deacon <will@kernel.org>
>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>> ---
>>> arch/arm64/kernel/idle.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
>>> index 05cfb347ec26..b85ba0df9b02 100644
>>> --- a/arch/arm64/kernel/idle.c
>>> +++ b/arch/arm64/kernel/idle.c
>>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
>>> */
>>> cpu_do_idle();
>>
>> Hi, Ankur,
>>
>> With haltpoll_driver registered, arch_cpu_idle() on x86 can select
>> mwait_idle() in idle threads.
>>
>> It use MONITOR sets up an effective address range that is monitored
>> for write-to-memory activities; MWAIT places the processor in
>> an optimized state (this may vary between different implementations)
>> until a write to the monitored address range occurs.
>
> MWAIT is more capable than WFE -- it allows selection of deeper idle
> state. IIRC C2/C3.
>
>> Should arch_cpu_idle() on arm64 also use the LDXR/WFE
>> to avoid wakeup IPI like x86 monitor/mwait?
>
> Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support
> that this series adds.
>
> As Haris notes, the negative with only using WFE is that it only allows
> a single idle state, one that is fairly shallow because the event-stream
> causes a wakeup every 100us.
>
> --
> ankur
Hi, Ankur and Haris
Got it, thanks for explaination :)
Comparing sched-pipe performance on Rund with Yitian 710, *IPC improved 35%*:
w/o haltpoll
Performance counter stats for 'CPU(s) 0,1' (5 runs):
32521.53 msec task-clock # 2.000 CPUs utilized ( +- 1.16% )
38081402726 cycles # 1.171 GHz ( +- 1.70% )
27324614561 instructions # 0.72 insn per cycle ( +- 0.12% )
181 sched:sched_wake_idle_without_ipi # 0.006 K/sec
w/ haltpoll
Performance counter stats for 'CPU(s) 0,1' (5 runs):
9477.15 msec task-clock # 2.000 CPUs utilized ( +- 0.89% )
21486828269 cycles # 2.267 GHz ( +- 0.35% )
23867109747 instructions # 1.11 insn per cycle ( +- 0.11% )
1925207 sched:sched_wake_idle_without_ipi # 0.203 M/sec
Comparing sched-pipe performance on QEMU with Kunpeng 920, *IPC improved 10%*:
w/o haltpoll
Performance counter stats for 'CPU(s) 0,1' (5 runs):
34,007.89 msec task-clock # 2.000 CPUs utilized ( +- 8.86% )
4,407,859,620 cycles # 0.130 GHz ( +- 84.92% )
2,482,046,461 instructions # 0.56 insn per cycle ( +- 88.27% )
16 sched:sched_wake_idle_without_ipi # 0.470 /sec ( +- 98.77% )
17.00 +- 1.51 seconds time elapsed ( +- 8.86% )
w/ haltpoll
Performance counter stats for 'CPU(s) 0,1' (5 runs):
16,894.37 msec task-clock # 2.000 CPUs utilized ( +- 3.80% )
8,703,158,826 cycles # 0.515 GHz ( +- 31.31% )
5,379,257,839 instructions # 0.62 insn per cycle ( +- 30.03% )
549,434 sched:sched_wake_idle_without_ipi # 32.522 K/sec ( +- 30.05% )
8.447 +- 0.321 seconds time elapsed ( +- 3.80% )
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Thanks.
Shuai
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-04-14 2:01 ` Shuai Xue
@ 2025-04-14 3:46 ` Ankur Arora
2025-04-14 7:43 ` Shuai Xue
0 siblings, 1 reply; 23+ messages in thread
From: Ankur Arora @ 2025-04-14 3:46 UTC (permalink / raw)
To: Shuai Xue
Cc: Ankur Arora, linux-pm, kvm, linux-arm-kernel, linux-kernel,
linux-acpi, catalin.marinas, will, x86, pbonzini, vkuznets,
rafael, daniel.lezcano, peterz, arnd, lenb, mark.rutland,
harisokn, mtosatti, sudeep.holla, cl, maz, misono.tomohiro,
maobibo, zhenglifeng1, joao.m.martins, boris.ostrovsky,
konrad.wilk
Shuai Xue <xueshuai@linux.alibaba.com> writes:
> 在 2025/4/12 04:57, Ankur Arora 写道:
>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>
>>> 在 2025/2/19 05:33, Ankur Arora 写道:
>>>> Needed for cpuidle-haltpoll.
>>>> Acked-by: Will Deacon <will@kernel.org>
>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>> ---
>>>> arch/arm64/kernel/idle.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
>>>> index 05cfb347ec26..b85ba0df9b02 100644
>>>> --- a/arch/arm64/kernel/idle.c
>>>> +++ b/arch/arm64/kernel/idle.c
>>>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
>>>> */
>>>> cpu_do_idle();
>>>
>>> Hi, Ankur,
>>>
>>> With haltpoll_driver registered, arch_cpu_idle() on x86 can select
>>> mwait_idle() in idle threads.
>>>
>>> It use MONITOR sets up an effective address range that is monitored
>>> for write-to-memory activities; MWAIT places the processor in
>>> an optimized state (this may vary between different implementations)
>>> until a write to the monitored address range occurs.
>> MWAIT is more capable than WFE -- it allows selection of deeper idle
>> state. IIRC C2/C3.
>>
>>> Should arch_cpu_idle() on arm64 also use the LDXR/WFE
>>> to avoid wakeup IPI like x86 monitor/mwait?
>> Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support
>> that this series adds.
>> As Haris notes, the negative with only using WFE is that it only allows
>> a single idle state, one that is fairly shallow because the event-stream
>> causes a wakeup every 100us.
>> --
>> ankur
>
> Hi, Ankur and Haris
>
> Got it, thanks for explaination :)
>
> Comparing sched-pipe performance on Rund with Yitian 710, *IPC improved 35%*:
Thanks for testing Shuai. I wasn't expecting the IPC to improve by quite
that much :). The reduced instructions make sense since we don't have to
handle the IRQ anymore but we would spend some of the saved cycles
waiting in WFE instead.
I'm not familiar with the Yitian 710. Can you check if you are running
with WFE? That's the __smp_cond_load_relaxed_timewait() path vs the
__smp_cond_load_relaxed_spinwait() path in [0]. Same question for the
Kunpeng 920.
Also, I'm working on a new version of the series in [1]. Would you be
okay trying that out?
Thanks
Ankur
[0] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
[1] https://lore.kernel.org/lkml/20250203214911.898276-4-ankur.a.arora@oracle.com/
> w/o haltpoll
> Performance counter stats for 'CPU(s) 0,1' (5 runs):
>
> 32521.53 msec task-clock # 2.000 CPUs utilized ( +- 1.16% )
> 38081402726 cycles # 1.171 GHz ( +- 1.70% )
> 27324614561 instructions # 0.72 insn per cycle ( +- 0.12% )
> 181 sched:sched_wake_idle_without_ipi # 0.006 K/sec
>
> w/ haltpoll
> Performance counter stats for 'CPU(s) 0,1' (5 runs):
>
> 9477.15 msec task-clock # 2.000 CPUs utilized ( +- 0.89% )
> 21486828269 cycles # 2.267 GHz ( +- 0.35% )
> 23867109747 instructions # 1.11 insn per cycle ( +- 0.11% )
> 1925207 sched:sched_wake_idle_without_ipi # 0.203 M/sec
>
> Comparing sched-pipe performance on QEMU with Kunpeng 920, *IPC improved 10%*:
>
> w/o haltpoll
> Performance counter stats for 'CPU(s) 0,1' (5 runs):
>
> 34,007.89 msec task-clock # 2.000 CPUs utilized ( +- 8.86% )
> 4,407,859,620 cycles # 0.130 GHz ( +- 84.92% )
> 2,482,046,461 instructions # 0.56 insn per cycle ( +- 88.27% )
> 16 sched:sched_wake_idle_without_ipi # 0.470 /sec ( +- 98.77% )
>
> 17.00 +- 1.51 seconds time elapsed ( +- 8.86% )
>
> w/ haltpoll
> Performance counter stats for 'CPU(s) 0,1' (5 runs):
>
> 16,894.37 msec task-clock # 2.000 CPUs utilized ( +- 3.80% )
> 8,703,158,826 cycles # 0.515 GHz ( +- 31.31% )
> 5,379,257,839 instructions # 0.62 insn per cycle ( +- 30.03% )
> 549,434 sched:sched_wake_idle_without_ipi # 32.522 K/sec ( +- 30.05% )
>
> 8.447 +- 0.321 seconds time elapsed ( +- 3.80% )
>
> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
>
> Thanks.
> Shuai
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-04-14 3:46 ` Ankur Arora
@ 2025-04-14 7:43 ` Shuai Xue
2025-04-15 6:24 ` Ankur Arora
0 siblings, 1 reply; 23+ messages in thread
From: Shuai Xue @ 2025-04-14 7:43 UTC (permalink / raw)
To: Ankur Arora
Cc: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi,
catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
在 2025/4/14 11:46, Ankur Arora 写道:
>
> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>
>> 在 2025/4/12 04:57, Ankur Arora 写道:
>>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>>
>>>> 在 2025/2/19 05:33, Ankur Arora 写道:
>>>>> Needed for cpuidle-haltpoll.
>>>>> Acked-by: Will Deacon <will@kernel.org>
>>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>>> ---
>>>>> arch/arm64/kernel/idle.c | 1 +
>>>>> 1 file changed, 1 insertion(+)
>>>>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
>>>>> index 05cfb347ec26..b85ba0df9b02 100644
>>>>> --- a/arch/arm64/kernel/idle.c
>>>>> +++ b/arch/arm64/kernel/idle.c
>>>>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
>>>>> */
>>>>> cpu_do_idle();
>>>>
>>>> Hi, Ankur,
>>>>
>>>> With haltpoll_driver registered, arch_cpu_idle() on x86 can select
>>>> mwait_idle() in idle threads.
>>>>
>>>> It use MONITOR sets up an effective address range that is monitored
>>>> for write-to-memory activities; MWAIT places the processor in
>>>> an optimized state (this may vary between different implementations)
>>>> until a write to the monitored address range occurs.
>>> MWAIT is more capable than WFE -- it allows selection of deeper idle
>>> state. IIRC C2/C3.
>>>
>>>> Should arch_cpu_idle() on arm64 also use the LDXR/WFE
>>>> to avoid wakeup IPI like x86 monitor/mwait?
>>> Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support
>>> that this series adds.
>>> As Haris notes, the negative with only using WFE is that it only allows
>>> a single idle state, one that is fairly shallow because the event-stream
>>> causes a wakeup every 100us.
>>> --
>>> ankur
>>
>> Hi, Ankur and Haris
>>
>> Got it, thanks for explaination :)
>>
>> Comparing sched-pipe performance on Rund with Yitian 710, *IPC improved 35%*:
>
> Thanks for testing Shuai. I wasn't expecting the IPC to improve by quite
> that much :). The reduced instructions make sense since we don't have to
> handle the IRQ anymore but we would spend some of the saved cycles
> waiting in WFE instead.
>
> I'm not familiar with the Yitian 710. Can you check if you are running
> with WFE? That's the __smp_cond_load_relaxed_timewait() path vs the
> __smp_cond_load_relaxed_spinwait() path in [0]. Same question for the
> Kunpeng 920.
Yes, it running with __smp_cond_load_relaxed_timewait().
I use perf-probe to check if WFE is available in Guest:
perf probe 'arch_timer_evtstrm_available%return r=$retval'
perf record -e probe:arch_timer_evtstrm_available__return -aR sleep 1
perf script
swapper 0 [000] 1360.063049: probe:arch_timer_evtstrm_available__return: (ffff800080a5c640 <- ffff800080d42764) r=0x1
arch_timer_evtstrm_available returns true, so
__smp_cond_load_relaxed_timewait() is used.
>
> Also, I'm working on a new version of the series in [1]. Would you be
> okay trying that out?
Sure. Please cc me when you send out a new version.
>
> Thanks
> Ankur
>
> [0] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
> [1] https://lore.kernel.org/lkml/20250203214911.898276-4-ankur.a.arora@oracle.com/
>
Thanks.
Shuai
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 10/11] arm64: idle: export arch_cpu_idle()
2025-04-14 7:43 ` Shuai Xue
@ 2025-04-15 6:24 ` Ankur Arora
0 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-04-15 6:24 UTC (permalink / raw)
To: Shuai Xue
Cc: Ankur Arora, linux-pm, kvm, linux-arm-kernel, linux-kernel,
linux-acpi, catalin.marinas, will, x86, pbonzini, vkuznets,
rafael, daniel.lezcano, peterz, arnd, lenb, mark.rutland,
harisokn, mtosatti, sudeep.holla, cl, maz, misono.tomohiro,
maobibo, zhenglifeng1, joao.m.martins, boris.ostrovsky,
konrad.wilk
Shuai Xue <xueshuai@linux.alibaba.com> writes:
> 在 2025/4/14 11:46, Ankur Arora 写道:
>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>
>>> 在 2025/4/12 04:57, Ankur Arora 写道:
>>>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>>>
>>>>> 在 2025/2/19 05:33, Ankur Arora 写道:
>>>>>> Needed for cpuidle-haltpoll.
>>>>>> Acked-by: Will Deacon <will@kernel.org>
>>>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>>>> ---
>>>>>> arch/arm64/kernel/idle.c | 1 +
>>>>>> 1 file changed, 1 insertion(+)
>>>>>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
>>>>>> index 05cfb347ec26..b85ba0df9b02 100644
>>>>>> --- a/arch/arm64/kernel/idle.c
>>>>>> +++ b/arch/arm64/kernel/idle.c
>>>>>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void)
>>>>>> */
>>>>>> cpu_do_idle();
>>>>>
>>>>> Hi, Ankur,
>>>>>
>>>>> With haltpoll_driver registered, arch_cpu_idle() on x86 can select
>>>>> mwait_idle() in idle threads.
>>>>>
>>>>> It use MONITOR sets up an effective address range that is monitored
>>>>> for write-to-memory activities; MWAIT places the processor in
>>>>> an optimized state (this may vary between different implementations)
>>>>> until a write to the monitored address range occurs.
>>>> MWAIT is more capable than WFE -- it allows selection of deeper idle
>>>> state. IIRC C2/C3.
>>>>
>>>>> Should arch_cpu_idle() on arm64 also use the LDXR/WFE
>>>>> to avoid wakeup IPI like x86 monitor/mwait?
>>>> Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support
>>>> that this series adds.
>>>> As Haris notes, the negative with only using WFE is that it only allows
>>>> a single idle state, one that is fairly shallow because the event-stream
>>>> causes a wakeup every 100us.
>>>> --
>>>> ankur
>>>
>>> Hi, Ankur and Haris
>>>
>>> Got it, thanks for explaination :)
>>>
>>> Comparing sched-pipe performance on Rund with Yitian 710, *IPC improved 35%*:
>> Thanks for testing Shuai. I wasn't expecting the IPC to improve by quite
>> that much :). The reduced instructions make sense since we don't have to
>> handle the IRQ anymore but we would spend some of the saved cycles
>> waiting in WFE instead.
>> I'm not familiar with the Yitian 710. Can you check if you are running
>> with WFE? That's the __smp_cond_load_relaxed_timewait() path vs the
>> __smp_cond_load_relaxed_spinwait() path in [0]. Same question for the
>> Kunpeng 920.
>
> Yes, it running with __smp_cond_load_relaxed_timewait().
>
> I use perf-probe to check if WFE is available in Guest:
>
> perf probe 'arch_timer_evtstrm_available%return r=$retval'
> perf record -e probe:arch_timer_evtstrm_available__return -aR sleep 1
> perf script
> swapper 0 [000] 1360.063049: probe:arch_timer_evtstrm_available__return: (ffff800080a5c640 <- ffff800080d42764) r=0x1
>
> arch_timer_evtstrm_available returns true, so
> __smp_cond_load_relaxed_timewait() is used.
Great. Thanks for checking.
>> Also, I'm working on a new version of the series in [1]. Would you be
>> okay trying that out?
>
> Sure. Please cc me when you send out a new version.
Will do. Thanks!
--
ankur
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 00/11] arm64: support poll_idle()
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
` (10 preceding siblings ...)
2025-02-18 21:33 ` [PATCH v10 11/11] arm64: support cpuidle-haltpoll Ankur Arora
@ 2025-05-13 5:23 ` Ankur Arora
11 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-05-13 5:23 UTC (permalink / raw)
To: Ankur Arora
Cc: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi,
catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
Ankur Arora <ankur.a.arora@oracle.com> writes:
> Hi,
>
> This patchset adds support for polling in idle on arm64 via poll_idle()
> and adds the requisite support to acpi-idle and cpuidle-haltpoll.
>
> v10 is a respin of v9 with the timed wait barrier logic
> (smp_cond_load_relaxed_timewait()) moved out into a separate
> series [0]. (The barrier patches could also do with some eyes.)
Sent a v2 for the barrier series:
https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
Ankur
> Why poll in idle?
> ==
>
> The benefit of polling in idle is to reduce the cost (and latency)
> of remote wakeups. When enabled, these can be done just by setting the
> need-resched bit, eliding the IPI, and the cost of handling the
> interrupt on the receiver.
>
> Comparing sched-pipe performance on a guest VM:
>
> # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions \
> -e sched:sched_wake_idle_without_ipi perf bench sched pipe -l 1000000 --cpu 4
>
> # without polling in idle
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% )
> 45,821,250,284 cycles # 1.816 GHz ( +- 10.07% )
> 26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% )
> 0 sched:sched_wake_idle_without_ipi # 0.000 /sec
>
> 12.615 +- 0.977 seconds time elapsed ( +- 7.75% )
>
>
> # polling in idle (with haltpoll):
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% )
> 34,158,188,839 cycles # 2.257 GHz ( +- 6.91% )
> 20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% )
> 1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% )
>
> 7.566 +- 0.756 seconds time elapsed ( +- 10.00% )
>
> Comparing the two cases, there's a significant drop in both cycles and
> instructions executed. And a signficant drop in the wakeup latency.
>
> Tomohiro Misono and Haris Okanovic also report similar latency
> improvements on Grace and Graviton systems (for v7) [1] [2].
> Haris also tested a modified v9 on top of the split out barrier
> primitives.
>
> Lifeng also reports improved context switch latency on a bare-metal
> machine with acpi-idle [3].
>
>
> Series layout
> ==
>
> - patches 1-3,
>
> "cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait()"
> "cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL"
> "Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig"
>
> switch poll_idle() to using the new barrier interface. Also, do some
> munging of related kconfig options.
>
> - patches 4-5,
>
> "arm64: define TIF_POLLING_NRFLAG"
> "arm64: add support for poll_idle()"
>
> add arm64 support for the polling flag and enable poll_idle()
> support.
>
> - patches 6, 7-11,
>
> "ACPI: processor_idle: Support polling state for LPI"
>
> "cpuidle-haltpoll: define arch_haltpoll_want()"
> "governors/haltpoll: drop kvm_para_available() check"
> "cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL"
>
> "arm64: idle: export arch_cpu_idle()"
> "arm64: support cpuidle-haltpoll"
>
> add support for polling via acpi-idle, and cpuidle-haltpoll.
>
>
> Changelog
> ==
>
> v10: respin of v9
> - sent out smp_cond_load_relaxed_timeout() separately [0]
> - Dropped from this series:
> "asm-generic: add barrier smp_cond_load_relaxed_timeout()"
> "arm64: barrier: add support for smp_cond_relaxed_timeout()"
> "arm64/delay: move some constants out to a separate header"
> "arm64: support WFET in smp_cond_relaxed_timeout()"
>
> - reworded some commit messages
>
> v9:
> - reworked the series to address a comment from Catalin Marinas
> about how v8 was abusing semantics of smp_cond_load_relaxed().
> - add poll_idle() support in acpi-idle (Lifeng Zheng)
> - dropped some earlier "Tested-by", "Reviewed-by" due to the
> above rework.
>
> v8: No logic changes. Largely respin of v7, with changes
> noted below:
>
> - move selection of ARCH_HAS_OPTIMIZED_POLL on arm64 to its
> own patch.
> (patch-9 "arm64: select ARCH_HAS_OPTIMIZED_POLL")
>
> - address comments simplifying arm64 support (Will Deacon)
> (patch-11 "arm64: support cpuidle-haltpoll")
>
> v7: No significant logic changes. Mostly a respin of v6.
>
> - minor cleanup in poll_idle() (Christoph Lameter)
> - fixes conflicts due to code movement in arch/arm64/kernel/cpuidle.c
> (Tomohiro Misono)
>
> v6:
>
> - reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL
> changes together (comment from Christoph Lameter)
> - threshes out the commit messages a bit more (comments from Christoph
> Lameter, Sudeep Holla)
> - also rework selection of cpuidle-haltpoll. Now selected based
> on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
> - moved back to arch_haltpoll_want() (comment from Joao Martins)
> Also, arch_haltpoll_want() now takes the force parameter and is
> now responsible for the complete selection (or not) of haltpoll.
> - fixes the build breakage on i386
> - fixes the cpuidle-haltpoll module breakage on arm64 (comment from
> Tomohiro Misono, Haris Okanovic)
>
> v5:
> - rework the poll_idle() loop around smp_cond_load_relaxed() (review
> comment from Tomohiro Misono.)
> - also rework selection of cpuidle-haltpoll. Now selected based
> on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
> - arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on
> arm64 now depends on the event-stream being enabled.
> - limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic)
> - ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL.
>
> v4 changes from v3:
> - change 7/8 per Rafael input: drop the parens and use ret for the final check
> - add 8/8 which renames the guard for building poll_state
>
> v3 changes from v2:
> - fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig
> - add Ack-by from Rafael Wysocki on 2/7
>
> v2 changes from v1:
> - added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ
> (this improves by 50% at least the CPU cycles consumed in the tests above:
> 10,716,881,137 now vs 14,503,014,257 before)
> - removed the ifdef from patch 1 per RafaelW
>
>
> Would appreciate any review comments.
>
> Ankur
>
>
> [0] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
> [1] https://lore.kernel.org/lkml/TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@TY3PR01MB11148.jpnprd01.prod.outlook.com/
> [2] https://lore.kernel.org/lkml/104d0ec31cb45477e27273e089402d4205ee4042.camel@amazon.com/
> [3] https://lore.kernel.org/lkml/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@huawei.com/
>
> Ankur Arora (6):
> cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait()
> cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL
> arm64: add support for poll_idle()
> cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL
> arm64: idle: export arch_cpu_idle()
> arm64: support cpuidle-haltpoll
>
> Joao Martins (4):
> Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig
> arm64: define TIF_POLLING_NRFLAG
> cpuidle-haltpoll: define arch_haltpoll_want()
> governors/haltpoll: drop kvm_para_available() check
>
> Lifeng Zheng (1):
> ACPI: processor_idle: Support polling state for LPI
>
> arch/Kconfig | 3 ++
> arch/arm64/Kconfig | 7 ++++
> arch/arm64/include/asm/cpuidle_haltpoll.h | 20 +++++++++++
> arch/arm64/include/asm/thread_info.h | 2 ++
> arch/arm64/kernel/idle.c | 1 +
> arch/x86/Kconfig | 5 ++-
> arch/x86/include/asm/cpuidle_haltpoll.h | 1 +
> arch/x86/kernel/kvm.c | 13 +++++++
> drivers/acpi/processor_idle.c | 43 +++++++++++++++++++----
> drivers/cpuidle/Kconfig | 5 ++-
> drivers/cpuidle/Makefile | 2 +-
> drivers/cpuidle/cpuidle-haltpoll.c | 12 +------
> drivers/cpuidle/governors/haltpoll.c | 6 +---
> drivers/cpuidle/poll_state.c | 27 +++++---------
> drivers/idle/Kconfig | 1 +
> include/linux/cpuidle.h | 2 +-
> include/linux/cpuidle_haltpoll.h | 5 +++
> 17 files changed, 105 insertions(+), 50 deletions(-)
> create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h
--
ankur
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v10 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait()
2025-02-18 21:33 ` [PATCH v10 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait() Ankur Arora
@ 2025-05-13 5:29 ` Ankur Arora
0 siblings, 0 replies; 23+ messages in thread
From: Ankur Arora @ 2025-05-13 5:29 UTC (permalink / raw)
To: Ankur Arora
Cc: linux-pm, kvm, linux-arm-kernel, linux-kernel, linux-acpi,
catalin.marinas, will, x86, pbonzini, vkuznets, rafael,
daniel.lezcano, peterz, arnd, lenb, mark.rutland, harisokn,
mtosatti, sudeep.holla, cl, maz, misono.tomohiro, maobibo,
zhenglifeng1, joao.m.martins, boris.ostrovsky, konrad.wilk
Ankur Arora <ankur.a.arora@oracle.com> writes:
> The inner loop in poll_idle() polls to see if the thread's
> TIF_NEED_RESCHED bit is set. The loop exits once the condition is met,
> or if the poll time limit has been exceeded.
>
> To minimize the number of instructions executed in each iteration, the
> time check is rate-limited. In addition, each loop iteration executes
> cpu_relax() which on certain platforms provides a hint to the pipeline
> that the loop is busy-waiting, which allows the processor to reduce
> power consumption.
>
> However, cpu_relax() is defined optimally only on x86. On arm64, for
> instance, it is implemented as a YIELD which only serves as a hint
> to the CPU that it prioritize a different hardware thread if one is
> available. arm64, does expose a more optimal polling mechanism via
> smp_cond_load_relaxed_timewait() which uses LDXR, WFE to wait until a
> store to a specified region, or until a timeout.
>
> These semantics are essentially identical to what we want
> from poll_idle(). So, restructure the loop to use
> smp_cond_load_relaxed_timewait() instead.
>
> The generated code remains close to the original version.
>
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
> drivers/cpuidle/poll_state.c | 27 ++++++++-------------------
> 1 file changed, 8 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 9b6d90a72601..5117d3d37036 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -8,35 +8,24 @@
> #include <linux/sched/clock.h>
> #include <linux/sched/idle.h>
>
> -#define POLL_IDLE_RELAX_COUNT 200
> -
> static int __cpuidle poll_idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> - u64 time_start;
> -
> - time_start = local_clock_noinstr();
>
> dev->poll_time_limit = false;
>
> raw_local_irq_enable();
> if (!current_set_polling_and_test()) {
> - unsigned int loop_count = 0;
> - u64 limit;
> + unsigned long flags;
> + u64 time_start = local_clock_noinstr();
> + u64 limit = cpuidle_poll_time(drv, dev);
>
> - limit = cpuidle_poll_time(drv, dev);
> + flags = smp_cond_load_relaxed_timewait(¤t_thread_info()->flags,
> + VAL & _TIF_NEED_RESCHED,
> + local_clock_noinstr(),
> + time_start + limit);
>
> - while (!need_resched()) {
> - cpu_relax();
> - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> - continue;
> -
> - loop_count = 0;
> - if (local_clock_noinstr() - time_start > limit) {
> - dev->poll_time_limit = true;
> - break;
> - }
> - }
> + dev->poll_time_limit = !(flags & _TIF_NEED_RESCHED);
> }
> raw_local_irq_disable();
The barrier-v2 [1] interface is slightly different from the one proposed
in v1 (which this series is based on.)
[1] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
For testing please use the following patch. It adds a new parameter
(__smp_cond_timewait_coarse) explicitly specifying the waiting policy.
--
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 9b6d90a72601..2970368663c7 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -8,35 +8,25 @@
#include <linux/sched/clock.h>
#include <linux/sched/idle.h>
-#define POLL_IDLE_RELAX_COUNT 200
-
static int __cpuidle poll_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- u64 time_start;
-
- time_start = local_clock_noinstr();
dev->poll_time_limit = false;
raw_local_irq_enable();
if (!current_set_polling_and_test()) {
- unsigned int loop_count = 0;
- u64 limit;
+ unsigned long flags;
+ u64 time_start = local_clock_noinstr();
+ u64 limit = cpuidle_poll_time(drv, dev);
- limit = cpuidle_poll_time(drv, dev);
+ flags = smp_cond_load_relaxed_timewait(¤t_thread_info()->flags,
+ VAL & _TIF_NEED_RESCHED,
+ __smp_cond_timewait_coarse,
+ local_clock_noinstr(),
+ time_start + limit);
- while (!need_resched()) {
- cpu_relax();
- if (loop_count++ < POLL_IDLE_RELAX_COUNT)
- continue;
-
- loop_count = 0;
- if (local_clock_noinstr() - time_start > limit) {
- dev->poll_time_limit = true;
- break;
- }
- }
+ dev->poll_time_limit = !(flags & _TIF_NEED_RESCHED);
}
raw_local_irq_disable();
--
ankur
^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-05-13 5:29 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 21:33 [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
2025-02-18 21:33 ` [PATCH v10 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait() Ankur Arora
2025-05-13 5:29 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 02/11] cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL Ankur Arora
2025-02-18 21:33 ` [PATCH v10 03/11] Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig Ankur Arora
2025-02-18 21:33 ` [PATCH v10 04/11] arm64: define TIF_POLLING_NRFLAG Ankur Arora
2025-02-18 21:33 ` [PATCH v10 05/11] arm64: add support for poll_idle() Ankur Arora
2025-02-18 21:33 ` [PATCH v10 06/11] ACPI: processor_idle: Support polling state for LPI Ankur Arora
2025-02-18 21:33 ` [PATCH v10 07/11] cpuidle-haltpoll: define arch_haltpoll_want() Ankur Arora
2025-02-18 21:33 ` [PATCH v10 08/11] governors/haltpoll: drop kvm_para_available() check Ankur Arora
2025-02-24 16:57 ` Christoph Lameter (Ampere)
2025-02-25 19:06 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 09/11] cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL Ankur Arora
2025-02-18 21:33 ` [PATCH v10 10/11] arm64: idle: export arch_cpu_idle() Ankur Arora
2025-04-11 3:32 ` Shuai Xue
2025-04-11 17:42 ` Okanovic, Haris
2025-04-11 20:57 ` Ankur Arora
2025-04-14 2:01 ` Shuai Xue
2025-04-14 3:46 ` Ankur Arora
2025-04-14 7:43 ` Shuai Xue
2025-04-15 6:24 ` Ankur Arora
2025-02-18 21:33 ` [PATCH v10 11/11] arm64: support cpuidle-haltpoll Ankur Arora
2025-05-13 5:23 ` [PATCH v10 00/11] arm64: support poll_idle() Ankur Arora
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).