* [PATCH 01/10] x86: Add a comment about the "magic" behind shadow sti before mwait
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:44 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 02/10] cpuidle: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker
` (9 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
Add a note to make sure we never miss and break the requirements behind
it.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/x86/include/asm/mwait.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 778df05f8539..341ee4f1d91e 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -87,6 +87,15 @@ static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx,
:: "a" (eax), "b" (ebx), "c" (ecx));
}
+/*
+ * Re-enable interrupts right upon calling mwait in such a way that
+ * no interrupt can fire _before_ the execution of mwait, ie: no
+ * instruction must be placed between "sti" and "mwait".
+ *
+ * This is necessary because if an interrupt queues a timer before
+ * executing mwait, it would otherwise go unnoticed and the next tick
+ * would not be reprogrammed accordingly before mwait ever wakes up.
+ */
static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
{
mds_idle_clear_cpu_buffers();
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 01/10] x86: Add a comment about the "magic" behind shadow sti before mwait
2023-08-11 17:00 ` [PATCH 01/10] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker
@ 2023-08-11 17:44 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:44 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:00 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Add a note to make sure we never miss and break the requirements behind
> it.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> arch/x86/include/asm/mwait.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 778df05f8539..341ee4f1d91e 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -87,6 +87,15 @@ static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx,
> :: "a" (eax), "b" (ebx), "c" (ecx));
> }
>
> +/*
> + * Re-enable interrupts right upon calling mwait in such a way that
> + * no interrupt can fire _before_ the execution of mwait, ie: no
> + * instruction must be placed between "sti" and "mwait".
> + *
> + * This is necessary because if an interrupt queues a timer before
> + * executing mwait, it would otherwise go unnoticed and the next tick
> + * would not be reprogrammed accordingly before mwait ever wakes up.
> + */
> static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> {
> mds_idle_clear_cpu_buffers();
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 02/10] cpuidle: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
2023-08-11 17:00 ` [PATCH 01/10] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:35 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 03/10] cpuidle: Report illegal tick stopped while polling Frederic Weisbecker
` (8 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Peter Zijlstra, Frederic Weisbecker, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen, Jacob Pan,
Len Brown
From: Peter Zijlstra <peterz@infradead.org>
intel_idle_irq() re-enables IRQs very early. As a result, an interrupt
may fire before mwait() is eventually called. If such an interrupt queues
a timer, it may go unnoticed until mwait returns and the idle loop
handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't
help because a local timer enqueue doesn't set that flag.
The issue is mitigated by the fact that this idle handler is only invoked
for shallow C-states when, presumably, the next tick is supposed to be
close enough. There may still be rare cases though when the next tick
is far away and the selected C-state is shallow, resulting in a timer
getting ignored for a while.
Fix this with using sti_mwait() whose IRQ-reenablement only triggers
upon calling mwait(), dealing with the race while keeping the interrupt
latency within acceptable bounds.
Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons)
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/x86/include/asm/mwait.h | 11 +++++++++--
drivers/idle/intel_idle.c | 19 +++++++------------
2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 341ee4f1d91e..920426d691ce 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -124,8 +124,15 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
}
__monitor((void *)¤t_thread_info()->flags, 0, 0);
- if (!need_resched())
- __mwait(eax, ecx);
+
+ if (!need_resched()) {
+ if (ecx & 1) {
+ __mwait(eax, ecx);
+ } else {
+ __sti_mwait(eax, ecx);
+ raw_local_irq_disable();
+ }
+ }
}
current_clr_polling();
}
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 256c2d42e350..d676d32741da 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -131,11 +131,12 @@ static unsigned int mwait_substates __initdata;
#define MWAIT2flg(eax) ((eax & 0xFF) << 24)
static __always_inline int __intel_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+ struct cpuidle_driver *drv,
+ int index, bool irqoff)
{
struct cpuidle_state *state = &drv->states[index];
unsigned long eax = flg2MWAIT(state->flags);
- unsigned long ecx = 1; /* break on interrupt flag */
+ unsigned long ecx = 1*irqoff; /* break on interrupt flag */
mwait_idle_with_hints(eax, ecx);
@@ -159,19 +160,13 @@ static __always_inline int __intel_idle(struct cpuidle_device *dev,
static __cpuidle int intel_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- return __intel_idle(dev, drv, index);
+ return __intel_idle(dev, drv, index, true);
}
static __cpuidle int intel_idle_irq(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- int ret;
-
- raw_local_irq_enable();
- ret = __intel_idle(dev, drv, index);
- raw_local_irq_disable();
-
- return ret;
+ return __intel_idle(dev, drv, index, false);
}
static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
@@ -184,7 +179,7 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
if (smt_active)
native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
- ret = __intel_idle(dev, drv, index);
+ ret = __intel_idle(dev, drv, index, true);
if (smt_active)
native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
@@ -196,7 +191,7 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
fpu_idle_fpregs();
- return __intel_idle(dev, drv, index);
+ return __intel_idle(dev, drv, index, true);
}
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 02/10] cpuidle: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
2023-08-11 17:00 ` [PATCH 02/10] cpuidle: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker
@ 2023-08-11 17:35 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:35 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen, Jacob Pan, Len Brown
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> intel_idle_irq() re-enables IRQs very early. As a result, an interrupt
> may fire before mwait() is eventually called. If such an interrupt queues
> a timer, it may go unnoticed until mwait returns and the idle loop
> handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't
> help because a local timer enqueue doesn't set that flag.
>
> The issue is mitigated by the fact that this idle handler is only invoked
> for shallow C-states when, presumably, the next tick is supposed to be
> close enough. There may still be rare cases though when the next tick
> is far away and the selected C-state is shallow, resulting in a timer
> getting ignored for a while.
>
> Fix this with using sti_mwait() whose IRQ-reenablement only triggers
> upon calling mwait(), dealing with the race while keeping the interrupt
> latency within acceptable bounds.
>
> Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons)
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> arch/x86/include/asm/mwait.h | 11 +++++++++--
> drivers/idle/intel_idle.c | 19 +++++++------------
> 2 files changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 341ee4f1d91e..920426d691ce 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -124,8 +124,15 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
> }
>
> __monitor((void *)¤t_thread_info()->flags, 0, 0);
> - if (!need_resched())
> - __mwait(eax, ecx);
> +
> + if (!need_resched()) {
> + if (ecx & 1) {
> + __mwait(eax, ecx);
> + } else {
> + __sti_mwait(eax, ecx);
> + raw_local_irq_disable();
> + }
> + }
> }
> current_clr_polling();
> }
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 256c2d42e350..d676d32741da 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -131,11 +131,12 @@ static unsigned int mwait_substates __initdata;
> #define MWAIT2flg(eax) ((eax & 0xFF) << 24)
>
> static __always_inline int __intel_idle(struct cpuidle_device *dev,
> - struct cpuidle_driver *drv, int index)
> + struct cpuidle_driver *drv,
> + int index, bool irqoff)
> {
> struct cpuidle_state *state = &drv->states[index];
> unsigned long eax = flg2MWAIT(state->flags);
> - unsigned long ecx = 1; /* break on interrupt flag */
> + unsigned long ecx = 1*irqoff; /* break on interrupt flag */
>
> mwait_idle_with_hints(eax, ecx);
>
> @@ -159,19 +160,13 @@ static __always_inline int __intel_idle(struct cpuidle_device *dev,
> static __cpuidle int intel_idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> - return __intel_idle(dev, drv, index);
> + return __intel_idle(dev, drv, index, true);
> }
>
> static __cpuidle int intel_idle_irq(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> - int ret;
> -
> - raw_local_irq_enable();
> - ret = __intel_idle(dev, drv, index);
> - raw_local_irq_disable();
> -
> - return ret;
> + return __intel_idle(dev, drv, index, false);
> }
>
> static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
> @@ -184,7 +179,7 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
> if (smt_active)
> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>
> - ret = __intel_idle(dev, drv, index);
> + ret = __intel_idle(dev, drv, index, true);
>
> if (smt_active)
> native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> @@ -196,7 +191,7 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> fpu_idle_fpregs();
> - return __intel_idle(dev, drv, index);
> + return __intel_idle(dev, drv, index, true);
> }
>
> /**
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 03/10] cpuidle: Report illegal tick stopped while polling
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
2023-08-11 17:00 ` [PATCH 01/10] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker
2023-08-11 17:00 ` [PATCH 02/10] cpuidle: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:37 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 04/10] loongson: Fix idle VS timer enqueue Frederic Weisbecker
` (7 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
poll_idle() can't be called while the tick is stopped because it enables
interrupts and only polls on TIF_NEED_RESCHED, which doesn't tell if an
interrupt queues a timer that would require a tick re-programming.
There is no point anyway to poll with the tick stopped so add a check
to make sure it never happens.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
drivers/cpuidle/poll_state.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 9b6d90a72601..009f46f121ae 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -7,6 +7,7 @@
#include <linux/sched.h>
#include <linux/sched/clock.h>
#include <linux/sched/idle.h>
+#include <linux/tick.h>
#define POLL_IDLE_RELAX_COUNT 200
@@ -19,6 +20,13 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
dev->poll_time_limit = false;
+ /*
+ * This re-enables IRQs and only polls on TIF_NEED_RESCHED.
+ * A timer queued by an interrupt here may go unnoticed if
+ * the tick is stopped.
+ */
+ WARN_ON_ONCE(tick_nohz_tick_stopped());
+
raw_local_irq_enable();
if (!current_set_polling_and_test()) {
unsigned int loop_count = 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 03/10] cpuidle: Report illegal tick stopped while polling
2023-08-11 17:00 ` [PATCH 03/10] cpuidle: Report illegal tick stopped while polling Frederic Weisbecker
@ 2023-08-11 17:37 ` Rafael J. Wysocki
2023-08-29 13:04 ` Frederic Weisbecker
0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:37 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> poll_idle() can't be called while the tick is stopped because it enables
> interrupts and only polls on TIF_NEED_RESCHED, which doesn't tell if an
> interrupt queues a timer that would require a tick re-programming.
>
> There is no point anyway to poll with the tick stopped so add a check
> to make sure it never happens.
I'd rather update governors so they never use polling states when the
tick has been stopped and then add the WARN_ON().
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> drivers/cpuidle/poll_state.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 9b6d90a72601..009f46f121ae 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -7,6 +7,7 @@
> #include <linux/sched.h>
> #include <linux/sched/clock.h>
> #include <linux/sched/idle.h>
> +#include <linux/tick.h>
>
> #define POLL_IDLE_RELAX_COUNT 200
>
> @@ -19,6 +20,13 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>
> dev->poll_time_limit = false;
>
> + /*
> + * This re-enables IRQs and only polls on TIF_NEED_RESCHED.
> + * A timer queued by an interrupt here may go unnoticed if
> + * the tick is stopped.
> + */
> + WARN_ON_ONCE(tick_nohz_tick_stopped());
> +
> raw_local_irq_enable();
> if (!current_set_polling_and_test()) {
> unsigned int loop_count = 0;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 03/10] cpuidle: Report illegal tick stopped while polling
2023-08-11 17:37 ` Rafael J. Wysocki
@ 2023-08-29 13:04 ` Frederic Weisbecker
0 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-29 13:04 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: LKML, Peter Zijlstra, Daniel Lezcano, Thomas Gleixner,
Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 07:37:55PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> > poll_idle() can't be called while the tick is stopped because it enables
> > interrupts and only polls on TIF_NEED_RESCHED, which doesn't tell if an
> > interrupt queues a timer that would require a tick re-programming.
> >
> > There is no point anyway to poll with the tick stopped so add a check
> > to make sure it never happens.
>
> I'd rather update governors so they never use polling states when the
> tick has been stopped and then add the WARN_ON().
Sounds good, let's put this one aside.
In the meantime feel free to pick up those you acked and see fit in your
queue.
Thanks!
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 04/10] loongson: Fix idle VS timer enqueue
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (2 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 03/10] cpuidle: Report illegal tick stopped while polling Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-14 2:58 ` bibo mao
2023-08-11 17:00 ` [PATCH 05/10] cpuidle: Comment about timers requirements VS idle handler Frederic Weisbecker
` (6 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Peter Zijlstra, Frederic Weisbecker, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen, Bibo Mao,
WANG Xuerui, Bibo
From: Peter Zijlstra <peterz@infradead.org>
Loongson re-enables interrupts on its idle routine and performs a
TIF_NEED_RESCHED check afterwards before putting the CPU to sleep.
The IRQs firing between the check and the idling instruction may set the
TIF_NEED_RESCHED flag. In order to deal with the such a race, IRQs
interrupting __arch_cpu_idle() rollback their return address to the
beginning of __arch_cpu_idle() so that TIF_NEED_RESCHED is checked
again before going back to sleep.
However idle IRQs can also queue timers that may require a tick
reprogramming through a new generic idle loop iteration but those timers
would go unnoticed here because __arch_cpu_idle() only checks
TIF_NEED_RESCHED. It doesn't check for pending timers.
Fix this with fast-forwarding idle IRQs return value to the end of the
idle routine instead of the beginning, so that the generic idle loop
handles both TIF_NEED_RESCHED and pending timers.
Fixes: 0603839b18f4 (LoongArch: Add exception/interrupt handling)
Tested-by: Bibo, Mao <maobibo@loongson.cn>
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/loongarch/kernel/genex.S | 32 ++++++++++++++++++--------------
arch/loongarch/kernel/idle.c | 1 -
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S
index 78f066384657..359d693f112e 100644
--- a/arch/loongarch/kernel/genex.S
+++ b/arch/loongarch/kernel/genex.S
@@ -18,27 +18,31 @@
.align 5
SYM_FUNC_START(__arch_cpu_idle)
- /* start of rollback region */
- LONG_L t0, tp, TI_FLAGS
- nop
- andi t0, t0, _TIF_NEED_RESCHED
- bnez t0, 1f
- nop
- nop
- nop
+ /* start of idle interrupt region */
+ move t0, CSR_CRMD_IE
+ csrxchg t0, t0, LOONGARCH_CSR_CRMD
+ /*
+ * If an interrupt lands here; between enabling interrupts above and
+ * going idle on the next instruction, we must *NOT* go idle since the
+ * interrupt could have set TIF_NEED_RESCHED or caused an timer to need
+ * reprogramming. Fall through -- see handle_vint() below -- and have
+ * the idle loop take care of things.
+ */
idle 0
- /* end of rollback region */
-1: jr ra
+ nop
+ /* end of idle interrupt region */
+SYM_INNER_LABEL(__arch_cpu_idle_exit, SYM_L_LOCAL)
+ jr ra
SYM_FUNC_END(__arch_cpu_idle)
SYM_FUNC_START(handle_vint)
BACKUP_T0T1
SAVE_ALL
- la_abs t1, __arch_cpu_idle
+ la_abs t1, __arch_cpu_idle_exit
LONG_L t0, sp, PT_ERA
- /* 32 byte rollback region */
- ori t0, t0, 0x1f
- xori t0, t0, 0x1f
+ /* 16 byte idle interrupt region */
+ ori t0, t0, 0x0f
+ addi.d t0, t0, 1
bne t0, t1, 1f
LONG_S t0, sp, PT_ERA
1: move a0, sp
diff --git a/arch/loongarch/kernel/idle.c b/arch/loongarch/kernel/idle.c
index 0b5dd2faeb90..5ba72d229920 100644
--- a/arch/loongarch/kernel/idle.c
+++ b/arch/loongarch/kernel/idle.c
@@ -11,7 +11,6 @@
void __cpuidle arch_cpu_idle(void)
{
- raw_local_irq_enable();
__arch_cpu_idle(); /* idle instruction needs irq enabled */
raw_local_irq_disable();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 04/10] loongson: Fix idle VS timer enqueue
2023-08-11 17:00 ` [PATCH 04/10] loongson: Fix idle VS timer enqueue Frederic Weisbecker
@ 2023-08-14 2:58 ` bibo mao
0 siblings, 0 replies; 24+ messages in thread
From: bibo mao @ 2023-08-14 2:58 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen, WANG Xuerui, LKML,
Huacai Chen
+Add huacai
asm instruction move should be replaced with li.w, the other looks good to me.
diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S
index 359d693f112e..8a98023ecafd 100644
--- a/arch/loongarch/kernel/genex.S
+++ b/arch/loongarch/kernel/genex.S
@@ -19,7 +19,7 @@
.align 5
SYM_FUNC_START(__arch_cpu_idle)
/* start of idle interrupt region */
- move t0, CSR_CRMD_IE
+ li.w t0, CSR_CRMD_IE
csrxchg t0, t0, LOONGARCH_CSR_CRMD
/*
* If an interrupt lands here; between enabling interrupts above and
By the way __arch_cpu_idle is called by machine_halt/machine_power_off etc,
irq will be enabled with new patch. Need we add something like this?
diff --git a/arch/loongarch/kernel/reset.c b/arch/loongarch/kernel/reset.c
index 1ef8c6383535..9ecd42c0c804 100644
--- a/arch/loongarch/kernel/reset.c
+++ b/arch/loongarch/kernel/reset.c
@@ -20,6 +20,11 @@
void (*pm_power_off)(void);
EXPORT_SYMBOL(pm_power_off);
+static __always_inline void native_halt(void)
+{
+ asm volatile("idle 0": : :"memory");
+}
+
void machine_halt(void)
{
#ifdef CONFIG_SMP
@@ -32,9 +37,9 @@ void machine_halt(void)
pr_notice("\n\n** You can safely turn off the power now **\n\n");
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
- while (true) {
- __arch_cpu_idle();
- }
+ while (1) {
+ native_halt();
+ };
}
void machine_power_off(void)
@@ -52,9 +57,9 @@ void machine_power_off(void)
efi.reset_system(EFI_RESET_SHUTDOWN, EFI_SUCCESS, 0, NULL);
#endif
- while (true) {
- __arch_cpu_idle();
- }
+ while (1) {
+ native_halt();
+ };
}
void machine_restart(char *command)
@@ -73,7 +78,7 @@ void machine_restart(char *command)
if (!acpi_disabled)
acpi_reboot();
- while (true) {
- __arch_cpu_idle();
- }
+ while (1) {
+ native_halt();
+ };
}
Regards
Bibo Mao
在 2023/8/12 01:00, Frederic Weisbecker 写道:
> From: Peter Zijlstra <peterz@infradead.org>
>
> Loongson re-enables interrupts on its idle routine and performs a
> TIF_NEED_RESCHED check afterwards before putting the CPU to sleep.
>
> The IRQs firing between the check and the idling instruction may set the
> TIF_NEED_RESCHED flag. In order to deal with the such a race, IRQs
> interrupting __arch_cpu_idle() rollback their return address to the
> beginning of __arch_cpu_idle() so that TIF_NEED_RESCHED is checked
> again before going back to sleep.
>
> However idle IRQs can also queue timers that may require a tick
> reprogramming through a new generic idle loop iteration but those timers
> would go unnoticed here because __arch_cpu_idle() only checks
> TIF_NEED_RESCHED. It doesn't check for pending timers.
>
> Fix this with fast-forwarding idle IRQs return value to the end of the
> idle routine instead of the beginning, so that the generic idle loop
> handles both TIF_NEED_RESCHED and pending timers.
>
> Fixes: 0603839b18f4 (LoongArch: Add exception/interrupt handling)
> Tested-by: Bibo, Mao <maobibo@loongson.cn>
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: WANG Xuerui <kernel@xen0n.name>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> arch/loongarch/kernel/genex.S | 32 ++++++++++++++++++--------------
> arch/loongarch/kernel/idle.c | 1 -
> 2 files changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S
> index 78f066384657..359d693f112e 100644
> --- a/arch/loongarch/kernel/genex.S
> +++ b/arch/loongarch/kernel/genex.S
> @@ -18,27 +18,31 @@
>
> .align 5
> SYM_FUNC_START(__arch_cpu_idle)
> - /* start of rollback region */
> - LONG_L t0, tp, TI_FLAGS
> - nop
> - andi t0, t0, _TIF_NEED_RESCHED
> - bnez t0, 1f
> - nop
> - nop
> - nop
> + /* start of idle interrupt region */
> + move t0, CSR_CRMD_IE
> + csrxchg t0, t0, LOONGARCH_CSR_CRMD
> + /*
> + * If an interrupt lands here; between enabling interrupts above and
> + * going idle on the next instruction, we must *NOT* go idle since the
> + * interrupt could have set TIF_NEED_RESCHED or caused an timer to need
> + * reprogramming. Fall through -- see handle_vint() below -- and have
> + * the idle loop take care of things.
> + */
> idle 0
> - /* end of rollback region */
> -1: jr ra
> + nop
> + /* end of idle interrupt region */
> +SYM_INNER_LABEL(__arch_cpu_idle_exit, SYM_L_LOCAL)
> + jr ra
> SYM_FUNC_END(__arch_cpu_idle)
>
> SYM_FUNC_START(handle_vint)
> BACKUP_T0T1
> SAVE_ALL
> - la_abs t1, __arch_cpu_idle
> + la_abs t1, __arch_cpu_idle_exit
> LONG_L t0, sp, PT_ERA
> - /* 32 byte rollback region */
> - ori t0, t0, 0x1f
> - xori t0, t0, 0x1f
> + /* 16 byte idle interrupt region */
> + ori t0, t0, 0x0f
> + addi.d t0, t0, 1
> bne t0, t1, 1f
> LONG_S t0, sp, PT_ERA
> 1: move a0, sp
> diff --git a/arch/loongarch/kernel/idle.c b/arch/loongarch/kernel/idle.c
> index 0b5dd2faeb90..5ba72d229920 100644
> --- a/arch/loongarch/kernel/idle.c
> +++ b/arch/loongarch/kernel/idle.c
> @@ -11,7 +11,6 @@
>
> void __cpuidle arch_cpu_idle(void)
> {
> - raw_local_irq_enable();
> __arch_cpu_idle(); /* idle instruction needs irq enabled */
> raw_local_irq_disable();
> }
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 05/10] cpuidle: Comment about timers requirements VS idle handler
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (3 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 04/10] loongson: Fix idle VS timer enqueue Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:38 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 06/10] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
` (5 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/sched/idle.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 342f58a329f5..d52f6e3e3854 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -258,6 +258,36 @@ static void do_idle(void)
while (!need_resched()) {
rmb();
+ /*
+ * Interrupts shouldn't be re-enabled from that point on until
+ * the CPU sleeping instruction is reached. Otherwise an interrupt
+ * may fire and queue a timer that would be ignored until the CPU
+ * wakes from the sleeping instruction. And testing need_resched()
+ * doesn't tell about pending needed timer reprogram.
+ *
+ * Several cases to consider:
+ *
+ * - SLEEP-UNTIL-PENDING-INTERRUPT based instructions such as
+ * "wfi" or "mwait" are fine because they can be entered with
+ * interrupt disabled.
+ *
+ * - sti;mwait() couple is fine because the interrupts are
+ * re-enabled only upon the execution of mwait, leaving no gap
+ * in-between.
+ *
+ * - ROLLBACK based idle handlers with the sleeping instruction
+ * called with interrupts enabled are NOT fine. In this scheme
+ * when the interrupt detects it has interrupted an idle handler,
+ * it rolls back to its beginning which performs the
+ * need_resched() check before re-executing the sleeping
+ * instruction. This can leak a pending needed timer reprogram.
+ * If such a scheme is really mandatory due to the lack of an
+ * appropriate CPU sleeping instruction, then a FAST-FORWARD
+ * must instead be applied: when the interrupt detects it has
+ * interrupted an idle handler, it must resume to the end of
+ * this idle handler so that the generic idle loop is iterated
+ * again to reprogram the tick.
+ */
local_irq_disable();
if (cpu_is_offline(cpu)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 05/10] cpuidle: Comment about timers requirements VS idle handler
2023-08-11 17:00 ` [PATCH 05/10] cpuidle: Comment about timers requirements VS idle handler Frederic Weisbecker
@ 2023-08-11 17:38 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:38 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> kernel/sched/idle.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 342f58a329f5..d52f6e3e3854 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -258,6 +258,36 @@ static void do_idle(void)
> while (!need_resched()) {
> rmb();
>
> + /*
> + * Interrupts shouldn't be re-enabled from that point on until
> + * the CPU sleeping instruction is reached. Otherwise an interrupt
> + * may fire and queue a timer that would be ignored until the CPU
> + * wakes from the sleeping instruction. And testing need_resched()
> + * doesn't tell about pending needed timer reprogram.
> + *
> + * Several cases to consider:
> + *
> + * - SLEEP-UNTIL-PENDING-INTERRUPT based instructions such as
> + * "wfi" or "mwait" are fine because they can be entered with
> + * interrupt disabled.
> + *
> + * - sti;mwait() couple is fine because the interrupts are
> + * re-enabled only upon the execution of mwait, leaving no gap
> + * in-between.
> + *
> + * - ROLLBACK based idle handlers with the sleeping instruction
> + * called with interrupts enabled are NOT fine. In this scheme
> + * when the interrupt detects it has interrupted an idle handler,
> + * it rolls back to its beginning which performs the
> + * need_resched() check before re-executing the sleeping
> + * instruction. This can leak a pending needed timer reprogram.
> + * If such a scheme is really mandatory due to the lack of an
> + * appropriate CPU sleeping instruction, then a FAST-FORWARD
> + * must instead be applied: when the interrupt detects it has
> + * interrupted an idle handler, it must resume to the end of
> + * this idle handler so that the generic idle loop is iterated
> + * again to reprogram the tick.
> + */
> local_irq_disable();
>
> if (cpu_is_offline(cpu)) {
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 06/10] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (4 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 05/10] cpuidle: Comment about timers requirements VS idle handler Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:39 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 07/10] cpuidle: Remove unnecessary current_clr_polling() on poll_idle() Frederic Weisbecker
` (4 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Marcelo Tosatti, Peter Zijlstra,
Rafael J . Wysocki, Daniel Lezcano, Thomas Gleixner,
Anna-Maria Behnsen
When cpuidle drivers ->enter() callback are called, the TIF_NR_POLLING
flag is cleared already and TIF_NEED_RESCHED checked by call_cpuidle().
Therefore calling current_clr_polling_and_test() is redundant here and
further setting of TIF_NEED_RESCHED will result in an IPI and thus an
idle loop exit. This call can be safely removed.
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
drivers/cpuidle/cpuidle-haltpoll.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c
index e66df22f9695..b641bc535102 100644
--- a/drivers/cpuidle/cpuidle-haltpoll.c
+++ b/drivers/cpuidle/cpuidle-haltpoll.c
@@ -28,11 +28,8 @@ static enum cpuhp_state haltpoll_hp_state;
static int default_enter_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- if (current_clr_polling_and_test()) {
- local_irq_enable();
- return index;
- }
arch_cpu_idle();
+
return index;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 06/10] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll
2023-08-11 17:00 ` [PATCH 06/10] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
@ 2023-08-11 17:39 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:39 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Marcelo Tosatti, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> When cpuidle drivers ->enter() callback are called, the TIF_NR_POLLING
> flag is cleared already and TIF_NEED_RESCHED checked by call_cpuidle().
>
> Therefore calling current_clr_polling_and_test() is redundant here and
> further setting of TIF_NEED_RESCHED will result in an IPI and thus an
> idle loop exit. This call can be safely removed.
>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> drivers/cpuidle/cpuidle-haltpoll.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c
> index e66df22f9695..b641bc535102 100644
> --- a/drivers/cpuidle/cpuidle-haltpoll.c
> +++ b/drivers/cpuidle/cpuidle-haltpoll.c
> @@ -28,11 +28,8 @@ static enum cpuhp_state haltpoll_hp_state;
> static int default_enter_idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> - if (current_clr_polling_and_test()) {
> - local_irq_enable();
> - return index;
> - }
> arch_cpu_idle();
> +
> return index;
> }
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 07/10] cpuidle: Remove unnecessary current_clr_polling() on poll_idle()
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (5 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 06/10] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:40 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 08/10] x86: Remove __current_clr_polling() from mwait_idle() Frederic Weisbecker
` (3 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
There is no point in clearing TIF_NR_POLLING and folding TIF_NEED_RESCHED
upon poll_idle() exit because cpuidle_idle_call() is going to set again
TIF_NR_POLLING anyway. Also if TIF_NEED_RESCHED is set, it will be
folded and TIF_NR_POLLING will be cleared at the end of do_idle().
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
drivers/cpuidle/poll_state.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 009f46f121ae..44a656464d06 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -48,8 +48,6 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
}
raw_local_irq_disable();
- current_clr_polling();
-
return index;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 07/10] cpuidle: Remove unnecessary current_clr_polling() on poll_idle()
2023-08-11 17:00 ` [PATCH 07/10] cpuidle: Remove unnecessary current_clr_polling() on poll_idle() Frederic Weisbecker
@ 2023-08-11 17:40 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:40 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> There is no point in clearing TIF_NR_POLLING and folding TIF_NEED_RESCHED
> upon poll_idle() exit because cpuidle_idle_call() is going to set again
> TIF_NR_POLLING anyway. Also if TIF_NEED_RESCHED is set, it will be
> folded and TIF_NR_POLLING will be cleared at the end of do_idle().
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> drivers/cpuidle/poll_state.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 009f46f121ae..44a656464d06 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -48,8 +48,6 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
> }
> raw_local_irq_disable();
>
> - current_clr_polling();
> -
> return index;
> }
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 08/10] x86: Remove __current_clr_polling() from mwait_idle()
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (6 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 07/10] cpuidle: Remove unnecessary current_clr_polling() on poll_idle() Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:40 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 09/10] x86: Remove the current_clr_polling() call upon mwait exit Frederic Weisbecker
` (2 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
mwait_idle() is only ever called through by cpuidle, either from
default_idle_call() or from cpuidle_enter(). In any case
cpuidle_idle_call() sets again TIF_NR_POLLING after calling it so there
is no point for this atomic operation upon idle exit.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/x86/kernel/process.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 72015dba72ab..f1eb769770b1 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -918,7 +918,6 @@ static __cpuidle void mwait_idle(void)
raw_local_irq_disable();
}
}
- __current_clr_polling();
}
void select_idle_routine(const struct cpuinfo_x86 *c)
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 08/10] x86: Remove __current_clr_polling() from mwait_idle()
2023-08-11 17:00 ` [PATCH 08/10] x86: Remove __current_clr_polling() from mwait_idle() Frederic Weisbecker
@ 2023-08-11 17:40 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:40 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> mwait_idle() is only ever called through by cpuidle, either from
> default_idle_call() or from cpuidle_enter(). In any case
> cpuidle_idle_call() sets again TIF_NR_POLLING after calling it so there
> is no point for this atomic operation upon idle exit.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> arch/x86/kernel/process.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 72015dba72ab..f1eb769770b1 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -918,7 +918,6 @@ static __cpuidle void mwait_idle(void)
> raw_local_irq_disable();
> }
> }
> - __current_clr_polling();
> }
>
> void select_idle_routine(const struct cpuinfo_x86 *c)
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 09/10] x86: Remove the current_clr_polling() call upon mwait exit
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (7 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 08/10] x86: Remove __current_clr_polling() from mwait_idle() Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:41 ` Rafael J. Wysocki
2023-08-11 17:00 ` [PATCH 10/10] sched/timers: Explain why idle task schedules out on remote timer enqueue Frederic Weisbecker
2023-08-15 16:10 ` [PATCH 00/10] timers/cpuidle: Fixes and cleanups Peter Zijlstra
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Len Brown, Peter Zijlstra,
Rafael J . Wysocki, Daniel Lezcano, Thomas Gleixner,
Anna-Maria Behnsen
mwait_idle_with_hints() is mostly called from cpuidle which already sets
back TIF_NR_POLLING and fold TIF_NEED_RESCHED if necessary.
The only non-cpuidle caller is power_saving_thread() which acts as an idle
loop and is the only reason why mwait_idle_with_hints() carries a costly
fully ordered atomic operation upon idle exit.
Make this overhead proper to power_saving_thread() instead which already
duplicates quite some cpuidle code.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
arch/x86/include/asm/mwait.h | 1 -
drivers/acpi/acpi_pad.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 920426d691ce..c1be3775b94a 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -134,7 +134,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
}
}
}
- current_clr_polling();
}
/*
diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
index 7a453c5ff303..5a44afafe185 100644
--- a/drivers/acpi/acpi_pad.c
+++ b/drivers/acpi/acpi_pad.c
@@ -174,6 +174,7 @@ static int power_saving_thread(void *data)
stop_critical_timings();
mwait_idle_with_hints(power_saving_mwait_eax, 1);
+ current_clr_polling();
start_critical_timings();
tick_broadcast_exit();
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 09/10] x86: Remove the current_clr_polling() call upon mwait exit
2023-08-11 17:00 ` [PATCH 09/10] x86: Remove the current_clr_polling() call upon mwait exit Frederic Weisbecker
@ 2023-08-11 17:41 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:41 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Len Brown, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> mwait_idle_with_hints() is mostly called from cpuidle which already sets
> back TIF_NR_POLLING and fold TIF_NEED_RESCHED if necessary.
>
> The only non-cpuidle caller is power_saving_thread() which acts as an idle
> loop and is the only reason why mwait_idle_with_hints() carries a costly
> fully ordered atomic operation upon idle exit.
>
> Make this overhead proper to power_saving_thread() instead which already
> duplicates quite some cpuidle code.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> arch/x86/include/asm/mwait.h | 1 -
> drivers/acpi/acpi_pad.c | 1 +
> 2 files changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 920426d691ce..c1be3775b94a 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -134,7 +134,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
> }
> }
> }
> - current_clr_polling();
> }
>
> /*
> diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
> index 7a453c5ff303..5a44afafe185 100644
> --- a/drivers/acpi/acpi_pad.c
> +++ b/drivers/acpi/acpi_pad.c
> @@ -174,6 +174,7 @@ static int power_saving_thread(void *data)
> stop_critical_timings();
>
> mwait_idle_with_hints(power_saving_mwait_eax, 1);
> + current_clr_polling();
>
> start_critical_timings();
> tick_broadcast_exit();
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 10/10] sched/timers: Explain why idle task schedules out on remote timer enqueue
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (8 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 09/10] x86: Remove the current_clr_polling() call upon mwait exit Frederic Weisbecker
@ 2023-08-11 17:00 ` Frederic Weisbecker
2023-08-11 17:43 ` Rafael J. Wysocki
2023-08-15 16:10 ` [PATCH 00/10] timers/cpuidle: Fixes and cleanups Peter Zijlstra
10 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-11 17:00 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Anna-Maria Behnsen
Trying to avoid that didn't bring much value after testing, add comment
about this.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/sched/core.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..e53b892167ad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1135,6 +1135,28 @@ static void wake_up_idle_cpu(int cpu)
if (cpu == smp_processor_id())
return;
+ /*
+ * Set TIF_NEED_RESCHED and send an IPI if in the non-polling
+ * part of the idle loop. This forces an exit from the idle loop
+ * and a round trip to schedule(). Now this could be optimized
+ * because a simple new idle loop iteration is enough to
+ * re-evaluate the next tick. Provided some re-ordering of tick
+ * nohz functions that would need to follow TIF_NR_POLLING
+ * clearing:
+ *
+ * - On most archs, a simple fetch_or on ti::flags with a
+ * "0" value would be enough to know if an IPI needs to be sent.
+ *
+ * - x86 needs to perform a last need_resched() check between
+ * monitor and mwait which doesn't take timers into account.
+ * There a dedicated TIF_TIMER flag would be required to
+ * fetch_or here and be checked along with TIF_NEED_RESCHED
+ * before mwait().
+ *
+ * However, remote timer enqueue is not such a frequent event
+ * and testing of the above solutions didn't appear to report
+ * much benefits.
+ */
if (set_nr_and_not_polling(rq->idle))
smp_send_reschedule(cpu);
else
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 10/10] sched/timers: Explain why idle task schedules out on remote timer enqueue
2023-08-11 17:00 ` [PATCH 10/10] sched/timers: Explain why idle task schedules out on remote timer enqueue Frederic Weisbecker
@ 2023-08-11 17:43 ` Rafael J. Wysocki
0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11 17:43 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
Thomas Gleixner, Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Trying to avoid that didn't bring much value after testing, add comment
> about this.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> ---
> kernel/sched/core.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c52c2eba7c73..e53b892167ad 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1135,6 +1135,28 @@ static void wake_up_idle_cpu(int cpu)
> if (cpu == smp_processor_id())
> return;
>
> + /*
> + * Set TIF_NEED_RESCHED and send an IPI if in the non-polling
> + * part of the idle loop. This forces an exit from the idle loop
> + * and a round trip to schedule(). Now this could be optimized
> + * because a simple new idle loop iteration is enough to
> + * re-evaluate the next tick. Provided some re-ordering of tick
> + * nohz functions that would need to follow TIF_NR_POLLING
> + * clearing:
> + *
> + * - On most archs, a simple fetch_or on ti::flags with a
> + * "0" value would be enough to know if an IPI needs to be sent.
> + *
> + * - x86 needs to perform a last need_resched() check between
> + * monitor and mwait which doesn't take timers into account.
> + * There a dedicated TIF_TIMER flag would be required to
> + * fetch_or here and be checked along with TIF_NEED_RESCHED
> + * before mwait().
> + *
> + * However, remote timer enqueue is not such a frequent event
> + * and testing of the above solutions didn't appear to report
> + * much benefits.
> + */
> if (set_nr_and_not_polling(rq->idle))
> smp_send_reschedule(cpu);
> else
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 00/10] timers/cpuidle: Fixes and cleanups
2023-08-11 17:00 [PATCH 00/10] timers/cpuidle: Fixes and cleanups Frederic Weisbecker
` (9 preceding siblings ...)
2023-08-11 17:00 ` [PATCH 10/10] sched/timers: Explain why idle task schedules out on remote timer enqueue Frederic Weisbecker
@ 2023-08-15 16:10 ` Peter Zijlstra
2023-08-29 11:28 ` Frederic Weisbecker
10 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2023-08-15 16:10 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Rafael J . Wysocki, Daniel Lezcano, Thomas Gleixner,
Anna-Maria Behnsen
On Fri, Aug 11, 2023 at 07:00:39PM +0200, Frederic Weisbecker wrote:
> Hi,
>
> The first part of the series is cpuidle callback fixes against timers,
> some of which haven't been Signed by Peter yet.
>
> Another part is removing the overhead of useless TIF_NR_POLLING clearing.
So I've again forgotten why we don't simply set TIF_NEED_RESCHED if we
need the timer re-programmed. That is by far the simplest fix.
I'm sure we talked about it, but this was a long time ago and I can't
remember :/
Anyway, the patches look good, except I think there's a whole bunch of
architectures that still need fixing. In particular since loongson
'borrowed' the whole lot from MIPS, they need an identical fix. But I'm
sure there's more architectures affected.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 00/10] timers/cpuidle: Fixes and cleanups
2023-08-15 16:10 ` [PATCH 00/10] timers/cpuidle: Fixes and cleanups Peter Zijlstra
@ 2023-08-29 11:28 ` Frederic Weisbecker
0 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2023-08-29 11:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Rafael J . Wysocki, Daniel Lezcano, Thomas Gleixner,
Anna-Maria Behnsen
On Tue, Aug 15, 2023 at 06:10:43PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 11, 2023 at 07:00:39PM +0200, Frederic Weisbecker wrote:
> > Hi,
> >
> > The first part of the series is cpuidle callback fixes against timers,
> > some of which haven't been Signed by Peter yet.
> >
> > Another part is removing the overhead of useless TIF_NR_POLLING clearing.
>
> So I've again forgotten why we don't simply set TIF_NEED_RESCHED if we
> need the timer re-programmed. That is by far the simplest fix.
>
> I'm sure we talked about it, but this was a long time ago and I can't
> remember :/
I don't think we did but the rationale is that with forcing setting
TIF_NEED_RESCHED, you force a needless timer restart (which is then going
to be cancelled shortly after) and a round trip to the scheduler with the
rq lock overhead, etc...
Just for the fun I just tried the following change:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..ec43d135cf65 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1132,8 +1132,10 @@ static void wake_up_idle_cpu(int cpu)
{
struct rq *rq = cpu_rq(cpu);
- if (cpu == smp_processor_id())
+ if (cpu == smp_processor_id()) {
+ set_tsk_need_resched(current);
return;
+ }
if (set_nr_and_not_polling(rq->idle))
smp_send_reschedule(cpu);
Then I computed the average of 100 runs of "make clean -C tools/perf; make -C
tools/perf/" before and after this patch.
I observed an average regression of 1.27% less time spent in C-states.
So this has a measurable impact.
>
> Anyway, the patches look good, except I think there's a whole bunch of
> architectures that still need fixing. In particular since loongson
> 'borrowed' the whole lot from MIPS, they need an identical fix. But I'm
> sure there's more architectures affected.
MIPS at least yes, I only did a quick check and it seems that most archs
use a "wfi" like instruction. I'll check for others.
Thanks.
^ permalink raw reply related [flat|nested] 24+ messages in thread