* [PATCH 0/4] x86/cpuidle fixes and optimization
@ 2023-11-15 15:13 Frederic Weisbecker
2023-11-15 15:13 ` [PATCH 1/4] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Frederic Weisbecker @ 2023-11-15 15:13 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, Peter Zijlstra, x86
This is the x86 part of a cpuidle series I posted a few time ago. There
is a fix by Peter (Not-yet-signed-off-by btw.), the rest is comment and
optimizations.
Frederic Weisbecker (3):
x86: Add a comment about the "magic" behind shadow sti before mwait
x86: Remove __current_clr_polling() from mwait_idle()
x86: Remove the current_clr_polling() call upon mwait exit
Peter Zijlstra (1):
x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
arch/x86/include/asm/mwait.h | 21 ++++++++++++++++++---
arch/x86/kernel/process.c | 1 -
drivers/acpi/acpi_pad.c | 1 +
drivers/idle/intel_idle.c | 19 +++++++------------
4 files changed, 26 insertions(+), 16 deletions(-)
--
2.42.1
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/4] x86: Add a comment about the "magic" behind shadow sti before mwait 2023-11-15 15:13 [PATCH 0/4] x86/cpuidle fixes and optimization Frederic Weisbecker @ 2023-11-15 15:13 ` Frederic Weisbecker 2023-11-29 14:55 ` [tip: x86/core] " tip-bot2 for Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Frederic Weisbecker @ 2023-11-15 15:13 UTC (permalink / raw) To: LKML Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, Peter Zijlstra, x86, Rafael J . Wysocki Add a note to make sure we never miss and break the requirements behind it. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- arch/x86/include/asm/mwait.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 778df05f8539..341ee4f1d91e 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -87,6 +87,15 @@ static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx, :: "a" (eax), "b" (ebx), "c" (ecx)); } +/* + * Re-enable interrupts right upon calling mwait in such a way that + * no interrupt can fire _before_ the execution of mwait, ie: no + * instruction must be placed between "sti" and "mwait". + * + * This is necessary because if an interrupt queues a timer before + * executing mwait, it would otherwise go unnoticed and the next tick + * would not be reprogrammed accordingly before mwait ever wakes up. + */ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx) { mds_idle_clear_cpu_buffers(); -- 2.42.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [tip: x86/core] x86: Add a comment about the "magic" behind shadow sti before mwait 2023-11-15 15:13 ` [PATCH 1/4] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker @ 2023-11-29 14:55 ` tip-bot2 for Frederic Weisbecker 0 siblings, 0 replies; 13+ messages in thread From: tip-bot2 for Frederic Weisbecker @ 2023-11-29 14:55 UTC (permalink / raw) To: linux-tip-commits Cc: Frederic Weisbecker, Peter Zijlstra (Intel), Rafael J. Wysocki, x86, linux-kernel The following commit has been merged into the x86/core branch of tip: Commit-ID: 7d09a052a3bdb62de9a86d43359d6c22eeaf105a Gitweb: https://git.kernel.org/tip/7d09a052a3bdb62de9a86d43359d6c22eeaf105a Author: Frederic Weisbecker <frederic@kernel.org> AuthorDate: Wed, 15 Nov 2023 10:13:22 -05:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 29 Nov 2023 15:44:01 +01:00 x86: Add a comment about the "magic" behind shadow sti before mwait Add a note to make sure we never miss and break the requirements behind it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lkml.kernel.org/r/20231115151325.6262-2-frederic@kernel.org --- arch/x86/include/asm/mwait.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 778df05..341ee4f 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -87,6 +87,15 @@ static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx, :: "a" (eax), "b" (ebx), "c" (ecx)); } +/* + * Re-enable interrupts right upon calling mwait in such a way that + * no interrupt can fire _before_ the execution of mwait, ie: no + * instruction must be placed between "sti" and "mwait". + * + * This is necessary because if an interrupt queues a timer before + * executing mwait, it would otherwise go unnoticed and the next tick + * would not be reprogrammed accordingly before mwait ever wakes up. + */ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx) { mds_idle_clear_cpu_buffers(); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram 2023-11-15 15:13 [PATCH 0/4] x86/cpuidle fixes and optimization Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 1/4] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker @ 2023-11-15 15:13 ` Frederic Weisbecker 2023-11-15 15:52 ` Peter Zijlstra 2023-11-29 14:55 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra 2023-11-15 15:13 ` [PATCH 3/4] x86: Remove __current_clr_polling() from mwait_idle() Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 4/4] x86: Remove the current_clr_polling() call upon mwait exit Frederic Weisbecker 3 siblings, 2 replies; 13+ messages in thread From: Frederic Weisbecker @ 2023-11-15 15:13 UTC (permalink / raw) To: LKML Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Rafael J . Wysocki, Frederic Weisbecker From: Peter Zijlstra <peterz@infradead.org> intel_idle_irq() re-enables IRQs very early. As a result, an interrupt may fire before mwait() is eventually called. If such an interrupt queues a timer, it may go unnoticed until mwait returns and the idle loop handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't help because a local timer enqueue doesn't set that flag. The issue is mitigated by the fact that this idle handler is only invoked for shallow C-states when, presumably, the next tick is supposed to be close enough. There may still be rare cases though when the next tick is far away and the selected C-state is shallow, resulting in a timer getting ignored for a while. Fix this with using sti_mwait() whose IRQ-reenablement only triggers upon calling mwait(), dealing with the race while keeping the interrupt latency within acceptable bounds. Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons) Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- arch/x86/include/asm/mwait.h | 11 +++++++++-- drivers/idle/intel_idle.c | 19 +++++++------------ 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 341ee4f1d91e..920426d691ce 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -124,8 +124,15 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo } __monitor((void *)¤t_thread_info()->flags, 0, 0); - if (!need_resched()) - __mwait(eax, ecx); + + if (!need_resched()) { + if (ecx & 1) { + __mwait(eax, ecx); + } else { + __sti_mwait(eax, ecx); + raw_local_irq_disable(); + } + } } current_clr_polling(); } diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index dcda0afecfc5..3e01a6b23e75 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -131,11 +131,12 @@ static unsigned int mwait_substates __initdata; #define MWAIT2flg(eax) ((eax & 0xFF) << 24) static __always_inline int __intel_idle(struct cpuidle_device *dev, - struct cpuidle_driver *drv, int index) + struct cpuidle_driver *drv, + int index, bool irqoff) { struct cpuidle_state *state = &drv->states[index]; unsigned long eax = flg2MWAIT(state->flags); - unsigned long ecx = 1; /* break on interrupt flag */ + unsigned long ecx = 1*irqoff; /* break on interrupt flag */ mwait_idle_with_hints(eax, ecx); @@ -159,19 +160,13 @@ static __always_inline int __intel_idle(struct cpuidle_device *dev, static __cpuidle int intel_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - return __intel_idle(dev, drv, index); + return __intel_idle(dev, drv, index, true); } static __cpuidle int intel_idle_irq(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - int ret; - - raw_local_irq_enable(); - ret = __intel_idle(dev, drv, index); - raw_local_irq_disable(); - - return ret; + return __intel_idle(dev, drv, index, false); } static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, @@ -184,7 +179,7 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, if (smt_active) __update_spec_ctrl(0); - ret = __intel_idle(dev, drv, index); + ret = __intel_idle(dev, drv, index, true); if (smt_active) __update_spec_ctrl(spec_ctrl); @@ -196,7 +191,7 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { fpu_idle_fpregs(); - return __intel_idle(dev, drv, index); + return __intel_idle(dev, drv, index, true); } /** -- 2.42.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram 2023-11-15 15:13 ` [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker @ 2023-11-15 15:52 ` Peter Zijlstra 2023-11-15 15:57 ` Frederic Weisbecker 2023-11-29 14:55 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra 1 sibling, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2023-11-15 15:52 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Rafael J . Wysocki On Wed, Nov 15, 2023 at 10:13:23AM -0500, Frederic Weisbecker wrote: > From: Peter Zijlstra <peterz@infradead.org> > > intel_idle_irq() re-enables IRQs very early. As a result, an interrupt > may fire before mwait() is eventually called. If such an interrupt queues > a timer, it may go unnoticed until mwait returns and the idle loop > handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't > help because a local timer enqueue doesn't set that flag. > > The issue is mitigated by the fact that this idle handler is only invoked > for shallow C-states when, presumably, the next tick is supposed to be > close enough. There may still be rare cases though when the next tick > is far away and the selected C-state is shallow, resulting in a timer > getting ignored for a while. > > Fix this with using sti_mwait() whose IRQ-reenablement only triggers > upon calling mwait(), dealing with the race while keeping the interrupt > latency within acceptable bounds. > > Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons) > Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org> Feel free to change to normal SOB, I'm assuming it actually compiles and works by now :-) > Acked-by: Rafael J. Wysocki <rafael@kernel.org> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > --- > arch/x86/include/asm/mwait.h | 11 +++++++++-- > drivers/idle/intel_idle.c | 19 +++++++------------ > 2 files changed, 16 insertions(+), 14 deletions(-) > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index 341ee4f1d91e..920426d691ce 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -124,8 +124,15 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo > } > > __monitor((void *)¤t_thread_info()->flags, 0, 0); > - if (!need_resched()) > - __mwait(eax, ecx); > + > + if (!need_resched()) { > + if (ecx & 1) { > + __mwait(eax, ecx); > + } else { > + __sti_mwait(eax, ecx); > + raw_local_irq_disable(); > + } > + } > } > current_clr_polling(); > } > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > index dcda0afecfc5..3e01a6b23e75 100644 > --- a/drivers/idle/intel_idle.c > +++ b/drivers/idle/intel_idle.c > @@ -131,11 +131,12 @@ static unsigned int mwait_substates __initdata; > #define MWAIT2flg(eax) ((eax & 0xFF) << 24) > > static __always_inline int __intel_idle(struct cpuidle_device *dev, > - struct cpuidle_driver *drv, int index) > + struct cpuidle_driver *drv, > + int index, bool irqoff) > { > struct cpuidle_state *state = &drv->states[index]; > unsigned long eax = flg2MWAIT(state->flags); > - unsigned long ecx = 1; /* break on interrupt flag */ > + unsigned long ecx = 1*irqoff; /* break on interrupt flag */ > > mwait_idle_with_hints(eax, ecx); > > @@ -159,19 +160,13 @@ static __always_inline int __intel_idle(struct cpuidle_device *dev, > static __cpuidle int intel_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > - return __intel_idle(dev, drv, index); > + return __intel_idle(dev, drv, index, true); > } > > static __cpuidle int intel_idle_irq(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > - int ret; > - > - raw_local_irq_enable(); > - ret = __intel_idle(dev, drv, index); > - raw_local_irq_disable(); > - > - return ret; > + return __intel_idle(dev, drv, index, false); > } > > static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, > @@ -184,7 +179,7 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, > if (smt_active) > __update_spec_ctrl(0); > > - ret = __intel_idle(dev, drv, index); > + ret = __intel_idle(dev, drv, index, true); > > if (smt_active) > __update_spec_ctrl(spec_ctrl); > @@ -196,7 +191,7 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > fpu_idle_fpregs(); > - return __intel_idle(dev, drv, index); > + return __intel_idle(dev, drv, index, true); > } > > /** > -- > 2.42.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram 2023-11-15 15:52 ` Peter Zijlstra @ 2023-11-15 15:57 ` Frederic Weisbecker 0 siblings, 0 replies; 13+ messages in thread From: Frederic Weisbecker @ 2023-11-15 15:57 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Rafael J . Wysocki Le Wed, Nov 15, 2023 at 04:52:32PM +0100, Peter Zijlstra a écrit : > On Wed, Nov 15, 2023 at 10:13:23AM -0500, Frederic Weisbecker wrote: > > From: Peter Zijlstra <peterz@infradead.org> > > > > intel_idle_irq() re-enables IRQs very early. As a result, an interrupt > > may fire before mwait() is eventually called. If such an interrupt queues > > a timer, it may go unnoticed until mwait returns and the idle loop > > handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't > > help because a local timer enqueue doesn't set that flag. > > > > The issue is mitigated by the fact that this idle handler is only invoked > > for shallow C-states when, presumably, the next tick is supposed to be > > close enough. There may still be rare cases though when the next tick > > is far away and the selected C-state is shallow, resulting in a timer > > getting ignored for a while. > > > > Fix this with using sti_mwait() whose IRQ-reenablement only triggers > > upon calling mwait(), dealing with the race while keeping the interrupt > > latency within acceptable bounds. > > > > Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons) > > Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org> > > Feel free to change to normal SOB, I'm assuming it actually compiles and > works by now :-) Not sure, I might have tested it at some point ;-) Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip: x86/core] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram 2023-11-15 15:13 ` [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker 2023-11-15 15:52 ` Peter Zijlstra @ 2023-11-29 14:55 ` tip-bot2 for Peter Zijlstra 2023-11-30 11:15 ` Peter Zijlstra 1 sibling, 1 reply; 13+ messages in thread From: tip-bot2 for Peter Zijlstra @ 2023-11-29 14:55 UTC (permalink / raw) To: linux-tip-commits Cc: Peter Zijlstra (Intel), Frederic Weisbecker, Rafael J. Wysocki, x86, linux-kernel The following commit has been merged into the x86/core branch of tip: Commit-ID: edc8fc01f608108b0b7580cb2c29dfb5135e5f0e Gitweb: https://git.kernel.org/tip/edc8fc01f608108b0b7580cb2c29dfb5135e5f0e Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Wed, 15 Nov 2023 10:13:23 -05:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Wed, 29 Nov 2023 15:44:01 +01:00 x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram intel_idle_irq() re-enables IRQs very early. As a result, an interrupt may fire before mwait() is eventually called. If such an interrupt queues a timer, it may go unnoticed until mwait returns and the idle loop handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't help because a local timer enqueue doesn't set that flag. The issue is mitigated by the fact that this idle handler is only invoked for shallow C-states when, presumably, the next tick is supposed to be close enough. There may still be rare cases though when the next tick is far away and the selected C-state is shallow, resulting in a timer getting ignored for a while. Fix this with using sti_mwait() whose IRQ-reenablement only triggers upon calling mwait(), dealing with the race while keeping the interrupt latency within acceptable bounds. Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lkml.kernel.org/r/20231115151325.6262-3-frederic@kernel.org --- arch/x86/include/asm/mwait.h | 11 +++++++++-- drivers/idle/intel_idle.c | 19 +++++++------------ 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 341ee4f..920426d 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -124,8 +124,15 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo } __monitor((void *)¤t_thread_info()->flags, 0, 0); - if (!need_resched()) - __mwait(eax, ecx); + + if (!need_resched()) { + if (ecx & 1) { + __mwait(eax, ecx); + } else { + __sti_mwait(eax, ecx); + raw_local_irq_disable(); + } + } } current_clr_polling(); } diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index dcda0af..3e01a6b 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -131,11 +131,12 @@ static unsigned int mwait_substates __initdata; #define MWAIT2flg(eax) ((eax & 0xFF) << 24) static __always_inline int __intel_idle(struct cpuidle_device *dev, - struct cpuidle_driver *drv, int index) + struct cpuidle_driver *drv, + int index, bool irqoff) { struct cpuidle_state *state = &drv->states[index]; unsigned long eax = flg2MWAIT(state->flags); - unsigned long ecx = 1; /* break on interrupt flag */ + unsigned long ecx = 1*irqoff; /* break on interrupt flag */ mwait_idle_with_hints(eax, ecx); @@ -159,19 +160,13 @@ static __always_inline int __intel_idle(struct cpuidle_device *dev, static __cpuidle int intel_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - return __intel_idle(dev, drv, index); + return __intel_idle(dev, drv, index, true); } static __cpuidle int intel_idle_irq(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { - int ret; - - raw_local_irq_enable(); - ret = __intel_idle(dev, drv, index); - raw_local_irq_disable(); - - return ret; + return __intel_idle(dev, drv, index, false); } static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, @@ -184,7 +179,7 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, if (smt_active) __update_spec_ctrl(0); - ret = __intel_idle(dev, drv, index); + ret = __intel_idle(dev, drv, index, true); if (smt_active) __update_spec_ctrl(spec_ctrl); @@ -196,7 +191,7 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { fpu_idle_fpregs(); - return __intel_idle(dev, drv, index); + return __intel_idle(dev, drv, index, true); } /** ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [tip: x86/core] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram 2023-11-29 14:55 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra @ 2023-11-30 11:15 ` Peter Zijlstra 2023-12-12 13:46 ` Frederic Weisbecker 0 siblings, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2023-11-30 11:15 UTC (permalink / raw) To: linux-kernel, Andrew Cooper Cc: linux-tip-commits, Frederic Weisbecker, Rafael J. Wysocki, x86 On Wed, Nov 29, 2023 at 02:55:55PM -0000, tip-bot2 for Peter Zijlstra wrote: > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index 341ee4f..920426d 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -124,8 +124,15 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo > } > > __monitor((void *)¤t_thread_info()->flags, 0, 0); > - if (!need_resched()) > - __mwait(eax, ecx); > + > + if (!need_resched()) { > + if (ecx & 1) { > + __mwait(eax, ecx); > + } else { > + __sti_mwait(eax, ecx); > + raw_local_irq_disable(); > + } > + } Andrew noted that this is only safe if it precludes #DB from happening on mwait, because #DB can wreck the STI shadow thing. > @@ -159,19 +160,13 @@ static __always_inline int __intel_idle(struct cpuidle_device *dev, > static __cpuidle int intel_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > + return __intel_idle(dev, drv, index, true); > } > > static __cpuidle int intel_idle_irq(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > + return __intel_idle(dev, drv, index, false); > } > > static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, > @@ -184,7 +179,7 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, > if (smt_active) > __update_spec_ctrl(0); > > + ret = __intel_idle(dev, drv, index, true); > > if (smt_active) > __update_spec_ctrl(spec_ctrl); > @@ -196,7 +191,7 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > fpu_idle_fpregs(); > + return __intel_idle(dev, drv, index, true); > } This is so, because all mwait users should be in __cpuidle section, which itself is part of the noinstr section and as such kprobes/hw-breakpoints etc.. are disallowed. Notable vmlinux.lds.h has: #define NOINSTR_TEXT \ ALIGN_FUNCTION(); \ __noinstr_text_start = .; \ *(.noinstr.text) \ __cpuidle_text_start = .; \ *(.cpuidle.text) \ __cpuidle_text_end = .; \ __noinstr_text_end = .; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tip: x86/core] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram 2023-11-30 11:15 ` Peter Zijlstra @ 2023-12-12 13:46 ` Frederic Weisbecker 0 siblings, 0 replies; 13+ messages in thread From: Frederic Weisbecker @ 2023-12-12 13:46 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Andrew Cooper, linux-tip-commits, Rafael J. Wysocki, x86 On Thu, Nov 30, 2023 at 12:15:19PM +0100, Peter Zijlstra wrote: > This is so, because all mwait users should be in __cpuidle section, > which itself is part of the noinstr section and as such > kprobes/hw-breakpoints etc.. are disallowed. > > Notable vmlinux.lds.h has: > > #define NOINSTR_TEXT \ > ALIGN_FUNCTION(); \ > __noinstr_text_start = .; \ > *(.noinstr.text) \ > __cpuidle_text_start = .; \ > *(.cpuidle.text) \ > __cpuidle_text_end = .; \ > __noinstr_text_end = .; So #DB aren't supposed to happen then, right? Or you noticed an mwait user that doesn't have __cpuidle? Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/4] x86: Remove __current_clr_polling() from mwait_idle() 2023-11-15 15:13 [PATCH 0/4] x86/cpuidle fixes and optimization Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 1/4] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker @ 2023-11-15 15:13 ` Frederic Weisbecker 2023-11-16 15:13 ` Peter Zijlstra 2023-11-15 15:13 ` [PATCH 4/4] x86: Remove the current_clr_polling() call upon mwait exit Frederic Weisbecker 3 siblings, 1 reply; 13+ messages in thread From: Frederic Weisbecker @ 2023-11-15 15:13 UTC (permalink / raw) To: LKML Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, Peter Zijlstra, x86, Rafael J . Wysocki mwait_idle() is only ever called through by cpuidle, either from default_idle_call() or from cpuidle_enter(). In any case cpuidle_idle_call() sets again TIF_NR_POLLING after calling it so there is no point for this atomic operation upon idle exit. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- arch/x86/kernel/process.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b6f4e8399fca..fc7a38084606 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -930,7 +930,6 @@ static __cpuidle void mwait_idle(void) raw_local_irq_disable(); } } - __current_clr_polling(); } void select_idle_routine(const struct cpuinfo_x86 *c) -- 2.42.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 3/4] x86: Remove __current_clr_polling() from mwait_idle() 2023-11-15 15:13 ` [PATCH 3/4] x86: Remove __current_clr_polling() from mwait_idle() Frederic Weisbecker @ 2023-11-16 15:13 ` Peter Zijlstra 2023-11-16 18:48 ` Frederic Weisbecker 0 siblings, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2023-11-16 15:13 UTC (permalink / raw) To: Frederic Weisbecker Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Rafael J . Wysocki On Wed, Nov 15, 2023 at 10:13:24AM -0500, Frederic Weisbecker wrote: > mwait_idle() is only ever called through by cpuidle, either from > default_idle_call() or from cpuidle_enter(). In any case > cpuidle_idle_call() sets again TIF_NR_POLLING after calling it so there > is no point for this atomic operation upon idle exit. > > Acked-by: Rafael J. Wysocki <rafael@kernel.org> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > --- > arch/x86/kernel/process.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index b6f4e8399fca..fc7a38084606 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -930,7 +930,6 @@ static __cpuidle void mwait_idle(void) > raw_local_irq_disable(); > } > } > - __current_clr_polling(); > } > > void select_idle_routine(const struct cpuinfo_x86 *c) Urgh at this and the next one... That is, yes we can do this, but it makes these function asymmetric and doesn't actually solve the underlying problem that all of the polling stuff is inside-out. Idle loop sets polling, then clears polling because it assumes all arch/driver idle loops are non-polling, then individual drivers re-set polling, and to be symmetric (above) clear it again, for the generic code to set it again, only to clear it again when leaving idle. Follow that? ;-) Anyway, drivers ought to tell up-front if they're polling and then we can avoid the whole dance and everything is better. Something like the very crude below. --- arch/x86/include/asm/mwait.h | 3 +-- arch/x86/kernel/process.c | 3 +-- drivers/acpi/processor_idle.c | 2 +- drivers/idle/intel_idle.c | 3 +++ include/linux/cpuidle.h | 6 ++++++ include/linux/sched/idle.h | 7 ++++++- kernel/sched/idle.c | 22 +++++++++++++++++----- 7 files changed, 35 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 778df05f8539..c0c60fde7a4d 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -107,7 +107,7 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx) */ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) { - if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) { + if (static_cpu_has_bug(X86_BUG_MONITOR) || !need_resched()) { if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) { mb(); clflush((void *)¤t_thread_info()->flags); @@ -118,7 +118,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo if (!need_resched()) __mwait(eax, ecx); } - current_clr_polling(); } /* diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b6f4e8399fca..73baa82584c2 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -917,7 +917,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c) */ static __cpuidle void mwait_idle(void) { - if (!current_set_polling_and_test()) { + if (!need_resched()) { if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) { mb(); /* quirk */ clflush((void *)¤t_thread_info()->flags); @@ -930,7 +930,6 @@ static __cpuidle void mwait_idle(void) raw_local_irq_disable(); } } - __current_clr_polling(); } void select_idle_routine(const struct cpuinfo_x86 *c) diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 3a34a8c425fe..58765557b1b8 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -1219,7 +1219,7 @@ static int acpi_processor_setup_lpi_states(struct acpi_processor *pr) state->target_residency = lpi->min_residency; state->flags |= arch_get_idle_state_flags(lpi->arch_flags); if (i != 0 && lpi->entry_method == ACPI_CSTATE_FFH) - state->flags |= CPUIDLE_FLAG_RCU_IDLE; + state->flags |= CPUIDLE_FLAG_RCU_IDLE | CPUIDLE_FLAG_NO_IPI; state->enter = acpi_idle_lpi_enter; drv->safe_state_index = i; } diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index dcda0afecfc5..496b6a309f4f 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -1569,6 +1569,7 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv) state->target_residency *= 3; state->flags = MWAIT2flg(cx->address); + state->flags |= CPUIDLE_FLAG_NO_IPI; if (cx->type > ACPI_STATE_C2) state->flags |= CPUIDLE_FLAG_TLB_FLUSHED; @@ -1841,6 +1842,8 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) static void state_update_enter_method(struct cpuidle_state *state, int cstate) { + state->flags |= CPUIDLE_FLAG_NO_IPI; + if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) { /* * Combining with XSTATE with IBRS or IRQ_ENABLE flags diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 3183aeb7f5b4..623d88bd7658 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -85,6 +85,7 @@ struct cpuidle_state { #define CPUIDLE_FLAG_OFF BIT(4) /* disable this state by default */ #define CPUIDLE_FLAG_TLB_FLUSHED BIT(5) /* idle-state flushes TLBs */ #define CPUIDLE_FLAG_RCU_IDLE BIT(6) /* idle-state takes care of RCU */ +#define CPUIDLE_FLAG_NO_IPI BIT(7) /* has TIF_POLLING_NRFLAG */ struct cpuidle_device_kobj; struct cpuidle_state_kobj; @@ -168,6 +169,11 @@ struct cpuidle_driver { }; #ifdef CONFIG_CPU_IDLE +bool cpuidle_state_needs_ipi(struct cpuidle_driver *drv, int state) +{ + return !(drv->states[state].flags & CPUIDLE_FLAG_NO_IPI); +} + extern void disable_cpuidle(void); extern bool cpuidle_not_available(struct cpuidle_driver *drv, struct cpuidle_device *dev); diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h index 478084f9105e..24b29bb5d43b 100644 --- a/include/linux/sched/idle.h +++ b/include/linux/sched/idle.h @@ -68,6 +68,8 @@ static __always_inline bool __must_check current_set_polling_and_test(void) static __always_inline bool __must_check current_clr_polling_and_test(void) { + bool ret; + __current_clr_polling(); /* @@ -76,7 +78,10 @@ static __always_inline bool __must_check current_clr_polling_and_test(void) */ smp_mb__after_atomic(); - return unlikely(tif_need_resched()); + bool ret = unlikely(tif_need_resched()); + if (ret) + __current_set_polling(); + return ret; } #else diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 565f8374ddbb..65e2474cb82b 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -94,11 +94,12 @@ void __cpuidle default_idle_call(void) stop_critical_timings(); ct_cpuidle_enter(); - arch_cpu_idle(); + arch_cpu_idle(); // XXX assumes !polling ct_cpuidle_exit(); start_critical_timings(); trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id()); + __current_set_polling(); } local_irq_enable(); instrumentation_end(); @@ -107,20 +108,27 @@ void __cpuidle default_idle_call(void) static int call_cpuidle_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev) { + int ret; + if (current_clr_polling_and_test()) return -EBUSY; - return cpuidle_enter_s2idle(drv, dev); + ret = cpuidle_enter_s2idle(drv, dev); + __current_set_polling(); + return ret; } static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev, int next_state) { + bool need_ipi = cpuidle_state_needs_ipi(drv, next_state); + int ret; + /* * The idle task must be scheduled, it is pointless to go to idle, just * update no idle residency and return. */ - if (current_clr_polling_and_test()) { + if (needs_ipi && current_clr_polling_and_test()) { dev->last_residency_ns = 0; local_irq_enable(); return -EBUSY; @@ -131,7 +139,12 @@ static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev, * This function will block until an interrupt occurs and will take * care of re-enabling the local interrupts */ - return cpuidle_enter(drv, dev, next_state); + ret = cpuidle_enter(drv, dev, next_state); + + if (needs_ipi) + __current_set_polling(); + + return ret; } /** @@ -220,7 +233,6 @@ static void cpuidle_idle_call(void) } exit_idle: - __current_set_polling(); /* * It is up to the idle functions to reenable local interrupts ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 3/4] x86: Remove __current_clr_polling() from mwait_idle() 2023-11-16 15:13 ` Peter Zijlstra @ 2023-11-16 18:48 ` Frederic Weisbecker 0 siblings, 0 replies; 13+ messages in thread From: Frederic Weisbecker @ 2023-11-16 18:48 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Rafael J . Wysocki Le Thu, Nov 16, 2023 at 04:13:16PM +0100, Peter Zijlstra a écrit : > On Wed, Nov 15, 2023 at 10:13:24AM -0500, Frederic Weisbecker wrote: > > mwait_idle() is only ever called through by cpuidle, either from > > default_idle_call() or from cpuidle_enter(). In any case > > cpuidle_idle_call() sets again TIF_NR_POLLING after calling it so there > > is no point for this atomic operation upon idle exit. > > > > Acked-by: Rafael J. Wysocki <rafael@kernel.org> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > --- > > arch/x86/kernel/process.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > index b6f4e8399fca..fc7a38084606 100644 > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -930,7 +930,6 @@ static __cpuidle void mwait_idle(void) > > raw_local_irq_disable(); > > } > > } > > - __current_clr_polling(); > > } > > > > void select_idle_routine(const struct cpuinfo_x86 *c) > > > Urgh at this and the next one... That is, yes we can do this, but it > makes these function asymmetric and doesn't actually solve the > underlying problem that all of the polling stuff is inside-out. > > Idle loop sets polling, then clears polling because it assumes all > arch/driver idle loops are non-polling, then individual drivers re-set > polling, and to be symmetric (above) clear it again, for the generic > code to set it again, only to clear it again when leaving idle. > > Follow that? ;-) That's right :-) > > Anyway, drivers ought to tell up-front if they're polling and then we > can avoid the whole dance and everything is better. > > Something like the very crude below. Yeah that makes perfect sense (can I use your SoB right away?) Though I sometimes wonder why we even bother with setting TIF_NR_POLLING for some short parts in the generic idle loop even on !mwait and !cpuidle-state-polling states. Like for example why do we bother with setting TIF_NR_POLLING for just the portion in the generic idle loop that looks up the cpuidle state and stops the tick then clear TIF_NR_POLLING before calling wfi on ARM? Or may be it's a frequent pattern to have a remote wake up happening while entering the idle loop? Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/4] x86: Remove the current_clr_polling() call upon mwait exit 2023-11-15 15:13 [PATCH 0/4] x86/cpuidle fixes and optimization Frederic Weisbecker ` (2 preceding siblings ...) 2023-11-15 15:13 ` [PATCH 3/4] x86: Remove __current_clr_polling() from mwait_idle() Frederic Weisbecker @ 2023-11-15 15:13 ` Frederic Weisbecker 3 siblings, 0 replies; 13+ messages in thread From: Frederic Weisbecker @ 2023-11-15 15:13 UTC (permalink / raw) To: LKML Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, Peter Zijlstra, x86, Rafael J . Wysocki mwait_idle_with_hints() is mostly called from cpuidle which already sets back TIF_NR_POLLING and fold TIF_NEED_RESCHED if necessary. The only non-cpuidle caller is power_saving_thread() which acts as an idle loop and is the only reason why mwait_idle_with_hints() carries a costly fully ordered atomic operation upon idle exit. Make this overhead proper to power_saving_thread() instead which already duplicates quite some cpuidle code. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- arch/x86/include/asm/mwait.h | 1 - drivers/acpi/acpi_pad.c | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 920426d691ce..c1be3775b94a 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -134,7 +134,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo } } } - current_clr_polling(); } /* diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c index bd1ad07f0290..86b57123713f 100644 --- a/drivers/acpi/acpi_pad.c +++ b/drivers/acpi/acpi_pad.c @@ -175,6 +175,7 @@ static int power_saving_thread(void *data) stop_critical_timings(); mwait_idle_with_hints(power_saving_mwait_eax, 1); + current_clr_polling(); start_critical_timings(); tick_broadcast_exit(); -- 2.42.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-12-12 13:47 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-15 15:13 [PATCH 0/4] x86/cpuidle fixes and optimization Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 1/4] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker 2023-11-29 14:55 ` [tip: x86/core] " tip-bot2 for Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 2/4] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker 2023-11-15 15:52 ` Peter Zijlstra 2023-11-15 15:57 ` Frederic Weisbecker 2023-11-29 14:55 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra 2023-11-30 11:15 ` Peter Zijlstra 2023-12-12 13:46 ` Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 3/4] x86: Remove __current_clr_polling() from mwait_idle() Frederic Weisbecker 2023-11-16 15:13 ` Peter Zijlstra 2023-11-16 18:48 ` Frederic Weisbecker 2023-11-15 15:13 ` [PATCH 4/4] x86: Remove the current_clr_polling() call upon mwait exit Frederic Weisbecker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox