* [patch 0/4] timer/nohz: Fix timer/nohz woes
@ 2017-12-22 14:51 Thomas Gleixner
2017-12-22 14:51 ` [patch 1/4] timer: Use deferrable base independent of base::nohz_active Thomas Gleixner
` (4 more replies)
0 siblings, 5 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-22 14:51 UTC (permalink / raw)
To: LKML
Cc: Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar
Paul was observing weird stalls which are hard to reproduce and decode. We
were finally able to reproduce and decode the wreckage on RT.
The following series addresses the issues and hopefully nails the root
cause completely.
Please review carefully and expose it to the dreaded rcu torture tests
which seem to be the only way to trigger it.
Thanks,
tglx
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 1/4] timer: Use deferrable base independent of base::nohz_active
2017-12-22 14:51 [patch 0/4] timer/nohz: Fix timer/nohz woes Thomas Gleixner
@ 2017-12-22 14:51 ` Thomas Gleixner
2017-12-25 16:24 ` Frederic Weisbecker
2017-12-29 22:45 ` [tip:timers/urgent] timers: " tip-bot for Anna-Maria Gleixner
2017-12-22 14:51 ` [patch 2/4] nohz: Prevent erroneous tick stop invocations Thomas Gleixner
` (3 subsequent siblings)
4 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-22 14:51 UTC (permalink / raw)
To: LKML
Cc: Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable, rt
[-- Attachment #1: timers--Use_deferrable_base_unconditional_of_nohz_active.patch --]
[-- Type: text/plain, Size: 2601 bytes --]
From: Anna-Maria Gleixner <anna-maria@linutronix.de>
During boot and before base::nohz_active is set in the timer bases, deferrable
timers are enqueued into the standard timer base. This works correctly as
long as base::nohz_active is false.
Once it base::nohz_active is set and a timer which was enqueued before that
is accessed the lock selector code choses the lock of the deferred
base. This causes unlocked access to the standard base and in case the
timer is removed it does not clear the pending flag in the standard base
bitmap which causes get_next_timer_interrupt() to return bogus values.
To prevent that, the deferrable timers must be enqueued in the deferrable
base, even when base::nohz_active is not set. Those deferrable timers also
need to be expired unconditional.
Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: rt@linutronix.de
---
kernel/time/timer.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -823,11 +823,10 @@ static inline struct timer_base *get_tim
struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu);
/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
return base;
}
@@ -837,11 +836,10 @@ static inline struct timer_base *get_tim
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = this_cpu_ptr(&timer_bases[BASE_DEF]);
return base;
}
@@ -1684,7 +1682,7 @@ static __latent_entropy void run_timer_s
base->must_forward_clk = false;
__run_timers(base);
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 2/4] nohz: Prevent erroneous tick stop invocations
2017-12-22 14:51 [patch 0/4] timer/nohz: Fix timer/nohz woes Thomas Gleixner
2017-12-22 14:51 ` [patch 1/4] timer: Use deferrable base independent of base::nohz_active Thomas Gleixner
@ 2017-12-22 14:51 ` Thomas Gleixner
2017-12-26 15:17 ` Frederic Weisbecker
2017-12-22 14:51 ` [patch 3/4] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
` (2 subsequent siblings)
4 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-22 14:51 UTC (permalink / raw)
To: LKML
Cc: Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable
[-- Attachment #1: nohz--Prevent-erroneous-tick-stop-invocations.patch --]
[-- Type: text/plain, Size: 1619 bytes --]
The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
This is too permissive in various aspects:
1) If need_resched() is set, then the tick cannot be stopped whether
the CPU is idle or in nohz full mode.
2) If need_resched() is not set, but softirqs are pending then this is an
indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted
task takes precedence over softirqd.
Invoking tick_nohz_irq_exit() in these cases can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....
Prevent that by making the conditions proper and only allow invokation when
in idle or nohz full mode and neither need_resched() nor
local_softirq_pending() are set.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
kernel/softirq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
int cpu = smp_processor_id();
/* Make sure that timer wheel updates are propagated */
- if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+ if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
+ !need_resched() && !local_softirq_pending()) {
if (!in_interrupt())
tick_nohz_irq_exit();
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 3/4] timer: Invoke timer_start_debug() where it makes sense
2017-12-22 14:51 [patch 0/4] timer/nohz: Fix timer/nohz woes Thomas Gleixner
2017-12-22 14:51 ` [patch 1/4] timer: Use deferrable base independent of base::nohz_active Thomas Gleixner
2017-12-22 14:51 ` [patch 2/4] nohz: Prevent erroneous tick stop invocations Thomas Gleixner
@ 2017-12-22 14:51 ` Thomas Gleixner
2017-12-29 22:47 ` [tip:timers/urgent] timers: " tip-bot for Thomas Gleixner
2017-12-22 14:51 ` [patch 4/4] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
2017-12-22 17:09 ` [patch 0/4] timer/nohz: Fix timer/nohz woes Paul E. McKenney
4 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-22 14:51 UTC (permalink / raw)
To: LKML
Cc: Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable, rt
[-- Attachment #1: timer--Invoke_timer_start_debug--_where_it_makes_sense.patch --]
[-- Type: text/plain, Size: 978 bytes --]
From: Thomas Gleixner <tglx@linutronix.de>
The timer start debug function is called before the proper timer base is
set. As a consequence the trace data contains the stale CPU and flags
values.
Call the debug function after setting the new base and flags.
Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: rt@linutronix.de
---
kernel/time/timer.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1007,8 +1007,6 @@ static inline int
if (!ret && (options & MOD_TIMER_PENDING_ONLY))
goto out_unlock;
- debug_activate(timer, expires);
-
new_base = get_target_base(base, timer->flags);
if (base != new_base) {
@@ -1032,6 +1030,8 @@ static inline int
}
}
+ debug_activate(timer, expires);
+
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance
^ permalink raw reply [flat|nested] 23+ messages in thread
* [patch 4/4] timerqueue: Document return values of timerqueue_add/del()
2017-12-22 14:51 [patch 0/4] timer/nohz: Fix timer/nohz woes Thomas Gleixner
` (2 preceding siblings ...)
2017-12-22 14:51 ` [patch 3/4] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
@ 2017-12-22 14:51 ` Thomas Gleixner
2017-12-29 22:47 ` [tip:timers/urgent] " tip-bot for Thomas Gleixner
2017-12-22 17:09 ` [patch 0/4] timer/nohz: Fix timer/nohz woes Paul E. McKenney
4 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-22 14:51 UTC (permalink / raw)
To: LKML
Cc: Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, rt
[-- Attachment #1: timerqueue--Document_return_values_of_timerqueue_add-del--.patch --]
[-- Type: text/plain, Size: 1161 bytes --]
From: Thomas Gleixner <tglx@linutronix.de>
The return values of timerqueue_add/del() are not documented in the kernel doc
comment. Add proper documentation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
---
lib/timerqueue.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/lib/timerqueue.c
+++ b/lib/timerqueue.c
@@ -33,8 +33,9 @@
* @head: head of timerqueue
* @node: timer node to be added
*
- * Adds the timer node to the timerqueue, sorted by the
- * node's expires value.
+ * Adds the timer node to the timerqueue, sorted by the node's expires
+ * value. Returns true if the newly added timer is the first expiring timer in
+ * the queue.
*/
bool timerqueue_add(struct timerqueue_head *head, struct timerqueue_node *node)
{
@@ -70,7 +71,8 @@ EXPORT_SYMBOL_GPL(timerqueue_add);
* @head: head of timerqueue
* @node: timer node to be removed
*
- * Removes the timer node from the timerqueue.
+ * Removes the timer node from the timerqueue. Returns true if the queue is
+ * not empty after the remove.
*/
bool timerqueue_del(struct timerqueue_head *head, struct timerqueue_node *node)
{
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 0/4] timer/nohz: Fix timer/nohz woes
2017-12-22 14:51 [patch 0/4] timer/nohz: Fix timer/nohz woes Thomas Gleixner
` (3 preceding siblings ...)
2017-12-22 14:51 ` [patch 4/4] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
@ 2017-12-22 17:09 ` Paul E. McKenney
2017-12-24 1:21 ` Paul E. McKenney
4 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2017-12-22 17:09 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Peter Zijlstra,
Frederic Weisbecker, Ingo Molnar
On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> Paul was observing weird stalls which are hard to reproduce and decode. We
> were finally able to reproduce and decode the wreckage on RT.
>
> The following series addresses the issues and hopefully nails the root
> cause completely.
>
> Please review carefully and expose it to the dreaded rcu torture tests
> which seem to be the only way to trigger it.
Best Christmas present ever, thank you!!!
Just started up three concurrent 10-hour runs of the infamous rcutorture
TREE01 scenario, and will let you know how it goes!
Thanx, Paul
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 0/4] timer/nohz: Fix timer/nohz woes
2017-12-22 17:09 ` [patch 0/4] timer/nohz: Fix timer/nohz woes Paul E. McKenney
@ 2017-12-24 1:21 ` Paul E. McKenney
2017-12-24 1:29 ` Paul E. McKenney
2017-12-27 20:55 ` Thomas Gleixner
0 siblings, 2 replies; 23+ messages in thread
From: Paul E. McKenney @ 2017-12-24 1:21 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Peter Zijlstra,
Frederic Weisbecker, Ingo Molnar
On Fri, Dec 22, 2017 at 09:09:07AM -0800, Paul E. McKenney wrote:
> On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> > Paul was observing weird stalls which are hard to reproduce and decode. We
> > were finally able to reproduce and decode the wreckage on RT.
> >
> > The following series addresses the issues and hopefully nails the root
> > cause completely.
> >
> > Please review carefully and expose it to the dreaded rcu torture tests
> > which seem to be the only way to trigger it.
>
> Best Christmas present ever, thank you!!!
>
> Just started up three concurrent 10-hour runs of the infamous rcutorture
> TREE01 scenario, and will let you know how it goes!
Well, I messed up the first test and then reran it. Which had the benefit
of giving me a baseline. The rerun (with all four patches) produced
failures, so I ran it again with an additional patch of mine. I score
these tests by recording the time at first failure, or, if there is no
failure, the duration of the test. Summing the values gives the score.
And here are the scores, where 30 is a perfect score:
1. Baseline: 3.0+2.5+10=15.5
2. Four patches from Anna-Marie and Thomas: 10+2.7+1.7=14.4
3. Ditto plus the patch below: 10+4.3+10=24.3
Please note that these are nowhere near anything even resembling
statistical significance. However, they are encouraging. I will do
more runs, but also do shorter five-hour runs to increase the amount
of data per unit time. Please note also that my patch by itself never
did provide that great of an improvement, so there might be some sort
of combination effect going on here. Or maybe it is just luck, who knows?
Please note that I have not yet ported my diagnostic patches on top of
these, however, the stacks have the usual schedule_timeout() entries.
This is not too surprising from a software-engineering viewpoint:
Locating several bugs at a given point of time usually indicates that
there are more to be found. So in a sense we are lucky that the
same test triggers at least one of those additional bugs.
Thanx, Paul
------------------------------------------------------------------------
commit accb0edb85526a05b934eac49658d05ea0216fc4
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Thu Dec 7 13:18:44 2017 -0800
timers: Ensure that timer_base ->clk accounts for time offline
The timer_base ->must_forward_clk is set to indicate that the next timer
operation on that timer_base must check for passage of time. One instance
of time passage is when the timer wheel goes idle, and another is when
the corresponding CPU is offline. Note that it is not appropriate to set
->is_idle because that could result in IPIing an offline CPU. Therefore,
this commit instead sets ->must_forward_clk at CPU-offline time.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffebcf878fba..94cce780c574 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)
BUG_ON(old_base->running_timer);
+ old_base->must_forward_clk = true;
for (i = 0; i < WHEEL_SIZE; i++)
migrate_timer_list(new_base, old_base->vectors + i);
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [patch 0/4] timer/nohz: Fix timer/nohz woes
2017-12-24 1:21 ` Paul E. McKenney
@ 2017-12-24 1:29 ` Paul E. McKenney
2018-01-05 19:41 ` Paul E. McKenney
2017-12-27 20:55 ` Thomas Gleixner
1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2017-12-24 1:29 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Peter Zijlstra,
Frederic Weisbecker, Ingo Molnar
On Sat, Dec 23, 2017 at 05:21:20PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 22, 2017 at 09:09:07AM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> > > Paul was observing weird stalls which are hard to reproduce and decode. We
> > > were finally able to reproduce and decode the wreckage on RT.
> > >
> > > The following series addresses the issues and hopefully nails the root
> > > cause completely.
> > >
> > > Please review carefully and expose it to the dreaded rcu torture tests
> > > which seem to be the only way to trigger it.
> >
> > Best Christmas present ever, thank you!!!
> >
> > Just started up three concurrent 10-hour runs of the infamous rcutorture
> > TREE01 scenario, and will let you know how it goes!
>
> Well, I messed up the first test and then reran it. Which had the benefit
> of giving me a baseline. The rerun (with all four patches) produced
> failures, so I ran it again with an additional patch of mine. I score
> these tests by recording the time at first failure, or, if there is no
> failure, the duration of the test. Summing the values gives the score.
> And here are the scores, where 30 is a perfect score:
Sigh. They were five-hour tests, not ten-hour tests.
1. Baseline: 3.0+2.5+5=10.5
2. Four patches from Anna-Marie and Thomas: 5+2.7+1.7=9.4
3. Ditto plus the patch below: 5+4.3+5=14.3
Oh, and the reason for my suspecting that #2 is actually an improvement
over #1 is that my patch by itself produced a very small improvement in
reliability. This leads to the hypothesis that #2 really is helping out
in some way or another.
Thanx, Paul
> 1. Baseline: 3.0+2.5+10=15.5
>
> 2. Four patches from Anna-Marie and Thomas: 10+2.7+1.7=14.4
>
> 3. Ditto plus the patch below: 10+4.3+10=24.3
>
> Please note that these are nowhere near anything even resembling
> statistical significance. However, they are encouraging. I will do
> more runs, but also do shorter five-hour runs to increase the amount
> of data per unit time. Please note also that my patch by itself never
> did provide that great of an improvement, so there might be some sort
> of combination effect going on here. Or maybe it is just luck, who knows?
>
> Please note that I have not yet ported my diagnostic patches on top of
> these, however, the stacks have the usual schedule_timeout() entries.
> This is not too surprising from a software-engineering viewpoint:
> Locating several bugs at a given point of time usually indicates that
> there are more to be found. So in a sense we are lucky that the
> same test triggers at least one of those additional bugs.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit accb0edb85526a05b934eac49658d05ea0216fc4
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date: Thu Dec 7 13:18:44 2017 -0800
>
> timers: Ensure that timer_base ->clk accounts for time offline
>
> The timer_base ->must_forward_clk is set to indicate that the next timer
> operation on that timer_base must check for passage of time. One instance
> of time passage is when the timer wheel goes idle, and another is when
> the corresponding CPU is offline. Note that it is not appropriate to set
> ->is_idle because that could result in IPIing an offline CPU. Therefore,
> this commit instead sets ->must_forward_clk at CPU-offline time.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ffebcf878fba..94cce780c574 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)
>
> BUG_ON(old_base->running_timer);
>
> + old_base->must_forward_clk = true;
> for (i = 0; i < WHEEL_SIZE; i++)
> migrate_timer_list(new_base, old_base->vectors + i);
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 1/4] timer: Use deferrable base independent of base::nohz_active
2017-12-22 14:51 ` [patch 1/4] timer: Use deferrable base independent of base::nohz_active Thomas Gleixner
@ 2017-12-25 16:24 ` Frederic Weisbecker
2017-12-29 22:45 ` [tip:timers/urgent] timers: " tip-bot for Anna-Maria Gleixner
1 sibling, 0 replies; 23+ messages in thread
From: Frederic Weisbecker @ 2017-12-25 16:24 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable, rt
On Fri, Dec 22, 2017 at 03:51:12PM +0100, Thomas Gleixner wrote:
> From: Anna-Maria Gleixner <anna-maria@linutronix.de>
>
> During boot and before base::nohz_active is set in the timer bases, deferrable
> timers are enqueued into the standard timer base. This works correctly as
> long as base::nohz_active is false.
>
> Once it base::nohz_active is set and a timer which was enqueued before that
> is accessed the lock selector code choses the lock of the deferred
> base. This causes unlocked access to the standard base and in case the
> timer is removed it does not clear the pending flag in the standard base
> bitmap which causes get_next_timer_interrupt() to return bogus values.
>
> To prevent that, the deferrable timers must be enqueued in the deferrable
> base, even when base::nohz_active is not set. Those deferrable timers also
> need to be expired unconditional.
>
> Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Cc: rt@linutronix.de
Nice catch!
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Thanks!
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations
2017-12-22 14:51 ` [patch 2/4] nohz: Prevent erroneous tick stop invocations Thomas Gleixner
@ 2017-12-26 15:17 ` Frederic Weisbecker
2017-12-27 18:22 ` Thomas Gleixner
0 siblings, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2017-12-26 15:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable
On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
>
> if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
>
> This is too permissive in various aspects:
>
> 1) If need_resched() is set, then the tick cannot be stopped whether
> the CPU is idle or in nohz full mode.
That's not exactly true. In nohz full mode the tick is not restarted on the
switch from idle to a single task. And if an idle interrupt wakes up a
single task and enqueues a timer, we want that timer to be programmed even
though we have need_resched().
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations
2017-12-26 15:17 ` Frederic Weisbecker
@ 2017-12-27 18:22 ` Thomas Gleixner
2017-12-27 18:24 ` Thomas Gleixner
0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-27 18:22 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable
On Tue, 26 Dec 2017, Frederic Weisbecker wrote:
> On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> > The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> >
> > if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> >
> > This is too permissive in various aspects:
> >
> > 1) If need_resched() is set, then the tick cannot be stopped whether
> > the CPU is idle or in nohz full mode.
>
> That's not exactly true. In nohz full mode the tick is not restarted on the
> switch from idle to a single task. And if an idle interrupt wakes up a
> single task and enqueues a timer, we want that timer to be programmed even
> though we have need_resched().
Hrmm, so the check for softirq_pending() should be sufficient, right?
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
int cpu = smp_processor_id();
/* Make sure that timer wheel updates are propagated */
- if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+ if (((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) &&
+ if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
+ !local_softirq_pending()) {
if (!in_interrupt())
tick_nohz_irq_exit();
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations
2017-12-27 18:22 ` Thomas Gleixner
@ 2017-12-27 18:24 ` Thomas Gleixner
2017-12-27 20:58 ` Thomas Gleixner
0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-27 18:24 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable
On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> On Tue, 26 Dec 2017, Frederic Weisbecker wrote:
>
> > On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> > > The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
> > >
> > > if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> > >
> > > This is too permissive in various aspects:
> > >
> > > 1) If need_resched() is set, then the tick cannot be stopped whether
> > > the CPU is idle or in nohz full mode.
> >
> > That's not exactly true. In nohz full mode the tick is not restarted on the
> > switch from idle to a single task. And if an idle interrupt wakes up a
> > single task and enqueues a timer, we want that timer to be programmed even
> > though we have need_resched().
>
> Hrmm, so the check for softirq_pending() should be sufficient, right?
>
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -382,7 +382,8 @@ static inline void tick_irq_exit(void)
> int cpu = smp_processor_id();
>
> /* Make sure that timer wheel updates are propagated */
> - if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> + if (((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) &&
> + if ((idle_cpu(cpu) || tick_nohz_full_cpu(cpu)) &&
> + !local_softirq_pending()) {
> if (!in_interrupt())
> tick_nohz_irq_exit();
> }
Bah, no. We need to move that into the nohz logic somehow to prevent that
repetitive expiry yesterday reprogramming. Lemme think about it some more.
Thanks,
tglx
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 0/4] timer/nohz: Fix timer/nohz woes
2017-12-24 1:21 ` Paul E. McKenney
2017-12-24 1:29 ` Paul E. McKenney
@ 2017-12-27 20:55 ` Thomas Gleixner
2017-12-29 22:46 ` [tip:timers/urgent] timers: Reinitialize per cpu bases on hotplug tip-bot for Thomas Gleixner
1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-27 20:55 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Peter Zijlstra,
Frederic Weisbecker, Ingo Molnar
On Sat, 23 Dec 2017, Paul E. McKenney wrote:
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ffebcf878fba..94cce780c574 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)
>
> BUG_ON(old_base->running_timer);
>
> + old_base->must_forward_clk = true;
> for (i = 0; i < WHEEL_SIZE; i++)
> migrate_timer_list(new_base, old_base->vectors + i);
>
That's daft because base->next_expiry is also stale and that might cause
interesting issues in forward_timer_base(). But you are right, leaving the
base stale on cpu hotplug is not really a brilliant thing to do.
Instead of that workaround, I think we want a proper callback in the
prepare stage which brings the bases of the to be plugged CPU up to date.
Thanks,
tglx
8<---------------------
Subject: timers: Reinitialize per cpu bases on hotplug
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 27 Dec 2017 21:37:25 +0100
The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.
Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.
Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.
Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
include/linux/cpuhotplug.h | 2 +-
include/linux/timer.h | 4 +++-
kernel/cpu.c | 4 ++--
kernel/time/timer.c | 14 ++++++++++++++
4 files changed, 20 insertions(+), 4 deletions(-)
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -86,7 +86,7 @@ enum cpuhp_state {
CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_KVM_PPC_BOOK3S_PREPARE,
CPUHP_ZCOMP_PREPARE,
- CPUHP_TIMERS_DEAD,
+ CPUHP_TIMERS_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -207,9 +207,11 @@ unsigned long round_jiffies_up(unsigned
unsigned long round_jiffies_up_relative(unsigned long j);
#ifdef CONFIG_HOTPLUG_CPU
+int timers_prepare_cpu(unsigned int cpu);
int timers_dead_cpu(unsigned int cpu);
#else
-#define timers_dead_cpu NULL
+#define timers_prepare_cpu NULL
+#define timers_dead_cpu NULL
#endif
#endif
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1277,9 +1277,9 @@ static struct cpuhp_step cpuhp_bp_states
* before blk_mq_queue_reinit_notify() from notify_dead(),
* otherwise a RCU stall occurs.
*/
- [CPUHP_TIMERS_DEAD] = {
+ [CPUHP_TIMERS_PREPARE] = {
.name = "timers:dead",
- .startup.single = NULL,
+ .startup.single = timers_prepare_cpu,
.teardown.single = timers_dead_cpu,
},
/* Kicks the plugged cpu into life */
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1853,6 +1853,20 @@ static void migrate_timer_list(struct ti
}
}
+int timers_prepare_cpu(unsigned int cpu)
+{
+ struct timer_base *base;
+ int b;
+
+ for (b = 0; b < NR_BASES; b++) {
+ base = per_cpu_ptr(&timer_bases[b], cpu);
+ base->clk = jiffies;
+ base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
+ base->is_idle = false;
+ base->must_forward_clk = true;
+ }
+}
+
int timers_dead_cpu(unsigned int cpu)
{
struct timer_base *old_base;
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations
2017-12-27 18:24 ` Thomas Gleixner
@ 2017-12-27 20:58 ` Thomas Gleixner
2017-12-29 16:12 ` Frederic Weisbecker
2017-12-29 22:46 ` [tip:timers/urgent] nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() tip-bot for Thomas Gleixner
0 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-27 20:58 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable
On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> Bah, no. We need to move that into the nohz logic somehow to prevent that
> repetitive expiry yesterday reprogramming. Lemme think about it some more.
The patch below should be the proper cure.
Thanks,
tglx
8<-------------------
Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 22 Dec 2017 15:51:13 +0100
From: Thomas Gleixner <tglx@linutronix.de>
The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:
if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.
Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....
Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
---
kernel/time/tick-sched.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
ts->next_tick = 0;
}
+static inline bool local_timer_softirq_pending(void)
+{
+ return local_softirq_pending & TIMER_SOFTIRQ;
+}
+
static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
ktime_t now, int cpu)
{
@@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
} while (read_seqretry(&jiffies_lock, seq));
ts->last_jiffies = basejiff;
- if (rcu_needs_cpu(basemono, &next_rcu) ||
- arch_needs_cpu() || irq_work_needs_cpu()) {
+ if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
+ irq_work_needs_cpu() || local_timer_softirq_pending()) {
next_tick = basemono + TICK_NSEC;
} else {
/*
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations
2017-12-27 20:58 ` Thomas Gleixner
@ 2017-12-29 16:12 ` Frederic Weisbecker
2017-12-29 22:46 ` [tip:timers/urgent] nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() tip-bot for Thomas Gleixner
1 sibling, 0 replies; 23+ messages in thread
From: Frederic Weisbecker @ 2017-12-29 16:12 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Paul McKenney,
Peter Zijlstra, Frederic Weisbecker, Ingo Molnar, stable
On Wed, Dec 27, 2017 at 09:58:08PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> > Bah, no. We need to move that into the nohz logic somehow to prevent that
> > repetitive expiry yesterday reprogramming. Lemme think about it some more.
>
> The patch below should be the proper cure.
>
> Thanks,
>
> tglx
>
> 8<-------------------
> Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 22 Dec 2017 15:51:13 +0100
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
> subsequently invokes tick_nohz_stop_sched_tick() are:
>
> if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
>
> If need_resched() is not set, but a timer softirq is pending then this is
> an indication that the softirq code punted and delegated the execution to
> softirqd. need_resched() is not true because the current interrupted task
> takes precedence over softirqd.
>
> Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
> timer interrupts because the timer wheel contains an expired timer, but
> softirqs are not yet executed. So it returns an immediate expiry request,
> which causes the timer to fire immediately again. Lather, rinse and
> repeat....
>
> Prevent that by adding a check for a pending timer soft interrupt to the
> conditions in tick_nohz_stop_sched_tick() which avoid calling
> get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
> prevents a repetitive programming of an already expired timer.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: stable@vger.kernel.org
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
>
> ---
> kernel/time/tick-sched.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
> ts->next_tick = 0;
> }
>
> +static inline bool local_timer_softirq_pending(void)
> +{
> + return local_softirq_pending & TIMER_SOFTIRQ;
> +}
> +
> static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> ktime_t now, int cpu)
> {
> @@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
> } while (read_seqretry(&jiffies_lock, seq));
> ts->last_jiffies = basejiff;
>
> - if (rcu_needs_cpu(basemono, &next_rcu) ||
> - arch_needs_cpu() || irq_work_needs_cpu()) {
> + if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
> + irq_work_needs_cpu() || local_timer_softirq_pending()) {
Much better. This may need a comment though because it's not immediately
obvious why we have this check while softirqs are processed just before
tick_irq_exit().
Thanks.
Acked-by: Frederic Weisbecker <frederic@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [tip:timers/urgent] timers: Use deferrable base independent of base::nohz_active
2017-12-22 14:51 ` [patch 1/4] timer: Use deferrable base independent of base::nohz_active Thomas Gleixner
2017-12-25 16:24 ` Frederic Weisbecker
@ 2017-12-29 22:45 ` tip-bot for Anna-Maria Gleixner
1 sibling, 0 replies; 23+ messages in thread
From: tip-bot for Anna-Maria Gleixner @ 2017-12-29 22:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, anna-maria, peterz, tglx, hpa, bigeasy, fweisbec, paulmck,
linux-kernel
Commit-ID: ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd
Gitweb: https://git.kernel.org/tip/ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd
Author: Anna-Maria Gleixner <anna-maria@linutronix.de>
AuthorDate: Fri, 22 Dec 2017 15:51:12 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 29 Dec 2017 23:13:09 +0100
timers: Use deferrable base independent of base::nohz_active
During boot and before base::nohz_active is set in the timer bases, deferrable
timers are enqueued into the standard timer base. This works correctly as
long as base::nohz_active is false.
Once it base::nohz_active is set and a timer which was enqueued before that
is accessed the lock selector code choses the lock of the deferred
base. This causes unlocked access to the standard base and in case the
timer is removed it does not clear the pending flag in the standard base
bitmap which causes get_next_timer_interrupt() to return bogus values.
To prevent that, the deferrable timers must be enqueued in the deferrable
base, even when base::nohz_active is not set. Those deferrable timers also
need to be expired unconditional.
Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Cc: rt@linutronix.de
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
---
kernel/time/timer.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffebcf8..19a9c3d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -823,11 +823,10 @@ static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu)
struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu);
/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
return base;
}
@@ -837,11 +836,10 @@ static inline struct timer_base *get_timer_this_cpu_base(u32 tflags)
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = this_cpu_ptr(&timer_bases[BASE_DEF]);
return base;
}
@@ -1684,7 +1682,7 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
base->must_forward_clk = false;
__run_timers(base);
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
}
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:timers/urgent] timers: Reinitialize per cpu bases on hotplug
2017-12-27 20:55 ` Thomas Gleixner
@ 2017-12-29 22:46 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-12-29 22:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: paulmck, bigeasy, anna-maria, fweisbec, mingo, linux-kernel, tglx,
peterz, hpa
Commit-ID: 26456f87aca7157c057de65c9414b37f1ab881d1
Gitweb: https://git.kernel.org/tip/26456f87aca7157c057de65c9414b37f1ab881d1
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 27 Dec 2017 21:37:25 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 29 Dec 2017 23:13:09 +0100
timers: Reinitialize per cpu bases on hotplug
The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.
Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.
Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.
Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
---
include/linux/cpuhotplug.h | 2 +-
include/linux/timer.h | 4 +++-
kernel/cpu.c | 4 ++--
kernel/time/timer.c | 15 +++++++++++++++
4 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 201ab72..1a32e55 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -86,7 +86,7 @@ enum cpuhp_state {
CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_KVM_PPC_BOOK3S_PREPARE,
CPUHP_ZCOMP_PREPARE,
- CPUHP_TIMERS_DEAD,
+ CPUHP_TIMERS_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 04af640..2448f9c 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -207,9 +207,11 @@ unsigned long round_jiffies_up(unsigned long j);
unsigned long round_jiffies_up_relative(unsigned long j);
#ifdef CONFIG_HOTPLUG_CPU
+int timers_prepare_cpu(unsigned int cpu);
int timers_dead_cpu(unsigned int cpu);
#else
-#define timers_dead_cpu NULL
+#define timers_prepare_cpu NULL
+#define timers_dead_cpu NULL
#endif
#endif
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 41376c3..9785847 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1277,9 +1277,9 @@ static struct cpuhp_step cpuhp_bp_states[] = {
* before blk_mq_queue_reinit_notify() from notify_dead(),
* otherwise a RCU stall occurs.
*/
- [CPUHP_TIMERS_DEAD] = {
+ [CPUHP_TIMERS_PREPARE] = {
.name = "timers:dead",
- .startup.single = NULL,
+ .startup.single = timers_prepare_cpu,
.teardown.single = timers_dead_cpu,
},
/* Kicks the plugged cpu into life */
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 19a9c3d..6be576e 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1853,6 +1853,21 @@ static void migrate_timer_list(struct timer_base *new_base, struct hlist_head *h
}
}
+int timers_prepare_cpu(unsigned int cpu)
+{
+ struct timer_base *base;
+ int b;
+
+ for (b = 0; b < NR_BASES; b++) {
+ base = per_cpu_ptr(&timer_bases[b], cpu);
+ base->clk = jiffies;
+ base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
+ base->is_idle = false;
+ base->must_forward_clk = true;
+ }
+ return 0;
+}
+
int timers_dead_cpu(unsigned int cpu)
{
struct timer_base *old_base;
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:timers/urgent] nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
2017-12-27 20:58 ` Thomas Gleixner
2017-12-29 16:12 ` Frederic Weisbecker
@ 2017-12-29 22:46 ` tip-bot for Thomas Gleixner
1 sibling, 0 replies; 23+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-12-29 22:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, bigeasy, mingo, bigeasy, peterz, paulmck, fweisbec, tglx,
linux-kernel, anna-maria
Commit-ID: 5d62c183f9e9df1deeea0906d099a94e8a43047a
Gitweb: https://git.kernel.org/tip/5d62c183f9e9df1deeea0906d099a94e8a43047a
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 22 Dec 2017 15:51:13 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 29 Dec 2017 23:13:10 +0100
nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:
if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.
Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....
Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.
Reported-by: Sebastian Siewior <bigeasy@linutronix.d>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
---
kernel/time/tick-sched.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 77555fa..f7cc7ab 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
ts->next_tick = 0;
}
+static inline bool local_timer_softirq_pending(void)
+{
+ return local_softirq_pending() & TIMER_SOFTIRQ;
+}
+
static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
ktime_t now, int cpu)
{
@@ -666,8 +671,18 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
} while (read_seqretry(&jiffies_lock, seq));
ts->last_jiffies = basejiff;
- if (rcu_needs_cpu(basemono, &next_rcu) ||
- arch_needs_cpu() || irq_work_needs_cpu()) {
+ /*
+ * Keep the periodic tick, when RCU, architecture or irq_work
+ * requests it.
+ * Aside of that check whether the local timer softirq is
+ * pending. If so its a bad idea to call get_next_timer_interrupt()
+ * because there is an already expired timer, so it will request
+ * immeditate expiry, which rearms the hardware timer with a
+ * minimal delta which brings us back to this place
+ * immediately. Lather, rinse and repeat...
+ */
+ if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
+ irq_work_needs_cpu() || local_timer_softirq_pending()) {
next_tick = basemono + TICK_NSEC;
} else {
/*
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:timers/urgent] timers: Invoke timer_start_debug() where it makes sense
2017-12-22 14:51 ` [patch 3/4] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
@ 2017-12-29 22:47 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-12-29 22:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: peterz, anna-maria, fweisbec, hpa, linux-kernel, mingo, paulmck,
bigeasy, tglx
Commit-ID: fd45bb77ad682be728d1002431d77b8c73342836
Gitweb: https://git.kernel.org/tip/fd45bb77ad682be728d1002431d77b8c73342836
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 22 Dec 2017 15:51:14 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 29 Dec 2017 23:13:10 +0100
timers: Invoke timer_start_debug() where it makes sense
The timer start debug function is called before the proper timer base is
set. As a consequence the trace data contains the stale CPU and flags
values.
Call the debug function after setting the new base and flags.
Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Cc: rt@linutronix.de
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
---
kernel/time/timer.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 6be576e..89a9e1b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1007,8 +1007,6 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
if (!ret && (options & MOD_TIMER_PENDING_ONLY))
goto out_unlock;
- debug_activate(timer, expires);
-
new_base = get_target_base(base, timer->flags);
if (base != new_base) {
@@ -1032,6 +1030,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
}
}
+ debug_activate(timer, expires);
+
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [tip:timers/urgent] timerqueue: Document return values of timerqueue_add/del()
2017-12-22 14:51 ` [patch 4/4] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
@ 2017-12-29 22:47 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-12-29 22:47 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, fweisbec, mingo, bigeasy, anna-maria, peterz, linux-kernel,
paulmck, hpa
Commit-ID: 9f4533cd7334235cd4c9b9fb1b0b8791e2ba01a7
Gitweb: https://git.kernel.org/tip/9f4533cd7334235cd4c9b9fb1b0b8791e2ba01a7
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 22 Dec 2017 15:51:15 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 29 Dec 2017 23:13:10 +0100
timerqueue: Document return values of timerqueue_add/del()
The return values of timerqueue_add/del() are not documented in the kernel doc
comment. Add proper documentation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: rt@linutronix.de
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/20171222145337.872681338@linutronix.de
---
lib/timerqueue.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/lib/timerqueue.c b/lib/timerqueue.c
index 4a720ed..0d54bcb 100644
--- a/lib/timerqueue.c
+++ b/lib/timerqueue.c
@@ -33,8 +33,9 @@
* @head: head of timerqueue
* @node: timer node to be added
*
- * Adds the timer node to the timerqueue, sorted by the
- * node's expires value.
+ * Adds the timer node to the timerqueue, sorted by the node's expires
+ * value. Returns true if the newly added timer is the first expiring timer in
+ * the queue.
*/
bool timerqueue_add(struct timerqueue_head *head, struct timerqueue_node *node)
{
@@ -70,7 +71,8 @@ EXPORT_SYMBOL_GPL(timerqueue_add);
* @head: head of timerqueue
* @node: timer node to be removed
*
- * Removes the timer node from the timerqueue.
+ * Removes the timer node from the timerqueue. Returns true if the queue is
+ * not empty after the remove.
*/
bool timerqueue_del(struct timerqueue_head *head, struct timerqueue_node *node)
{
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [patch 0/4] timer/nohz: Fix timer/nohz woes
2017-12-24 1:29 ` Paul E. McKenney
@ 2018-01-05 19:41 ` Paul E. McKenney
2018-01-06 21:18 ` Thomas Gleixner
0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2018-01-05 19:41 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Peter Zijlstra,
Frederic Weisbecker, Ingo Molnar
On Sat, Dec 23, 2017 at 05:29:24PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 23, 2017 at 05:21:20PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 22, 2017 at 09:09:07AM -0800, Paul E. McKenney wrote:
> > > On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> > > > Paul was observing weird stalls which are hard to reproduce and decode. We
> > > > were finally able to reproduce and decode the wreckage on RT.
> > > >
> > > > The following series addresses the issues and hopefully nails the root
> > > > cause completely.
> > > >
> > > > Please review carefully and expose it to the dreaded rcu torture tests
> > > > which seem to be the only way to trigger it.
> > >
> > > Best Christmas present ever, thank you!!!
> > >
> > > Just started up three concurrent 10-hour runs of the infamous rcutorture
> > > TREE01 scenario, and will let you know how it goes!
> >
> > Well, I messed up the first test and then reran it. Which had the benefit
> > of giving me a baseline. The rerun (with all four patches) produced
> > failures, so I ran it again with an additional patch of mine. I score
> > these tests by recording the time at first failure, or, if there is no
> > failure, the duration of the test. Summing the values gives the score.
> > And here are the scores, where 30 is a perfect score:
>
> Sigh. They were five-hour tests, not ten-hour tests.
>
> 1. Baseline: 3.0+2.5+5=10.5
>
> 2. Four patches from Anna-Marie and Thomas: 5+2.7+1.7=9.4
>
> 3. Ditto plus the patch below: 5+4.3+5=14.3
>
> Oh, and the reason for my suspecting that #2 is actually an improvement
> over #1 is that my patch by itself produced a very small improvement in
> reliability. This leads to the hypothesis that #2 really is helping out
> in some way or another.
But after more than 1,000 hours of test runs, split roughly evenly
among the above three scenarios, there is no statistically significant
difference in error rate among them. This means that there is some
other bug lurking somewhere, and having the same appearance (lost timer).
Were you guys ever able to reproduce this via rcutorture?
More details below.
Thanx, Paul
------------------------------------------------------------------------
I ran sets of three-hour runs. I took the time of first error (if
any), and excluded the rest of that particular three-hour run from
consideration. This means that if a given run failed at two hours,
we add one to the "errors" column and two to the "duration" column.
Runs without errors contributed three hours "duration" column, but of
course nothing to the "errors" column. An overall errors/hour rate
is then computed for each scenario:
1. Baseline: (378 hours total runtime)
74 errors in 218.8 hours error-free runtime, 0.338 errors/hour.
2. Four patches from Anna-Marie and Thomas: (315 hours total runtime)
65 errors in 195.2 hours error-free runtime, 0.333 errors/hour.
3. Ditto plus the patch below: (315 hours total runtime)
66 errors in 179.4 hours error-free runtime, 0.368 errors/hour.
Applying Poisson statistics shows that we need to drop below 0.270
errors/hour to assert that a fix had a 95% chance of having reduced the
error rate, and none of the runs achieve this level of improvement.
In fact, even the least probable scenario had more than a 25% probability
of happening by chance.
These calculations were carried out using maxima:
load(distrib);
bfloat(cdf_poisson(59,218.8*0.338));
(%o11) 4.267467688401431b-2
This is 4.2% probability of the result having happened due to random
chance, just a bit better than 95% confidence.
bfloat(cdf_poisson(60,218.8*0.338));
(%o8) 5.525461180734715b-2
This is 5.5% probability of the result having happened due to random
chance, just a bit worse than 95% confidence. So, dividing 59 by the
218.8 hours of error-free runs on baseline gives the aforementioned
0.270 errors/hour.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 0/4] timer/nohz: Fix timer/nohz woes
2018-01-05 19:41 ` Paul E. McKenney
@ 2018-01-06 21:18 ` Thomas Gleixner
2018-01-06 23:21 ` Paul E. McKenney
0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2018-01-06 21:18 UTC (permalink / raw)
To: Paul E. McKenney
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Peter Zijlstra,
Frederic Weisbecker, Ingo Molnar
On Fri, 5 Jan 2018, Paul E. McKenney wrote:
> But after more than 1,000 hours of test runs, split roughly evenly
> among the above three scenarios, there is no statistically significant
> difference in error rate among them. This means that there is some
> other bug lurking somewhere, and having the same appearance (lost timer).
> Were you guys ever able to reproduce this via rcutorture?
No.
We'll setup more testing on Monday. Which of the tests fails or at least
exposes the highest failure rate?
Thanks,
tglx
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [patch 0/4] timer/nohz: Fix timer/nohz woes
2018-01-06 21:18 ` Thomas Gleixner
@ 2018-01-06 23:21 ` Paul E. McKenney
0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2018-01-06 23:21 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Anna-Maria Gleixner, Sebastian Siewior, Peter Zijlstra,
Frederic Weisbecker, Ingo Molnar
On Sat, Jan 06, 2018 at 10:18:40PM +0100, Thomas Gleixner wrote:
> On Fri, 5 Jan 2018, Paul E. McKenney wrote:
> > But after more than 1,000 hours of test runs, split roughly evenly
> > among the above three scenarios, there is no statistically significant
> > difference in error rate among them. This means that there is some
> > other bug lurking somewhere, and having the same appearance (lost timer).
> > Were you guys ever able to reproduce this via rcutorture?
>
> No.
I was afraid of that... ;-)
> We'll setup more testing on Monday. Which of the tests fails or at least
> exposes the highest failure rate?
TREE01, as in:
bash tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 48 --duration 180 --configs "42*TREE01"
This results in 42 runs of TREE01 consuming about 21 hours of wall-clock
time. (Each run of TREE01 uses 8 CPUs.)
Thanx, Paul
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2018-01-06 23:21 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-22 14:51 [patch 0/4] timer/nohz: Fix timer/nohz woes Thomas Gleixner
2017-12-22 14:51 ` [patch 1/4] timer: Use deferrable base independent of base::nohz_active Thomas Gleixner
2017-12-25 16:24 ` Frederic Weisbecker
2017-12-29 22:45 ` [tip:timers/urgent] timers: " tip-bot for Anna-Maria Gleixner
2017-12-22 14:51 ` [patch 2/4] nohz: Prevent erroneous tick stop invocations Thomas Gleixner
2017-12-26 15:17 ` Frederic Weisbecker
2017-12-27 18:22 ` Thomas Gleixner
2017-12-27 18:24 ` Thomas Gleixner
2017-12-27 20:58 ` Thomas Gleixner
2017-12-29 16:12 ` Frederic Weisbecker
2017-12-29 22:46 ` [tip:timers/urgent] nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() tip-bot for Thomas Gleixner
2017-12-22 14:51 ` [patch 3/4] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
2017-12-29 22:47 ` [tip:timers/urgent] timers: " tip-bot for Thomas Gleixner
2017-12-22 14:51 ` [patch 4/4] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
2017-12-29 22:47 ` [tip:timers/urgent] " tip-bot for Thomas Gleixner
2017-12-22 17:09 ` [patch 0/4] timer/nohz: Fix timer/nohz woes Paul E. McKenney
2017-12-24 1:21 ` Paul E. McKenney
2017-12-24 1:29 ` Paul E. McKenney
2018-01-05 19:41 ` Paul E. McKenney
2018-01-06 21:18 ` Thomas Gleixner
2018-01-06 23:21 ` Paul E. McKenney
2017-12-27 20:55 ` Thomas Gleixner
2017-12-29 22:46 ` [tip:timers/urgent] timers: Reinitialize per cpu bases on hotplug tip-bot for Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).