* [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
@ 2023-08-13 3:15 Joel Fernandes (Google)
2023-08-13 16:34 ` Greg KH
0 siblings, 1 reply; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-13 3:15 UTC (permalink / raw)
To: stable
Cc: Guenter Roeck, Steven Rostedt, Joel Fernandes, Paul McKenney,
Frederic Weisbecker, Zhouyi Zhou, Davidlohr Bueso
From: Joel Fernandes <joel@joelfernandes.org>
During shutdown of rcutorture, the shutdown thread in
rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
readers and fakewriters to breakout of their main while loop and start
shutting down.
Once out of their main loop, they then call torture_kthread_stopping()
which in turn waits for kthread_stop() to be called, however
rcu_torture_cleanup() has not even called kthread_stop() on those
threads yet, it does that a bit later. However, before it gets a chance
to do so, torture_kthread_stopping() calls
schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
this makes the timer softirq constantly execute timer callbacks, while
never returning back to the softirq exit path and is essentially "locked
up" because of that. If the softirq preempts the shutdown thread,
kthread_stop() may never be called.
This commit improves the situation dramatically, by increasing timeout
passed to schedule_timeout_interruptible() 1/20th of a second. This
causes the timer softirq to not lock up a CPU and everything works fine.
Testing has shown 100 runs of TREE07 passing reliably, which was not the
case before because of RCU stalls.
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
kernel/torture.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/torture.c b/kernel/torture.c
index 1061492f14bd..477d9b601438 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -788,7 +788,7 @@ void torture_kthread_stopping(char *title)
VERBOSE_TOROUT_STRING(buf);
while (!kthread_should_stop()) {
torture_shutdown_absorb(title);
- schedule_timeout_uninterruptible(1);
+ schedule_timeout_uninterruptible(HZ/20);
}
}
EXPORT_SYMBOL_GPL(torture_kthread_stopping);
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
2023-08-13 3:15 Joel Fernandes (Google)
@ 2023-08-13 16:34 ` Greg KH
2023-08-13 20:24 ` Joel Fernandes
0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2023-08-13 16:34 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: stable, Guenter Roeck, Steven Rostedt, Paul McKenney,
Frederic Weisbecker, Zhouyi Zhou, Davidlohr Bueso
On Sun, Aug 13, 2023 at 03:15:34AM +0000, Joel Fernandes (Google) wrote:
> From: Joel Fernandes <joel@joelfernandes.org>
>
> During shutdown of rcutorture, the shutdown thread in
> rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
> to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
> readers and fakewriters to breakout of their main while loop and start
> shutting down.
>
> Once out of their main loop, they then call torture_kthread_stopping()
> which in turn waits for kthread_stop() to be called, however
> rcu_torture_cleanup() has not even called kthread_stop() on those
> threads yet, it does that a bit later. However, before it gets a chance
> to do so, torture_kthread_stopping() calls
> schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
> this makes the timer softirq constantly execute timer callbacks, while
> never returning back to the softirq exit path and is essentially "locked
> up" because of that. If the softirq preempts the shutdown thread,
> kthread_stop() may never be called.
>
> This commit improves the situation dramatically, by increasing timeout
> passed to schedule_timeout_interruptible() 1/20th of a second. This
> causes the timer softirq to not lock up a CPU and everything works fine.
> Testing has shown 100 runs of TREE07 passing reliably, which was not the
> case before because of RCU stalls.
>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
> Cc: <stable@vger.kernel.org> # 6.0.x
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> ---
> kernel/torture.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Any hint as to what the git commit id in Linus's tree for this, and the
other patches you just sent, are? I kind of need that to keep track of
things...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
2023-08-13 16:34 ` Greg KH
@ 2023-08-13 20:24 ` Joel Fernandes
2023-08-13 20:39 ` Greg KH
0 siblings, 1 reply; 8+ messages in thread
From: Joel Fernandes @ 2023-08-13 20:24 UTC (permalink / raw)
To: Greg KH
Cc: stable, Guenter Roeck, Steven Rostedt, Paul McKenney,
Frederic Weisbecker, Zhouyi Zhou, Davidlohr Bueso
On Sun, Aug 13, 2023 at 06:34:27PM +0200, Greg KH wrote:
> On Sun, Aug 13, 2023 at 03:15:34AM +0000, Joel Fernandes (Google) wrote:
> > From: Joel Fernandes <joel@joelfernandes.org>
> >
> > During shutdown of rcutorture, the shutdown thread in
> > rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
> > to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
> > readers and fakewriters to breakout of their main while loop and start
> > shutting down.
> >
> > Once out of their main loop, they then call torture_kthread_stopping()
> > which in turn waits for kthread_stop() to be called, however
> > rcu_torture_cleanup() has not even called kthread_stop() on those
> > threads yet, it does that a bit later. However, before it gets a chance
> > to do so, torture_kthread_stopping() calls
> > schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
> > this makes the timer softirq constantly execute timer callbacks, while
> > never returning back to the softirq exit path and is essentially "locked
> > up" because of that. If the softirq preempts the shutdown thread,
> > kthread_stop() may never be called.
> >
> > This commit improves the situation dramatically, by increasing timeout
> > passed to schedule_timeout_interruptible() 1/20th of a second. This
> > causes the timer softirq to not lock up a CPU and everything works fine.
> > Testing has shown 100 runs of TREE07 passing reliably, which was not the
> > case before because of RCU stalls.
> >
> > Cc: Paul McKenney <paulmck@kernel.org>
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > Cc: <stable@vger.kernel.org> # 6.0.x
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> > Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > ---
> > kernel/torture.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Any hint as to what the git commit id in Linus's tree for this, and the
> other patches you just sent, are? I kind of need that to keep track of
> things...
Apologies, I added the SHA to the 5.15 ones but not 5.10. Here they are for 5.10:
1/3
d52d3a2bf408ff86f3a79560b5cce80efb340239
("torture: Fix hang during kthread shutdown phase")
2/3
a1ff03cd6fb9c501fff63a4a2bface9adcfa81cd
("tick: Detect and fix jiffies update stall")
3/3
62c1256d544747b38e77ca9b5bfe3a26f9592576
("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
In case you wish to pull them in via git, I have uploaded them to:
Git: https://github.com/joelagnel/linux-kernel.git
Branch: rcu/linux-5.10.y.aug13.greg
thanks,
- Joel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
2023-08-13 20:24 ` Joel Fernandes
@ 2023-08-13 20:39 ` Greg KH
0 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2023-08-13 20:39 UTC (permalink / raw)
To: Joel Fernandes
Cc: stable, Guenter Roeck, Steven Rostedt, Paul McKenney,
Frederic Weisbecker, Zhouyi Zhou, Davidlohr Bueso
On Sun, Aug 13, 2023 at 08:24:39PM +0000, Joel Fernandes wrote:
> On Sun, Aug 13, 2023 at 06:34:27PM +0200, Greg KH wrote:
> > On Sun, Aug 13, 2023 at 03:15:34AM +0000, Joel Fernandes (Google) wrote:
> > > From: Joel Fernandes <joel@joelfernandes.org>
> > >
> > > During shutdown of rcutorture, the shutdown thread in
> > > rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
> > > to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
> > > readers and fakewriters to breakout of their main while loop and start
> > > shutting down.
> > >
> > > Once out of their main loop, they then call torture_kthread_stopping()
> > > which in turn waits for kthread_stop() to be called, however
> > > rcu_torture_cleanup() has not even called kthread_stop() on those
> > > threads yet, it does that a bit later. However, before it gets a chance
> > > to do so, torture_kthread_stopping() calls
> > > schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
> > > this makes the timer softirq constantly execute timer callbacks, while
> > > never returning back to the softirq exit path and is essentially "locked
> > > up" because of that. If the softirq preempts the shutdown thread,
> > > kthread_stop() may never be called.
> > >
> > > This commit improves the situation dramatically, by increasing timeout
> > > passed to schedule_timeout_interruptible() 1/20th of a second. This
> > > causes the timer softirq to not lock up a CPU and everything works fine.
> > > Testing has shown 100 runs of TREE07 passing reliably, which was not the
> > > case before because of RCU stalls.
> > >
> > > Cc: Paul McKenney <paulmck@kernel.org>
> > > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > > Cc: <stable@vger.kernel.org> # 6.0.x
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> > > Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > > ---
> > > kernel/torture.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Any hint as to what the git commit id in Linus's tree for this, and the
> > other patches you just sent, are? I kind of need that to keep track of
> > things...
>
> Apologies, I added the SHA to the 5.15 ones but not 5.10. Here they are for 5.10:
>
> 1/3
> d52d3a2bf408ff86f3a79560b5cce80efb340239
> ("torture: Fix hang during kthread shutdown phase")
>
> 2/3
> a1ff03cd6fb9c501fff63a4a2bface9adcfa81cd
> ("tick: Detect and fix jiffies update stall")
>
> 3/3
> 62c1256d544747b38e77ca9b5bfe3a26f9592576
> ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
>
> In case you wish to pull them in via git, I have uploaded them to:
> Git: https://github.com/joelagnel/linux-kernel.git
> Branch: rcu/linux-5.10.y.aug13.greg
Can you resend these with the git sha1 in the message like you did for
5.15.y (but the correct one) so I can take them that way? My scripts
are set up for email, not github pulls :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
@ 2023-08-14 3:39 Joel Fernandes (Google)
2023-08-14 3:39 ` [PATCH 5.10 2/3] tick: Detect and fix jiffies update stall Joel Fernandes (Google)
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-14 3:39 UTC (permalink / raw)
To: stable, Davidlohr Bueso, Paul E. McKenney, Josh Triplett; +Cc: gregkh
From: Joel Fernandes <joel@joelfernandes.org>
[ Upstream commit d52d3a2bf408ff86f3a79560b5cce80efb340239 ]
During shutdown of rcutorture, the shutdown thread in
rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
readers and fakewriters to breakout of their main while loop and start
shutting down.
Once out of their main loop, they then call torture_kthread_stopping()
which in turn waits for kthread_stop() to be called, however
rcu_torture_cleanup() has not even called kthread_stop() on those
threads yet, it does that a bit later. However, before it gets a chance
to do so, torture_kthread_stopping() calls
schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
this makes the timer softirq constantly execute timer callbacks, while
never returning back to the softirq exit path and is essentially "locked
up" because of that. If the softirq preempts the shutdown thread,
kthread_stop() may never be called.
This commit improves the situation dramatically, by increasing timeout
passed to schedule_timeout_interruptible() 1/20th of a second. This
causes the timer softirq to not lock up a CPU and everything works fine.
Testing has shown 100 runs of TREE07 passing reliably, which was not the
case before because of RCU stalls.
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
kernel/torture.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/torture.c b/kernel/torture.c
index 1061492f14bd..477d9b601438 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -788,7 +788,7 @@ void torture_kthread_stopping(char *title)
VERBOSE_TOROUT_STRING(buf);
while (!kthread_should_stop()) {
torture_shutdown_absorb(title);
- schedule_timeout_uninterruptible(1);
+ schedule_timeout_uninterruptible(HZ/20);
}
}
EXPORT_SYMBOL_GPL(torture_kthread_stopping);
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5.10 2/3] tick: Detect and fix jiffies update stall
2023-08-14 3:39 [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Joel Fernandes (Google)
@ 2023-08-14 3:39 ` Joel Fernandes (Google)
2023-08-14 3:39 ` [PATCH 5.10 3/3] timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped Joel Fernandes (Google)
2023-08-27 8:04 ` [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Greg KH
2 siblings, 0 replies; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-14 3:39 UTC (permalink / raw)
To: stable, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar; +Cc: gregkh
From: Frederic Weisbecker <frederic@kernel.org>
[ Upstream commit a1ff03cd6fb9c501fff63a4a2bface9adcfa81cd ]
On some rare cases, the timekeeper CPU may be delaying its jiffies
update duty for a while. Known causes include:
* The timekeeper is waiting on stop_machine in a MULTI_STOP_DISABLE_IRQ
or MULTI_STOP_RUN state. Disabled interrupts prevent from timekeeping
updates while waiting for the target CPU to complete its
stop_machine() callback.
* The timekeeper vcpu has VMEXIT'ed for a long while due to some overload
on the host.
Detect and fix these situations with emergency timekeeping catchups.
Original-patch-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/time/tick-sched.c | 17 +++++++++++++++++
kernel/time/tick-sched.h | 4 ++++
2 files changed, 21 insertions(+)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d07de3ff42ac..bb51619c9b63 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -148,6 +148,8 @@ static ktime_t tick_init_jiffy_update(void)
return period;
}
+#define MAX_STALLED_JIFFIES 5
+
static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
{
int cpu = smp_processor_id();
@@ -175,6 +177,21 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
if (tick_do_timer_cpu == cpu)
tick_do_update_jiffies64(now);
+ /*
+ * If jiffies update stalled for too long (timekeeper in stop_machine()
+ * or VMEXIT'ed for several msecs), force an update.
+ */
+ if (ts->last_tick_jiffies != jiffies) {
+ ts->stalled_jiffies = 0;
+ ts->last_tick_jiffies = READ_ONCE(jiffies);
+ } else {
+ if (++ts->stalled_jiffies == MAX_STALLED_JIFFIES) {
+ tick_do_update_jiffies64(now);
+ ts->stalled_jiffies = 0;
+ ts->last_tick_jiffies = READ_ONCE(jiffies);
+ }
+ }
+
if (ts->inidle)
ts->got_idle_tick = 1;
}
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 4fb06527cf64..1e7ec5c968a5 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -49,6 +49,8 @@ enum tick_nohz_mode {
* @timer_expires_base: Base time clock monotonic for @timer_expires
* @next_timer: Expiry time of next expiring timer for debugging purpose only
* @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick
+ * @last_tick_jiffies: Value of jiffies seen on last tick
+ * @stalled_jiffies: Number of stalled jiffies detected across ticks
*/
struct tick_sched {
struct hrtimer sched_timer;
@@ -77,6 +79,8 @@ struct tick_sched {
u64 next_timer;
ktime_t idle_expires;
atomic_t tick_dep_mask;
+ unsigned long last_tick_jiffies;
+ unsigned int stalled_jiffies;
};
extern struct tick_sched *tick_get_tick_sched(int cpu);
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5.10 3/3] timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
2023-08-14 3:39 [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Joel Fernandes (Google)
2023-08-14 3:39 ` [PATCH 5.10 2/3] tick: Detect and fix jiffies update stall Joel Fernandes (Google)
@ 2023-08-14 3:39 ` Joel Fernandes (Google)
2023-08-27 8:04 ` [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Greg KH
2 siblings, 0 replies; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-14 3:39 UTC (permalink / raw)
To: stable, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar; +Cc: gregkh
From: Nicholas Piggin <npiggin@gmail.com>
[ Upstream commit 62c1256d544747b38e77ca9b5bfe3a26f9592576 ]
When tick_nohz_stop_tick() stops the tick and high resolution timers are
disabled, then the clock event device is not put into ONESHOT_STOPPED
mode. This can lead to spurious timer interrupts with some clock event
device drivers that don't shut down entirely after firing.
Eliminate these by putting the device into ONESHOT_STOPPED mode at points
where it is not being reprogrammed. When there are no timers active, then
tick_program_event() with KTIME_MAX can be used to stop the device. When
there is a timer active, the device can be stopped at the next tick (any
new timer added by timers will reprogram the tick).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220422141446.915024-1-npiggin@gmail.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/time/tick-sched.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bb51619c9b63..fc79b04b5947 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -884,6 +884,8 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(&ts->sched_timer);
+ else
+ tick_program_event(KTIME_MAX, 1);
return;
}
@@ -1274,9 +1276,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
tick_sched_do_timer(ts, now);
tick_sched_handle(ts, regs);
- /* No need to reprogram if we are running tickless */
- if (unlikely(ts->tick_stopped))
+ if (unlikely(ts->tick_stopped)) {
+ /*
+ * The clockevent device is not reprogrammed, so change the
+ * clock event device to ONESHOT_STOPPED to avoid spurious
+ * interrupts on devices which might not be truly one shot.
+ */
+ tick_program_event(KTIME_MAX, 1);
return;
+ }
hrtimer_forward(&ts->sched_timer, now, TICK_NSEC);
tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
--
2.41.0.640.ga95def55d0-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
2023-08-14 3:39 [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Joel Fernandes (Google)
2023-08-14 3:39 ` [PATCH 5.10 2/3] tick: Detect and fix jiffies update stall Joel Fernandes (Google)
2023-08-14 3:39 ` [PATCH 5.10 3/3] timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped Joel Fernandes (Google)
@ 2023-08-27 8:04 ` Greg KH
2 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2023-08-27 8:04 UTC (permalink / raw)
To: Joel Fernandes (Google)
Cc: stable, Davidlohr Bueso, Paul E. McKenney, Josh Triplett
On Mon, Aug 14, 2023 at 03:39:31AM +0000, Joel Fernandes (Google) wrote:
> From: Joel Fernandes <joel@joelfernandes.org>
>
> [ Upstream commit d52d3a2bf408ff86f3a79560b5cce80efb340239 ]
> During shutdown of rcutorture, the shutdown thread in
> rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
> to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
> readers and fakewriters to breakout of their main while loop and start
> shutting down.
>
> Once out of their main loop, they then call torture_kthread_stopping()
> which in turn waits for kthread_stop() to be called, however
> rcu_torture_cleanup() has not even called kthread_stop() on those
> threads yet, it does that a bit later. However, before it gets a chance
> to do so, torture_kthread_stopping() calls
> schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
> this makes the timer softirq constantly execute timer callbacks, while
> never returning back to the softirq exit path and is essentially "locked
> up" because of that. If the softirq preempts the shutdown thread,
> kthread_stop() may never be called.
>
> This commit improves the situation dramatically, by increasing timeout
> passed to schedule_timeout_interruptible() 1/20th of a second. This
> causes the timer softirq to not lock up a CPU and everything works fine.
> Testing has shown 100 runs of TREE07 passing reliably, which was not the
> case before because of RCU stalls.
>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
> Cc: <stable@vger.kernel.org> # 6.0.x
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
> ---
> kernel/torture.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/torture.c b/kernel/torture.c
> index 1061492f14bd..477d9b601438 100644
> --- a/kernel/torture.c
> +++ b/kernel/torture.c
> @@ -788,7 +788,7 @@ void torture_kthread_stopping(char *title)
> VERBOSE_TOROUT_STRING(buf);
> while (!kthread_should_stop()) {
> torture_shutdown_absorb(title);
> - schedule_timeout_uninterruptible(1);
> + schedule_timeout_uninterruptible(HZ/20);
> }
> }
> EXPORT_SYMBOL_GPL(torture_kthread_stopping);
> --
> 2.41.0.640.ga95def55d0-goog
>
All now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-27 8:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-14 3:39 [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Joel Fernandes (Google)
2023-08-14 3:39 ` [PATCH 5.10 2/3] tick: Detect and fix jiffies update stall Joel Fernandes (Google)
2023-08-14 3:39 ` [PATCH 5.10 3/3] timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped Joel Fernandes (Google)
2023-08-27 8:04 ` [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Greg KH
-- strict thread matches above, loose matches on Subject: below --
2023-08-13 3:15 Joel Fernandes (Google)
2023-08-13 16:34 ` Greg KH
2023-08-13 20:24 ` Joel Fernandes
2023-08-13 20:39 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox