public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
@ 2023-08-13  3:15 Joel Fernandes (Google)
  2023-08-13  3:15 ` [PATCH 5.10 2/3] tick: Detect and fix jiffies update stall Joel Fernandes (Google)
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-13  3:15 UTC (permalink / raw)
  To: stable
  Cc: Guenter Roeck, Steven Rostedt, Joel Fernandes, Paul McKenney,
	Frederic Weisbecker, Zhouyi Zhou, Davidlohr Bueso

From: Joel Fernandes <joel@joelfernandes.org>

During shutdown of rcutorture, the shutdown thread in
rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
readers and fakewriters to breakout of their main while loop and start
shutting down.

Once out of their main loop, they then call torture_kthread_stopping()
which in turn waits for kthread_stop() to be called, however
rcu_torture_cleanup() has not even called kthread_stop() on those
threads yet, it does that a bit later.  However, before it gets a chance
to do so, torture_kthread_stopping() calls
schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
this makes the timer softirq constantly execute timer callbacks, while
never returning back to the softirq exit path and is essentially "locked
up" because of that. If the softirq preempts the shutdown thread,
kthread_stop() may never be called.

This commit improves the situation dramatically, by increasing timeout
passed to schedule_timeout_interruptible() 1/20th of a second. This
causes the timer softirq to not lock up a CPU and everything works fine.
Testing has shown 100 runs of TREE07 passing reliably, which was not the
case before because of RCU stalls.

Cc: Paul McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
 kernel/torture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/torture.c b/kernel/torture.c
index 1061492f14bd..477d9b601438 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -788,7 +788,7 @@ void torture_kthread_stopping(char *title)
 	VERBOSE_TOROUT_STRING(buf);
 	while (!kthread_should_stop()) {
 		torture_shutdown_absorb(title);
-		schedule_timeout_uninterruptible(1);
+		schedule_timeout_uninterruptible(HZ/20);
 	}
 }
 EXPORT_SYMBOL_GPL(torture_kthread_stopping);
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread
* [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase
@ 2023-08-14  3:39 Joel Fernandes (Google)
  2023-08-27  8:04 ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-08-14  3:39 UTC (permalink / raw)
  To: stable, Davidlohr Bueso, Paul E. McKenney, Josh Triplett; +Cc: gregkh

From: Joel Fernandes <joel@joelfernandes.org>

[ Upstream commit d52d3a2bf408ff86f3a79560b5cce80efb340239 ]

During shutdown of rcutorture, the shutdown thread in
rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
readers and fakewriters to breakout of their main while loop and start
shutting down.

Once out of their main loop, they then call torture_kthread_stopping()
which in turn waits for kthread_stop() to be called, however
rcu_torture_cleanup() has not even called kthread_stop() on those
threads yet, it does that a bit later.  However, before it gets a chance
to do so, torture_kthread_stopping() calls
schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
this makes the timer softirq constantly execute timer callbacks, while
never returning back to the softirq exit path and is essentially "locked
up" because of that. If the softirq preempts the shutdown thread,
kthread_stop() may never be called.

This commit improves the situation dramatically, by increasing timeout
passed to schedule_timeout_interruptible() 1/20th of a second. This
causes the timer softirq to not lock up a CPU and everything works fine.
Testing has shown 100 runs of TREE07 passing reliably, which was not the
case before because of RCU stalls.

Cc: Paul McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: <stable@vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
 kernel/torture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/torture.c b/kernel/torture.c
index 1061492f14bd..477d9b601438 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -788,7 +788,7 @@ void torture_kthread_stopping(char *title)
 	VERBOSE_TOROUT_STRING(buf);
 	while (!kthread_should_stop()) {
 		torture_shutdown_absorb(title);
-		schedule_timeout_uninterruptible(1);
+		schedule_timeout_uninterruptible(HZ/20);
 	}
 }
 EXPORT_SYMBOL_GPL(torture_kthread_stopping);
-- 
2.41.0.640.ga95def55d0-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-27  8:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-13  3:15 [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Joel Fernandes (Google)
2023-08-13  3:15 ` [PATCH 5.10 2/3] tick: Detect and fix jiffies update stall Joel Fernandes (Google)
2023-08-13  3:15 ` [PATCH 5.10 3/3] timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped Joel Fernandes (Google)
2023-08-13 16:34 ` [PATCH 5.10 1/3] torture: Fix hang during kthread shutdown phase Greg KH
2023-08-13 20:24   ` Joel Fernandes
2023-08-13 20:39     ` Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2023-08-14  3:39 Joel Fernandes (Google)
2023-08-27  8:04 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox