From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
linux-kernel@vger.kernel.org,
Frederic Weisbecker <fweisbec@gmail.com>,
Zhouyi Zhou <zhouzhouyi@gmail.com>,
stable@vger.kernel.org, Josh Triplett <josh@joshtriplett.org>,
rcu@vger.kernel.org
Subject: Re: [PATCH] torture: Fix hang during kthread shutdown phase
Date: Tue, 3 Jan 2023 18:24:23 +0000 [thread overview]
Message-ID: <Y7Ry1yTT/mltqSUI@google.com> (raw)
In-Reply-To: <20230103180404.GA4028633@paulmck-ThinkPad-P17-Gen-1>
On Tue, Jan 03, 2023 at 10:04:04AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 02, 2023 at 08:43:10AM -0800, Davidlohr Bueso wrote:
> > On Sun, 01 Jan 2023, Joel Fernandes (Google) wrote:
> >
> > > During shutdown of rcutorture, the shutdown thread in
> > > rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
> > > to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
> > > readers and fakewriters to breakout of their main while loop and start
> > > shutting down.
> > >
> > > Once out of their main loop, they then call torture_kthread_stopping()
> > > which in turn waits for kthread_stop() to be called, however
> > > rcu_torture_cleanup() has not even called kthread_stop() on those
> > > threads yet, it does that a bit later. However, before it gets a chance
> > > to do so, torture_kthread_stopping() calls
> > > schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
> > > this makes the timer softirq constantly execute timer callbacks, while
> > > never returning back to the softirq exit path and is essentially "locked
> > > up" because of that. If the softirq preempts the shutdown thread,
> > > kthread_stop() may never be called.
> > >
> > > This commit improves the situation dramatically, by increasing timeout
> > > passed to schedule_timeout_interruptible() 1/20th of a second. This
> > > causes the timer softirq to not lock up a CPU and everything works fine.
> > > Testing has shown 100 runs of TREE07 passing reliably, which was not the
> > > case before because of RCU stalls.
> > >
> > > Cc: Paul McKenney <paulmck@kernel.org>
> > > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>
> > > Cc: <stable@vger.kernel.org> # 6.0.x
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> > Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>
> Queued for further review and testing, thank you all!
>
> One thing still puzzles me. Assuming HZ=1000 and given 16 CPUs, each
> timer hander must have consumed many tens of microseconds in order
> to keep the system busy, which seems a bit longer than it should be.
> Or am I underestimating the number of tasks involved?
Here are the traces between successive calls to process_timeout() which is the timer callback handler:
[ 1320.444210] <idle>-0 0dNs.. 314229620us : __run_timers: Calling timerfn 5: process_timeout
[ 1320.444215] <idle>-0 0dNs.. 314229620us : sched_waking: comm=rcu_torture_fak pid=145 prio=139 target_cpu=008
[ 1320.463393] <idle>-0 7d.... 314229655us : sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_torture_wri next_pid=144 next_prio=120
[ 1320.478870] rcu_tort-144 7d.... 314229658us : sched_switch: prev_comm=rcu_torture_wri prev_pid=144 prev_prio=120 prev_state=D ==> next_comm=swapper/7 next_pid=0 next_prio=120
[ 1320.494324] <idle>-0 0dNs.. 314229738us : __run_timers: Calling timerfn 6: process_timeout
It appears the time delta in the above occurrence is 118 micro seconds
between 2 timer callbacks. It does appear to be doing a cross-CPU wake up.
Maybe that adds to the long time?
Here are the full logs with traces (in case it helps, search for "=D" for the
D-state sched_switch event before the "panic now" trace happens):
http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-6.0.y/26/artifact/tools/testing/selftests/rcutorture/res/2022.12.31-23.04.42/TREE07.2/console.log
thanks,
- Joel
next prev parent reply other threads:[~2023-01-03 18:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-01 6:15 [PATCH] torture: Fix hang during kthread shutdown phase Joel Fernandes (Google)
2023-01-01 6:20 ` Joel Fernandes
2023-01-01 8:48 ` Greg KH
2023-01-01 14:15 ` Joel Fernandes
2023-01-01 13:02 ` Zhouyi Zhou
2023-01-01 14:16 ` Joel Fernandes
2023-01-02 16:43 ` Davidlohr Bueso
2023-01-03 18:04 ` Paul E. McKenney
2023-01-03 18:24 ` Joel Fernandes [this message]
2023-01-03 18:59 ` Paul E. McKenney
2023-01-03 19:51 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7Ry1yTT/mltqSUI@google.com \
--to=joel@joelfernandes.org \
--cc=dave@stgolabs.net \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=zhouzhouyi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.