All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: RCU stall
Date: Tue, 22 Mar 2016 19:29:02 -0700	[thread overview]
Message-ID: <20160323022902.GA28227@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160323015932.GX4287@linux.vnet.ibm.com>

On Tue, Mar 22, 2016 at 06:59:32PM -0700, Paul E. McKenney wrote:
> On Tue, Mar 22, 2016 at 04:53:26PM -0700, Bart Van Assche wrote:
> > On 03/22/2016 01:45 PM, Paul E. McKenney wrote:
> > >You are getting a soft lockup as well as an RCU CPU stall warning, so
> > >it looks like something is taking a very long time in blk_done_softirq().
> > >
> > >You have multiple occurrences at different times, so it looks to be
> > >a long time as opposed to an infinite time.  Are you perhaps doing
> > >something that would make a huge amount of work for blk_done_softirq()?
> > >
> > >See Documentation/RCU/stallwarn.txt in the kernel source tree for more
> > >info on how to debug this sort of thing.
> > 
> > Hello Paul,
> > 
> > None of the drivers involved in the test I ran contain RCU code that
> > has been changed recently. The block and SCSI subsystems processes
> > I/O completions in softirq context but until last week I hadn't seen
> > any RCU lockup complaints when I ran an SRP test against a kernel
> > with lockdep and several other kernel debugging options enabled.
> > This is why I sent an e-mail to you. I have read
> > Documentation/RCU/stallwarn.txt after I received your reply but this
> > didn't provide me any clue about where to look for the root cause.
> > Any further help would be appreciated.
> 
> My suggestion would be to check the block/SCSI softirq handler for
> event traces.  If there are some, enable them and see what the loop
> is doing.  Documentation/trace/ftrace.txt describes how to enable
> existing event tracing.
> 
> If there is no event tracing, consider adding some in your local
> view.  Failing that, there is always printk().  ;-)
> 
> Or perhaps you have some sort of debug setup.
> 
> Either way, the next step is to work out why that CPU is spending
> so much time in that loop.

And the dmesg leading up to the stall might have some clues.

Note that a soft lockup triggered at 10509.568010, well before the RCU
CPU stall warning..  And you have a second soft lockup at 10537.567212,
with the same funtion scsi_request_fn() at the top of the stack in both
stack traces.  That function has a nice big "for (;;)" loop that does
not appear to have any iteration-limiting mechanism.  (Though perhaps
there is such a mechanism implemented in one of the called functions,
but that would be something for you to look into.)  As you saw when
reading stallwarn.txt, having a too-long loop in the kernel is a good
way to get RCU CPU stall warnings.

Also, before the soft lockups, you have a bunch of FAIL indications
and other nasty-looking error messages.  Might you have some sort of
configuration or hardware problem?

							Thanx, Paul

  reply	other threads:[~2016-03-23  2:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <56F1A8F2.9000905@sandisk.com>
2016-03-22 20:45 ` RCU stall Paul E. McKenney
2016-03-22 23:53   ` Bart Van Assche
2016-03-23  1:59     ` Paul E. McKenney
2016-03-23  2:29       ` Paul E. McKenney [this message]
2016-03-24 20:24         ` Bart Van Assche
2016-03-24 20:46           ` Paul E. McKenney
2011-04-20  2:02 rcu stall Dave Jones
2011-04-20  8:36 ` Ingo Molnar
2011-04-20 18:30   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160323022902.GA28227@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.