linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: rcu_prempt stalls / lockup
Date: Mon, 31 Mar 2014 17:48:01 -0700	[thread overview]
Message-ID: <20140401004801.GQ4284@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140331233552.GB30019@redhat.com>

On Mon, Mar 31, 2014 at 07:35:52PM -0400, Dave Jones wrote:
> On Mon, Mar 31, 2014 at 04:22:21PM -0700, Paul E. McKenney wrote:
>  > On Mon, Mar 31, 2014 at 07:02:41PM -0400, Dave Jones wrote:
>  > > You can tell the merge window is open, because I'm back to breaking RCU.
>  > > 
>  > > ... 
>  > > [ 3558.120739] INFO: Stall ended before state dump start
>  > > 
>  > > at that point, userspace stopped responding. cursor on console was blinking,
>  > > but I couldn't even switch tty's, or sysrq dump.

Hmmm...  I am having a very hard time imagining any of this merge
window's RCU changes preventing a sysrq dump.  On the other hand,
having a single grace period persist without anything blocking it
is pretty strange as well.

I would hope that the sysrq path does not allocate memory, but who knows?
After all, one possible reason for the eventual hang is memory exhaustion.
So one thing to try is to do sysrq earlier in the process.  (Yeah,
I know, tough to do if you have lots of scripted systems.)

>  > > rc8 was fine, so this is todays rcu changes.
>  > 
>  > New one on me!  Any chance of a .config file?
> 
> http://paste.fedoraproject.org/90449/30888213/raw/

Given that you have CONFIG_RCU_NOCB_CPU_ALL=y, all the grace-period
activity is being driven by the grace-period kthreads ("rcu_preempt"
in this case).  This leads me to wonder if your workload if preventing
RCU's grace-period kthreads from running.  These kthreads are SCHED_OTHER,
so could potentially be preempted for a long time.  But I would expect
a softlockup message in that case.

Alternatively, I suppose a wakeup could be getting lost.  The main change
related to that this merge window was ffa83fb565fb, which eliminated
idle wakeups from RCU in the CONFIG_RCU_NOCB_CPU_ALL=y case.

So, could you please try reverting ffa83fb565fb?

If that doesn't work, I will need to put together some diagnostic patches.
Starting with the one below.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0c47e300210a..c5a163378710 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -936,7 +936,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	       smp_processor_id(), (long)(jiffies - rsp->gp_start),
 	       rsp->gpnum, rsp->completed, totqlen);
 	if (ndetected == 0)
-		pr_err("INFO: Stall ended before state dump start\n");
+		pr_err("INFO: Stall ended before state dump start, gp_kthread state: %#lx\n", rsp->gp_kthread->state);
 	else if (!trigger_all_cpu_backtrace())
 		rcu_dump_cpu_stacks(rsp);
 


  reply	other threads:[~2014-04-01  0:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-31 23:02 rcu_prempt stalls / lockup Dave Jones
2014-03-31 23:22 ` Paul E. McKenney
2014-03-31 23:35   ` Dave Jones
2014-04-01  0:48     ` Paul E. McKenney [this message]
2014-04-01 15:08       ` Dave Jones
2014-04-01 15:30         ` Paul E. McKenney
2014-04-01 17:22           ` Dave Jones
2014-04-01 17:55             ` Paul E. McKenney
2014-04-01 18:04               ` Dave Jones
2014-04-01 18:32                 ` Paul E. McKenney
2014-04-01 22:16                   ` Dave Jones
2014-04-01 23:18                     ` Paul E. McKenney
2014-04-01 23:31                       ` Dave Jones
2014-04-01 23:57                         ` Paul E. McKenney
2014-04-02  0:07                           ` Dave Jones
2014-04-02 16:20                   ` Paul E. McKenney
2014-04-02 16:23                     ` Dave Jones
2014-04-02 22:48                     ` Dave Jones
2014-04-03 20:01                       ` Dave Jones
2014-04-03 20:46                         ` Paul E. McKenney
2014-04-03 21:44                           ` Dave Jones
2014-04-03 22:37                             ` Dave Jones
2014-04-04 17:06                               ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140401004801.GQ4284@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).