All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: RCU stalls -> lockup.
Date: Thu, 9 Oct 2014 16:09:55 -0400	[thread overview]
Message-ID: <20141009200955.GA15094@redhat.com> (raw)
In-Reply-To: <20141005021556.GC8549@htj.dyndns.org>

On Sat, Oct 04, 2014 at 10:15:56PM -0400, Tejun Heo wrote:
 > On Thu, Oct 02, 2014 at 12:36:55PM -0700, Paul E. McKenney wrote:
 > > On Thu, Oct 02, 2014 at 01:55:15PM -0400, Dave Jones wrote:
 > > > I just hit this on my box running 3.17rc7
 > > > It was followed by a userspace lockup. (Could still ping, and sysrq
 > > > from the console, but even getty wasn't responding on the console).
 > > > 
 > > > I was trying to reproduce another bug faster, and had ramped up the
 > > > number of processes trinity to uses to 512. This didn't take long
 > > > to fall out..
 > > 
 > > This might be related to an exchange I had with Tejun (CCed), where
 > > the work queues were running all out, preventing any quiescent states
 > > from happening.  One fix under consideration is to add a quiescent state,
 > > similar to the one in softirq handling.
 > 
 > Dave, can you please test whether the following patch makes a
 > difference if the problem is reproducible?
 > 
 >   http://lkml.kernel.org/r/20141003153701.7c7da030@jlaw-desktop.mno.stratus.com

The only rcu related stuff I'm seeing now is the spew below, and unlike
the issue before the above patch, it does seem to recover at least..

	Dave

INFO: rcu_preempt detected stalls on CPUs/tasks:
	Tasks blocked on level-0 rcu_node (CPUs 0-3): P5890 P6169 P6164
	Tasks blocked on level-0 rcu_node (CPUs 0-3): P5890 P6169 P6164
	(detected by 0, t=6502 jiffies, g=51433, c=51432, q=0)
trinity-c393    R  running task    12808  5890   5008 0x00000000
 ffff880235b6bd08 0000000000000002 00000000001d8088 ffff88019c1ac680
 00000000001d4080 0000000000000002 ffff880235b6bfd8 00000000001d4080
 ffff880011e80000 ffff88019c1ac680 ffff880235b6bfd8 0000000000000000
Call Trace:
 [<ffffffff9e189150>] ? perf_event_comm_output+0x1e0/0x1e0
 [<ffffffff9e82b4e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff9e832a50>] retint_kernel+0x20/0x30
 [<ffffffff9e19011e>] ? perf_event_mmap+0x24e/0x370
 [<ffffffff9e18af84>] ? perf_event_aux+0xe4/0x380
 [<ffffffff9e18af9f>] ? perf_event_aux+0xff/0x380
 [<ffffffff9e18aea0>] ? perf_cpu_notify+0x50/0x50
 [<ffffffff9e19011e>] perf_event_mmap+0x24e/0x370
 [<ffffffff9e1d111d>] do_brk+0x24d/0x350
 [<ffffffff9e1d13ee>] SyS_brk+0x14e/0x170
 [<ffffffff9e831ee4>] tracesys+0xdd/0xe2
trinity-c375    R  running task    14696  6169   5872 0x00000000
 ffff8801cfd17e58 0000000000000002 ffff8801a7319780 ffff8801a7319780
 00000000001d4080 0000000000000000 ffff8801cfd17fd8 00000000001d4080
 ffff88008b49af00 ffff8801a7319780 ffff8801cfd17fd8 0000000000000000
Call Trace:
 [<ffffffff9e82b4e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff9e832a50>] retint_kernel+0x20/0x30
 [<ffffffff9e09c82d>] ? __task_pid_nr_ns+0x10d/0x1b0
 [<ffffffff9e0e9c74>] ? rcu_is_watching+0x34/0x60
 [<ffffffff9e09c7b3>] ? __task_pid_nr_ns+0x93/0x1b0
 [<ffffffff9e09c82d>] ? __task_pid_nr_ns+0x10d/0x1b0
 [<ffffffff9e09c725>] ? __task_pid_nr_ns+0x5/0x1b0
 [<ffffffff9e0ad17e>] schedule_tail+0x5e/0xb0
 [<ffffffff9e831bbf>] ret_from_fork+0xf/0xb0
trinity-c377    R  running task    14632  6164   5874 0x00000000
 ffff88000c397df8 0000000000000002 0000000000000002 ffff880066d59780
 00000000001d4080 0000000000000000 ffff88000c397fd8 00000000001d4080
 ffff88008b49af00 ffff880066d59780 ffff88000c397fd8 0000000000000000
Call Trace:
 [<ffffffff9e82b4e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff9e832a50>] retint_kernel+0x20/0x30
 [<ffffffff9e0d496d>] ? lock_acquire+0x9d/0x1b0
 [<ffffffff9e09c725>] ? __task_pid_nr_ns+0x5/0x1b0
 [<ffffffff9e09c763>] __task_pid_nr_ns+0x43/0x1b0
 [<ffffffff9e09c725>] ? __task_pid_nr_ns+0x5/0x1b0
 [<ffffffff9e0ad17e>] schedule_tail+0x5e/0xb0
 [<ffffffff9e831bbf>] ret_from_fork+0xf/0xb0
trinity-c393    R  running task    12808  5890   5008 0x00000000
 ffff880235b6bd08 0000000000000002 00000000001d8088 ffff88019c1ac680
 00000000001d4080 0000000000000002 ffff880235b6bfd8 00000000001d4080
 ffff880011e80000 ffff88019c1ac680 ffff880235b6bfd8 0000000000000000
Call Trace:
 [<ffffffff9e189150>] ? perf_event_comm_output+0x1e0/0x1e0
 [<ffffffff9e82b4e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff9e832a50>] retint_kernel+0x20/0x30
 [<ffffffff9e19011e>] ? perf_event_mmap+0x24e/0x370
 [<ffffffff9e18af84>] ? perf_event_aux+0xe4/0x380
 [<ffffffff9e18af9f>] ? perf_event_aux+0xff/0x380
 [<ffffffff9e18aea0>] ? perf_cpu_notify+0x50/0x50
 [<ffffffff9e19011e>] perf_event_mmap+0x24e/0x370
 [<ffffffff9e1d111d>] do_brk+0x24d/0x350
 [<ffffffff9e1d13ee>] SyS_brk+0x14e/0x170
 [<ffffffff9e831ee4>] tracesys+0xdd/0xe2
trinity-c375    R  running task    14696  6169   5872 0x00000000
 ffff8801cfd17e58 0000000000000002 ffff8801a7319780 ffff8801a7319780
 00000000001d4080 0000000000000000 ffff8801cfd17fd8 00000000001d4080
 ffff88008b49af00 ffff8801a7319780 ffff8801cfd17fd8 0000000000000000
Call Trace:
 [<ffffffff9e82b4e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff9e832a50>] retint_kernel+0x20/0x30
 [<ffffffff9e09c82d>] ? __task_pid_nr_ns+0x10d/0x1b0
 [<ffffffff9e0e9c74>] ? rcu_is_watching+0x34/0x60
 [<ffffffff9e09c7b3>] ? __task_pid_nr_ns+0x93/0x1b0
 [<ffffffff9e09c82d>] ? __task_pid_nr_ns+0x10d/0x1b0
 [<ffffffff9e09c725>] ? __task_pid_nr_ns+0x5/0x1b0
 [<ffffffff9e0ad17e>] schedule_tail+0x5e/0xb0
 [<ffffffff9e831bbf>] ret_from_fork+0xf/0xb0
trinity-c377    R  running task    14632  6164   5874 0x00000000
 ffff88000c397df8 0000000000000002 0000000000000002 ffff880066d59780
 00000000001d4080 0000000000000000 ffff88000c397fd8 00000000001d4080
 ffff88008b49af00 ffff880066d59780 ffff88000c397fd8 0000000000000000
Call Trace:
 [<ffffffff9e82b4e2>] preempt_schedule_irq+0x52/0xb0
 [<ffffffff9e832a50>] retint_kernel+0x20/0x30
 [<ffffffff9e0d496d>] ? lock_acquire+0x9d/0x1b0
 [<ffffffff9e09c725>] ? __task_pid_nr_ns+0x5/0x1b0
 [<ffffffff9e09c763>] __task_pid_nr_ns+0x43/0x1b0
 [<ffffffff9e09c725>] ? __task_pid_nr_ns+0x5/0x1b0
 [<ffffffff9e0ad17e>] schedule_tail+0x5e/0xb0
 [<ffffffff9e831bbf>] ret_from_fork+0xf/0xb0
INFO: rcu_preempt detected stalls on CPUs/tasks:
	Tasks blocked on level-0 rcu_node (CPUs 0-3):
	Tasks blocked on level-0 rcu_node (CPUs 0-3):
	(detected by 0, t=26007 jiffies, g=51433, c=51432, q=0)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks:
	Tasks blocked on level-0 rcu_node (CPUs 0-3):
	Tasks blocked on level-0 rcu_node (CPUs 0-3):
	(detected by 0, t=45512 jiffies, g=51433, c=51432, q=0)
INFO: Stall ended before state dump start
INFO: rcu_preempt detected stalls on CPUs/tasks:
	Tasks blocked on level-0 rcu_node (CPUs 0-3):
	Tasks blocked on level-0 rcu_node (CPUs 0-3):
	(detected by 0, t=65017 jiffies, g=51433, c=51432, q=0)
INFO: Stall ended before state dump start


      parent reply	other threads:[~2014-10-09 20:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-02 17:55 RCU stalls -> lockup Dave Jones
2014-10-02 19:36 ` Paul E. McKenney
2014-10-05  2:15   ` Tejun Heo
2014-10-07  1:27     ` Dave Jones
2014-10-09 20:09     ` Dave Jones [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141009200955.GA15094@redhat.com \
    --to=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.