Re: spinlock lockup, rcu stalls etc.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: spinlock lockup, rcu stalls etc.
Date: Wed, 24 Jul 2013 07:13:34 -0700	[thread overview]
Message-ID: <20130724141334.GA3668@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFw9Vo_vugB63kbjh6=jPDoJZP47Nw+hEgMCxFPiExYn5w@mail.gmail.com>

On Tue, Jul 23, 2013 at 07:24:31PM -0700, Linus Torvalds wrote:
> [ Added Thomas and Ingo due to that timer/nmi thing. ]
> 
> On Tue, Jul 23, 2013 at 9:28 AM, Dave Jones <davej@redhat.com> wrote:
> > Woke up to a box that I could log into, but would hang as soon as I tried to
> > do any disk IO.  This is what hit the logs before that.
> >
> > [28853.503179] hrtimer: interrupt took 4847 ns
> 
> Ugh. There's some nasty timer congestion there..
> 
> > [28932.599607] BUG: spinlock lockup suspected on CPU#0, trinity-child2/6990
> > [28932.600419]  lock: inode_sb_list_lock+0x0/0x80, .magic: dead4ead, .owner: trinity-child1/6763, .owner_cpu: 1
> 
> So the current owner of the lock is cpu 1. The other CPU's agree:
> 
> > [28932.606666] BUG: spinlock lockup suspected on CPU#2, trinity-child2/6764
> > [28932.606669]  lock: inode_sb_list_lock+0x0/0x80, .magic: dead4ead, .owner: trinity-child1/6763, .owner_cpu: 1
> > [28932.617231] sending NMI to all CPUs:
> > [28932.635092] BUG: spinlock lockup suspected on CPU#3, trinity-child3/6975
> > [28932.635095]  lock: inode_sb_list_lock+0x0/0x80, .magic: dead4ead, .owner: trinity-child1/6763, .owner_cpu: 1
> 
> .. and their backtrace all points to them trying to take that
> spinlock. So that is all consistent.
> 
> And here's cpu1, edited down a bit:
> 
> > [28932.777623] NMI backtrace for cpu 1
> > [28932.777625] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 91.230 msecs
> 
> Whee. 91 msec? We have something going on there too. irq entry locks?

Not sure what the deal is with the other stuff, but the stacks are being
dumped from NMI handlers, and maybe the console got backed up a bit,
slowing down the dump.  -Might- account for the 91msec in an NMI handler.

							Thanx, Paul

> > [28932.779440] CPU: 1 PID: 6763 Comm: trinity-child1 Not tainted 3.11.0-rc2+ #54
> > [28932.782283] RIP: 0010:[<ffffffff81710385>]  [<ffffffff81710385>] add_preempt_count+0x25/0xf0
> > [28932.797761] Call Trace:
> > [28932.798737]  <IRQ>
> > [28932.799715]  [<ffffffff81320851>] delay_tsc+0x61/0xe0
> > [28932.800693]  [<ffffffff81320789>] __const_udelay+0x29/0x30
> > [28932.801674]  [<ffffffff810816d4>] __rcu_read_unlock+0x54/0xa0
> > [28932.802657]  [<ffffffff810af531>] cpuacct_account_field+0xf1/0x200
> > [28932.804613]  [<ffffffff8109e430>] account_system_time+0xb0/0x1b0
> > [28932.805561]  [<ffffffff8109e565>] __vtime_account_system+0x35/0x40
> > [28932.806506]  [<ffffffff8109e59d>] vtime_account_system.part.2+0x2d/0x50
> > [28932.807445]  [<ffffffff8109e8e5>] vtime_account_irq_enter+0x55/0x80
> > [28932.808365]  [<ffffffff8105f42f>] irq_enter+0x4f/0x90
> > [28932.809269]  [<ffffffff81716a15>] smp_apic_timer_interrupt+0x35/0x60
> > [28932.810156]  [<ffffffff817156af>] apic_timer_interrupt+0x6f/0x80
> > [28932.812756]  [<ffffffff8105f53d>] irq_exit+0xcd/0xe0
> > [28932.813618]  [<ffffffff81716a25>] smp_apic_timer_interrupt+0x45/0x60
> > [28932.814483]  [<ffffffff817156af>] apic_timer_interrupt+0x6f/0x80
> > [28932.815345]  <EOI>
> > [28932.816207]  [<ffffffff8170c6a0>] ? retint_restore_args+0xe/0xe
> > [28932.817069]  [<ffffffff810c1275>] ? lock_acquired+0x105/0x3f0
> > [28932.817924]  [<ffffffff811ec8e2>] ? sync_inodes_sb+0x1c2/0x2a0
> > [28932.818767]  [<ffffffff8170b62c>] _raw_spin_lock+0x6c/0x80
> > [28932.819616]  [<ffffffff811ec8e2>] ? sync_inodes_sb+0x1c2/0x2a0
> > [28932.820468]  [<ffffffff811ec8e2>] sync_inodes_sb+0x1c2/0x2a0
> > [28932.821310]  [<ffffffff81708c1f>] ? wait_for_completion+0xdf/0x110
> > [28932.823819]  [<ffffffff811f2a49>] sync_inodes_one_sb+0x19/0x20
> > [28932.824649]  [<ffffffff811c2b12>] iterate_supers+0xb2/0x110
> > [28932.825477]  [<ffffffff811f2cb5>] sys_sync+0x35/0x90
> > [28932.826300]  [<ffffffff81714c54>] tracesys+0xdd/0xe2
> > [28932.827119]  [<ffffffffa0000000>] ? 0xffffffff9fffffff
> 
> .. and again, it actually looks like the time is not necessarily spent
> inside the spinlock in sync_inodes_sb(), but in a timer interrupt that
> just happened to go off during that. I wonder if this is the same
> issue that caused that earlier hrtimer long delay.. I'm not
> necessarily seeing 91 msecs worth, but..
> 
> You seem to have CONFIG_PROVE_RCU_DELAY enabled, which explains that
> delay_tsc() call in there. I wonder how much things like that make it
> worse. Together with the (crazy bad) back-off in __spin_lock_debug(),
> there might be a *lot* of these delays.
> 
> That said, the fact that your machine is dead after all this implies
> that there is something else wrong than just things being very slow.
> But I suspect at least *part* of your problems may be due to these
> kinds of debugging options that make things much much worse.
> 
>               Linus
>

next prev parent reply	other threads:[~2013-07-24 14:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-23 16:28 spinlock lockup, rcu stalls etc Dave Jones
2013-07-24  2:24 ` Linus Torvalds
2013-07-24 14:13   ` Paul E. McKenney [this message]
2013-07-24 12:33 ` Paul E. McKenney
2013-07-24 23:01   ` Paul E. McKenney
2013-07-25 12:42     ` Josh Boyer
2013-07-25 13:17       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130724141334.GA3668@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).