Re: futex funkiness -- massive lockups

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Davidlohr Bueso <davidlohr@hp.com>
Cc: tglx@linutronix.de, dvhart@linux.intel.com, peterz@infradead.org,
	paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: futex funkiness -- massive lockups
Date: Wed, 5 Mar 2014 10:01:13 +0100	[thread overview]
Message-ID: <20140305090113.GE2705@gmail.com> (raw)
In-Reply-To: <1393983784.2512.40.camel@buesod1.americas.hpqcorp.net>


* Davidlohr Bueso <davidlohr@hp.com> wrote:

> Hi,
> 
> A large amount of lockups are seen on a 480 core system doing some sort
> of database-like workload. All except one are soft lockups. This is a
> SLES11 system with most of the recent futex changes backported,
> including commits 63b1a816, b0c29f79, 99b60ce6, a52b89eb, 0d00c7b2,
> 5cdec2d8 and f12d5bfc.
> 
> The following are some traces I put together in chronological order from
> the report I received. While the traces aren't perfect, I believe it
> exemplifies the issue pretty well. There are a lot more, but just of the
> same.
> 
> [212046.044098] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 22
> [212046.044098] Pid: 312554, comm: XXX Tainted: GF     D W  N  3.0.101-0.15-default #1
> [212046.044098] Call Trace:
> [212046.044098]  [<ffffffff81004935>] dump_trace+0x75/0x310
> [212046.044098]  [<ffffffff8145e0b3>] dump_stack+0x69/0x6f
> [212046.044098]  [<ffffffff8145e14c>] panic+0x93/0x201
> [212046.044098]  [<ffffffff810c65e4>] watchdog_overflow_callback+0xb4/0xc0
> [212046.044098]  [<ffffffff810f2d9a>] __perf_event_overflow+0xaa/0x230
> [212046.044098]  [<ffffffff81018210>] intel_pmu_handle_irq+0x1a0/0x330
> [212046.044098]  [<ffffffff81462ae1>] perf_event_nmi_handler+0x31/0xa0
> [212046.044098]  [<ffffffff81464c37>] notifier_call_chain+0x37/0x70
> [212046.044098]  [<ffffffff81464c7d>] __atomic_notifier_call_chain+0xd/0x20
> [212046.044098]  [<ffffffff81464ccd>] notify_die+0x2d/0x40
> [212046.044098]  [<ffffffff81462127>] default_do_nmi+0x37/0x200
> [212046.044098]  [<ffffffff81462358>] do_nmi+0x68/0x80
> [212046.044098]  [<ffffffff814618ad>] restart_nmi+0x1a/0x1e

Is this end of the traceback, i.e. does the first anomalous lockup 
show that the NMI interrupted user-space mode? If yes then that's 
highly unusual.

The 'GF D W' taint also suggests that there was something going on 
before this triggered: 'W' suggests that something warned before, 'D' 
suggests something died anomalously before and 'F' suggests a forced 
or unsigned module.

So even the earliest traces look like after effects.

Thanks,

	Ingo

     prev parent reply	other threads:[~2014-03-05  9:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-05  1:43 futex funkiness -- massive lockups Davidlohr Bueso
2014-03-05  3:36 ` Linus Torvalds
2014-03-05  4:45   ` Davidlohr Bueso
2014-03-05  8:16 ` Peter Zijlstra
2014-03-05  9:01 ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140305090113.GE2705@gmail.com \
    --to=mingo@kernel.org \
    --cc=davidlohr@hp.com \
    --cc=dvhart@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox