From: Ingo Molnar <mingo@kernel.org>
To: Davidlohr Bueso <davidlohr@hp.com>
Cc: tglx@linutronix.de, dvhart@linux.intel.com, peterz@infradead.org,
paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: futex funkiness -- massive lockups
Date: Wed, 5 Mar 2014 10:01:13 +0100 [thread overview]
Message-ID: <20140305090113.GE2705@gmail.com> (raw)
In-Reply-To: <1393983784.2512.40.camel@buesod1.americas.hpqcorp.net>
* Davidlohr Bueso <davidlohr@hp.com> wrote:
> Hi,
>
> A large amount of lockups are seen on a 480 core system doing some sort
> of database-like workload. All except one are soft lockups. This is a
> SLES11 system with most of the recent futex changes backported,
> including commits 63b1a816, b0c29f79, 99b60ce6, a52b89eb, 0d00c7b2,
> 5cdec2d8 and f12d5bfc.
>
> The following are some traces I put together in chronological order from
> the report I received. While the traces aren't perfect, I believe it
> exemplifies the issue pretty well. There are a lot more, but just of the
> same.
>
> [212046.044098] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 22
> [212046.044098] Pid: 312554, comm: XXX Tainted: GF D W N 3.0.101-0.15-default #1
> [212046.044098] Call Trace:
> [212046.044098] [<ffffffff81004935>] dump_trace+0x75/0x310
> [212046.044098] [<ffffffff8145e0b3>] dump_stack+0x69/0x6f
> [212046.044098] [<ffffffff8145e14c>] panic+0x93/0x201
> [212046.044098] [<ffffffff810c65e4>] watchdog_overflow_callback+0xb4/0xc0
> [212046.044098] [<ffffffff810f2d9a>] __perf_event_overflow+0xaa/0x230
> [212046.044098] [<ffffffff81018210>] intel_pmu_handle_irq+0x1a0/0x330
> [212046.044098] [<ffffffff81462ae1>] perf_event_nmi_handler+0x31/0xa0
> [212046.044098] [<ffffffff81464c37>] notifier_call_chain+0x37/0x70
> [212046.044098] [<ffffffff81464c7d>] __atomic_notifier_call_chain+0xd/0x20
> [212046.044098] [<ffffffff81464ccd>] notify_die+0x2d/0x40
> [212046.044098] [<ffffffff81462127>] default_do_nmi+0x37/0x200
> [212046.044098] [<ffffffff81462358>] do_nmi+0x68/0x80
> [212046.044098] [<ffffffff814618ad>] restart_nmi+0x1a/0x1e
Is this end of the traceback, i.e. does the first anomalous lockup
show that the NMI interrupted user-space mode? If yes then that's
highly unusual.
The 'GF D W' taint also suggests that there was something going on
before this triggered: 'W' suggests that something warned before, 'D'
suggests something died anomalously before and 'F' suggests a forced
or unsigned module.
So even the earliest traces look like after effects.
Thanks,
Ingo
prev parent reply other threads:[~2014-03-05 9:01 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-05 1:43 futex funkiness -- massive lockups Davidlohr Bueso
2014-03-05 3:36 ` Linus Torvalds
2014-03-05 4:45 ` Davidlohr Bueso
2014-03-05 8:16 ` Peter Zijlstra
2014-03-05 9:01 ` Ingo Molnar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140305090113.GE2705@gmail.com \
--to=mingo@kernel.org \
--cc=davidlohr@hp.com \
--cc=dvhart@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox