From: Ingo Molnar <mingo@kernel.org>
To: Davidlohr Bueso <davidlohr@hp.com>
Cc: tglx@linutronix.de, dvhart@linux.intel.com, peterz@infradead.org,
paulmck@linux.vnet.ibm.com, torvalds@linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: futex funkiness -- massive lockups
Date: Wed, 5 Mar 2014 10:01:13 +0100 [thread overview]
Message-ID: <20140305090113.GE2705@gmail.com> (raw)
In-Reply-To: <1393983784.2512.40.camel@buesod1.americas.hpqcorp.net>
* Davidlohr Bueso <davidlohr@hp.com> wrote:
> Hi,
>
> A large amount of lockups are seen on a 480 core system doing some sort
> of database-like workload. All except one are soft lockups. This is a
> SLES11 system with most of the recent futex changes backported,
> including commits 63b1a816, b0c29f79, 99b60ce6, a52b89eb, 0d00c7b2,
> 5cdec2d8 and f12d5bfc.
>
> The following are some traces I put together in chronological order from
> the report I received. While the traces aren't perfect, I believe it
> exemplifies the issue pretty well. There are a lot more, but just of the
> same.
>
> [212046.044098] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 22
> [212046.044098] Pid: 312554, comm: XXX Tainted: GF D W N 3.0.101-0.15-default #1
> [212046.044098] Call Trace:
> [212046.044098] [<ffffffff81004935>] dump_trace+0x75/0x310
> [212046.044098] [<ffffffff8145e0b3>] dump_stack+0x69/0x6f
> [212046.044098] [<ffffffff8145e14c>] panic+0x93/0x201
> [212046.044098] [<ffffffff810c65e4>] watchdog_overflow_callback+0xb4/0xc0
> [212046.044098] [<ffffffff810f2d9a>] __perf_event_overflow+0xaa/0x230
> [212046.044098] [<ffffffff81018210>] intel_pmu_handle_irq+0x1a0/0x330
> [212046.044098] [<ffffffff81462ae1>] perf_event_nmi_handler+0x31/0xa0
> [212046.044098] [<ffffffff81464c37>] notifier_call_chain+0x37/0x70
> [212046.044098] [<ffffffff81464c7d>] __atomic_notifier_call_chain+0xd/0x20
> [212046.044098] [<ffffffff81464ccd>] notify_die+0x2d/0x40
> [212046.044098] [<ffffffff81462127>] default_do_nmi+0x37/0x200
> [212046.044098] [<ffffffff81462358>] do_nmi+0x68/0x80
> [212046.044098] [<ffffffff814618ad>] restart_nmi+0x1a/0x1e
Is this end of the traceback, i.e. does the first anomalous lockup
show that the NMI interrupted user-space mode? If yes then that's
highly unusual.
The 'GF D W' taint also suggests that there was something going on
before this triggered: 'W' suggests that something warned before, 'D'
suggests something died anomalously before and 'F' suggests a forced
or unsigned module.
So even the earliest traces look like after effects.
Thanks,
Ingo
prev parent reply other threads:[~2014-03-05 9:01 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-05 1:43 futex funkiness -- massive lockups Davidlohr Bueso
2014-03-05 3:36 ` Linus Torvalds
2014-03-05 4:45 ` Davidlohr Bueso
2014-03-05 8:16 ` Peter Zijlstra
2014-03-05 9:01 ` Ingo Molnar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140305090113.GE2705@gmail.com \
--to=mingo@kernel.org \
--cc=davidlohr@hp.com \
--cc=dvhart@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.