From: Hank Leininger <hlein@marc.info>
To: linux-kernel@vger.kernel.org
Subject: BUG: spinlock lockup, async_umap_flush_lock in 3.4, 3.7, 3.8
Date: Sun, 12 May 2013 23:48:35 -0400 [thread overview]
Message-ID: <20130513034835.GA1130@marklar.spinoli.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 4641 bytes --]
I've got several systems with similar hardware which crash with BUG:
spinlock errors on async_umap_flush_lock such as:
BUG: spinlock lockup suspected on CPU#0, sh/1166
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
(More examples below.)
In general these happen very rarely--but a specific userland workload
(lots of mongodb + sqlite reads & writes, while other CPUs are running
compute-heavy tasks) seems to trigger it within a few minutes to hours.
After 1-3 "spinlock lockup suspected" errors, the system locks up, no
response to alt+sysrq.
I've gotten the crash on one system in the last couple of days with
3.7.1-gentoo, 3.8.11-gentoo, 3.8.11 vanilla, and 3.4.4 vanilla. When
I looked further back, over the past year another system crashed with
similar errors (under similar workload) running 3.7.0-gentoo and
3.8.4-gentoo. Further back than that there are 2-3 crashes on those
and other similar systems using 2.6.x and 3.0.x, but their errors are
different enough that they may not be related.
These systems each have:
Supermicro X8DTU-F motherboard
2x Xeon E5645 (6 cores each + hyperthreading)
24 GB ECC RAM
Adaptec 51645 RAID controller w/bbu
12x 2TB SAS disks
They are using hw raid, 11 disks in a RAID6 with 1 hot-spare; main
partition is 16 TB.
They all use loop-aes v3.6g as a replacement loop.ko module to encrypt
their / filesystem (using the aes-ni instruction set).
3.8.11 .config pastebin: http://pastebin.com/u3BDPTvP
3.4.44 .config pastebin: http://pastebin.com/1Rpk9RVf
Generally speaking, 3.8.x and 3.4.44 kernels were compiled with GCC 4.7;
the older 3.7.x kernels were compiled with GCC 4.6.
Error messages, captured by serial consoles, newest crashes first:
Host1:
3.4.44
BUG: spinlock lockup on CPU#0, john/21637
lock: ffffffff816558d0, .magic: dead4ead, .owner: mongod/27646, .owner_cpu: 8
BUG: spinlock lockup on CPU#6, mongod/3256
lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18
BUG: spinlock lockup on CPU#20, khugepaged/735
lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18
3.8.11
BUG: spinlock lockup suspected on CPU#0, sh/1166
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
3.8.11-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/3678, .owner_cpu: 4
BUG: spinlock lockup suspected on CPU#16, mongod/3115
lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5
BUG: spinlock lockup suspected on CPU#6, khugepaged/744
lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5
3.7.1-gentoo
BUG: spinlock lockup suspected on CPU#0, john/32030
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#19, mongod/18985
lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2
BUG: spinlock lockup suspected on CPU#3, scsi_eh_0/1407
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#9, khugepaged/741
lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2
Host2:
3.8.4-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/22377, .owner_cpu: 9
BUG: spinlock lockup suspected on CPU#4, mongod/3377
lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14
BUG: spinlock lockup suspected on CPU#21, mongod/3375
lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14
3.7.0-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongo/16561, .owner_cpu: 3
(The repeated crashes on Host2 lead to irreperable ext4 corruption.)
I can provide System.map files if they are interesting. I'd be happy
to try a specific kernel, add patches to harvest more information in
the event of a crash, etc.
Thanks,
--
Hank Leininger <hlein@marc.info>
3C2A 4EEE ED36 D136 18F2 1B30 47A8 D14B E13E 9C6A
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 447 bytes --]
reply other threads:[~2013-05-13 3:54 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130513034835.GA1130@marklar.spinoli.org \
--to=hlein@marc.info \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.