All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: spinlock lockup, async_umap_flush_lock in 3.4, 3.7, 3.8
@ 2013-05-13  3:48 Hank Leininger
  0 siblings, 0 replies; only message in thread
From: Hank Leininger @ 2013-05-13  3:48 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4641 bytes --]

I've got several systems with similar hardware which crash with BUG:
spinlock errors on async_umap_flush_lock such as:

BUG: spinlock lockup suspected on CPU#0, sh/1166
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23

(More examples below.)

In general these happen very rarely--but a specific userland workload
(lots of mongodb + sqlite reads & writes, while other CPUs are running
compute-heavy tasks) seems to trigger it within a few minutes to hours.
After 1-3 "spinlock lockup suspected" errors, the system locks up, no
response to alt+sysrq.

I've gotten the crash on one system in the last couple of days with
3.7.1-gentoo, 3.8.11-gentoo, 3.8.11 vanilla, and 3.4.4 vanilla.  When
I looked further back, over the past year another system crashed with
similar errors (under similar workload) running 3.7.0-gentoo and
3.8.4-gentoo.  Further back than that there are 2-3 crashes on those
and other similar systems using 2.6.x and 3.0.x, but their errors are
different enough that they may not be related.

These systems each have:

Supermicro X8DTU-F motherboard
2x Xeon E5645 (6 cores each + hyperthreading)
24 GB ECC RAM
Adaptec 51645 RAID controller w/bbu
12x 2TB SAS disks

They are using hw raid, 11 disks in a RAID6 with 1 hot-spare; main
partition is 16 TB.

They all use loop-aes v3.6g as a replacement loop.ko module to encrypt
their / filesystem (using the aes-ni instruction set).

3.8.11 .config pastebin: http://pastebin.com/u3BDPTvP

3.4.44 .config pastebin: http://pastebin.com/1Rpk9RVf

Generally speaking, 3.8.x and 3.4.44 kernels were compiled with GCC 4.7;
the older 3.7.x kernels were compiled with GCC 4.6.

Error messages, captured by serial consoles, newest crashes first:

Host1:

3.4.44
BUG: spinlock lockup on CPU#0, john/21637
 lock: ffffffff816558d0, .magic: dead4ead, .owner: mongod/27646, .owner_cpu: 8
BUG: spinlock lockup on CPU#6, mongod/3256
 lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18
BUG: spinlock lockup on CPU#20, khugepaged/735
 lock: ffff880621867860, .magic: dead4ead, .owner: mongod/3251, .owner_cpu: 18

3.8.11
BUG: spinlock lockup suspected on CPU#0, sh/1166
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23
BUG: spinlock lockup suspected on CPU#19, scsi_eh_0/1408
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/23/0, .owner_cpu: 23

3.8.11-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/3678, .owner_cpu: 4
BUG: spinlock lockup suspected on CPU#16, mongod/3115
 lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5
BUG: spinlock lockup suspected on CPU#6, khugepaged/744
 lock: 0xffff880620ab47a8, .magic: dead4ead, .owner: flush-7:4/1915, .owner_cpu: 5

3.7.1-gentoo
BUG: spinlock lockup suspected on CPU#0, john/32030
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#19, mongod/18985
 lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2
BUG: spinlock lockup suspected on CPU#3, scsi_eh_0/1407
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: swapper/13/0, .owner_cpu: 13
BUG: spinlock lockup suspected on CPU#9, khugepaged/741
 lock: 0xffff8806221f7860, .magic: dead4ead, .owner: mongod/18975, .owner_cpu: 2

Host2:

3.8.4-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongod/22377, .owner_cpu: 9
BUG: spinlock lockup suspected on CPU#4, mongod/3377
 lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14
BUG: spinlock lockup suspected on CPU#21, mongod/3375
 lock: 0xffff880621d00f68, .magic: dead4ead, .owner: kswapd0/689, .owner_cpu: 14

3.7.0-gentoo
BUG: spinlock lockup suspected on CPU#0, swapper/0/0
 lock: async_umap_flush_lock+0x0/0x20, .magic: dead4ead, .owner: mongo/16561, .owner_cpu: 3

(The repeated crashes on Host2 lead to irreperable ext4 corruption.)

I can provide System.map files if they are interesting.  I'd be happy
to try a specific kernel, add patches to harvest more information in
the event of a crash, etc.

Thanks,

-- 

Hank Leininger <hlein@marc.info>
3C2A 4EEE ED36 D136 18F2  1B30 47A8 D14B E13E 9C6A

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 447 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-05-13  3:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-13  3:48 BUG: spinlock lockup, async_umap_flush_lock in 3.4, 3.7, 3.8 Hank Leininger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.