From: Patrick Schmid <schmid@phys.ethz.ch>
To: linux-btrfs@vger.kernel.org
Subject: soft lockup - CPU#0 stuck - Kernel 3.17.2
Date: Thu, 13 Nov 2014 14:32:11 +0100 [thread overview]
Message-ID: <5464B2DB.7070008@phys.ethz.ch> (raw)
[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]
Hi all,
we run a > 500 TiB backup system on iSCSI targets using 19 BTRFS
filesystems (the biggest of which is 110 TiB) on Ubuntu 14.04 LTS and
various kernel versions. Btrfs-Progs v3.17.1. The hardware is a 24 core
Xeon E5-2620 on an Intel S2600GZ board with 128 GiB RAM.
Since btrfs has changed to kworkers (I think in 3.15) the frontend
server somewhat randomly crashes with soft lockups (see attachment). The
system is rock solid with the 3.14.22 kernel.
The lockups happen during the nightly cron-controlled rsync backups and
occur at random times during this process.
We are totally aware of the fact that this tends to be one of
those âit doesnât workâ bug reports, but itâs really hard to pin
down the source of the problem other than it seems to be related to the
kworkers. Weâd love to provide any feedback we can, please let us know
what you need.
Regards
Patrick
--
Patrick Schmid <schmid@phys.ethz.ch> support: +41 44 633 2668
IT Services Group, HPT H 8 voice: +41 44 633 3997
Departement Physik, ETH Zurich
CH-8093 Zurich, Switzerland
[-- Attachment #2: NMI_soft_lockup_crash.txt --]
[-- Type: text/plain, Size: 5999 bytes --]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207104] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u481:26:108963]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207147] Modules linked in: btrfs(E) xor(E) raid6_pq(E) tcp_diag(E) inet_diag(E) autofs4(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) mousedev(E) cryptd(E) ioatdma(E) sb_edac(E) microcode(E) ipmi_si(E) edac_core(E) lpc_ich(E) mei_me(E) ipmi_msghandler(E) tpm_tis(E) mei(E) wmi(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) sunrpc(E) fscache(E) lp(E) parport(E) hid_generic(E) usbhid(E) hid(E) igb(E) ixgbe(E) i2c_algo_bit(E) dca(E) isci(E) ptp(E) ahci(E) libsas(E) scsi_transport_sas(E) libahci(E) mdio(E) arcmsr(E) pps_core(E)
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207152] CPU: 0 PID: 108963 Comm: kworker/u481:26 Tainted: G EL 3.17.2-stable.slub #6
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207154] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207185] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207186] task: ffff8802e34a8000 ti: ffff88070a5a8000 task.ti: ffff88070a5a8000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207194] RIP: 0010:[<ffffffff810b0b35>] [<ffffffff810b0b35>] queue_read_lock_slowpath+0xb5/0xd0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207195] RSP: 0018:ffff88070a5aba00 EFLAGS: 00000206
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207196] RAX: 00000000000041b8 RBX: ffff8806bdac3a18 RCX: 0000000000003bcc
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207197] RDX: ffff8800a2c4f350 RSI: 0000000000003bcc RDI: ffff8800a2c4f354
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207198] RBP: ffff88070a5aba08 R08: 0000000000003bc6 R09: 0000000000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207199] R10: 00000000ffffffff R11: 0000000000000001 R12: ffff88081ee14300
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207200] R13: ffff88100e6e0000 R14: ffffffff810946ac R15: ffff88070a5ab9a8
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207202] FS: 0000000000000000(0000) GS:ffff88081ee00000(0000) knlGS:0000000000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207204] CR2: 0000000002b97fc8 CR3: 0000000001c16000 CR4: 00000000000407f0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207205] Stack:
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207207] ffffffff8173b07c ffff88070a5aba68 ffffffffa04d8a3b 0000000000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207209] ffff88070a5aba78 ffffffffa04757af 00003f66a0497f6e ffff88061c29af68
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207211] ffff8800a2c4f2e0 ffff88100f36d800 ffff880000000000 0000160000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207212] Call Trace:
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207218] [<ffffffff8173b07c>] ? _raw_read_lock+0x1c/0x30
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207233] [<ffffffffa04d8a3b>] btrfs_tree_read_lock+0x5b/0x120 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207241] [<ffffffffa04757af>] ? leaf_space_used+0xcf/0x110 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207249] [<ffffffffa0477d6b>] btrfs_read_lock_root_node+0x3b/0x50 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207258] [<ffffffffa047cbee>] btrfs_search_slot+0x50e/0xa10 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207269] [<ffffffffa0494257>] btrfs_lookup_file_extent+0x37/0x40 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207282] [<ffffffffa04b35da>] __btrfs_drop_extents+0x16a/0xd90 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207285] [<ffffffff810946ac>] ? try_to_wake_up+0x1fc/0x340
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207299] [<ffffffffa04bc65b>] ? __set_extent_bit+0x15b/0x540 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207302] [<ffffffff811b0a12>] ? kmem_cache_alloc+0x122/0x130
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207311] [<ffffffffa0477aea>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207323] [<ffffffffa04a36ce>] insert_reserved_file_extent.constprop.59+0x9e/0x2f0 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207335] [<ffffffffa04a94c5>] btrfs_finish_ordered_io+0x2e5/0x5f0 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207345] [<ffffffffa04a9ad5>] finish_ordered_fn+0x15/0x20 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207358] [<ffffffffa04cf3e2>] normal_work_helper+0xc2/0x2b0 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207362] [<ffffffff8107fe09>] ? pwq_activate_delayed_work+0x39/0x80
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207374] [<ffffffffa04cf742>] btrfs_endio_write_helper+0x12/0x20 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207377] [<ffffffff81082000>] process_one_work+0x150/0x3f0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207379] [<ffffffff810826f1>] worker_thread+0x121/0x520
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207381] [<ffffffff810825d0>] ? rescuer_thread+0x330/0x330
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207385] [<ffffffff81087992>] kthread+0xd2/0xf0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207388] [<ffffffff810878c0>] ? kthread_create_on_node+0x180/0x180
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207390] [<ffffffff8173b6bc>] ret_from_fork+0x7c/0xb0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207393] [<ffffffff810878c0>] ? kthread_create_on_node+0x180/0x180
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207413] Code: 8b 02 3c ff 74 f8 f3 c3 55 48 89 e5 e8 a8 df 67 00 5d c3 83 e1 fe 0f b7 f1 b8 00 80 00 00 44 0f b7 42 04 66 44 39 c1 74 83 f3 90 <83> e8 01 75 ee 66 66 66 90 66 66 90 eb e0 66 2e 0f 1f 84 00 00
next reply other threads:[~2014-11-13 13:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-13 13:32 Patrick Schmid [this message]
2014-11-13 14:49 ` soft lockup - CPU#0 stuck - Kernel 3.17.2 Chris Mason
2014-11-13 19:07 ` Patrick Schmid
2014-11-13 19:12 ` Chris Mason
[not found] ` <54659FDB.6070300@phys.ethz.ch>
2014-11-14 17:39 ` Chris Mason
2014-11-14 18:23 ` Patrick Schmid
2014-11-14 18:31 ` Chris Mason
2014-11-14 23:47 ` Chris Mason
2014-11-21 13:01 ` Patrick Schmid
2014-11-21 13:16 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5464B2DB.7070008@phys.ethz.ch \
--to=schmid@phys.ethz.ch \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox