From: bfields@fieldses.org (J. Bruce Fields)
To: Philipp Hahn <hahn@univention.de>
Cc: linux-fsdevel@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: kernel BUG at linux-3.16.7-ckt25/fs/dcache.c:2373!
Date: Thu, 2 Jun 2016 16:02:01 -0400 [thread overview]
Message-ID: <20160602200201.GA15429@fieldses.org> (raw)
In-Reply-To: <574FE57F.7080209@univention.de>
On Thu, Jun 02, 2016 at 09:51:27AM +0200, Philipp Hahn wrote:
> Hello,
>
> probably during heavy IO our file server crashed on a BUG_ON in dache.c,
> probably triggered by NFS:
>
> > ------------[ cut here ]------------
> > kernel BUG at /var/build/temp/tmp.BPql4ErveJ/pbuilder/linux-3.16.7-ckt25/fs/dcache.c:2373!
> > invalid opcode: 0000 [#1] SMP
> > Modules linked in: nfsv4 dns_resolver nfsv3 btrfs raid6_pq xor crc32c_generic ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
> > s_acl auth_rpcgss nfs fscache lockd sunrpc quota_v2 quota_tree lpc_ich mfd_core psmouse i2c_i801 tpm_tis processor pcspkr serio_raw i2c_core tpm rng_core thermal_sys
> > rc16 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sd_mod crc_t10dif crct10dif_common sg sr_mod cdrom ata_generic ehci_pci uhci_hcd ata_piix e1000 ehci_hcd mptspi mptscsih mptbase floppy scsi_transport_spi usbcore libata button usb_common
> > CPU: 0 PID: 1455 Comm: nfsd Not tainted 3.16.0-ucs193-amd64 #1 Debian 3.16.7-ckt25-2~bpo70+1~ucs3.3.193.201604181018
> > Hardware name: FUJITSU SIEMENS PRIMERGY RX200S2/D1790/M73IL, BIOS 6.0 Rev. R04A5F5.1790 01/16/2006
> > task: ffff880121cb03d0 ti: ffff8800d0cac000 task.ti: ffff8800d0cac000
> > RIP: 0010:[<ffffffff811d2613>] [<ffffffff811d2613>] __d_rehash+0x53/0x60
> > RSP: 0018:ffff8800d0cafaa0 EFLAGS: 00010282
> > RAX: 0000000000043e72 RBX: ffff8800939d4578 RCX: 000000000000000d
> > RDX: ffffc90000005000 RSI: ffffc90000224390 RDI: ffff8800b8d53018
> > RBP: ffff8800b8d53018 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0100000000000b15 R11: 0000000000000000 R12: ffff8800d1388198
> > R13: ffff8800b8d53018 R14: ffff8800b8d53b58 R15: ffff8800d0cafbe0
> > FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffffffffff600400 CR3: 00000001007d7000 CR4: 00000000000007f0
> > Stack:
> > ffffffff811d2a3d ffff8800d0cafb27 ffff8800d1388198 ffff8800b8d53b58
> > ffff8800d1388198 ffff8800b8d53b58 ffff8800b8d53b58 ffff880122829860
> > ffff8800b8d53b58 ffff8800d0cafbe0 ffffffff811c44f9 ffff8800b8d53b58
> > Call Trace:
> > [<ffffffff811d2a3d>] ? d_materialise_unique+0x8d/0x3d0
> > [<ffffffff811c44f9>] ? lookup_real+0x19/0x50
> > [<ffffffff811c4b27>] ? __lookup_hash+0x37/0x50
> > [<ffffffff811c92fd>] ? lookup_one_len+0xbd/0x110
> > [<ffffffff81241159>] ? reconnect_path+0x109/0x2d0
> > [<ffffffffa0367e20>] ? nfsd_setuser_and_check_port+0xa0/0xa0 [nfsd]
> > [<ffffffff8124144f>] ? exportfs_decode_fh+0x12f/0x300
> > [<ffffffffa036cee5>] ? expkey_match+0x45/0x50 [nfsd]
> > [<ffffffffa02df314>] ? cache_check+0x114/0x3b0 [sunrpc]
> > [<ffffffffa036da5d>] ? exp_find+0xfd/0x1b0 [nfsd]
> > [<ffffffffa036898c>] ? fh_verify+0x2fc/0x5a0 [nfsd]
> > [<ffffffffa0378df3>] ? nfsd4_proc_compound+0x4c3/0x790 [nfsd]
> > [<ffffffffa0364df4>] ? nfsd_dispatch+0xe4/0x230 [nfsd]
> > [<ffffffffa02d3dae>] ? svc_process_common+0x33e/0x6a0 [sunrpc]
> > [<ffffffffa0364680>] ? nfsd_destroy+0x70/0x70 [nfsd]
> > [<ffffffffa02d44d0>] ? svc_process+0x110/0x160 [sunrpc]
> > [<ffffffffa0364747>] ? nfsd+0xc7/0x140 [nfsd]
> > [<ffffffff8108e141>] ? kthread+0xc1/0xe0
> > [<ffffffff8108e080>] ? flush_kthread_worker+0x80/0x80
> > [<ffffffff81572598>] ? ret_from_fork+0x58/0x90
> > [<ffffffff8108e080>] ? flush_kthread_worker+0x80/0x80
> > Code: 89 47 08 74 04 48 89 50 08 48 89 77 10 48 83 ca 01 48 89 16 0f ba 36 00 c3 0f 1f 80 00 00 00 00 f3 90 48 8b 06 a8 01 75 f7 eb b9 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47
> > RIP [<ffffffff811d2613>] __d_rehash+0x53/0x60
> > RSP <ffff8800d0cafaa0>
> > ---[ end trace 18587f42c106645d ]---
>
> There was a report 4 years ago where the same BUG_ON() was hit, but
> during some renames:
> <http://linux-fsdevel.vger.kernel.narkive.com/GcMw2p94/kernel-bug-at-fs-dcache-c-2105-d-rehash-bug-on-d-unhashed-entry>
>
>
> Our kernel is 3.16.7-ckt25 from Debian + ckt26 and ckt27 on top.
So this is the BUG_ON(!d_unhashed(..)) at the top of __d_rehash? I
don't know the -ckt kernels, and I don't have a BUG at line 2373 of
dcache.c, or something else?
There have been a lot of changes in the area since 3.16.7.
--b.
> I don't know exactly what our server was doing at that time, but we were
> setting up a Debian mirror at that time on "btrfs" on an iSCSI storage
> connected to that host.
> The host also imports 2 NFS shares from other servers and exports 3 NFS
> volumes itself.
>
>
> Last week-end the same server got stuck while being backed up:
>
> > BUG: soft lockup - CPU#1 stuck for 22s! [bacula-fd:16612]
> > Modules linked in: nfsv4 dns_resolver nfsv3 btrfs raid6_pq xor crc32c_generic ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rpcsec_gss_krb5 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc quota_v2 quota_tree lpc_ich mfd_core psmouse tpm_tis tpm i2c_i801 rng_core serio_raw processor i2c_core e752x_edac evdev shpchp edac_core pcspkr thermal_sys ext4 jbd2 crc16 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sd_mod crc_t10dif crct10dif_common usb_storage sg sr_mod cdrom ata_generic floppy ehci_pci uhci_hcd ata_piix ehci_hcd mptspi mptscsih libata usbcore e1000 mptbase scsi_transport_spi button usb_common
> > CPU: 1 PID: 16612 Comm: bacula-fd Not tainted 3.16.0-ucs193-amd64 #1 Debian 3.16.7-ckt25-2~bpo70+1~ucs3.3.193.201604181018
> > Hardware name: FUJITSU SIEMENS PRIMERGY RX200S2/D1790/M73IL, BIOS 6.0 Rev. R04A5F5.1790 01/16/2006
> > task: ffff8800c6bf6ce0 ti: ffff8800732cc000 task.ti: ffff8800732cc000
> > RIP: 0010:[<ffffffff811d36c0>] [<ffffffff811d36c0>] __d_lookup_rcu+0x80/0x150
> > RSP: 0018:ffff8800732cfc58 EFLAGS: 00000282
> > RAX: 0000000000000000 RBX: 000102d000000000 RCX: 000000000000000d
> > RDX: ffffc90000005000 RSI: ffff8800732cfe10 RDI: ffff880012c2b918
> > RBP: ffff880012c2b918 R08: ffff880104804050 R09: 0000000000000023
> > R10: ffffffffffffffff R11: 7bfe9e189e400000 R12: ffff8800beb5f080
> > R13: ffff88011b2102e8 R14: ffffffff811a035e R15: 00000000000102d0
> > FS: 00007fa2e701f700(0000) GS:ffff880127c80000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffffffffff600400 CR3: 00000000bd005000 CR4: 00000000000007e0
> > Stack:
> > 0000000000000006 ffff880104804050 0000000000000000 ffffffff811c6449
> > ffff8800732cfcc0 ffff8800732cfd48 ffff8800732cfd38 0000000000000000
> > ffff880012c2b918 ffff880121f0e060 ffff8800732cfe00 ffffffff811c569e
> > Call Trace:
> > [<ffffffff811c6449>] ? link_path_walk+0x69/0x850
> > [<ffffffff811c569e>] ? lookup_fast+0x3e/0x2c0
> > [<ffffffff811c6d68>] ? path_lookupat+0x138/0x740
> > [<ffffffff814bda33>] ? tcp_sendmsg+0xd3/0xd20
> > [<ffffffff812138f5>] ? posix_acl_xattr_get+0x45/0xb0
> > [<ffffffff811c73a4>] ? filename_lookup+0x34/0xd0
> > [<ffffffff811c5cf1>] ? getname_flags.part.21+0x91/0x140
> > [<ffffffff811cb289>] ? user_path_at_empty+0x99/0x120
> > [<ffffffff811f7db4>] ? fsnotify+0x234/0x300
> > [<ffffffff811ba4cf>] ? do_sync_write+0x5f/0x90
> > [<ffffffff811bfc00>] ? vfs_fstatat+0x40/0x90
> > [<ffffffff811bfcc2>] ? SYSC_newlstat+0x12/0x30
> > [<ffffffff811bb33c>] ? vfs_write+0x16c/0x200
> > [<ffffffff811bb80c>] ? SyS_write+0x7c/0xb0
> > [<ffffffff8157264d>] ? system_call_fast_compare_end+0x10/0x15
> > Code: 00 00 00 4d 89 e9 49 c7 c2 ff ff ff ff 49 c1 e9 20 eb 14 0f 1f 84 00 00 00 00 00 48 8b 1b 48 85 db 0f 84 84 00 00 00 4c 8d 63 f8 <8b> 43 fc 48 39 6b 10 75 e7 48 83 7b 08 00 74 e0 89 c2 83 e2 fe
>
> The trace-back triggered by arch_trigger_all_cpu_backtrace() shows the
> following two functions in addition on top of link_path_walk():
>
> > [<ffffffff811d36c7>] ? __d_lookup_rcu+0x87/0x150
> > [<ffffffff811c5f3b>] ? __inode_permission+0x7b/0xd0
> > [<ffffffff811c6449>] ? link_path_walk+0x69/0x850
> ...
>
> Has someone seen similar problems with 3.16?
>
> If you need more info, just ask.
>
> Please cc: on replies as I'm not subscribed to linux-fsdevel (but LKML)
>
> Philipp
> --
> Philipp Hahn
> Open Source Software Engineer
>
> Univention GmbH
> be open.
> Mary-Somerville-Str. 1
> D-28359 Bremen
> Tel.: +49 421 22232-0
> Fax : +49 421 22232-99
> hahn@univention.de
>
> http://www.univention.de/
> Geschäftsführer: Peter H. Ganten
> HRB 20755 Amtsgericht Bremen
> Steuer-Nr.: 71-597-02876
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-06-02 20:02 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-02 7:51 kernel BUG at linux-3.16.7-ckt25/fs/dcache.c:2373! Philipp Hahn
2016-06-02 20:02 ` J. Bruce Fields [this message]
2016-06-03 9:17 ` Philipp Hahn
2016-06-03 13:14 ` J. Bruce Fields
-- strict thread matches above, loose matches on Subject: below --
2016-12-09 9:15 Daniel Pocock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160602200201.GA15429@fieldses.org \
--to=bfields@fieldses.org \
--cc=hahn@univention.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.