From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from fieldses.org ([173.255.197.46]:41510 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752782AbcFBUCD (ORCPT ); Thu, 2 Jun 2016 16:02:03 -0400 Date: Thu, 2 Jun 2016 16:02:01 -0400 To: Philipp Hahn Cc: linux-fsdevel@vger.kernel.org, "linux-kernel@vger.kernel.org" Subject: Re: kernel BUG at linux-3.16.7-ckt25/fs/dcache.c:2373! Message-ID: <20160602200201.GA15429@fieldses.org> References: <574FE57F.7080209@univention.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <574FE57F.7080209@univention.de> From: bfields@fieldses.org (J. Bruce Fields) Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Jun 02, 2016 at 09:51:27AM +0200, Philipp Hahn wrote: > Hello, > > probably during heavy IO our file server crashed on a BUG_ON in dache.c, > probably triggered by NFS: > > > ------------[ cut here ]------------ > > kernel BUG at /var/build/temp/tmp.BPql4ErveJ/pbuilder/linux-3.16.7-ckt25/fs/dcache.c:2373! > > invalid opcode: 0000 [#1] SMP > > Modules linked in: nfsv4 dns_resolver nfsv3 btrfs raid6_pq xor crc32c_generic ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi > > s_acl auth_rpcgss nfs fscache lockd sunrpc quota_v2 quota_tree lpc_ich mfd_core psmouse i2c_i801 tpm_tis processor pcspkr serio_raw i2c_core tpm rng_core thermal_sys > > rc16 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sd_mod crc_t10dif crct10dif_common sg sr_mod cdrom ata_generic ehci_pci uhci_hcd ata_piix e1000 ehci_hcd mptspi mptscsih mptbase floppy scsi_transport_spi usbcore libata button usb_common > > CPU: 0 PID: 1455 Comm: nfsd Not tainted 3.16.0-ucs193-amd64 #1 Debian 3.16.7-ckt25-2~bpo70+1~ucs3.3.193.201604181018 > > Hardware name: FUJITSU SIEMENS PRIMERGY RX200S2/D1790/M73IL, BIOS 6.0 Rev. R04A5F5.1790 01/16/2006 > > task: ffff880121cb03d0 ti: ffff8800d0cac000 task.ti: ffff8800d0cac000 > > RIP: 0010:[] [] __d_rehash+0x53/0x60 > > RSP: 0018:ffff8800d0cafaa0 EFLAGS: 00010282 > > RAX: 0000000000043e72 RBX: ffff8800939d4578 RCX: 000000000000000d > > RDX: ffffc90000005000 RSI: ffffc90000224390 RDI: ffff8800b8d53018 > > RBP: ffff8800b8d53018 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0100000000000b15 R11: 0000000000000000 R12: ffff8800d1388198 > > R13: ffff8800b8d53018 R14: ffff8800b8d53b58 R15: ffff8800d0cafbe0 > > FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: ffffffffff600400 CR3: 00000001007d7000 CR4: 00000000000007f0 > > Stack: > > ffffffff811d2a3d ffff8800d0cafb27 ffff8800d1388198 ffff8800b8d53b58 > > ffff8800d1388198 ffff8800b8d53b58 ffff8800b8d53b58 ffff880122829860 > > ffff8800b8d53b58 ffff8800d0cafbe0 ffffffff811c44f9 ffff8800b8d53b58 > > Call Trace: > > [] ? d_materialise_unique+0x8d/0x3d0 > > [] ? lookup_real+0x19/0x50 > > [] ? __lookup_hash+0x37/0x50 > > [] ? lookup_one_len+0xbd/0x110 > > [] ? reconnect_path+0x109/0x2d0 > > [] ? nfsd_setuser_and_check_port+0xa0/0xa0 [nfsd] > > [] ? exportfs_decode_fh+0x12f/0x300 > > [] ? expkey_match+0x45/0x50 [nfsd] > > [] ? cache_check+0x114/0x3b0 [sunrpc] > > [] ? exp_find+0xfd/0x1b0 [nfsd] > > [] ? fh_verify+0x2fc/0x5a0 [nfsd] > > [] ? nfsd4_proc_compound+0x4c3/0x790 [nfsd] > > [] ? nfsd_dispatch+0xe4/0x230 [nfsd] > > [] ? svc_process_common+0x33e/0x6a0 [sunrpc] > > [] ? nfsd_destroy+0x70/0x70 [nfsd] > > [] ? svc_process+0x110/0x160 [sunrpc] > > [] ? nfsd+0xc7/0x140 [nfsd] > > [] ? kthread+0xc1/0xe0 > > [] ? flush_kthread_worker+0x80/0x80 > > [] ? ret_from_fork+0x58/0x90 > > [] ? flush_kthread_worker+0x80/0x80 > > Code: 89 47 08 74 04 48 89 50 08 48 89 77 10 48 83 ca 01 48 89 16 0f ba 36 00 c3 0f 1f 80 00 00 00 00 f3 90 48 8b 06 a8 01 75 f7 eb b9 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 > > RIP [] __d_rehash+0x53/0x60 > > RSP > > ---[ end trace 18587f42c106645d ]--- > > There was a report 4 years ago where the same BUG_ON() was hit, but > during some renames: > > > > Our kernel is 3.16.7-ckt25 from Debian + ckt26 and ckt27 on top. So this is the BUG_ON(!d_unhashed(..)) at the top of __d_rehash? I don't know the -ckt kernels, and I don't have a BUG at line 2373 of dcache.c, or something else? There have been a lot of changes in the area since 3.16.7. --b. > I don't know exactly what our server was doing at that time, but we were > setting up a Debian mirror at that time on "btrfs" on an iSCSI storage > connected to that host. > The host also imports 2 NFS shares from other servers and exports 3 NFS > volumes itself. > > > Last week-end the same server got stuck while being backed up: > > > BUG: soft lockup - CPU#1 stuck for 22s! [bacula-fd:16612] > > Modules linked in: nfsv4 dns_resolver nfsv3 btrfs raid6_pq xor crc32c_generic ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rpcsec_gss_krb5 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc quota_v2 quota_tree lpc_ich mfd_core psmouse tpm_tis tpm i2c_i801 rng_core serio_raw processor i2c_core e752x_edac evdev shpchp edac_core pcspkr thermal_sys ext4 jbd2 crc16 dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod sd_mod crc_t10dif crct10dif_common usb_storage sg sr_mod cdrom ata_generic floppy ehci_pci uhci_hcd ata_piix ehci_hcd mptspi mptscsih libata usbcore e1000 mptbase scsi_transport_spi button usb_common > > CPU: 1 PID: 16612 Comm: bacula-fd Not tainted 3.16.0-ucs193-amd64 #1 Debian 3.16.7-ckt25-2~bpo70+1~ucs3.3.193.201604181018 > > Hardware name: FUJITSU SIEMENS PRIMERGY RX200S2/D1790/M73IL, BIOS 6.0 Rev. R04A5F5.1790 01/16/2006 > > task: ffff8800c6bf6ce0 ti: ffff8800732cc000 task.ti: ffff8800732cc000 > > RIP: 0010:[] [] __d_lookup_rcu+0x80/0x150 > > RSP: 0018:ffff8800732cfc58 EFLAGS: 00000282 > > RAX: 0000000000000000 RBX: 000102d000000000 RCX: 000000000000000d > > RDX: ffffc90000005000 RSI: ffff8800732cfe10 RDI: ffff880012c2b918 > > RBP: ffff880012c2b918 R08: ffff880104804050 R09: 0000000000000023 > > R10: ffffffffffffffff R11: 7bfe9e189e400000 R12: ffff8800beb5f080 > > R13: ffff88011b2102e8 R14: ffffffff811a035e R15: 00000000000102d0 > > FS: 00007fa2e701f700(0000) GS:ffff880127c80000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: ffffffffff600400 CR3: 00000000bd005000 CR4: 00000000000007e0 > > Stack: > > 0000000000000006 ffff880104804050 0000000000000000 ffffffff811c6449 > > ffff8800732cfcc0 ffff8800732cfd48 ffff8800732cfd38 0000000000000000 > > ffff880012c2b918 ffff880121f0e060 ffff8800732cfe00 ffffffff811c569e > > Call Trace: > > [] ? link_path_walk+0x69/0x850 > > [] ? lookup_fast+0x3e/0x2c0 > > [] ? path_lookupat+0x138/0x740 > > [] ? tcp_sendmsg+0xd3/0xd20 > > [] ? posix_acl_xattr_get+0x45/0xb0 > > [] ? filename_lookup+0x34/0xd0 > > [] ? getname_flags.part.21+0x91/0x140 > > [] ? user_path_at_empty+0x99/0x120 > > [] ? fsnotify+0x234/0x300 > > [] ? do_sync_write+0x5f/0x90 > > [] ? vfs_fstatat+0x40/0x90 > > [] ? SYSC_newlstat+0x12/0x30 > > [] ? vfs_write+0x16c/0x200 > > [] ? SyS_write+0x7c/0xb0 > > [] ? system_call_fast_compare_end+0x10/0x15 > > Code: 00 00 00 4d 89 e9 49 c7 c2 ff ff ff ff 49 c1 e9 20 eb 14 0f 1f 84 00 00 00 00 00 48 8b 1b 48 85 db 0f 84 84 00 00 00 4c 8d 63 f8 <8b> 43 fc 48 39 6b 10 75 e7 48 83 7b 08 00 74 e0 89 c2 83 e2 fe > > The trace-back triggered by arch_trigger_all_cpu_backtrace() shows the > following two functions in addition on top of link_path_walk(): > > > [] ? __d_lookup_rcu+0x87/0x150 > > [] ? __inode_permission+0x7b/0xd0 > > [] ? link_path_walk+0x69/0x850 > ... > > Has someone seen similar problems with 3.16? > > If you need more info, just ask. > > Please cc: on replies as I'm not subscribed to linux-fsdevel (but LKML) > > Philipp > -- > Philipp Hahn > Open Source Software Engineer > > Univention GmbH > be open. > Mary-Somerville-Str. 1 > D-28359 Bremen > Tel.: +49 421 22232-0 > Fax : +49 421 22232-99 > hahn@univention.de > > http://www.univention.de/ > Geschäftsführer: Peter H. Ganten > HRB 20755 Amtsgericht Bremen > Steuer-Nr.: 71-597-02876 > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html