* blocked on the spin lock @ 2016-05-02 7:29 Mkrtchyan, Tigran 2016-05-02 13:30 ` Weston Andros Adamson 2016-05-16 19:02 ` Dave Wysochanski 0 siblings, 2 replies; 7+ messages in thread From: Mkrtchyan, Tigran @ 2016-05-02 7:29 UTC (permalink / raw) To: Weston Andros Adamson; +Cc: Linux NFS Mailing List, Steve Dickson, Andy Adamson Hi Dros et. al. We have seen the following stack trace on one of out systems: Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! [tee:13755] Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] Apr 28 22:09:51 bird510 kernel: CPU 7 Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] Apr 28 22:09:51 bird510 kernel: Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M Apr 28 22:09:51 bird510 kernel: RIP: 0010:[<ffffffff8153bde1>] [<ffffffff8153bde1>] _spin_lock+0x21/0x30 Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: 0000000000000000 Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: ffff880159aeb8d0 Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: 0000000000000002 Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: 0000000000000001 Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: 0000000000000000 Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) GS:ffff880028260000(0000) knlGS:0000000000000000 Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: 00000000000007e0 Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo ffff880352814000, task ffff880636064ab0) Apr 28 22:09:51 bird510 kernel: Stack: Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 ffff880159aeb8d0 ffff880159aeb770 Apr 28 22:09:51 bird510 kernel: <d> 0000000000000000 0000000000000000 ffffffffa04bd040 ffffffffa048f39a Apr 28 22:09:51 bird510 kernel: <d> ffff8803378761c0 ffffea000a13dcc0 ffff880352817a48 ffffffffa0490ec8 Apr 28 22:09:51 bird510 kernel: Call Trace: Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f Apr 28 22:09:51 bird510 kernel: Call Trace: Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b The questionable spin lock was remove by commit 411a99adf. Is there was a problem? Do we need a packport in RHEL kernel? Thanks a lot, Tigran. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blocked on the spin lock 2016-05-02 7:29 blocked on the spin lock Mkrtchyan, Tigran @ 2016-05-02 13:30 ` Weston Andros Adamson 2016-05-02 20:07 ` Mkrtchyan, Tigran 2016-05-16 19:02 ` Dave Wysochanski 1 sibling, 1 reply; 7+ messages in thread From: Weston Andros Adamson @ 2016-05-02 13:30 UTC (permalink / raw) To: Tigran Mkrtchyan; +Cc: linux-nfs list, Steve Dickson, William Andros Adamson > On May 2, 2016, at 3:29 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote: > > > > Hi Dros et. al. > > We have seen the following stack trace on one of out systems: > > > Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! [tee:13755] > Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy > dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p > tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] > Apr 28 22:09:51 bird510 kernel: CPU 7 > Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy > dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p > tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] > Apr 28 22:09:51 bird510 kernel: > Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M > Apr 28 22:09:51 bird510 kernel: RIP: 0010:[<ffffffff8153bde1>] [<ffffffff8153bde1>] _spin_lock+0x21/0x30 > Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 > Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: 0000000000000000 > Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: ffff880159aeb8d0 > Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: 0000000000000002 > Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: 0000000000000001 > Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: 0000000000000000 > Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) GS:ffff880028260000(0000) knlGS:0000000000000000 > Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: 00000000000007e0 > Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo ffff880352814000, task ffff880636064ab0) > Apr 28 22:09:51 bird510 kernel: Stack: > Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 ffff880159aeb8d0 ffff880159aeb770 > Apr 28 22:09:51 bird510 kernel: <d> 0000000000000000 0000000000000000 ffffffffa04bd040 ffffffffa048f39a > Apr 28 22:09:51 bird510 kernel: <d> ffff8803378761c0 ffffea000a13dcc0 ffff880352817a48 ffffffffa0490ec8 > Apr 28 22:09:51 bird510 kernel: Call Trace: > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b > Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f > Apr 28 22:09:51 bird510 kernel: Call Trace: > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b > > > > > The questionable spin lock was remove by commit 411a99adf. Is there was a problem? > Do we need a packport in RHEL kernel? > > Thanks a lot, > Tigran. > Hey Tigran, It’s been a while since I looked at this, so I don’t know off the top of my head, but IIRC the patch series that 411a99adf is part of are some important bug fixes to the effort to split sub-page regions: 411a99adffb4 nfs: clear_request_commit while holding i_lock e6cf82d1830f pnfs: add pnfs_put_lseg_async 02d1426c7053 pnfs: find swapped pages on pnfs commit lists too b412ddf0661e nfs: fix comment and add warn_on for PG_INODE_REF e7029206ff43 nfs: check wait_on_bit_lock err in page_group_lock Also, if you are using the pnfs_nfs commit path, there are several recent fixes that are pretty important. Until recently, the only server to do NFS backed pnfs did only STABLE writes, so that stuff was pretty much untested… In fact, I have a patch to post today that fixes a problem in the commit path, although it probably isn’t an issue in RHEL as it was a regression introduced by a recent patch. -dros ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blocked on the spin lock 2016-05-02 13:30 ` Weston Andros Adamson @ 2016-05-02 20:07 ` Mkrtchyan, Tigran 2016-05-02 21:08 ` Weston Andros Adamson 0 siblings, 1 reply; 7+ messages in thread From: Mkrtchyan, Tigran @ 2016-05-02 20:07 UTC (permalink / raw) To: Weston Andros Adamson; +Cc: linux-nfs list, Steve Dickson, Andy Adamson Hi Dros, as you can see from the stack trace, this is NFSv3. For now, it happened only once. May be there is some change with our workflow. I will let you know, if we see it again. Thanks, Tigran. ----- Original Message ----- > From: "Weston Andros Adamson" <dros@monkey.org> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de> > Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>, "Steve Dickson" <steved@redhat.com>, "Andy Adamson" > <william.adamson@netapp.com> > Sent: Monday, May 2, 2016 3:30:29 PM > Subject: Re: blocked on the spin lock >> On May 2, 2016, at 3:29 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote: >> >> >> >> Hi Dros et. al. >> >> We have seen the following stack trace on one of out systems: >> >> >> Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! >> [tee:13755] >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu >> nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat >> usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc >> cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support >> bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 >> mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib >> ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: >> ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: CPU 7 >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu >> nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat >> usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc >> cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support >> bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 >> mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib >> ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: >> ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: >> Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- >> ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M >> Apr 28 22:09:51 bird510 kernel: RIP: 0010:[<ffffffff8153bde1>] >> [<ffffffff8153bde1>] _spin_lock+0x21/0x30 >> Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 >> Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: >> 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: >> ffff880159aeb8d0 >> Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: >> 0000000000000002 >> Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: >> 0000000000000001 >> Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: >> 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) >> GS:ffff880028260000(0000) knlGS:0000000000000000 >> Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >> 000000008005003b >> Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: >> 00000000000007e0 >> Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> 0000000000000400 >> Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo >> ffff880352814000, task ffff880636064ab0) >> Apr 28 22:09:51 bird510 kernel: Stack: >> Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 >> ffff880159aeb8d0 ffff880159aeb770 >> Apr 28 22:09:51 bird510 kernel: <d> 0000000000000000 0000000000000000 >> ffffffffa04bd040 ffffffffa048f39a >> Apr 28 22:09:51 bird510 kernel: <d> ffff8803378761c0 ffffea000a13dcc0 >> ffff880352817a48 ffffffffa0490ec8 >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? >> nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? >> nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? >> nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? >> nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? >> do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? >> truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? >> truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? >> mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? >> truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? >> truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? >> nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? >> nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? >> security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? >> strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? >> system_call_fastpath+0x16/0x1b >> Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 >> 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 >> 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? >> nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? >> nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? >> nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? >> nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? >> do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? >> truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? >> truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? >> mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? >> truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? >> truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? >> nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? >> nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? >> security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? >> strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? >> system_call_fastpath+0x16/0x1b >> >> >> >> >> The questionable spin lock was remove by commit 411a99adf. Is there was a >> problem? >> Do we need a packport in RHEL kernel? >> >> Thanks a lot, >> Tigran. >> > > Hey Tigran, > > It’s been a while since I looked at this, so I don’t know off the top of my > head, but IIRC the patch series that 411a99adf is part of are some important > bug fixes to the effort to split sub-page regions: > > 411a99adffb4 nfs: clear_request_commit while holding i_lock > e6cf82d1830f pnfs: add pnfs_put_lseg_async > 02d1426c7053 pnfs: find swapped pages on pnfs commit lists too > b412ddf0661e nfs: fix comment and add warn_on for PG_INODE_REF > e7029206ff43 nfs: check wait_on_bit_lock err in page_group_lock > > Also, if you are using the pnfs_nfs commit path, there are several recent fixes > that are pretty important. Until recently, the only server to do NFS backed > pnfs did only STABLE writes, so that stuff was pretty much untested… > > In fact, I have a patch to post today that fixes a problem in the commit path, > although it probably isn’t an issue in RHEL as it was a regression introduced > by a recent patch. > > -dros ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blocked on the spin lock 2016-05-02 20:07 ` Mkrtchyan, Tigran @ 2016-05-02 21:08 ` Weston Andros Adamson 0 siblings, 0 replies; 7+ messages in thread From: Weston Andros Adamson @ 2016-05-02 21:08 UTC (permalink / raw) To: Tigran Mkrtchyan; +Cc: linux-nfs list, Steve Dickson, William Andros Adamson > On May 2, 2016, at 4:07 PM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote: > > Hi Dros, > > as you can see from the stack trace, this is NFSv3. Oh, I didn't notice that! > For now, it happened only once. May be there is some change with > our workflow. I will let you know, if we see it again. OK. If you do, try applying that patch set and see if it’s still reproducible. -dros > > > Thanks, > Tigran. > > ----- Original Message ----- >> From: "Weston Andros Adamson" <dros@monkey.org> >> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de> >> Cc: "linux-nfs list" <linux-nfs@vger.kernel.org>, "Steve Dickson" <steved@redhat.com>, "Andy Adamson" >> <william.adamson@netapp.com> >> Sent: Monday, May 2, 2016 3:30:29 PM >> Subject: Re: blocked on the spin lock > >>> On May 2, 2016, at 3:29 AM, Mkrtchyan, Tigran <tigran.mkrtchyan@desy.de> wrote: >>> >>> >>> >>> Hi Dros et. al. >>> >>> We have seen the following stack trace on one of out systems: >>> >>> >>> Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! >>> [tee:13755] >>> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu >>> nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat >>> usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc >>> cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >>> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support >>> bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 >>> mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib >>> ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >>> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: >>> ipmi_devintf] >>> Apr 28 22:09:51 bird510 kernel: CPU 7 >>> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu >>> nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat >>> usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc >>> cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >>> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support >>> bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 >>> mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib >>> ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >>> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: >>> ipmi_devintf] >>> Apr 28 22:09:51 bird510 kernel: >>> Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- >>> ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M >>> Apr 28 22:09:51 bird510 kernel: RIP: 0010:[<ffffffff8153bde1>] >>> [<ffffffff8153bde1>] _spin_lock+0x21/0x30 >>> Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 >>> Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: >>> 0000000000000000 >>> Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: >>> ffff880159aeb8d0 >>> Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: >>> 0000000000000002 >>> Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: >>> 0000000000000001 >>> Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: >>> 0000000000000000 >>> Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) >>> GS:ffff880028260000(0000) knlGS:0000000000000000 >>> Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >>> 000000008005003b >>> Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: >>> 00000000000007e0 >>> Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >>> 0000000000000400 >>> Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo >>> ffff880352814000, task ffff880636064ab0) >>> Apr 28 22:09:51 bird510 kernel: Stack: >>> Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 >>> ffff880159aeb8d0 ffff880159aeb770 >>> Apr 28 22:09:51 bird510 kernel: <d> 0000000000000000 0000000000000000 >>> ffffffffa04bd040 ffffffffa048f39a >>> Apr 28 22:09:51 bird510 kernel: <d> ffff8803378761c0 ffffea000a13dcc0 >>> ffff880352817a48 ffffffffa0490ec8 >>> Apr 28 22:09:51 bird510 kernel: Call Trace: >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? >>> nfs_clear_request_commit+0xa1/0xf0 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? >>> nfs_page_find_request_locked+0x2a/0x40 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? >>> nfs_wb_page_cancel+0xb8/0xf0 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? >>> nfs_invalidate_page+0x47/0x80 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? >>> do_invalidatepage+0x25/0x30 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? >>> truncate_inode_page+0xa2/0xc0 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? >>> truncate_inode_pages_range+0x16f/0x500 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? >>> mempool_free_slab+0x17/0x20 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? >>> truncate_inode_pages+0x15/0x20 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? >>> truncate_pagecache+0x4f/0x70 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? >>> nfs_setattr_update_inode+0xb9/0x150 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? >>> nfs3_proc_setattr+0xdb/0x120 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 >>> [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 >>> [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? >>> security_inode_permission+0x1f/0x30 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? >>> strncpy_from_user+0x4a/0x90 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? >>> system_call_fastpath+0x16/0x1b >>> Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 >>> 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 >>> 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f >>> Apr 28 22:09:51 bird510 kernel: Call Trace: >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? >>> nfs_clear_request_commit+0xa1/0xf0 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? >>> nfs_page_find_request_locked+0x2a/0x40 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? >>> nfs_wb_page_cancel+0xb8/0xf0 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? >>> nfs_invalidate_page+0x47/0x80 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? >>> do_invalidatepage+0x25/0x30 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? >>> truncate_inode_page+0xa2/0xc0 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? >>> truncate_inode_pages_range+0x16f/0x500 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? >>> mempool_free_slab+0x17/0x20 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? >>> truncate_inode_pages+0x15/0x20 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? >>> truncate_pagecache+0x4f/0x70 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? >>> nfs_setattr_update_inode+0xb9/0x150 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? >>> nfs3_proc_setattr+0xdb/0x120 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 >>> [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 >>> [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? >>> security_inode_permission+0x1f/0x30 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? >>> strncpy_from_user+0x4a/0x90 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >>> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? >>> system_call_fastpath+0x16/0x1b >>> >>> >>> >>> >>> The questionable spin lock was remove by commit 411a99adf. Is there was a >>> problem? >>> Do we need a packport in RHEL kernel? >>> >>> Thanks a lot, >>> Tigran. >>> >> >> Hey Tigran, >> >> It’s been a while since I looked at this, so I don’t know off the top of my >> head, but IIRC the patch series that 411a99adf is part of are some important >> bug fixes to the effort to split sub-page regions: >> >> 411a99adffb4 nfs: clear_request_commit while holding i_lock >> e6cf82d1830f pnfs: add pnfs_put_lseg_async >> 02d1426c7053 pnfs: find swapped pages on pnfs commit lists too >> b412ddf0661e nfs: fix comment and add warn_on for PG_INODE_REF >> e7029206ff43 nfs: check wait_on_bit_lock err in page_group_lock >> >> Also, if you are using the pnfs_nfs commit path, there are several recent fixes >> that are pretty important. Until recently, the only server to do NFS backed >> pnfs did only STABLE writes, so that stuff was pretty much untested… >> >> In fact, I have a patch to post today that fixes a problem in the commit path, >> although it probably isn’t an issue in RHEL as it was a regression introduced >> by a recent patch. >> >> -dros ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blocked on the spin lock 2016-05-02 7:29 blocked on the spin lock Mkrtchyan, Tigran 2016-05-02 13:30 ` Weston Andros Adamson @ 2016-05-16 19:02 ` Dave Wysochanski 2016-05-16 19:15 ` Weston Andros Adamson 2016-05-16 20:59 ` Mkrtchyan, Tigran 1 sibling, 2 replies; 7+ messages in thread From: Dave Wysochanski @ 2016-05-16 19:02 UTC (permalink / raw) To: Mkrtchyan, Tigran Cc: Weston Andros Adamson, Linux NFS Mailing List, Steve Dickson, Andy Adamson On Mon, 2016-05-02 at 09:29 +0200, Mkrtchyan, Tigran wrote: > > Hi Dros et. al. > > We have seen the following stack trace on one of out systems: > > > Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! [tee:13755] > Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy > dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p > tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] > Apr 28 22:09:51 bird510 kernel: CPU 7 > Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy > dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p > tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] > Apr 28 22:09:51 bird510 kernel: > Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M > Apr 28 22:09:51 bird510 kernel: RIP: 0010:[<ffffffff8153bde1>] [<ffffffff8153bde1>] _spin_lock+0x21/0x30 > Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 > Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: 0000000000000000 > Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: ffff880159aeb8d0 > Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: 0000000000000002 > Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: 0000000000000001 > Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: 0000000000000000 > Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) GS:ffff880028260000(0000) knlGS:0000000000000000 > Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: 00000000000007e0 > Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo ffff880352814000, task ffff880636064ab0) > Apr 28 22:09:51 bird510 kernel: Stack: > Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 ffff880159aeb8d0 ffff880159aeb770 > Apr 28 22:09:51 bird510 kernel: <d> 0000000000000000 0000000000000000 ffffffffa04bd040 ffffffffa048f39a > Apr 28 22:09:51 bird510 kernel: <d> ffff8803378761c0 ffffea000a13dcc0 ffff880352817a48 ffffffffa0490ec8 > Apr 28 22:09:51 bird510 kernel: Call Trace: > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b > Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f > Apr 28 22:09:51 bird510 kernel: Call Trace: > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 > Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 > Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 > Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 > Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b > > > > > The questionable spin lock was remove by commit 411a99adf. Is there was a problem? > Do we need a packport in RHEL kernel? > This is a known bug 1322823. It should be fixed in RHEL6.7 kernel-2.6.32-573.27.1.el6 or above as well as RHEL6.8 kernel-2.6.32-642.el6 or higher. It only affects kernels 2.6.32-573.10.2.el6 until kernel-2.6.32-573.27.1.el6. The bug is private but see the description below. This bug was an unfortunate side-effect of a fix for another nasty bug Bug 1135601 - System panics with "kernel BUG at fs/nfs/inode.c:113! But 1135601 was fixed by a RHEL6 equivalent of the following upstream patch: 3e21704 nfs: handle multiple reqs in nfs_wb_page_cancel Saving &inode->i_lock in an NFS specific structure seems problematic which maybe why it was missed in the backport, and partially removed in commit 411a99adf. I posted a related cleanup patch in this area https://www.spinics.net/lists/linux-nfs/msg57194.html Bug 1322823 - RHEL6.7.z: nfs client kernel deadlock due to inode->i_lock == nfs_commit_info.lock in code path nfs_wb_page_cancel -> nfs_clear_request_commit due to patch to fix bug 1135601 Dave Wysochanski 2016-03-31 07:50:18 EDT Description of problem: Repeated kernel deadlock hit multiple times by a customer, and caused by the current fix to bug 1135601. Basically the NFS client / RPC deadlocks due to a spinlock being taken twice. We have vmcores. I think in the fix for 1135601 the following "finer point" of the NFS client was missed: So inode->i_lock == nfs_commit_info.lock? If that's true it may be ok but it's not obvious and it's unclear why the code isn't just using inode->i_lock but apparently copying the address into nfs_commit_info.lock ? Indeed looking at the code we have: ~~~ static void nfs_init_cinfo_from_inode(struct nfs_commit_info *cinfo, struct inode *inode) { cinfo->lock = &inode->i_lock; cinfo->mds = &NFS_I(inode)->commit_info; cinfo->ds = pnfs_get_ds_info(inode); cinfo->dreq = NULL; cinfo->completion_ops = &nfs_commit_completion_ops; } "fs/nfs/write.c" 1927L, 50230C ~~~ == Technical explanation of deadlock seen in vmcore == Ok with the understanding that inode->i_lock == nfs_commit_info.lock, take a look at the pid 11305 ~~~ crash> bt PID: 11305 TASK: ffff881c7715aab0 CPU: 8 COMMAND: "sas" #0 [ffff881078806e90] crash_nmi_callback at ffffffff81033cf6 #1 [ffff881078806ea0] notifier_call_chain at ffffffff8153f545 #2 [ffff881078806ee0] atomic_notifier_call_chain at ffffffff8153f5aa #3 [ffff881078806ef0] notify_die at ffffffff810a783e #4 [ffff881078806f20] do_nmi at ffffffff8153d203 #5 [ffff881078806f50] nmi at ffffffff8153cac0 [exception RIP: _spin_lock+0x1c] RIP: ffffffff8153c32c RSP: ffff88104032fb58 RFLAGS: 00000297 RAX: 00000000000009c1 RBX: ffff880a82b84bc0 RCX: 0000000000000000 RDX: 00000000000009c0 RSI: ffff8809ff681820 RDI: ffff8809ff6818d0 RBP: ffff88104032fb58 R8: ffff88104032fc80 R9: 0000000000000002 R10: ffffea0022eec790 R11: 0000000000000003 R12: ffff88104032fb68 R13: 0000000000000000 R14: ffff8809ff6818d0 R15: 0000000000000020 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff88104032fb58] _spin_lock at ffffffff8153c32c #7 [ffff88104032fb60] nfs_clear_request_commit at ffffffffa03e9dc1 [nfs] #8 [ffff88104032fbb0] nfs_wb_page_cancel at ffffffffa03e9ec8 [nfs] #9 [ffff88104032fbe0] nfs_invalidate_page at ffffffffa03d74f7 [nfs] #10 [ffff88104032fc00] do_invalidatepage at ffffffff8113edb5 #11 [ffff88104032fc10] truncate_inode_page at ffffffff8113f0d2 #12 [ffff88104032fc30] truncate_inode_pages_range at ffffffff8113f47f #13 [ffff88104032fd20] truncate_inode_pages at ffffffff8113f8a5 #14 [ffff88104032fd30] truncate_pagecache at ffffffff8113f8ff #15 [ffff88104032fd60] nfs_setattr_update_inode at ffffffffa03d9eb9 [nfs] #16 [ffff88104032fd90] nfs3_proc_setattr at ffffffffa03eca6b [nfs] #17 [ffff88104032fe20] nfs_setattr at ffffffffa03db610 [nfs] #18 [ffff88104032fe60] notify_change at ffffffff811b1338 #19 [ffff88104032fed0] do_truncate at ffffffff8118fee4 #20 [ffff88104032ff40] sys_ftruncate at ffffffff81190290 #21 [ffff88104032ff80] system_call_fastpath at ffffffff8100b0d2 RIP: 00002b53f548a4d7 RSP: 00002b543fb36b50 RFLAGS: 00010246 RAX: 000000000000004d RBX: ffffffff8100b0d2 RCX: 00002b5426a29c48 RDX: 00002b543f11a880 RSI: 0000000000020000 RDI: 0000000000000028 RBP: 00002b543f11a880 R8: 0000000000010000 R9: 00002b544e8a63b0 R10: 000000000000002a R11: 0000000000000206 R12: 00002b543f11ad88 R13: 00002b53f9ae7fe0 R14: 00002b53fa43af20 R15: 00002b543f11a880 ORIG_RAX: 000000000000004d CS: 0033 SS: 002b crash> dis -lr ffffffffa03e9ec8 | tail 0xffffffffa03e9eb6 <nfs_wb_page_cancel+0xa6>: pop %r13 0xffffffffa03e9eb8 <nfs_wb_page_cancel+0xa8>: pop %r14 0xffffffffa03e9eba <nfs_wb_page_cancel+0xaa>: leaveq 0xffffffffa03e9ebb <nfs_wb_page_cancel+0xab>: retq 0xffffffffa03e9ebc <nfs_wb_page_cancel+0xac>: nopl 0x0(%rax) /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: 1787 0xffffffffa03e9ec0 <nfs_wb_page_cancel+0xb0>: mov %rbx,%rdi 0xffffffffa03e9ec3 <nfs_wb_page_cancel+0xb3>: callq 0xffffffffa03e9d20 <nfs_clear_request_commit> /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/spinlock.h: 174 0xffffffffa03e9ec8 <nfs_wb_page_cancel+0xb8>: mov %r14,%rax crash> dis -lr ffffffffa03e9dc1 | tail 0xffffffffa03e9dab <nfs_clear_request_commit+0x8b>: mov -0x10(%rbp),%rbx 0xffffffffa03e9daf <nfs_clear_request_commit+0x8f>: mov -0x8(%rbp),%r12 0xffffffffa03e9db3 <nfs_clear_request_commit+0x93>: leaveq 0xffffffffa03e9db4 <nfs_clear_request_commit+0x94>: retq 0xffffffffa03e9db5 <nfs_clear_request_commit+0x95>: nopl (%rax) /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: 558 0xffffffffa03e9db8 <nfs_clear_request_commit+0x98>: mov -0x40(%rbp),%rdi 0xffffffffa03e9dbc <nfs_clear_request_commit+0x9c>: callq 0xffffffff8153c310 <_spin_lock> /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/bitops.h: 252 0xffffffffa03e9dc1 <nfs_clear_request_commit+0xa1>: lock btrl $0x2,0x40(%rbx) ~~~ Now look at the code, specifically nfs_wb_page_cancel and nfs_clear_request_commit. On line 1780 we take inode->i_lock. ~~~ fs/nfs/write.c 1771 1772 int nfs_wb_page_cancel(struct inode *inode, struct page *page) 1773 { 1774 struct nfs_page *req; 1775 int ret = 0; 1776 1777 BUG_ON(!PageLocked(page)); 1778 for (;;) { 1779 wait_on_page_writeback(page); 1780 spin_lock(&inode->i_lock); <----------- Takes spinlock the first time 1781 req = nfs_page_find_request_locked(page); 1782 if (req == NULL) { 1783 spin_unlock(&inode->i_lock); 1784 break; 1785 } 1786 if (nfs_lock_request(req)) { 1787 nfs_clear_request_commit(req); <---------- executing inside here 1788 spin_unlock(&inode->i_lock); 549 static void 550 nfs_clear_request_commit(struct nfs_page *req) 551 { 552 if (test_bit(PG_CLEAN, &req->wb_flags)) { 553 struct inode *inode = req->wb_context->dentry->d_inode; 554 struct nfs_commit_info cinfo; 555 556 nfs_init_cinfo_from_inode(&cinfo, inode); 557 if (!pnfs_clear_request_commit(req, &cinfo)) { 558--> spin_lock(cinfo.lock); <-------------- Tries to take spinlock a second time 559 nfs_request_remove_commit_list(req, &cinfo); 560 spin_unlock(cinfo.lock); 561 } 562 nfs_clear_page_commit(req->wb_page); 563 } 564 } ~~~ Version-Release number of selected component (if applicable): 2.6.32-573.18.1.el6 How reproducible: multiple times by one customer so far Steps to Reproduce: TBD Actual results: NFS client system deadlocks ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blocked on the spin lock 2016-05-16 19:02 ` Dave Wysochanski @ 2016-05-16 19:15 ` Weston Andros Adamson 2016-05-16 20:59 ` Mkrtchyan, Tigran 1 sibling, 0 replies; 7+ messages in thread From: Weston Andros Adamson @ 2016-05-16 19:15 UTC (permalink / raw) To: dwysocha Cc: Tigran Mkrtchyan, linux-nfs list, Steve Dickson, William Andros Adamson > On May 16, 2016, at 3:02 PM, Dave Wysochanski <dwysocha@redhat.com> wrote: > > On Mon, 2016-05-02 at 09:29 +0200, Mkrtchyan, Tigran wrote: >> >> Hi Dros et. al. >> >> We have seen the following stack trace on one of out systems: >> >> >> Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! [tee:13755] >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: CPU 7 >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: >> Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M >> Apr 28 22:09:51 bird510 kernel: RIP: 0010:[<ffffffff8153bde1>] [<ffffffff8153bde1>] _spin_lock+0x21/0x30 >> Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 >> Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: ffff880159aeb8d0 >> Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: 0000000000000002 >> Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: 0000000000000001 >> Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) GS:ffff880028260000(0000) knlGS:0000000000000000 >> Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: 00000000000007e0 >> Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo ffff880352814000, task ffff880636064ab0) >> Apr 28 22:09:51 bird510 kernel: Stack: >> Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 ffff880159aeb8d0 ffff880159aeb770 >> Apr 28 22:09:51 bird510 kernel: <d> 0000000000000000 0000000000000000 ffffffffa04bd040 ffffffffa048f39a >> Apr 28 22:09:51 bird510 kernel: <d> ffff8803378761c0 ffffea000a13dcc0 ffff880352817a48 ffffffffa0490ec8 >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b >> Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b >> >> >> >> >> The questionable spin lock was remove by commit 411a99adf. Is there was a problem? >> Do we need a packport in RHEL kernel? >> > > > This is a known bug 1322823. It should be fixed in RHEL6.7 > kernel-2.6.32-573.27.1.el6 or above as well as RHEL6.8 > kernel-2.6.32-642.el6 or higher. It only affects kernels > 2.6.32-573.10.2.el6 until kernel-2.6.32-573.27.1.el6. The bug is > private but see the description below. Thanks for the detailed explanation! -dros > > This bug was an unfortunate side-effect of a fix for another nasty bug > Bug 1135601 - System panics with "kernel BUG at fs/nfs/inode.c:113! > But 1135601 was fixed by a RHEL6 equivalent of the following upstream > patch: 3e21704 nfs: handle multiple reqs in nfs_wb_page_cancel > > Saving &inode->i_lock in an NFS specific structure seems problematic > which maybe why it was missed in the backport, and partially removed in > commit 411a99adf. I posted a related cleanup patch in this area > https://www.spinics.net/lists/linux-nfs/msg57194.html > > > > Bug 1322823 - RHEL6.7.z: nfs client kernel deadlock due to inode->i_lock > == nfs_commit_info.lock in code path nfs_wb_page_cancel -> > nfs_clear_request_commit due to patch to fix bug 1135601 > > Dave Wysochanski 2016-03-31 07:50:18 EDT > > Description of problem: > Repeated kernel deadlock hit multiple times by a customer, and caused by the current fix to bug 1135601. Basically the NFS client / RPC deadlocks due to a spinlock being taken twice. We have vmcores. > > I think in the fix for 1135601 the following "finer point" of the NFS > client was missed: So inode->i_lock == nfs_commit_info.lock? If that's > true it may be ok but it's not obvious and it's unclear why the code > isn't just using inode->i_lock but apparently copying the address into > nfs_commit_info.lock ? Indeed looking at the code we have: > ~~~ > static void nfs_init_cinfo_from_inode(struct nfs_commit_info *cinfo, > struct inode *inode) > { > cinfo->lock = &inode->i_lock; > cinfo->mds = &NFS_I(inode)->commit_info; > cinfo->ds = pnfs_get_ds_info(inode); > cinfo->dreq = NULL; > cinfo->completion_ops = &nfs_commit_completion_ops; > } > "fs/nfs/write.c" 1927L, 50230C > ~~~ > > > == Technical explanation of deadlock seen in vmcore == > > Ok with the understanding that inode->i_lock == nfs_commit_info.lock, take a look at the pid 11305 > ~~~ > crash> bt > PID: 11305 TASK: ffff881c7715aab0 CPU: 8 COMMAND: "sas" > #0 [ffff881078806e90] crash_nmi_callback at ffffffff81033cf6 > #1 [ffff881078806ea0] notifier_call_chain at ffffffff8153f545 > #2 [ffff881078806ee0] atomic_notifier_call_chain at ffffffff8153f5aa > #3 [ffff881078806ef0] notify_die at ffffffff810a783e > #4 [ffff881078806f20] do_nmi at ffffffff8153d203 > #5 [ffff881078806f50] nmi at ffffffff8153cac0 > [exception RIP: _spin_lock+0x1c] > RIP: ffffffff8153c32c RSP: ffff88104032fb58 RFLAGS: 00000297 > RAX: 00000000000009c1 RBX: ffff880a82b84bc0 RCX: 0000000000000000 > RDX: 00000000000009c0 RSI: ffff8809ff681820 RDI: ffff8809ff6818d0 > RBP: ffff88104032fb58 R8: ffff88104032fc80 R9: 0000000000000002 > R10: ffffea0022eec790 R11: 0000000000000003 R12: ffff88104032fb68 > R13: 0000000000000000 R14: ffff8809ff6818d0 R15: 0000000000000020 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > --- <NMI exception stack> --- > #6 [ffff88104032fb58] _spin_lock at ffffffff8153c32c > #7 [ffff88104032fb60] nfs_clear_request_commit at ffffffffa03e9dc1 [nfs] > #8 [ffff88104032fbb0] nfs_wb_page_cancel at ffffffffa03e9ec8 [nfs] > #9 [ffff88104032fbe0] nfs_invalidate_page at ffffffffa03d74f7 [nfs] > #10 [ffff88104032fc00] do_invalidatepage at ffffffff8113edb5 > #11 [ffff88104032fc10] truncate_inode_page at ffffffff8113f0d2 > #12 [ffff88104032fc30] truncate_inode_pages_range at ffffffff8113f47f > #13 [ffff88104032fd20] truncate_inode_pages at ffffffff8113f8a5 > #14 [ffff88104032fd30] truncate_pagecache at ffffffff8113f8ff > #15 [ffff88104032fd60] nfs_setattr_update_inode at ffffffffa03d9eb9 [nfs] > #16 [ffff88104032fd90] nfs3_proc_setattr at ffffffffa03eca6b [nfs] > #17 [ffff88104032fe20] nfs_setattr at ffffffffa03db610 [nfs] > #18 [ffff88104032fe60] notify_change at ffffffff811b1338 > #19 [ffff88104032fed0] do_truncate at ffffffff8118fee4 > #20 [ffff88104032ff40] sys_ftruncate at ffffffff81190290 > #21 [ffff88104032ff80] system_call_fastpath at ffffffff8100b0d2 > RIP: 00002b53f548a4d7 RSP: 00002b543fb36b50 RFLAGS: 00010246 > RAX: 000000000000004d RBX: ffffffff8100b0d2 RCX: 00002b5426a29c48 > RDX: 00002b543f11a880 RSI: 0000000000020000 RDI: 0000000000000028 > RBP: 00002b543f11a880 R8: 0000000000010000 R9: 00002b544e8a63b0 > R10: 000000000000002a R11: 0000000000000206 R12: 00002b543f11ad88 > R13: 00002b53f9ae7fe0 R14: 00002b53fa43af20 R15: 00002b543f11a880 > ORIG_RAX: 000000000000004d CS: 0033 SS: 002b > crash> dis -lr ffffffffa03e9ec8 | tail > 0xffffffffa03e9eb6 <nfs_wb_page_cancel+0xa6>: pop %r13 > 0xffffffffa03e9eb8 <nfs_wb_page_cancel+0xa8>: pop %r14 > 0xffffffffa03e9eba <nfs_wb_page_cancel+0xaa>: leaveq > 0xffffffffa03e9ebb <nfs_wb_page_cancel+0xab>: retq > 0xffffffffa03e9ebc <nfs_wb_page_cancel+0xac>: nopl 0x0(%rax) > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: 1787 > 0xffffffffa03e9ec0 <nfs_wb_page_cancel+0xb0>: mov %rbx,%rdi > 0xffffffffa03e9ec3 <nfs_wb_page_cancel+0xb3>: callq 0xffffffffa03e9d20 <nfs_clear_request_commit> > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/spinlock.h: 174 > 0xffffffffa03e9ec8 <nfs_wb_page_cancel+0xb8>: mov %r14,%rax > crash> dis -lr ffffffffa03e9dc1 | tail > 0xffffffffa03e9dab <nfs_clear_request_commit+0x8b>: mov -0x10(%rbp),%rbx > 0xffffffffa03e9daf <nfs_clear_request_commit+0x8f>: mov -0x8(%rbp),%r12 > 0xffffffffa03e9db3 <nfs_clear_request_commit+0x93>: leaveq > 0xffffffffa03e9db4 <nfs_clear_request_commit+0x94>: retq > 0xffffffffa03e9db5 <nfs_clear_request_commit+0x95>: nopl (%rax) > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: 558 > 0xffffffffa03e9db8 <nfs_clear_request_commit+0x98>: mov -0x40(%rbp),%rdi > 0xffffffffa03e9dbc <nfs_clear_request_commit+0x9c>: callq 0xffffffff8153c310 <_spin_lock> > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/bitops.h: 252 > 0xffffffffa03e9dc1 <nfs_clear_request_commit+0xa1>: lock btrl $0x2,0x40(%rbx) > ~~~ > > > Now look at the code, specifically nfs_wb_page_cancel and nfs_clear_request_commit. On line 1780 we take inode->i_lock. > ~~~ > fs/nfs/write.c > 1771 > 1772 int nfs_wb_page_cancel(struct inode *inode, struct page *page) > 1773 { > 1774 struct nfs_page *req; > 1775 int ret = 0; > 1776 > 1777 BUG_ON(!PageLocked(page)); > 1778 for (;;) { > 1779 wait_on_page_writeback(page); > 1780 spin_lock(&inode->i_lock); <----------- Takes spinlock the first time > 1781 req = nfs_page_find_request_locked(page); > 1782 if (req == NULL) { > 1783 spin_unlock(&inode->i_lock); > 1784 break; > 1785 } > 1786 if (nfs_lock_request(req)) { > 1787 nfs_clear_request_commit(req); <---------- executing inside here > 1788 spin_unlock(&inode->i_lock); > > > 549 static void > 550 nfs_clear_request_commit(struct nfs_page *req) > 551 { > 552 if (test_bit(PG_CLEAN, &req->wb_flags)) { > 553 struct inode *inode = req->wb_context->dentry->d_inode; > 554 struct nfs_commit_info cinfo; > 555 > 556 nfs_init_cinfo_from_inode(&cinfo, inode); > 557 if (!pnfs_clear_request_commit(req, &cinfo)) { > 558--> spin_lock(cinfo.lock); <-------------- Tries to take spinlock a second time > 559 nfs_request_remove_commit_list(req, &cinfo); > 560 spin_unlock(cinfo.lock); > 561 } > 562 nfs_clear_page_commit(req->wb_page); > 563 } > 564 } > ~~~ > > Version-Release number of selected component (if applicable): > 2.6.32-573.18.1.el6 > > How reproducible: > multiple times by one customer so far > > Steps to Reproduce: > TBD > > Actual results: > NFS client system deadlocks -dros ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: blocked on the spin lock 2016-05-16 19:02 ` Dave Wysochanski 2016-05-16 19:15 ` Weston Andros Adamson @ 2016-05-16 20:59 ` Mkrtchyan, Tigran 1 sibling, 0 replies; 7+ messages in thread From: Mkrtchyan, Tigran @ 2016-05-16 20:59 UTC (permalink / raw) To: dwysocha Cc: Weston Andros Adamson, Linux NFS Mailing List, Steve Dickson, Andy Adamson Hi Dave, thanks a lot for the update! Tigran. ----- Original Message ----- > From: "Dave Wysochanski" <dwysocha@redhat.com> > To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> > Cc: "Weston Andros Adamson" <dros@primarydata.com>, "Linux NFS Mailing List" <linux-nfs@vger.kernel.org>, "Steve > Dickson" <steved@redhat.com>, "Andy Adamson" <william.adamson@netapp.com> > Sent: Monday, May 16, 2016 9:02:50 PM > Subject: Re: blocked on the spin lock > On Mon, 2016-05-02 at 09:29 +0200, Mkrtchyan, Tigran wrote: >> >> Hi Dros et. al. >> >> We have seen the following stack trace on one of out systems: >> >> >> Apr 28 22:09:51 bird510 kernel: BUG: soft lockup - CPU#7 stuck for 67s! >> [tee:13755] >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu >> nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat >> usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc >> cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support >> bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 >> mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib >> ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: >> ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: CPU 7 >> Apr 28 22:09:51 bird510 kernel: Modules linked in: ipmi_devintf dell_rbu >> nfs_layout_nfsv41_files fuse nfs lockd fscache auth_rpcgss nfs_acl vfat fat >> usb_storage mpt3sas mpt2sas raid_class mptctl openafs(P)(U) autofs4 sunrpc >> cpufreq_ondemand acpi_cpufreq freq_table mperf sg joy >> dev power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support >> bnx2 microcode dcdbas lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 >> mbcache sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas mlx4_ib >> ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en p >> tp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: >> ipmi_devintf] >> Apr 28 22:09:51 bird510 kernel: >> Apr 28 22:09:51 bird510 kernel: Pid: 13755, comm: tee Tainted: P -- >> ------------ 2.6.32-573.12.1.el6.x86_64 #1 Dell Inc. PowerEdge M610/0N582M >> Apr 28 22:09:51 bird510 kernel: RIP: 0010:[<ffffffff8153bde1>] >> [<ffffffff8153bde1>] _spin_lock+0x21/0x30 >> Apr 28 22:09:51 bird510 kernel: RSP: 0018:ffff8803528179c8 EFLAGS: 00000297 >> Apr 28 22:09:51 bird510 kernel: RAX: 00000000000003b1 RBX: ffff8803528179c8 RCX: >> 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: RDX: 00000000000003b0 RSI: ffff880159aeb820 RDI: >> ffff880159aeb8d0 >> Apr 28 22:09:51 bird510 kernel: RBP: ffffffff8100bc0e R08: ffff880352817af0 R09: >> 0000000000000002 >> Apr 28 22:09:51 bird510 kernel: R10: ffffea0008efc428 R11: 0000000000000003 R12: >> 0000000000000001 >> Apr 28 22:09:51 bird510 kernel: R13: 001310facb5806bd R14: 0000000000000000 R15: >> 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: FS: 00007f3e78118700(0000) >> GS:ffff880028260000(0000) knlGS:0000000000000000 >> Apr 28 22:09:51 bird510 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: >> 000000008005003b >> Apr 28 22:09:51 bird510 kernel: CR2: 00007fefa21ff000 CR3: 0000000352959000 CR4: >> 00000000000007e0 >> Apr 28 22:09:51 bird510 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> Apr 28 22:09:51 bird510 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> 0000000000000400 >> Apr 28 22:09:51 bird510 kernel: Process tee (pid: 13755, threadinfo >> ffff880352814000, task ffff880636064ab0) >> Apr 28 22:09:51 bird510 kernel: Stack: >> Apr 28 22:09:51 bird510 kernel: ffff880352817a18 ffffffffa0490dc1 >> ffff880159aeb8d0 ffff880159aeb770 >> Apr 28 22:09:51 bird510 kernel: <d> 0000000000000000 0000000000000000 >> ffffffffa04bd040 ffffffffa048f39a >> Apr 28 22:09:51 bird510 kernel: <d> ffff8803378761c0 ffffea000a13dcc0 >> ffff880352817a48 ffffffffa0490ec8 >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? >> nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? >> nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? >> nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? >> nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? >> do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? >> truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? >> truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? >> mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? >> truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? >> truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? >> nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? >> nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? >> security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? >> strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? >> system_call_fastpath+0x16/0x1b >> Apr 28 22:09:51 bird510 kernel: Code: 01 74 05 e8 52 1c d6 ff c9 c3 55 48 89 e5 >> 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 >> 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f >> Apr 28 22:09:51 bird510 kernel: Call Trace: >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490dc1>] ? >> nfs_clear_request_commit+0xa1/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f39a>] ? >> nfs_page_find_request_locked+0x2a/0x40 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0490ec8>] ? >> nfs_wb_page_cancel+0xb8/0xf0 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa047e4f7>] ? >> nfs_invalidate_page+0x47/0x80 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113edb5>] ? >> do_invalidatepage+0x25/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f0d2>] ? >> truncate_inode_page+0xa2/0xc0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f47f>] ? >> truncate_inode_pages_range+0x16f/0x500 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81129cd7>] ? >> mempool_free_slab+0x17/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8a5>] ? >> truncate_inode_pages+0x15/0x20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8113f8ff>] ? >> truncate_pagecache+0x4f/0x70 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480eb9>] ? >> nfs_setattr_update_inode+0xb9/0x150 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0493a6b>] ? >> nfs3_proc_setattr+0xdb/0x120 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa048f6b3>] ? nfs_wb_all+0x43/0x50 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0482610>] ? nfs_setattr+0xf0/0x170 >> [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1118>] ? notify_change+0x168/0x340 >> Apr 28 22:09:51 bird510 kernel: [<ffffffffa0480de8>] ? nfs_open+0x78/0x90 [nfs] >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118fcb4>] ? do_truncate+0x64/0xa0 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff81232b9f>] ? >> security_inode_permission+0x1f/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811a4fc1>] ? do_filp_open+0x861/0xd20 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8129df8a>] ? >> strncpy_from_user+0x4a/0x90 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff811b1ff2>] ? alloc_fd+0x92/0x160 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118e967>] ? do_sys_open+0x67/0x130 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8118ea70>] ? sys_open+0x20/0x30 >> Apr 28 22:09:51 bird510 kernel: [<ffffffff8100b0d2>] ? >> system_call_fastpath+0x16/0x1b >> >> >> >> >> The questionable spin lock was remove by commit 411a99adf. Is there was a >> problem? >> Do we need a packport in RHEL kernel? >> > > > This is a known bug 1322823. It should be fixed in RHEL6.7 > kernel-2.6.32-573.27.1.el6 or above as well as RHEL6.8 > kernel-2.6.32-642.el6 or higher. It only affects kernels > 2.6.32-573.10.2.el6 until kernel-2.6.32-573.27.1.el6. The bug is > private but see the description below. > > This bug was an unfortunate side-effect of a fix for another nasty bug > Bug 1135601 - System panics with "kernel BUG at fs/nfs/inode.c:113! > But 1135601 was fixed by a RHEL6 equivalent of the following upstream > patch: 3e21704 nfs: handle multiple reqs in nfs_wb_page_cancel > > Saving &inode->i_lock in an NFS specific structure seems problematic > which maybe why it was missed in the backport, and partially removed in > commit 411a99adf. I posted a related cleanup patch in this area > https://www.spinics.net/lists/linux-nfs/msg57194.html > > > > Bug 1322823 - RHEL6.7.z: nfs client kernel deadlock due to inode->i_lock > == nfs_commit_info.lock in code path nfs_wb_page_cancel -> > nfs_clear_request_commit due to patch to fix bug 1135601 > > Dave Wysochanski 2016-03-31 07:50:18 EDT > > Description of problem: > Repeated kernel deadlock hit multiple times by a customer, and caused by the > current fix to bug 1135601. Basically the NFS client / RPC deadlocks due to a > spinlock being taken twice. We have vmcores. > > I think in the fix for 1135601 the following "finer point" of the NFS > client was missed: So inode->i_lock == nfs_commit_info.lock? If that's > true it may be ok but it's not obvious and it's unclear why the code > isn't just using inode->i_lock but apparently copying the address into > nfs_commit_info.lock ? Indeed looking at the code we have: > ~~~ > static void nfs_init_cinfo_from_inode(struct nfs_commit_info *cinfo, > struct inode *inode) > { > cinfo->lock = &inode->i_lock; > cinfo->mds = &NFS_I(inode)->commit_info; > cinfo->ds = pnfs_get_ds_info(inode); > cinfo->dreq = NULL; > cinfo->completion_ops = &nfs_commit_completion_ops; > } > "fs/nfs/write.c" 1927L, 50230C > ~~~ > > > == Technical explanation of deadlock seen in vmcore == > > Ok with the understanding that inode->i_lock == nfs_commit_info.lock, take a > look at the pid 11305 > ~~~ > crash> bt > PID: 11305 TASK: ffff881c7715aab0 CPU: 8 COMMAND: "sas" > #0 [ffff881078806e90] crash_nmi_callback at ffffffff81033cf6 > #1 [ffff881078806ea0] notifier_call_chain at ffffffff8153f545 > #2 [ffff881078806ee0] atomic_notifier_call_chain at ffffffff8153f5aa > #3 [ffff881078806ef0] notify_die at ffffffff810a783e > #4 [ffff881078806f20] do_nmi at ffffffff8153d203 > #5 [ffff881078806f50] nmi at ffffffff8153cac0 > [exception RIP: _spin_lock+0x1c] > RIP: ffffffff8153c32c RSP: ffff88104032fb58 RFLAGS: 00000297 > RAX: 00000000000009c1 RBX: ffff880a82b84bc0 RCX: 0000000000000000 > RDX: 00000000000009c0 RSI: ffff8809ff681820 RDI: ffff8809ff6818d0 > RBP: ffff88104032fb58 R8: ffff88104032fc80 R9: 0000000000000002 > R10: ffffea0022eec790 R11: 0000000000000003 R12: ffff88104032fb68 > R13: 0000000000000000 R14: ffff8809ff6818d0 R15: 0000000000000020 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > --- <NMI exception stack> --- > #6 [ffff88104032fb58] _spin_lock at ffffffff8153c32c > #7 [ffff88104032fb60] nfs_clear_request_commit at ffffffffa03e9dc1 [nfs] > #8 [ffff88104032fbb0] nfs_wb_page_cancel at ffffffffa03e9ec8 [nfs] > #9 [ffff88104032fbe0] nfs_invalidate_page at ffffffffa03d74f7 [nfs] > #10 [ffff88104032fc00] do_invalidatepage at ffffffff8113edb5 > #11 [ffff88104032fc10] truncate_inode_page at ffffffff8113f0d2 > #12 [ffff88104032fc30] truncate_inode_pages_range at ffffffff8113f47f > #13 [ffff88104032fd20] truncate_inode_pages at ffffffff8113f8a5 > #14 [ffff88104032fd30] truncate_pagecache at ffffffff8113f8ff > #15 [ffff88104032fd60] nfs_setattr_update_inode at ffffffffa03d9eb9 [nfs] > #16 [ffff88104032fd90] nfs3_proc_setattr at ffffffffa03eca6b [nfs] > #17 [ffff88104032fe20] nfs_setattr at ffffffffa03db610 [nfs] > #18 [ffff88104032fe60] notify_change at ffffffff811b1338 > #19 [ffff88104032fed0] do_truncate at ffffffff8118fee4 > #20 [ffff88104032ff40] sys_ftruncate at ffffffff81190290 > #21 [ffff88104032ff80] system_call_fastpath at ffffffff8100b0d2 > RIP: 00002b53f548a4d7 RSP: 00002b543fb36b50 RFLAGS: 00010246 > RAX: 000000000000004d RBX: ffffffff8100b0d2 RCX: 00002b5426a29c48 > RDX: 00002b543f11a880 RSI: 0000000000020000 RDI: 0000000000000028 > RBP: 00002b543f11a880 R8: 0000000000010000 R9: 00002b544e8a63b0 > R10: 000000000000002a R11: 0000000000000206 R12: 00002b543f11ad88 > R13: 00002b53f9ae7fe0 R14: 00002b53fa43af20 R15: 00002b543f11a880 > ORIG_RAX: 000000000000004d CS: 0033 SS: 002b > crash> dis -lr ffffffffa03e9ec8 | tail > 0xffffffffa03e9eb6 <nfs_wb_page_cancel+0xa6>: pop %r13 > 0xffffffffa03e9eb8 <nfs_wb_page_cancel+0xa8>: pop %r14 > 0xffffffffa03e9eba <nfs_wb_page_cancel+0xaa>: leaveq > 0xffffffffa03e9ebb <nfs_wb_page_cancel+0xab>: retq > 0xffffffffa03e9ebc <nfs_wb_page_cancel+0xac>: nopl 0x0(%rax) > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: > 1787 > 0xffffffffa03e9ec0 <nfs_wb_page_cancel+0xb0>: mov %rbx,%rdi > 0xffffffffa03e9ec3 <nfs_wb_page_cancel+0xb3>: callq 0xffffffffa03e9d20 > <nfs_clear_request_commit> > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/spinlock.h: > 174 > 0xffffffffa03e9ec8 <nfs_wb_page_cancel+0xb8>: mov %r14,%rax > crash> dis -lr ffffffffa03e9dc1 | tail > 0xffffffffa03e9dab <nfs_clear_request_commit+0x8b>: mov -0x10(%rbp),%rbx > 0xffffffffa03e9daf <nfs_clear_request_commit+0x8f>: mov -0x8(%rbp),%r12 > 0xffffffffa03e9db3 <nfs_clear_request_commit+0x93>: leaveq > 0xffffffffa03e9db4 <nfs_clear_request_commit+0x94>: retq > 0xffffffffa03e9db5 <nfs_clear_request_commit+0x95>: nopl (%rax) > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/fs/nfs/write.c: > 558 > 0xffffffffa03e9db8 <nfs_clear_request_commit+0x98>: mov -0x40(%rbp),%rdi > 0xffffffffa03e9dbc <nfs_clear_request_commit+0x9c>: callq > 0xffffffff8153c310 <_spin_lock> > /usr/src/debug/kernel-2.6.32-573.18.1.el6/linux-2.6.32-573.18.1.el6.x86_64/arch/x86/include/asm/bitops.h: > 252 > 0xffffffffa03e9dc1 <nfs_clear_request_commit+0xa1>: lock btrl > $0x2,0x40(%rbx) > ~~~ > > > Now look at the code, specifically nfs_wb_page_cancel and > nfs_clear_request_commit. On line 1780 we take inode->i_lock. > ~~~ > fs/nfs/write.c > 1771 > 1772 int nfs_wb_page_cancel(struct inode *inode, struct page *page) > 1773 { > 1774 struct nfs_page *req; > 1775 int ret = 0; > 1776 > 1777 BUG_ON(!PageLocked(page)); > 1778 for (;;) { > 1779 wait_on_page_writeback(page); > 1780 spin_lock(&inode->i_lock); <----------- Takes spinlock the first time > 1781 req = nfs_page_find_request_locked(page); > 1782 if (req == NULL) { > 1783 spin_unlock(&inode->i_lock); > 1784 break; > 1785 } > 1786 if (nfs_lock_request(req)) { > 1787 nfs_clear_request_commit(req); <---------- executing inside here > 1788 spin_unlock(&inode->i_lock); > > > 549 static void > 550 nfs_clear_request_commit(struct nfs_page *req) > 551 { > 552 if (test_bit(PG_CLEAN, &req->wb_flags)) { > 553 struct inode *inode = req->wb_context->dentry->d_inode; > 554 struct nfs_commit_info cinfo; > 555 > 556 nfs_init_cinfo_from_inode(&cinfo, inode); > 557 if (!pnfs_clear_request_commit(req, &cinfo)) { > 558--> spin_lock(cinfo.lock); <-------------- Tries to take spinlock a > second time > 559 nfs_request_remove_commit_list(req, &cinfo); > 560 spin_unlock(cinfo.lock); > 561 } > 562 nfs_clear_page_commit(req->wb_page); > 563 } > 564 } > ~~~ > > Version-Release number of selected component (if applicable): > 2.6.32-573.18.1.el6 > > How reproducible: > multiple times by one customer so far > > Steps to Reproduce: > TBD > > Actual results: > NFS client system deadlocks ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-05-16 20:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-02 7:29 blocked on the spin lock Mkrtchyan, Tigran 2016-05-02 13:30 ` Weston Andros Adamson 2016-05-02 20:07 ` Mkrtchyan, Tigran 2016-05-02 21:08 ` Weston Andros Adamson 2016-05-16 19:02 ` Dave Wysochanski 2016-05-16 19:15 ` Weston Andros Adamson 2016-05-16 20:59 ` Mkrtchyan, Tigran
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).