From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:50905 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752651AbcBCXTy (ORCPT ); Wed, 3 Feb 2016 18:19:54 -0500 Date: Wed, 3 Feb 2016 15:19:41 -0800 From: Liu Bo To: fdmanana@kernel.org Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl Message-ID: <20160203231941.GA13006@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <1454531210-7629-1-git-send-email-fdmanana@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1454531210-7629-1-git-send-email-fdmanana@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana > > While doing some tests I ran into an hang on an extent buffer's rwlock > that produced the following trace: > > [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] > [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] > [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > [39389.800016] irq event stamp: 0 > [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) > [39389.800016] hardirqs last disabled at (0): [] copy_process+0x638/0x1a35 > [39389.800016] softirqs last enabled at (0): [] copy_process+0x638/0x1a35 > [39389.800016] softirqs last disabled at (0): [< (null)>] (null) > [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 > [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 > [39389.800016] RIP: 0010:[] [] queued_spin_lock_slowpath+0x57/0x158 > [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 > [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 > [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 > [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 > [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 > [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 > [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 > [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 > [39389.800016] Stack: > [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 > [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 > [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c > [39389.800016] Call Trace: > [39389.800016] [] queued_read_lock_slowpath+0x46/0x60 > [39389.800016] [] do_raw_read_lock+0x3e/0x41 > [39389.800016] [] _raw_read_lock+0x3d/0x44 > [39389.800016] [] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] > [39389.800016] [] btrfs_tree_read_lock+0x54/0x125 [btrfs] > [39389.800016] [] ? btrfs_find_item+0xa7/0xd2 [btrfs] > [39389.800016] [] btrfs_ref_to_path+0xd6/0x174 [btrfs] > [39389.800016] [] inode_to_path+0x53/0xa2 [btrfs] > [39389.800016] [] paths_from_inode+0x117/0x2ec [btrfs] > [39389.800016] [] btrfs_ioctl+0xd5b/0x2793 [btrfs] > [39389.800016] [] ? arch_local_irq_save+0x9/0xc > [39389.800016] [] ? __this_cpu_preempt_check+0x13/0x15 > [39389.800016] [] ? arch_local_irq_save+0x9/0xc > [39389.800016] [] ? rcu_read_unlock+0x3e/0x5d > [39389.800016] [] do_vfs_ioctl+0x42b/0x4ea > [39389.800016] [] ? __fget_light+0x62/0x71 > [39389.800016] [] SyS_ioctl+0x57/0x79 > [39389.800016] [] entry_SYSCALL_64_fastpath+0x12/0x6f > [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 > [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > [39389.800012] irq event stamp: 0 > [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) > [39389.800012] hardirqs last disabled at (0): [] copy_process+0x638/0x1a35 > [39389.800012] softirqs last enabled at (0): [] copy_process+0x638/0x1a35 > [39389.800012] softirqs last disabled at (0): [< (null)>] (null) > [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 > [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 > [39389.800012] RIP: 0010:[] [] queued_write_lock_slowpath+0x62/0x72 > [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 > [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 > [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c > [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 > [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 > [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 > [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 > [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 > [39389.800012] Stack: > [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 > [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 > [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 > [39389.800012] Call Trace: > [39389.800012] [] do_raw_write_lock+0x72/0x8c > [39389.800012] [] _raw_write_lock+0x3a/0x41 > [39389.800012] [] ? btrfs_tree_lock+0x119/0x251 [btrfs] > [39389.800012] [] btrfs_tree_lock+0x119/0x251 [btrfs] > [39389.800012] [] ? rcu_read_unlock+0x5b/0x5d [btrfs] > [39389.800012] [] ? btrfs_root_node+0xda/0xe6 [btrfs] > [39389.800012] [] btrfs_lock_root_node+0x22/0x42 [btrfs] > [39389.800012] [] btrfs_search_slot+0x1b8/0x758 [btrfs] > [39389.800012] [] ? time_hardirqs_on+0x15/0x28 > [39389.800012] [] btrfs_lookup_inode+0x31/0x95 [btrfs] > [39389.800012] [] ? trace_hardirqs_on+0xd/0xf > [39389.800012] [] ? mutex_lock_nested+0x397/0x3bc > [39389.800012] [] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] > [39389.800012] [] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] > [39389.800012] [] ? _raw_spin_unlock+0x31/0x44 > [39389.800012] [] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] > [39389.800012] [] btrfs_run_delayed_items+0x11/0x13 [btrfs] > [39389.800012] [] btrfs_commit_transaction+0x234/0x96e [btrfs] > [39389.800012] [] btrfs_sync_fs+0x145/0x1ad [btrfs] > [39389.800012] [] btrfs_ioctl+0x11d2/0x2793 [btrfs] > [39389.800012] [] ? arch_local_irq_save+0x9/0xc > [39389.800012] [] ? __might_fault+0x4c/0xa7 > [39389.800012] [] ? __might_fault+0x4c/0xa7 > [39389.800012] [] ? arch_local_irq_save+0x9/0xc > [39389.800012] [] ? rcu_read_unlock+0x3e/0x5d > [39389.800012] [] do_vfs_ioctl+0x42b/0x4ea > [39389.800012] [] ? __fget_light+0x62/0x71 > [39389.800012] [] SyS_ioctl+0x57/0x79 > [39389.800012] [] entry_SYSCALL_64_fastpath+0x12/0x6f > [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 > > This happens because in the code path executed by the inode_paths ioctl we > end up nesting two calls to read lock a leaf's rwlock when after the first > call to read_lock() and before the second call to read_lock(), another > task (running the delayed items as part of a transaction commit) has > already called write_lock() against the leaf's rwlock. This situation is > illustrated by the following diagram: > > Task A Task B > > btrfs_ref_to_path() btrfs_commit_transaction() > read_lock(&eb->lock); > > btrfs_run_delayed_items() > __btrfs_commit_inode_delayed_items() > __btrfs_update_delayed_inode() > btrfs_lookup_inode() > > write_lock(&eb->lock); > --> task waits for lock > > read_lock(&eb->lock); > --> makes this task hang > forever (and task B too > of course) > > So fix this by avoiding doing the nested read lock, which is easily > avoidable. This issue does not happen if task B calls write_lock() after > task A does the second call to read_lock(), however there does not seem > to exist anything in the documentation that mentions what is the expected > behaviour for recursive locking of rwlocks (leaving the idea that doing > so is not a good usage of rwlocks). iterate_inode_refs() does it in anothe way, it's using btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, it creates a clone so we don't have trouble on the original eb's read lock, it's more reasonable to me. Thanks, -liubo > > Cc: stable@vger.kernel.org > Signed-off-by: Filipe Manana > --- > fs/btrfs/backref.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > index b90cd37..198a0f8 100644 > --- a/fs/btrfs/backref.c > +++ b/fs/btrfs/backref.c > @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, > eb = path->nodes[0]; > /* make sure we can use eb after releasing the path */ > if (eb != eb_in) { > - atomic_inc(&eb->refs); > - btrfs_tree_read_lock(eb); > btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); > + path->nodes[0] = NULL; > + path->locks[0] = 0; > } > btrfs_release_path(path); > iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html