* [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
@ 2016-02-03 20:26 fdmanana
2016-02-03 23:19 ` Liu Bo
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: fdmanana @ 2016-02-03 20:26 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
While doing some tests I ran into an hang on an extent buffer's rwlock
that produced the following trace:
[39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
[39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
[39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800016] irq event stamp: 0
[39389.800016] hardirqs last enabled at (0): [< (null)>] (null)
[39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last disabled at (0): [< (null)>] (null)
[39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
[39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158
[39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202
[39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
[39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
[39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
[39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
[39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
[39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
[39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800016] Stack:
[39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
[39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
[39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
[39389.800016] Call Trace:
[39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60
[39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41
[39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44
[39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs]
[39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs]
[39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs]
[39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs]
[39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs]
[39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15
[39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
[39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800012] irq event stamp: 0
[39389.800012] hardirqs last enabled at (0): [< (null)>] (null)
[39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last disabled at (0): [< (null)>] (null)
[39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1
[39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
[39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72
[39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206
[39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
[39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
[39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
[39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
[39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
[39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
[39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800012] Stack:
[39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
[39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
[39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
[39389.800012] Call Trace:
[39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c
[39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41
[39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs]
[39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs]
[39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs]
[39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs]
[39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28
[39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs]
[39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf
[39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc
[39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
[39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
[39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44
[39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
[39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs]
[39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs]
[39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs]
[39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs]
[39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00
This happens because in the code path executed by the inode_paths ioctl we
end up nesting two calls to read lock a leaf's rwlock when after the first
call to read_lock() and before the second call to read_lock(), another
task (running the delayed items as part of a transaction commit) has
already called write_lock() against the leaf's rwlock. This situation is
illustrated by the following diagram:
Task A Task B
btrfs_ref_to_path() btrfs_commit_transaction()
read_lock(&eb->lock);
btrfs_run_delayed_items()
__btrfs_commit_inode_delayed_items()
__btrfs_update_delayed_inode()
btrfs_lookup_inode()
write_lock(&eb->lock);
--> task waits for lock
read_lock(&eb->lock);
--> makes this task hang
forever (and task B too
of course)
So fix this by avoiding doing the nested read lock, which is easily
avoidable. This issue does not happen if task B calls write_lock() after
task A does the second call to read_lock(), however there does not seem
to exist anything in the documentation that mentions what is the expected
behaviour for recursive locking of rwlocks (leaving the idea that doing
so is not a good usage of rwlocks).
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/backref.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index b90cd37..198a0f8 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
eb = path->nodes[0];
/* make sure we can use eb after releasing the path */
if (eb != eb_in) {
- atomic_inc(&eb->refs);
- btrfs_tree_read_lock(eb);
btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
+ path->nodes[0] = NULL;
+ path->locks[0] = 0;
}
btrfs_release_path(path);
iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
--
2.7.0.rc3
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-03 20:26 [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl fdmanana @ 2016-02-03 23:19 ` Liu Bo 2016-02-03 23:24 ` Filipe Manana 2016-02-04 23:03 ` Liu Bo 2016-02-05 2:22 ` [PATCH v2] " fdmanana 2 siblings, 1 reply; 10+ messages in thread From: Liu Bo @ 2016-02-03 23:19 UTC (permalink / raw) To: fdmanana; +Cc: linux-btrfs On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > While doing some tests I ran into an hang on an extent buffer's rwlock > that produced the following trace: > > [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] > [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] > [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > [39389.800016] irq event stamp: 0 > [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) > [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800016] softirqs last disabled at (0): [< (null)>] (null) > [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 > [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 > [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 > [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 > [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 > [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 > [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 > [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 > [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 > [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 > [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 > [39389.800016] Stack: > [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 > [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 > [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c > [39389.800016] Call Trace: > [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 > [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 > [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 > [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] > [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] > [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] > [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] > [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] > [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] > [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] > [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 > [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 > [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > [39389.800012] irq event stamp: 0 > [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) > [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800012] softirqs last disabled at (0): [< (null)>] (null) > [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 > [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 > [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 > [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 > [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 > [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c > [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 > [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 > [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 > [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 > [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 > [39389.800012] Stack: > [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 > [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 > [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 > [39389.800012] Call Trace: > [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c > [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 > [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] > [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] > [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] > [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] > [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] > [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] > [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 > [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] > [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf > [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc > [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] > [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] > [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 > [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] > [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] > [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] > [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] > [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] > [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 > > This happens because in the code path executed by the inode_paths ioctl we > end up nesting two calls to read lock a leaf's rwlock when after the first > call to read_lock() and before the second call to read_lock(), another > task (running the delayed items as part of a transaction commit) has > already called write_lock() against the leaf's rwlock. This situation is > illustrated by the following diagram: > > Task A Task B > > btrfs_ref_to_path() btrfs_commit_transaction() > read_lock(&eb->lock); > > btrfs_run_delayed_items() > __btrfs_commit_inode_delayed_items() > __btrfs_update_delayed_inode() > btrfs_lookup_inode() > > write_lock(&eb->lock); > --> task waits for lock > > read_lock(&eb->lock); > --> makes this task hang > forever (and task B too > of course) > > So fix this by avoiding doing the nested read lock, which is easily > avoidable. This issue does not happen if task B calls write_lock() after > task A does the second call to read_lock(), however there does not seem > to exist anything in the documentation that mentions what is the expected > behaviour for recursive locking of rwlocks (leaving the idea that doing > so is not a good usage of rwlocks). iterate_inode_refs() does it in anothe way, it's using btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, it creates a clone so we don't have trouble on the original eb's read lock, it's more reasonable to me. Thanks, -liubo > > Cc: stable@vger.kernel.org > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/backref.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > index b90cd37..198a0f8 100644 > --- a/fs/btrfs/backref.c > +++ b/fs/btrfs/backref.c > @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, > eb = path->nodes[0]; > /* make sure we can use eb after releasing the path */ > if (eb != eb_in) { > - atomic_inc(&eb->refs); > - btrfs_tree_read_lock(eb); > btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); > + path->nodes[0] = NULL; > + path->locks[0] = 0; > } > btrfs_release_path(path); > iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-03 23:19 ` Liu Bo @ 2016-02-03 23:24 ` Filipe Manana 2016-02-03 23:50 ` Chris Mason 2016-02-04 0:20 ` Liu Bo 0 siblings, 2 replies; 10+ messages in thread From: Filipe Manana @ 2016-02-03 23:24 UTC (permalink / raw) To: Liu Bo; +Cc: linux-btrfs@vger.kernel.org On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote: > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: >> From: Filipe Manana <fdmanana@suse.com> >> >> While doing some tests I ran into an hang on an extent buffer's rwlock >> that produced the following trace: >> >> [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] >> [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] >> [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] >> [39389.800016] irq event stamp: 0 >> [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) >> [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> [39389.800016] softirqs last disabled at (0): [< (null)>] (null) >> [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 >> [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 >> [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 >> [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 >> [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 >> [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 >> [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 >> [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 >> [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 >> [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 >> [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 >> [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 >> [39389.800016] Stack: >> [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 >> [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 >> [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c >> [39389.800016] Call Trace: >> [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 >> [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 >> [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 >> [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] >> [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] >> [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] >> [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] >> [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] >> [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] >> [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d >> [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea >> [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 >> [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 >> [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f >> [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 >> [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] >> [39389.800012] irq event stamp: 0 >> [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) >> [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> [39389.800012] softirqs last disabled at (0): [< (null)>] (null) >> [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 >> [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 >> [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 >> [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 >> [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 >> [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 >> [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c >> [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 >> [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 >> [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 >> [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 >> [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 >> [39389.800012] Stack: >> [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 >> [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 >> [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 >> [39389.800012] Call Trace: >> [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c >> [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 >> [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] >> [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] >> [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] >> [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] >> [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] >> [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] >> [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 >> [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] >> [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf >> [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc >> [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] >> [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] >> [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 >> [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] >> [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] >> [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] >> [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] >> [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d >> [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea >> [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 >> [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 >> [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f >> [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 >> >> This happens because in the code path executed by the inode_paths ioctl we >> end up nesting two calls to read lock a leaf's rwlock when after the first >> call to read_lock() and before the second call to read_lock(), another >> task (running the delayed items as part of a transaction commit) has >> already called write_lock() against the leaf's rwlock. This situation is >> illustrated by the following diagram: >> >> Task A Task B >> >> btrfs_ref_to_path() btrfs_commit_transaction() >> read_lock(&eb->lock); >> >> btrfs_run_delayed_items() >> __btrfs_commit_inode_delayed_items() >> __btrfs_update_delayed_inode() >> btrfs_lookup_inode() >> >> write_lock(&eb->lock); >> --> task waits for lock >> >> read_lock(&eb->lock); >> --> makes this task hang >> forever (and task B too >> of course) >> >> So fix this by avoiding doing the nested read lock, which is easily >> avoidable. This issue does not happen if task B calls write_lock() after >> task A does the second call to read_lock(), however there does not seem >> to exist anything in the documentation that mentions what is the expected >> behaviour for recursive locking of rwlocks (leaving the idea that doing >> so is not a good usage of rwlocks). > > iterate_inode_refs() does it in anothe way, it's using > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, Yeah, I know, I made that fix long time ago. > it creates a clone so we don't have trouble on the original eb's read > lock, it's more reasonable to me. Here it would be overkill to do similar approach, and it's a different problem, we're not attempting to lock one leaf while having some other leaf locked. We just want to transition a lock from spinning to blocking mode - so it was silly to double read lock, we can do directly the transition and make sure the release path call below doesn't release the leaf. No need to waste memory and time cloning a leaf. thanks > > Thanks, > > -liubo > >> >> Cc: stable@vger.kernel.org >> Signed-off-by: Filipe Manana <fdmanana@suse.com> >> --- >> fs/btrfs/backref.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c >> index b90cd37..198a0f8 100644 >> --- a/fs/btrfs/backref.c >> +++ b/fs/btrfs/backref.c >> @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, >> eb = path->nodes[0]; >> /* make sure we can use eb after releasing the path */ >> if (eb != eb_in) { >> - atomic_inc(&eb->refs); >> - btrfs_tree_read_lock(eb); >> btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); >> + path->nodes[0] = NULL; >> + path->locks[0] = 0; >> } >> btrfs_release_path(path); >> iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); >> -- >> 2.7.0.rc3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-03 23:24 ` Filipe Manana @ 2016-02-03 23:50 ` Chris Mason 2016-02-04 0:20 ` Liu Bo 1 sibling, 0 replies; 10+ messages in thread From: Chris Mason @ 2016-02-03 23:50 UTC (permalink / raw) To: Filipe Manana; +Cc: Liu Bo, linux-btrfs@vger.kernel.org On Wed, Feb 03, 2016 at 11:24:33PM +0000, Filipe Manana wrote: > On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote: > > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: > >> So fix this by avoiding doing the nested read lock, which is easily > >> avoidable. This issue does not happen if task B calls write_lock() after > >> task A does the second call to read_lock(), however there does not seem > >> to exist anything in the documentation that mentions what is the expected > >> behaviour for recursive locking of rwlocks (leaving the idea that doing > >> so is not a good usage of rwlocks). > > > > iterate_inode_refs() does it in anothe way, it's using > > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, > > Yeah, I know, I made that fix long time ago. > > > it creates a clone so we don't have trouble on the original eb's read > > lock, it's more reasonable to me. > > Here it would be overkill to do similar approach, and it's a different > problem, we're not attempting to lock one leaf while having some other > leaf locked. We just want to transition a lock from spinning to > blocking mode - so it was silly to double read lock, we can do > directly the transition and make sure the release path call below > doesn't release the leaf. No need to waste memory and time cloning a > leaf. Yeah, this is just like unlock_up(), we just want to keep this one extent buffer locked. Filipe's patch should be fine. -chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-03 23:24 ` Filipe Manana 2016-02-03 23:50 ` Chris Mason @ 2016-02-04 0:20 ` Liu Bo 2016-02-04 0:23 ` Filipe Manana 1 sibling, 1 reply; 10+ messages in thread From: Liu Bo @ 2016-02-04 0:20 UTC (permalink / raw) To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org On Wed, Feb 03, 2016 at 11:24:33PM +0000, Filipe Manana wrote: > On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote: > > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: > >> From: Filipe Manana <fdmanana@suse.com> > >> > >> While doing some tests I ran into an hang on an extent buffer's rwlock > >> that produced the following trace: > >> > >> [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] > >> [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] > >> [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > >> [39389.800016] irq event stamp: 0 > >> [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) > >> [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> [39389.800016] softirqs last disabled at (0): [< (null)>] (null) > >> [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 > >> [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > >> [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 > >> [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 > >> [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 > >> [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 > >> [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 > >> [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 > >> [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 > >> [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 > >> [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 > >> [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 > >> [39389.800016] Stack: > >> [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 > >> [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 > >> [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c > >> [39389.800016] Call Trace: > >> [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 > >> [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 > >> [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 > >> [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] > >> [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] > >> [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] > >> [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] > >> [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] > >> [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] > >> [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] > >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 > >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > >> [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > >> [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > >> [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > >> [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > >> [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 > >> [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > >> [39389.800012] irq event stamp: 0 > >> [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) > >> [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> [39389.800012] softirqs last disabled at (0): [< (null)>] (null) > >> [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 > >> [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > >> [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 > >> [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 > >> [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 > >> [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 > >> [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c > >> [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 > >> [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 > >> [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 > >> [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 > >> [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 > >> [39389.800012] Stack: > >> [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 > >> [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 > >> [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 > >> [39389.800012] Call Trace: > >> [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c > >> [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 > >> [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] > >> [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] > >> [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] > >> [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] > >> [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] > >> [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] > >> [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 > >> [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] > >> [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf > >> [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc > >> [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] > >> [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] > >> [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 > >> [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] > >> [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] > >> [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] > >> [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] > >> [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] > >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > >> [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > >> [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > >> [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > >> [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > >> [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 > >> > >> This happens because in the code path executed by the inode_paths ioctl we > >> end up nesting two calls to read lock a leaf's rwlock when after the first > >> call to read_lock() and before the second call to read_lock(), another > >> task (running the delayed items as part of a transaction commit) has > >> already called write_lock() against the leaf's rwlock. This situation is > >> illustrated by the following diagram: > >> > >> Task A Task B > >> > >> btrfs_ref_to_path() btrfs_commit_transaction() > >> read_lock(&eb->lock); > >> > >> btrfs_run_delayed_items() > >> __btrfs_commit_inode_delayed_items() > >> __btrfs_update_delayed_inode() > >> btrfs_lookup_inode() > >> > >> write_lock(&eb->lock); > >> --> task waits for lock > >> > >> read_lock(&eb->lock); > >> --> makes this task hang > >> forever (and task B too > >> of course) > >> > >> So fix this by avoiding doing the nested read lock, which is easily > >> avoidable. This issue does not happen if task B calls write_lock() after > >> task A does the second call to read_lock(), however there does not seem > >> to exist anything in the documentation that mentions what is the expected > >> behaviour for recursive locking of rwlocks (leaving the idea that doing > >> so is not a good usage of rwlocks). > > > > iterate_inode_refs() does it in anothe way, it's using > > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, > > Yeah, I know, I made that fix long time ago. > > > it creates a clone so we don't have trouble on the original eb's read > > lock, it's more reasonable to me. > > Here it would be overkill to do similar approach, and it's a different > problem, we're not attempting to lock one leaf while having some other > leaf locked. We just want to transition a lock from spinning to > blocking mode - so it was silly to double read lock, we can do > directly the transition and make sure the release path call below > doesn't release the leaf. No need to waste memory and time cloning a > leaf. Correct, I missed that fact, so the patch is good. Since we're passing a clone eb into btrfs_ref_to_path(), the check 'if (eb != eb_in) will only be false for the first time when it enters into while(), maybe we can add 'likely' preference for it. Thanks, -liubo > > thanks > > > > > Thanks, > > > > -liubo > > > >> > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Filipe Manana <fdmanana@suse.com> > >> --- > >> fs/btrfs/backref.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > >> index b90cd37..198a0f8 100644 > >> --- a/fs/btrfs/backref.c > >> +++ b/fs/btrfs/backref.c > >> @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, > >> eb = path->nodes[0]; > >> /* make sure we can use eb after releasing the path */ > >> if (eb != eb_in) { > >> - atomic_inc(&eb->refs); > >> - btrfs_tree_read_lock(eb); > >> btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); > >> + path->nodes[0] = NULL; > >> + path->locks[0] = 0; > >> } > >> btrfs_release_path(path); > >> iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); > >> -- > >> 2.7.0.rc3 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-04 0:20 ` Liu Bo @ 2016-02-04 0:23 ` Filipe Manana 2016-02-04 0:28 ` Liu Bo 0 siblings, 1 reply; 10+ messages in thread From: Filipe Manana @ 2016-02-04 0:23 UTC (permalink / raw) To: Liu Bo; +Cc: linux-btrfs@vger.kernel.org On Thu, Feb 4, 2016 at 12:20 AM, Liu Bo <bo.li.liu@oracle.com> wrote: > On Wed, Feb 03, 2016 at 11:24:33PM +0000, Filipe Manana wrote: >> On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote: >> > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: >> >> From: Filipe Manana <fdmanana@suse.com> >> >> >> >> While doing some tests I ran into an hang on an extent buffer's rwlock >> >> that produced the following trace: >> >> >> >> [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] >> >> [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] >> >> [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] >> >> [39389.800016] irq event stamp: 0 >> >> [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) >> >> [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> [39389.800016] softirqs last disabled at (0): [< (null)>] (null) >> >> [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 >> >> [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 >> >> [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 >> >> [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 >> >> [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 >> >> [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 >> >> [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 >> >> [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 >> >> [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 >> >> [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 >> >> [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 >> >> [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> >> [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 >> >> [39389.800016] Stack: >> >> [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 >> >> [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 >> >> [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c >> >> [39389.800016] Call Trace: >> >> [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 >> >> [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 >> >> [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 >> >> [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] >> >> [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] >> >> [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] >> >> [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] >> >> [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] >> >> [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] >> >> [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] >> >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 >> >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d >> >> [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea >> >> [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 >> >> [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 >> >> [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f >> >> [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 >> >> [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] >> >> [39389.800012] irq event stamp: 0 >> >> [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) >> >> [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> [39389.800012] softirqs last disabled at (0): [< (null)>] (null) >> >> [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 >> >> [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 >> >> [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 >> >> [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 >> >> [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 >> >> [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 >> >> [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c >> >> [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 >> >> [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 >> >> [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 >> >> [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 >> >> [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> >> [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 >> >> [39389.800012] Stack: >> >> [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 >> >> [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 >> >> [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 >> >> [39389.800012] Call Trace: >> >> [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c >> >> [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 >> >> [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] >> >> [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] >> >> [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] >> >> [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] >> >> [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] >> >> [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] >> >> [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 >> >> [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] >> >> [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf >> >> [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc >> >> [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] >> >> [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] >> >> [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 >> >> [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] >> >> [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] >> >> [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] >> >> [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] >> >> [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] >> >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 >> >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 >> >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d >> >> [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea >> >> [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 >> >> [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 >> >> [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f >> >> [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 >> >> >> >> This happens because in the code path executed by the inode_paths ioctl we >> >> end up nesting two calls to read lock a leaf's rwlock when after the first >> >> call to read_lock() and before the second call to read_lock(), another >> >> task (running the delayed items as part of a transaction commit) has >> >> already called write_lock() against the leaf's rwlock. This situation is >> >> illustrated by the following diagram: >> >> >> >> Task A Task B >> >> >> >> btrfs_ref_to_path() btrfs_commit_transaction() >> >> read_lock(&eb->lock); >> >> >> >> btrfs_run_delayed_items() >> >> __btrfs_commit_inode_delayed_items() >> >> __btrfs_update_delayed_inode() >> >> btrfs_lookup_inode() >> >> >> >> write_lock(&eb->lock); >> >> --> task waits for lock >> >> >> >> read_lock(&eb->lock); >> >> --> makes this task hang >> >> forever (and task B too >> >> of course) >> >> >> >> So fix this by avoiding doing the nested read lock, which is easily >> >> avoidable. This issue does not happen if task B calls write_lock() after >> >> task A does the second call to read_lock(), however there does not seem >> >> to exist anything in the documentation that mentions what is the expected >> >> behaviour for recursive locking of rwlocks (leaving the idea that doing >> >> so is not a good usage of rwlocks). >> > >> > iterate_inode_refs() does it in anothe way, it's using >> > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, >> >> Yeah, I know, I made that fix long time ago. >> >> > it creates a clone so we don't have trouble on the original eb's read >> > lock, it's more reasonable to me. >> >> Here it would be overkill to do similar approach, and it's a different >> problem, we're not attempting to lock one leaf while having some other >> leaf locked. We just want to transition a lock from spinning to >> blocking mode - so it was silly to double read lock, we can do >> directly the transition and make sure the release path call below >> doesn't release the leaf. No need to waste memory and time cloning a >> leaf. > > Correct, I missed that fact, so the patch is good. > > Since we're passing a clone eb into btrfs_ref_to_path(), the check 'if > (eb != eb_in) will only be false for the first time when it enters into > while(), maybe we can add 'likely' preference for it. What do you mean by 'likely' preference? The macro likely/unlikely? > > Thanks, > > -liubo > >> >> thanks >> >> > >> > Thanks, >> > >> > -liubo >> > >> >> >> >> Cc: stable@vger.kernel.org >> >> Signed-off-by: Filipe Manana <fdmanana@suse.com> >> >> --- >> >> fs/btrfs/backref.c | 4 ++-- >> >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c >> >> index b90cd37..198a0f8 100644 >> >> --- a/fs/btrfs/backref.c >> >> +++ b/fs/btrfs/backref.c >> >> @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, >> >> eb = path->nodes[0]; >> >> /* make sure we can use eb after releasing the path */ >> >> if (eb != eb_in) { >> >> - atomic_inc(&eb->refs); >> >> - btrfs_tree_read_lock(eb); >> >> btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); >> >> + path->nodes[0] = NULL; >> >> + path->locks[0] = 0; >> >> } >> >> btrfs_release_path(path); >> >> iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); >> >> -- >> >> 2.7.0.rc3 >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-04 0:23 ` Filipe Manana @ 2016-02-04 0:28 ` Liu Bo 2016-02-04 0:33 ` Filipe Manana 0 siblings, 1 reply; 10+ messages in thread From: Liu Bo @ 2016-02-04 0:28 UTC (permalink / raw) To: Filipe Manana; +Cc: linux-btrfs@vger.kernel.org On Thu, Feb 04, 2016 at 12:23:12AM +0000, Filipe Manana wrote: > On Thu, Feb 4, 2016 at 12:20 AM, Liu Bo <bo.li.liu@oracle.com> wrote: > > On Wed, Feb 03, 2016 at 11:24:33PM +0000, Filipe Manana wrote: > >> On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote: > >> > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: > >> >> From: Filipe Manana <fdmanana@suse.com> > >> >> > >> >> While doing some tests I ran into an hang on an extent buffer's rwlock > >> >> that produced the following trace: > >> >> > >> >> [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] > >> >> [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] > >> >> [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > >> >> [39389.800016] irq event stamp: 0 > >> >> [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) > >> >> [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> >> [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> >> [39389.800016] softirqs last disabled at (0): [< (null)>] (null) > >> >> [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 > >> >> [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > >> >> [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 > >> >> [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 > >> >> [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 > >> >> [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 > >> >> [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 > >> >> [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 > >> >> [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 > >> >> [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 > >> >> [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 > >> >> [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> >> [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 > >> >> [39389.800016] Stack: > >> >> [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 > >> >> [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 > >> >> [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c > >> >> [39389.800016] Call Trace: > >> >> [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 > >> >> [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 > >> >> [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 > >> >> [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] > >> >> [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] > >> >> [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] > >> >> [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] > >> >> [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] > >> >> [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] > >> >> [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] > >> >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> >> [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 > >> >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> >> [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > >> >> [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > >> >> [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > >> >> [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > >> >> [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > >> >> [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 > >> >> [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > >> >> [39389.800012] irq event stamp: 0 > >> >> [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) > >> >> [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> >> [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > >> >> [39389.800012] softirqs last disabled at (0): [< (null)>] (null) > >> >> [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 > >> >> [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > >> >> [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 > >> >> [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 > >> >> [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 > >> >> [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 > >> >> [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c > >> >> [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 > >> >> [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 > >> >> [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 > >> >> [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 > >> >> [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> >> [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 > >> >> [39389.800012] Stack: > >> >> [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 > >> >> [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 > >> >> [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 > >> >> [39389.800012] Call Trace: > >> >> [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c > >> >> [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 > >> >> [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] > >> >> [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] > >> >> [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] > >> >> [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] > >> >> [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] > >> >> [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] > >> >> [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 > >> >> [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] > >> >> [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf > >> >> [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc > >> >> [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] > >> >> [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] > >> >> [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 > >> >> [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] > >> >> [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] > >> >> [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] > >> >> [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] > >> >> [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] > >> >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > >> >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > >> >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > >> >> [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > >> >> [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > >> >> [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > >> >> [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > >> >> [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > >> >> [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 > >> >> > >> >> This happens because in the code path executed by the inode_paths ioctl we > >> >> end up nesting two calls to read lock a leaf's rwlock when after the first > >> >> call to read_lock() and before the second call to read_lock(), another > >> >> task (running the delayed items as part of a transaction commit) has > >> >> already called write_lock() against the leaf's rwlock. This situation is > >> >> illustrated by the following diagram: > >> >> > >> >> Task A Task B > >> >> > >> >> btrfs_ref_to_path() btrfs_commit_transaction() > >> >> read_lock(&eb->lock); > >> >> > >> >> btrfs_run_delayed_items() > >> >> __btrfs_commit_inode_delayed_items() > >> >> __btrfs_update_delayed_inode() > >> >> btrfs_lookup_inode() > >> >> > >> >> write_lock(&eb->lock); > >> >> --> task waits for lock > >> >> > >> >> read_lock(&eb->lock); > >> >> --> makes this task hang > >> >> forever (and task B too > >> >> of course) > >> >> > >> >> So fix this by avoiding doing the nested read lock, which is easily > >> >> avoidable. This issue does not happen if task B calls write_lock() after > >> >> task A does the second call to read_lock(), however there does not seem > >> >> to exist anything in the documentation that mentions what is the expected > >> >> behaviour for recursive locking of rwlocks (leaving the idea that doing > >> >> so is not a good usage of rwlocks). > >> > > >> > iterate_inode_refs() does it in anothe way, it's using > >> > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, > >> > >> Yeah, I know, I made that fix long time ago. > >> > >> > it creates a clone so we don't have trouble on the original eb's read > >> > lock, it's more reasonable to me. > >> > >> Here it would be overkill to do similar approach, and it's a different > >> problem, we're not attempting to lock one leaf while having some other > >> leaf locked. We just want to transition a lock from spinning to > >> blocking mode - so it was silly to double read lock, we can do > >> directly the transition and make sure the release path call below > >> doesn't release the leaf. No need to waste memory and time cloning a > >> leaf. > > > > Correct, I missed that fact, so the patch is good. > > > > Since we're passing a clone eb into btrfs_ref_to_path(), the check 'if > > (eb != eb_in) will only be false for the first time when it enters into > > while(), maybe we can add 'likely' preference for it. > > What do you mean by 'likely' preference? The macro likely/unlikely? Exactly, the macro likely/unlikely. Thanks, -liubo > > > > > > Thanks, > > > > -liubo > > > >> > >> thanks > >> > >> > > >> > Thanks, > >> > > >> > -liubo > >> > > >> >> > >> >> Cc: stable@vger.kernel.org > >> >> Signed-off-by: Filipe Manana <fdmanana@suse.com> > >> >> --- > >> >> fs/btrfs/backref.c | 4 ++-- > >> >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> >> > >> >> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > >> >> index b90cd37..198a0f8 100644 > >> >> --- a/fs/btrfs/backref.c > >> >> +++ b/fs/btrfs/backref.c > >> >> @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, > >> >> eb = path->nodes[0]; > >> >> /* make sure we can use eb after releasing the path */ > >> >> if (eb != eb_in) { > >> >> - atomic_inc(&eb->refs); > >> >> - btrfs_tree_read_lock(eb); > >> >> btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); > >> >> + path->nodes[0] = NULL; > >> >> + path->locks[0] = 0; > >> >> } > >> >> btrfs_release_path(path); > >> >> iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); > >> >> -- > >> >> 2.7.0.rc3 > >> >> > >> >> -- > >> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >> >> the body of a message to majordomo@vger.kernel.org > >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-04 0:28 ` Liu Bo @ 2016-02-04 0:33 ` Filipe Manana 0 siblings, 0 replies; 10+ messages in thread From: Filipe Manana @ 2016-02-04 0:33 UTC (permalink / raw) To: Liu Bo; +Cc: linux-btrfs@vger.kernel.org On Thu, Feb 4, 2016 at 12:28 AM, Liu Bo <bo.li.liu@oracle.com> wrote: > On Thu, Feb 04, 2016 at 12:23:12AM +0000, Filipe Manana wrote: >> On Thu, Feb 4, 2016 at 12:20 AM, Liu Bo <bo.li.liu@oracle.com> wrote: >> > On Wed, Feb 03, 2016 at 11:24:33PM +0000, Filipe Manana wrote: >> >> On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote: >> >> > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: >> >> >> From: Filipe Manana <fdmanana@suse.com> >> >> >> >> >> >> While doing some tests I ran into an hang on an extent buffer's rwlock >> >> >> that produced the following trace: >> >> >> >> >> >> [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] >> >> >> [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] >> >> >> [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] >> >> >> [39389.800016] irq event stamp: 0 >> >> >> [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) >> >> >> [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> >> [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> >> [39389.800016] softirqs last disabled at (0): [< (null)>] (null) >> >> >> [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 >> >> >> [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 >> >> >> [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 >> >> >> [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 >> >> >> [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 >> >> >> [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 >> >> >> [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 >> >> >> [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 >> >> >> [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 >> >> >> [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 >> >> >> [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 >> >> >> [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> >> >> [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 >> >> >> [39389.800016] Stack: >> >> >> [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 >> >> >> [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 >> >> >> [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c >> >> >> [39389.800016] Call Trace: >> >> >> [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 >> >> >> [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 >> >> >> [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 >> >> >> [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] >> >> >> [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] >> >> >> [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] >> >> >> [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] >> >> >> [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] >> >> >> [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] >> >> >> [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] >> >> >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> >> [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 >> >> >> [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> >> [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d >> >> >> [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea >> >> >> [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 >> >> >> [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 >> >> >> [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f >> >> >> [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 >> >> >> [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] >> >> >> [39389.800012] irq event stamp: 0 >> >> >> [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) >> >> >> [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> >> [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 >> >> >> [39389.800012] softirqs last disabled at (0): [< (null)>] (null) >> >> >> [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 >> >> >> [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 >> >> >> [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 >> >> >> [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 >> >> >> [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 >> >> >> [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 >> >> >> [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c >> >> >> [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 >> >> >> [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 >> >> >> [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 >> >> >> [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 >> >> >> [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> >> >> [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 >> >> >> [39389.800012] Stack: >> >> >> [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 >> >> >> [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 >> >> >> [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 >> >> >> [39389.800012] Call Trace: >> >> >> [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c >> >> >> [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 >> >> >> [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] >> >> >> [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] >> >> >> [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] >> >> >> [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] >> >> >> [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] >> >> >> [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] >> >> >> [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 >> >> >> [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] >> >> >> [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf >> >> >> [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc >> >> >> [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] >> >> >> [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] >> >> >> [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 >> >> >> [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] >> >> >> [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] >> >> >> [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] >> >> >> [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] >> >> >> [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] >> >> >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 >> >> >> [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 >> >> >> [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc >> >> >> [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d >> >> >> [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea >> >> >> [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 >> >> >> [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 >> >> >> [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f >> >> >> [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 >> >> >> >> >> >> This happens because in the code path executed by the inode_paths ioctl we >> >> >> end up nesting two calls to read lock a leaf's rwlock when after the first >> >> >> call to read_lock() and before the second call to read_lock(), another >> >> >> task (running the delayed items as part of a transaction commit) has >> >> >> already called write_lock() against the leaf's rwlock. This situation is >> >> >> illustrated by the following diagram: >> >> >> >> >> >> Task A Task B >> >> >> >> >> >> btrfs_ref_to_path() btrfs_commit_transaction() >> >> >> read_lock(&eb->lock); >> >> >> >> >> >> btrfs_run_delayed_items() >> >> >> __btrfs_commit_inode_delayed_items() >> >> >> __btrfs_update_delayed_inode() >> >> >> btrfs_lookup_inode() >> >> >> >> >> >> write_lock(&eb->lock); >> >> >> --> task waits for lock >> >> >> >> >> >> read_lock(&eb->lock); >> >> >> --> makes this task hang >> >> >> forever (and task B too >> >> >> of course) >> >> >> >> >> >> So fix this by avoiding doing the nested read lock, which is easily >> >> >> avoidable. This issue does not happen if task B calls write_lock() after >> >> >> task A does the second call to read_lock(), however there does not seem >> >> >> to exist anything in the documentation that mentions what is the expected >> >> >> behaviour for recursive locking of rwlocks (leaving the idea that doing >> >> >> so is not a good usage of rwlocks). >> >> > >> >> > iterate_inode_refs() does it in anothe way, it's using >> >> > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock, >> >> >> >> Yeah, I know, I made that fix long time ago. >> >> >> >> > it creates a clone so we don't have trouble on the original eb's read >> >> > lock, it's more reasonable to me. >> >> >> >> Here it would be overkill to do similar approach, and it's a different >> >> problem, we're not attempting to lock one leaf while having some other >> >> leaf locked. We just want to transition a lock from spinning to >> >> blocking mode - so it was silly to double read lock, we can do >> >> directly the transition and make sure the release path call below >> >> doesn't release the leaf. No need to waste memory and time cloning a >> >> leaf. >> > >> > Correct, I missed that fact, so the patch is good. >> > >> > Since we're passing a clone eb into btrfs_ref_to_path(), the check 'if >> > (eb != eb_in) will only be false for the first time when it enters into >> > while(), maybe we can add 'likely' preference for it. >> >> What do you mean by 'likely' preference? The macro likely/unlikely? > > Exactly, the macro likely/unlikely. What's the point? Makes the code more unreadable and most often has negative effects (and cpus are good doing branch predictions nowadays I believe). Plus this isn't a performance critical piece of code (if it was, btrfs would be the fastest fs ever). Also it's not necessarily always true only for the first iteration, when used from send for example, eb_in isn't a cloned eb. Thanks. > > Thanks, > > -liubo > >> >> >> > >> > Thanks, >> > >> > -liubo >> > >> >> >> >> thanks >> >> >> >> > >> >> > Thanks, >> >> > >> >> > -liubo >> >> > >> >> >> >> >> >> Cc: stable@vger.kernel.org >> >> >> Signed-off-by: Filipe Manana <fdmanana@suse.com> >> >> >> --- >> >> >> fs/btrfs/backref.c | 4 ++-- >> >> >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> >> >> >> >> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c >> >> >> index b90cd37..198a0f8 100644 >> >> >> --- a/fs/btrfs/backref.c >> >> >> +++ b/fs/btrfs/backref.c >> >> >> @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, >> >> >> eb = path->nodes[0]; >> >> >> /* make sure we can use eb after releasing the path */ >> >> >> if (eb != eb_in) { >> >> >> - atomic_inc(&eb->refs); >> >> >> - btrfs_tree_read_lock(eb); >> >> >> btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); >> >> >> + path->nodes[0] = NULL; >> >> >> + path->locks[0] = 0; >> >> >> } >> >> >> btrfs_release_path(path); >> >> >> iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); >> >> >> -- >> >> >> 2.7.0.rc3 >> >> >> >> >> >> -- >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> >> >> the body of a message to majordomo@vger.kernel.org >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-03 20:26 [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl fdmanana 2016-02-03 23:19 ` Liu Bo @ 2016-02-04 23:03 ` Liu Bo 2016-02-05 2:22 ` [PATCH v2] " fdmanana 2 siblings, 0 replies; 10+ messages in thread From: Liu Bo @ 2016-02-04 23:03 UTC (permalink / raw) To: linux-btrfs On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > While doing some tests I ran into an hang on an extent buffer's rwlock > that produced the following trace: > > [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] > [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] > [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > [39389.800016] irq event stamp: 0 > [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) > [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800016] softirqs last disabled at (0): [< (null)>] (null) > [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1 > [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 > [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 > [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 > [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 > [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 > [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 > [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 > [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 > [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 > [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 > [39389.800016] Stack: > [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 > [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 > [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c > [39389.800016] Call Trace: > [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 > [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 > [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 > [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] > [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] > [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] > [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] > [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] > [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] > [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] > [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 > [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 > [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] > [39389.800012] irq event stamp: 0 > [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) > [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 > [39389.800012] softirqs last disabled at (0): [< (null)>] (null) > [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 > [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 > [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 > [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 > [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 > [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c > [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 > [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 > [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 > [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 > [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 > [39389.800012] Stack: > [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 > [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 > [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 > [39389.800012] Call Trace: > [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c > [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 > [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] > [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] > [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] > [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] > [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] > [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] > [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 > [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] > [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf > [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc > [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] > [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] > [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 > [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] > [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] > [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] > [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] > [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] > [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 > [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc > [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d > [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea > [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 > [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 > [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f > [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 > > This happens because in the code path executed by the inode_paths ioctl we > end up nesting two calls to read lock a leaf's rwlock when after the first > call to read_lock() and before the second call to read_lock(), another > task (running the delayed items as part of a transaction commit) has > already called write_lock() against the leaf's rwlock. This situation is > illustrated by the following diagram: > > Task A Task B > > btrfs_ref_to_path() btrfs_commit_transaction() > read_lock(&eb->lock); > > btrfs_run_delayed_items() > __btrfs_commit_inode_delayed_items() > __btrfs_update_delayed_inode() > btrfs_lookup_inode() > > write_lock(&eb->lock); > --> task waits for lock > > read_lock(&eb->lock); > --> makes this task hang > forever (and task B too > of course) > > So fix this by avoiding doing the nested read lock, which is easily > avoidable. This issue does not happen if task B calls write_lock() after > task A does the second call to read_lock(), however there does not seem > to exist anything in the documentation that mentions what is the expected > behaviour for recursive locking of rwlocks (leaving the idea that doing > so is not a good usage of rwlocks). For anyone who cares about why the 2nd read_lock is blocked... On archs which have enabled CONFIG_QUEUED_RWLOCKS, For nested read_lock who is not in interrupted context, it will be blocked if a write_lock is waiting in line. This is the fair queued rwlock[1]. [1] commit 70af2f8a4f48d6cebdf92d533d3aef37853ce6de (locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks) Thanks, -liubo > > Cc: stable@vger.kernel.org > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/backref.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > index b90cd37..198a0f8 100644 > --- a/fs/btrfs/backref.c > +++ b/fs/btrfs/backref.c > @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, > eb = path->nodes[0]; > /* make sure we can use eb after releasing the path */ > if (eb != eb_in) { > - atomic_inc(&eb->refs); > - btrfs_tree_read_lock(eb); > btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); > + path->nodes[0] = NULL; > + path->locks[0] = 0; > } > btrfs_release_path(path); > iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-03 20:26 [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl fdmanana 2016-02-03 23:19 ` Liu Bo 2016-02-04 23:03 ` Liu Bo @ 2016-02-05 2:22 ` fdmanana 2 siblings, 0 replies; 10+ messages in thread From: fdmanana @ 2016-02-05 2:22 UTC (permalink / raw) To: linux-btrfs From: Filipe Manana <fdmanana@suse.com> While doing some tests I ran into an hang on an extent buffer's rwlock that produced the following trace: [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166] [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165] [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] [39389.800016] irq event stamp: 0 [39389.800016] hardirqs last enabled at (0): [< (null)>] (null) [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 [39389.800016] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 [39389.800016] softirqs last disabled at (0): [< (null)>] (null) [39389.800016] CPU: 14 PID: 32165 Comm: fdm-stress Not tainted 4.4.0-rc6-btrfs-next-18+ #1 [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000 [39389.800016] RIP: 0010:[<ffffffff810902af>] [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158 [39389.800016] RSP: 0018:ffff8800a185fb80 EFLAGS: 00000202 [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101 [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001 [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000 [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98 [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40 [39389.800016] FS: 00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000 [39389.800016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0 [39389.800016] Stack: [39389.800016] ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0 [39389.800016] ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895 [39389.800016] ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c [39389.800016] Call Trace: [39389.800016] [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60 [39389.800016] [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41 [39389.800016] [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44 [39389.800016] [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs] [39389.800016] [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs] [39389.800016] [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs] [39389.800016] [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs] [39389.800016] [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs] [39389.800016] [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs] [39389.800016] [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs] [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc [39389.800016] [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15 [39389.800016] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc [39389.800016] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d [39389.800016] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea [39389.800016] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 [39389.800016] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 [39389.800016] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8 [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs] [39389.800012] irq event stamp: 0 [39389.800012] hardirqs last enabled at (0): [< (null)>] (null) [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 [39389.800012] softirqs last enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35 [39389.800012] softirqs last disabled at (0): [< (null)>] (null) [39389.800012] CPU: 15 PID: 32166 Comm: fdm-stress Tainted: G L 4.4.0-rc6-btrfs-next-18+ #1 [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000 [39389.800012] RIP: 0010:[<ffffffff81091e8d>] [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72 [39389.800012] RSP: 0018:ffff880034a639f0 EFLAGS: 00000206 [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000 [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000 [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98 [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00 [39389.800012] FS: 00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000 [39389.800012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0 [39389.800012] Stack: [39389.800012] ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98 [39389.800012] ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00 [39389.800012] ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58 [39389.800012] Call Trace: [39389.800012] [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c [39389.800012] [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41 [39389.800012] [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs] [39389.800012] [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs] [39389.800012] [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs] [39389.800012] [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs] [39389.800012] [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs] [39389.800012] [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs] [39389.800012] [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28 [39389.800012] [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs] [39389.800012] [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf [39389.800012] [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc [39389.800012] [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs] [39389.800012] [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs] [39389.800012] [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44 [39389.800012] [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs] [39389.800012] [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs] [39389.800012] [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs] [39389.800012] [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs] [39389.800012] [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs] [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 [39389.800012] [<ffffffff81140261>] ? __might_fault+0x4c/0xa7 [39389.800012] [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc [39389.800012] [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d [39389.800012] [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea [39389.800012] [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71 [39389.800012] [<ffffffff8118240e>] SyS_ioctl+0x57/0x79 [39389.800012] [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00 This happens because in the code path executed by the inode_paths ioctl we end up nesting two calls to read lock a leaf's rwlock when after the first call to read_lock() and before the second call to read_lock(), another task (running the delayed items as part of a transaction commit) has already called write_lock() against the leaf's rwlock. This situation is illustrated by the following diagram: Task A Task B btrfs_ref_to_path() btrfs_commit_transaction() read_lock(&eb->lock); btrfs_run_delayed_items() __btrfs_commit_inode_delayed_items() __btrfs_update_delayed_inode() btrfs_lookup_inode() write_lock(&eb->lock); --> task waits for lock read_lock(&eb->lock); --> makes this task hang forever (and task B too of course) So fix this by avoiding doing the nested read lock, which is easily avoidable. This issue does not happen if task B calls write_lock() after task A does the second call to read_lock(), however there does not seem to exist anything in the documentation that mentions what is the expected behaviour for recursive locking of rwlocks (leaving the idea that doing so is not a good usage of rwlocks). Also, as a side effect necessary for this fix, make sure we do not needlessly read lock extent buffers when the input path has skip_locking set (used when called from send). Cc: stable@vger.kernel.org Signed-off-by: Filipe Manana <fdmanana@suse.com> --- V2: Folded the patch titled "Btrfs: send, fix extent buffer tree lock assertion failure (BUG_ON)" into this patch since the problem it fixes could not happen without the first version of this patch (the only problem was unnecessary locking of extent buffers when path->skip_locking == 1). fs/btrfs/backref.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index b90cd37..f6dac40 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1406,7 +1406,8 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, read_extent_buffer(eb, dest + bytes_left, name_off, name_len); if (eb != eb_in) { - btrfs_tree_read_unlock_blocking(eb); + if (!path->skip_locking) + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); } ret = btrfs_find_item(fs_root, path, parent, 0, @@ -1426,9 +1427,10 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path, eb = path->nodes[0]; /* make sure we can use eb after releasing the path */ if (eb != eb_in) { - atomic_inc(&eb->refs); - btrfs_tree_read_lock(eb); - btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + if (!path->skip_locking) + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + path->nodes[0] = NULL; + path->locks[0] = 0; } btrfs_release_path(path); iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref); -- 2.7.0.rc3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-02-05 8:57 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-03 20:26 [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl fdmanana 2016-02-03 23:19 ` Liu Bo 2016-02-03 23:24 ` Filipe Manana 2016-02-03 23:50 ` Chris Mason 2016-02-04 0:20 ` Liu Bo 2016-02-04 0:23 ` Filipe Manana 2016-02-04 0:28 ` Liu Bo 2016-02-04 0:33 ` Filipe Manana 2016-02-04 23:03 ` Liu Bo 2016-02-05 2:22 ` [PATCH v2] " fdmanana
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).