linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
Date: Wed, 3 Feb 2016 16:28:13 -0800	[thread overview]
Message-ID: <20160204002813.GC13006@localhost.localdomain> (raw)
In-Reply-To: <CAL3q7H5FJmvmKLOWZh9QGc3ukvgLT5Z=iJuoYRh+wKws8N-sqw@mail.gmail.com>

On Thu, Feb 04, 2016 at 12:23:12AM +0000, Filipe Manana wrote:
> On Thu, Feb 4, 2016 at 12:20 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > On Wed, Feb 03, 2016 at 11:24:33PM +0000, Filipe Manana wrote:
> >> On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> >> > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote:
> >> >> From: Filipe Manana <fdmanana@suse.com>
> >> >>
> >> >> While doing some tests I ran into an hang on an extent buffer's rwlock
> >> >> that produced the following trace:
> >> >>
> >> >> [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
> >> >> [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
> >> >> [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
> >> >> [39389.800016] irq event stamp: 0
> >> >> [39389.800016] hardirqs last  enabled at (0): [<          (null)>]           (null)
> >> >> [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
> >> >> [39389.800016] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
> >> >> [39389.800016] softirqs last disabled at (0): [<          (null)>]           (null)
> >> >> [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1
> >> >> [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
> >> >> [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
> >> >> [39389.800016] RIP: 0010:[<ffffffff810902af>]  [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158
> >> >> [39389.800016] RSP: 0018:ffff8800a185fb80  EFLAGS: 00000202
> >> >> [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
> >> >> [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
> >> >> [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
> >> >> [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
> >> >> [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
> >> >> [39389.800016] FS:  00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
> >> >> [39389.800016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> >> [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
> >> >> [39389.800016] Stack:
> >> >> [39389.800016]  ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
> >> >> [39389.800016]  ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
> >> >> [39389.800016]  ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
> >> >> [39389.800016] Call Trace:
> >> >> [39389.800016]  [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60
> >> >> [39389.800016]  [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41
> >> >> [39389.800016]  [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44
> >> >> [39389.800016]  [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
> >> >> [39389.800016]  [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs]
> >> >> [39389.800016]  [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs]
> >> >> [39389.800016]  [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs]
> >> >> [39389.800016]  [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs]
> >> >> [39389.800016]  [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs]
> >> >> [39389.800016]  [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs]
> >> >> [39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
> >> >> [39389.800016]  [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15
> >> >> [39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
> >> >> [39389.800016]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
> >> >> [39389.800016]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
> >> >> [39389.800016]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
> >> >> [39389.800016]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
> >> >> [39389.800016]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
> >> >> [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
> >> >> [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
> >> >> [39389.800012] irq event stamp: 0
> >> >> [39389.800012] hardirqs last  enabled at (0): [<          (null)>]           (null)
> >> >> [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
> >> >> [39389.800012] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
> >> >> [39389.800012] softirqs last disabled at (0): [<          (null)>]           (null)
> >> >> [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G             L  4.4.0-rc6-btrfs-next-18+ #1
> >> >> [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
> >> >> [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
> >> >> [39389.800012] RIP: 0010:[<ffffffff81091e8d>]  [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72
> >> >> [39389.800012] RSP: 0018:ffff880034a639f0  EFLAGS: 00000206
> >> >> [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
> >> >> [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
> >> >> [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
> >> >> [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
> >> >> [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
> >> >> [39389.800012] FS:  00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
> >> >> [39389.800012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> >> [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
> >> >> [39389.800012] Stack:
> >> >> [39389.800012]  ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
> >> >> [39389.800012]  ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
> >> >> [39389.800012]  ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
> >> >> [39389.800012] Call Trace:
> >> >> [39389.800012]  [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c
> >> >> [39389.800012]  [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41
> >> >> [39389.800012]  [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs]
> >> >> [39389.800012]  [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs]
> >> >> [39389.800012]  [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs]
> >> >> [39389.800012]  [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs]
> >> >> [39389.800012]  [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs]
> >> >> [39389.800012]  [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs]
> >> >> [39389.800012]  [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28
> >> >> [39389.800012]  [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs]
> >> >> [39389.800012]  [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf
> >> >> [39389.800012]  [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc
> >> >> [39389.800012]  [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
> >> >> [39389.800012]  [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
> >> >> [39389.800012]  [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44
> >> >> [39389.800012]  [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
> >> >> [39389.800012]  [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs]
> >> >> [39389.800012]  [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs]
> >> >> [39389.800012]  [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs]
> >> >> [39389.800012]  [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs]
> >> >> [39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
> >> >> [39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
> >> >> [39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
> >> >> [39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
> >> >> [39389.800012]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
> >> >> [39389.800012]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
> >> >> [39389.800012]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
> >> >> [39389.800012]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
> >> >> [39389.800012]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
> >> >> [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00
> >> >>
> >> >> This happens because in the code path executed by the inode_paths ioctl we
> >> >> end up nesting two calls to read lock a leaf's rwlock when after the first
> >> >> call to read_lock() and before the second call to read_lock(), another
> >> >> task (running the delayed items as part of a transaction commit) has
> >> >> already called write_lock() against the leaf's rwlock. This situation is
> >> >> illustrated by the following diagram:
> >> >>
> >> >>          Task A                       Task B
> >> >>
> >> >>   btrfs_ref_to_path()               btrfs_commit_transaction()
> >> >>     read_lock(&eb->lock);
> >> >>
> >> >>                                       btrfs_run_delayed_items()
> >> >>                                         __btrfs_commit_inode_delayed_items()
> >> >>                                           __btrfs_update_delayed_inode()
> >> >>                                             btrfs_lookup_inode()
> >> >>
> >> >>                                               write_lock(&eb->lock);
> >> >>                                                 --> task waits for lock
> >> >>
> >> >>     read_lock(&eb->lock);
> >> >>     --> makes this task hang
> >> >>         forever (and task B too
> >> >>       of course)
> >> >>
> >> >> So fix this by avoiding doing the nested read lock, which is easily
> >> >> avoidable. This issue does not happen if task B calls write_lock() after
> >> >> task A does the second call to read_lock(), however there does not seem
> >> >> to exist anything in the documentation that mentions what is the expected
> >> >> behaviour for recursive locking of rwlocks (leaving the idea that doing
> >> >> so is not a good usage of rwlocks).
> >> >
> >> > iterate_inode_refs() does it in anothe way, it's using
> >> > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock,
> >>
> >> Yeah, I know, I made that fix long time ago.
> >>
> >> > it creates a clone so we don't have trouble on the original eb's read
> >> > lock, it's more reasonable to me.
> >>
> >> Here it would be overkill to do similar approach, and it's a different
> >> problem, we're not attempting to lock one leaf while having some other
> >> leaf locked. We just want to transition a lock from spinning to
> >> blocking mode - so it was silly to double read lock, we can do
> >> directly the transition and make sure the release path call below
> >> doesn't release the leaf. No need to waste memory and time cloning a
> >> leaf.
> >
> > Correct, I missed that fact, so the patch is good.
> >
> > Since we're passing a clone eb into btrfs_ref_to_path(), the check 'if
> > (eb != eb_in) will only be false for the first time when it enters into
> > while(), maybe we can add 'likely' preference for it.
> 
> What do you mean by 'likely' preference? The macro likely/unlikely?

Exactly, the macro likely/unlikely.

Thanks,

-liubo

> 
> 
> >
> > Thanks,
> >
> > -liubo
> >
> >>
> >> thanks
> >>
> >> >
> >> > Thanks,
> >> >
> >> > -liubo
> >> >
> >> >>
> >> >> Cc: stable@vger.kernel.org
> >> >> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >> >> ---
> >> >>  fs/btrfs/backref.c | 4 ++--
> >> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> >> >> index b90cd37..198a0f8 100644
> >> >> --- a/fs/btrfs/backref.c
> >> >> +++ b/fs/btrfs/backref.c
> >> >> @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
> >> >>               eb = path->nodes[0];
> >> >>               /* make sure we can use eb after releasing the path */
> >> >>               if (eb != eb_in) {
> >> >> -                     atomic_inc(&eb->refs);
> >> >> -                     btrfs_tree_read_lock(eb);
> >> >>                       btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
> >> >> +                     path->nodes[0] = NULL;
> >> >> +                     path->locks[0] = 0;
> >> >>               }
> >> >>               btrfs_release_path(path);
> >> >>               iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
> >> >> --
> >> >> 2.7.0.rc3
> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-02-04  0:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 20:26 [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl fdmanana
2016-02-03 23:19 ` Liu Bo
2016-02-03 23:24   ` Filipe Manana
2016-02-03 23:50     ` Chris Mason
2016-02-04  0:20     ` Liu Bo
2016-02-04  0:23       ` Filipe Manana
2016-02-04  0:28         ` Liu Bo [this message]
2016-02-04  0:33           ` Filipe Manana
2016-02-04 23:03 ` Liu Bo
2016-02-05  2:22 ` [PATCH v2] " fdmanana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160204002813.GC13006@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).