linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 1/2] btrfs: fix deadlock between chunk allocation and chunk btree modifications
Date: Thu, 14 Oct 2021 11:20:44 -0400	[thread overview]
Message-ID: <YWhKzHgFFRopolVA@localhost.localdomain> (raw)
In-Reply-To: <0747812264412ce1a8474ff2ec223010a6dce3a0.1634115580.git.fdmanana@suse.com>

On Wed, Oct 13, 2021 at 10:12:49AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When a task is doing some modification to the chunk btree and it is not in
> the context of a chunk allocation or a chunk removal, it can deadlock with
> another task that is currently allocating a new data or metadata chunk.
> 
> These contextes are the following:
> 
> * When relocating a system chunk, when we need to COW the extent buffers
>   that belong to the chunk btree;
> 
> * When adding a new device (ioctl), where we need to add a new device item
>   to the chunk btree;
> 
> * When removing a device (ioctl), where we need to remove a device item
>   from the chunk btree;
> 
> * When resizing a device (ioctl), where we need to update a device item in
>   the chunk btree and may need to relocate a system chunk that lies beyond
>   the new device size when shrinking a device.
> 
> The problem happens due to a sequence of steps like the following:
> 
> 1) Task A starts a data or metadata chunk allocation and it locks the
>    chunk mutex;
> 
> 2) Task B is relocating a system chunk, and when it needs to COW an extent
>    buffer of the chunk btree, it has locked both that extent buffer as
>    well as its parent extent buffer;
> 
> 3) Since there is not enough available system space, either because none
>    of the existing system block groups have enough free space or because
>    the only one with enough free space is in RO mode due to the relocation,
>    task B triggers a new system chunk allocation. It blocks when trying to
>    acquire the chunk mutex, currently held by task A;
> 
> 4) Task A enters btrfs_chunk_alloc_add_chunk_item(), in order to insert
>    the new chunk item into the chunk btree and update the existing device
>    items there. But in order to do that, it has to lock the extent buffer
>    that task B locked at step 2, or its parent extent buffer, but task B
>    is waiting on the chunk mutex, which is currently locked by task A,
>    therefore resulting in a deadlock.
> 
> One example report when the deadlock happens with system chunk relocation:
> 
>   INFO: task kworker/u9:5:546 blocked for more than 143 seconds.
>         Not tainted 5.15.0-rc3+ #1
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:kworker/u9:5    state:D stack:25936 pid:  546 ppid:     2 flags:0x00004000
>   Workqueue: events_unbound btrfs_async_reclaim_metadata_space
>   Call Trace:
>    context_switch kernel/sched/core.c:4940 [inline]
>    __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
>    schedule+0xd3/0x270 kernel/sched/core.c:6366
>    rwsem_down_read_slowpath+0x4ee/0x9d0 kernel/locking/rwsem.c:993
>    __down_read_common kernel/locking/rwsem.c:1214 [inline]
>    __down_read kernel/locking/rwsem.c:1223 [inline]
>    down_read_nested+0xe6/0x440 kernel/locking/rwsem.c:1590
>    __btrfs_tree_read_lock+0x31/0x350 fs/btrfs/locking.c:47
>    btrfs_tree_read_lock fs/btrfs/locking.c:54 [inline]
>    btrfs_read_lock_root_node+0x8a/0x320 fs/btrfs/locking.c:191
>    btrfs_search_slot_get_root fs/btrfs/ctree.c:1623 [inline]
>    btrfs_search_slot+0x13b4/0x2140 fs/btrfs/ctree.c:1728
>    btrfs_update_device+0x11f/0x500 fs/btrfs/volumes.c:2794
>    btrfs_chunk_alloc_add_chunk_item+0x34d/0xea0 fs/btrfs/volumes.c:5504
>    do_chunk_alloc fs/btrfs/block-group.c:3408 [inline]
>    btrfs_chunk_alloc+0x84d/0xf50 fs/btrfs/block-group.c:3653
>    flush_space+0x54e/0xd80 fs/btrfs/space-info.c:670
>    btrfs_async_reclaim_metadata_space+0x396/0xa90 fs/btrfs/space-info.c:953
>    process_one_work+0x9df/0x16d0 kernel/workqueue.c:2297
>    worker_thread+0x90/0xed0 kernel/workqueue.c:2444
>    kthread+0x3e5/0x4d0 kernel/kthread.c:319
>    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>   INFO: task syz-executor:9107 blocked for more than 143 seconds.
>         Not tainted 5.15.0-rc3+ #1
>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>   task:syz-executor    state:D stack:23200 pid: 9107 ppid:  7792 flags:0x00004004
>   Call Trace:
>    context_switch kernel/sched/core.c:4940 [inline]
>    __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
>    schedule+0xd3/0x270 kernel/sched/core.c:6366
>    schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
>    __mutex_lock_common kernel/locking/mutex.c:669 [inline]
>    __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
>    btrfs_chunk_alloc+0x31a/0xf50 fs/btrfs/block-group.c:3631
>    find_free_extent_update_loop fs/btrfs/extent-tree.c:3986 [inline]
>    find_free_extent+0x25cb/0x3a30 fs/btrfs/extent-tree.c:4335
>    btrfs_reserve_extent+0x1f1/0x500 fs/btrfs/extent-tree.c:4415
>    btrfs_alloc_tree_block+0x203/0x1120 fs/btrfs/extent-tree.c:4813
>    __btrfs_cow_block+0x412/0x1620 fs/btrfs/ctree.c:415
>    btrfs_cow_block+0x2f6/0x8c0 fs/btrfs/ctree.c:570
>    btrfs_search_slot+0x1094/0x2140 fs/btrfs/ctree.c:1768
>    relocate_tree_block fs/btrfs/relocation.c:2694 [inline]
>    relocate_tree_blocks+0xf73/0x1770 fs/btrfs/relocation.c:2757
>    relocate_block_group+0x47e/0xc70 fs/btrfs/relocation.c:3673
>    btrfs_relocate_block_group+0x48a/0xc60 fs/btrfs/relocation.c:4070
>    btrfs_relocate_chunk+0x96/0x280 fs/btrfs/volumes.c:3181
>    __btrfs_balance fs/btrfs/volumes.c:3911 [inline]
>    btrfs_balance+0x1f03/0x3cd0 fs/btrfs/volumes.c:4301
>    btrfs_ioctl_balance+0x61e/0x800 fs/btrfs/ioctl.c:4137
>    btrfs_ioctl+0x39ea/0x7b70 fs/btrfs/ioctl.c:4949
>    vfs_ioctl fs/ioctl.c:51 [inline]
>    __do_sys_ioctl fs/ioctl.c:874 [inline]
>    __se_sys_ioctl fs/ioctl.c:860 [inline]
>    __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> So fix this by making sure that whenever we try to modify the chunk btree
> and we are neither in a chunk allocation context nor in a chunk remove
> context, we reserve system space before modifying the chunk btree.
> 
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CACkBjsax51i4mu6C0C3vJqQN3NR_iVuucoeG3U1HXjrgzn5FFQ@mail.gmail.com/
> Fixes: 79bd37120b1495 ("btrfs: rework chunk allocation to avoid exhaustion of the system chunk array")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Looks good

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  parent reply	other threads:[~2021-10-14 15:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 11:03 [PATCH 0/2] btrfs: fix a deadlock between chunk allocation and chunk tree modifications fdmanana
2021-10-07 11:03 ` [PATCH 1/2] btrfs: fix deadlock between chunk allocation and chunk btree modifications fdmanana
2021-10-07 11:04 ` [PATCH 2/2] btrfs: update comments for chunk allocation -ENOSPC cases fdmanana
2021-10-08 15:10 ` [PATCH v2 0/2] btrfs: fix a deadlock between chunk allocation and chunk tree modifications fdmanana
2021-10-08 15:10   ` [PATCH v2 1/2] btrfs: fix deadlock between chunk allocation and chunk btree modifications fdmanana
2021-10-11 16:05     ` Josef Bacik
2021-10-11 17:31       ` Filipe Manana
2021-10-11 17:42         ` Josef Bacik
2021-10-11 18:22           ` Filipe Manana
2021-10-11 18:31             ` Josef Bacik
2021-10-11 19:09               ` Filipe Manana
2021-10-12 21:34                 ` Josef Bacik
2021-10-13  9:19                   ` Filipe Manana
2021-10-08 15:10   ` [PATCH v2 2/2] btrfs: update comments for chunk allocation -ENOSPC cases fdmanana
2021-10-13  9:12 ` [PATCH v3 0/2] btrfs: fix a deadlock between chunk allocation and chunk tree modifications fdmanana
2021-10-13  9:12   ` [PATCH v3 1/2] btrfs: fix deadlock between chunk allocation and chunk btree modifications fdmanana
2021-10-13 14:09     ` Nikolay Borisov
2021-10-13 14:21       ` Filipe Manana
2021-10-18 16:22         ` David Sterba
2021-10-14 15:20     ` Josef Bacik [this message]
2021-10-13  9:12   ` [PATCH v3 2/2] btrfs: update comments for chunk allocation -ENOSPC cases fdmanana
2021-10-14 15:21     ` Josef Bacik
2021-10-18 16:33   ` [PATCH v3 0/2] btrfs: fix a deadlock between chunk allocation and chunk tree modifications David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWhKzHgFFRopolVA@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).