Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Johannes Thumshirn <jth@kernel.org>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>
Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: Re: [PATCH v4 5/7] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
Date: Sat, 6 Jul 2024 08:58:54 +0930	[thread overview]
Message-ID: <29cd4e79-de21-41ea-8241-2706d37fe4ae@gmx.com> (raw)
In-Reply-To: <20240705-b4-rst-updates-v4-5-f3eed3f2cfad@kernel.org>



在 2024/7/6 00:43, Johannes Thumshirn 写道:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Don't hold the dev_replace rwsem for the entirety of btrfs_map_block().
>
> It is only needed to protect
> a) calls to find_live_mirror() and
> b) calling into handle_ops_on_dev_replace().
>
> But there is no need to hold the rwsem for any kind of set_io_stripe()
> calls.
>
> So relax taking the dev_replace rwsem to only protect both cases and check
> if the device replace status has changed in the meantime, for which we have
> to re-do the find_live_mirror() calls.
>
> This fixes a deadlock on raid-stripe-tree where device replace performs a
> scrub operation, which in turn calls into btrfs_map_block() to find the
> physical location of the block.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/volumes.c | 28 +++++++++++++++++-----------
>   1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index fcedc43ef291..4209419244a1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6650,14 +6650,9 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   	max_len = btrfs_max_io_len(map, map_offset, &io_geom);
>   	*length = min_t(u64, map->chunk_len - map_offset, max_len);
>
> +again:
>   	down_read(&dev_replace->rwsem);
>   	dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
> -	/*
> -	 * Hold the semaphore for read during the whole operation, write is
> -	 * requested at commit time but must wait.
> -	 */
> -	if (!dev_replace_is_ongoing)
> -		up_read(&dev_replace->rwsem);
>
>   	switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>   	case BTRFS_BLOCK_GROUP_RAID0:
> @@ -6695,6 +6690,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   			   "stripe index math went horribly wrong, got stripe_index=%u, num_stripes=%u",
>   			   io_geom.stripe_index, map->num_stripes);
>   		ret = -EINVAL;
> +		up_read(&dev_replace->rwsem);
>   		goto out;
>   	}
>
> @@ -6710,6 +6706,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   		 */
>   		num_alloc_stripes += 2;
>
> +	up_read(&dev_replace->rwsem);
> +
>   	/*
>   	 * If this I/O maps to a single device, try to return the device and
>   	 * physical block information on the stack instead of allocating an
> @@ -6782,6 +6780,18 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   		goto out;
>   	}
>
> +	/*
> +	 * Check if something changed the dev_replace state since
> +	 * we've checked it for the last time and if redo the whole
> +	 * mapping operation.
> +	 */
> +	down_read(&dev_replace->rwsem);
> +	if (dev_replace_is_ongoing !=
> +	    btrfs_dev_replace_is_ongoing(dev_replace)) {
> +		up_read(&dev_replace->rwsem);
> +		goto again;
> +	}
> +
>   	if (op != BTRFS_MAP_READ)
>   		io_geom.max_errors = btrfs_chunk_max_errors(map);
>
> @@ -6789,6 +6799,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   	    op != BTRFS_MAP_READ) {
>   		handle_ops_on_dev_replace(bioc, dev_replace, logical, &io_geom);
>   	}
> +	up_read(&dev_replace->rwsem);
>
>   	*bioc_ret = bioc;
>   	bioc->num_stripes = io_geom.num_stripes;
> @@ -6796,11 +6807,6 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   	bioc->mirror_num = io_geom.mirror_num;
>
>   out:
> -	if (dev_replace_is_ongoing) {
> -		lockdep_assert_held(&dev_replace->rwsem);
> -		/* Unlock and let waiting writers proceed */
> -		up_read(&dev_replace->rwsem);
> -	}
>   	btrfs_free_chunk_map(map);
>   	return ret;
>   }
>

  reply	other threads:[~2024-07-05 23:29 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-05 15:13 [PATCH v4 0/7] btrfs: rst: updates for RAID stripe tree Johannes Thumshirn
2024-07-05 15:13 ` [PATCH v4 1/7] btrfs: replace stripe extents Johannes Thumshirn
2024-07-05 23:19   ` Qu Wenruo
2024-07-08 11:43     ` Johannes Thumshirn
2024-07-08 22:14       ` Qu Wenruo
2024-07-09  5:49         ` Johannes Thumshirn
2024-07-09  5:36       ` Qu Wenruo
2024-07-05 15:13 ` [PATCH v4 2/7] btrfs: rst: don't print tree dump in case lookup fails Johannes Thumshirn
2024-07-05 23:20   ` Qu Wenruo
2024-07-05 15:13 ` [PATCH v4 3/7] btrfs: split RAID stripes on deletion Johannes Thumshirn
2024-07-05 23:26   ` Qu Wenruo
2024-07-08  4:56     ` Johannes Thumshirn
2024-07-08  5:20       ` Qu Wenruo
2024-07-08  5:25         ` Johannes Thumshirn
2024-07-08 10:52           ` Johannes Thumshirn
2024-07-08 23:02             ` Qu Wenruo
2024-07-09  5:51               ` Johannes Thumshirn
2024-07-05 15:13 ` [PATCH v4 4/7] btrfs: stripe-tree: add selftests Johannes Thumshirn
2024-07-05 15:13 ` [PATCH v4 5/7] btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block Johannes Thumshirn
2024-07-05 23:28   ` Qu Wenruo [this message]
2024-07-05 15:13 ` [PATCH v4 6/7] btrfs: rename brtfs_io_stripe::is_scrub to commit_root Johannes Thumshirn
2024-07-05 23:32   ` Qu Wenruo
2024-07-05 15:13 ` [PATCH v4 7/7] btrfs: stripe-tree: also look at commit root on relocation Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29cd4e79-de21-41ea-8241-2706d37fe4ae@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=josef@toxicpanda.com \
    --cc=jth@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox