public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v4 3/6] btrfs: repurpose split_zoned_em for partial dio splitting
Date: Tue, 21 Mar 2023 13:33:58 -0700	[thread overview]
Message-ID: <20230321203358.GA24993@zen> (raw)
In-Reply-To: <5faf0148f526b4e9eb373c177de3c70284999ce7.1679416511.git.boris@bur.io>

On Tue, Mar 21, 2023 at 09:45:30AM -0700, Boris Burkov wrote:
> In a subsequent patch I will be "extracting" a partial dio write bio
> from its ordered extent, creating a 1:1 bio<->ordered_extent relation.
> This is necessary to avoid triggering an assertion in unpin_extent_cache
> called from btrfs_finish_ordered_io that checks that the em matches the
> finished ordered extent.
> 
> Since there is already a function which splits an uncompressed
> extent_map for a zoned bio use case, adapt it to this new, similar
> use case. Mostly, modify it to handle the case where the extent_map is
> bigger than the ordered_extent, and we cannot assume the em "post" split
> can be computed from the ordered_extent and bio. This comes up in
> btrfs/250, for example.
> 
> I felt that these relaxations where not so damaging to the legibility of
> the zoned case as to merit a fully separate codepath, but I admit that is
> not my area of expertise.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/inode.c | 104 ++++++++++++++++++++++++++++++++---------------
>  1 file changed, 71 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5ab486f448eb..2f8baf4797ea 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2512,37 +2512,59 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
>  }
>  
>  /*
> - * Split an extent_map at [start, start + len]
> + * Split out a middle extent_map [start, start+len] from within an extent_map.
>   *
> - * This function is intended to be used only for extract_ordered_extent().
> + * @inode: the inode to which the extent map belongs.
> + * @start: the start index of the middle split
> + * @len: the length of the middle split
> + *
> + * The result is two or three extent_maps inserted in the tree, depending on
> + * whether start and len imply an uncovered area at the beginning or end of the
> + * extent map. If the implied split lines up with the end or beginning, there
> + * will only be two extent maps in the resulting split, otherwise there will be
> + * three. (If they both match, the split operation is a no-op)
> + *
> + * extent map splitting assumptions:
> + * end = start + len
> + * em-end = em-start + em-len
> + * start >= em-start
> + * len < em-len
> + * end <= em-end
> + *
> + * Diagrams explaining the splitting cases:
> + * original em:
> + *   [em-start---start---end---em-end)
> + * resulting ems:
> + * start != em-start && end != em-end (full tri split):
> + *   [em-start---start) [start---end) [end---em-end)
> + * start == em-start (no pre split):
> + *   [em-start---end) [end---em-end)
> + * end == em-end (no post split):
> + *   [em-start---start) [start--em-end)
> + *
> + * Returns: 0 on success, -errno on failure.
>   */
> -static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
> -			  u64 pre, u64 post)
> +static int split_em(struct btrfs_inode *inode, u64 start, u64 len)
>  {
>  	struct extent_map_tree *em_tree = &inode->extent_tree;
>  	struct extent_map *em;
> +	u64 pre_start, pre_len, pre_end;
> +	u64 mid_start, mid_len, mid_end;
> +	u64 post_start, post_len, post_end;
>  	struct extent_map *split_pre = NULL;
>  	struct extent_map *split_mid = NULL;
>  	struct extent_map *split_post = NULL;
>  	int ret = 0;
>  	unsigned long flags;
>  
> -	/* Sanity check */
> -	if (pre == 0 && post == 0)
> -		return 0;
> -
>  	split_pre = alloc_extent_map();
> -	if (pre)
> -		split_mid = alloc_extent_map();
> -	if (post)
> -		split_post = alloc_extent_map();
> -	if (!split_pre || (pre && !split_mid) || (post && !split_post)) {
> +	split_mid = alloc_extent_map();
> +	split_post = alloc_extent_map();
> +	if (!split_pre || !split_mid || !split_post) {
>  		ret = -ENOMEM;
>  		goto out;
>  	}
>  
> -	ASSERT(pre + post < len);
> -
>  	lock_extent(&inode->io_tree, start, start + len - 1, NULL);
>  	write_lock(&em_tree->lock);
>  	em = lookup_extent_mapping(em_tree, start, len);
> @@ -2551,19 +2573,38 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
>  		goto out_unlock;
>  	}
>  
> -	ASSERT(em->len == len);
> +	pre_start = em->start;
> +	pre_end = start;
> +	pre_len = pre_end - pre_start;
> +	mid_start = start;
> +	mid_end = start + len;
> +	mid_len = len;
> +	post_start = mid_end;
> +	post_end = em->start + em->len;
> +	post_len = post_end - post_start;
> +	ASSERT(pre_start == em->start);
> +	ASSERT(pre_start + pre_len == mid_start);
> +	ASSERT(mid_start + mid_len == post_start);
> +	ASSERT(post_start + post_len == em->start + em->len);
> +
> +	/* Sanity check */
> +	if (pre_len == 0 && post_len == 0) {
> +		ret = 0;
> +		goto out_unlock;
> +	}
> +
>  	ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
>  	ASSERT(em->block_start < EXTENT_MAP_LAST_BYTE);
> -	ASSERT(test_bit(EXTENT_FLAG_PINNED, &em->flags));

I am currently digging into this more, as I think we want the em to be
pinned. The length is likely the same issue, so maybe I can figure them
both out. I was seeing it violated on btrfs/250.

>  	ASSERT(!test_bit(EXTENT_FLAG_LOGGING, &em->flags));
>  	ASSERT(!list_empty(&em->list));
>  
>  	flags = em->flags;
> -	clear_bit(EXTENT_FLAG_PINNED, &em->flags);
> +	if (test_bit(EXTENT_FLAG_PINNED, &em->flags))
> +		clear_bit(EXTENT_FLAG_PINNED, &em->flags);
>  
>  	/* First, replace the em with a new extent_map starting from * em->start */
>  	split_pre->start = em->start;
> -	split_pre->len = (pre ? pre : em->len - post);
> +	split_pre->len = (pre_len ? pre_len : mid_len);
>  	split_pre->orig_start = split_pre->start;
>  	split_pre->block_start = em->block_start;
>  	split_pre->block_len = split_pre->len;
> @@ -2577,16 +2618,15 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
>  
>  	/*
>  	 * Now we only have an extent_map at:
> -	 *     [em->start, em->start + pre] if pre != 0
> -	 *     [em->start, em->start + em->len - post] if pre == 0
> +	 *     [em->start, em->start + pre_len] if pre_len != 0
> +	 *     [em->start, em->start + mid_len] if pre == 0
>  	 */
> -
> -	if (pre) {
> +	if (pre_len) {
>  		/* Insert the middle extent_map */
> -		split_mid->start = em->start + pre;
> -		split_mid->len = em->len - pre - post;
> +		split_mid->start = mid_start;
> +		split_mid->len = mid_len;
>  		split_mid->orig_start = split_mid->start;
> -		split_mid->block_start = em->block_start + pre;
> +		split_mid->block_start = em->block_start + pre_len;
>  		split_mid->block_len = split_mid->len;
>  		split_mid->orig_block_len = split_mid->block_len;
>  		split_mid->ram_bytes = split_mid->len;
> @@ -2596,11 +2636,11 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
>  		add_extent_mapping(em_tree, split_mid, 1);
>  	}
>  
> -	if (post) {
> -		split_post->start = em->start + em->len - post;
> -		split_post->len = post;
> +	if (post_len) {
> +		split_post->start = post_start;
> +		split_post->len = post_len;
>  		split_post->orig_start = split_post->start;
> -		split_post->block_start = em->block_start + em->len - post;
> +		split_post->block_start = em->block_start + pre_len + mid_len;
>  		split_post->block_len = split_post->len;
>  		split_post->orig_block_len = split_post->block_len;
>  		split_post->ram_bytes = split_post->len;
> @@ -2632,7 +2672,6 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
>  	u64 len = bbio->bio.bi_iter.bi_size;
>  	struct btrfs_inode *inode = bbio->inode;
>  	struct btrfs_ordered_extent *ordered;
> -	u64 file_len;
>  	u64 end = start + len;
>  	u64 ordered_end;
>  	u64 pre, post;
> @@ -2671,14 +2710,13 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
>  		goto out;
>  	}
>  
> -	file_len = ordered->num_bytes;
>  	pre = start - ordered->disk_bytenr;
>  	post = ordered_end - end;
>  
>  	ret = btrfs_split_ordered_extent(ordered, pre, post);
>  	if (ret)
>  		goto out;
> -	ret = split_zoned_em(inode, bbio->file_offset, file_len, pre, post);
> +	ret = split_em(inode, bbio->file_offset, len);
>  
>  out:
>  	btrfs_put_ordered_extent(ordered);
> -- 
> 2.38.1
> 

  reply	other threads:[~2023-03-21 20:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21 16:45 [PATCH v4 0/6] btrfs: fix corruption caused by partial dio writes Boris Burkov
2023-03-21 16:45 ` [PATCH v4 1/6] btrfs: add function to create and return an ordered extent Boris Burkov
2023-03-21 16:45 ` [PATCH v4 2/6] btrfs: stash ordered extent in dio_data during iomap dio Boris Burkov
2023-03-21 16:45 ` [PATCH v4 3/6] btrfs: repurpose split_zoned_em for partial dio splitting Boris Burkov
2023-03-21 20:33   ` Boris Burkov [this message]
2023-03-21 16:45 ` [PATCH v4 4/6] btrfs: return ordered_extent splits from bio extraction Boris Burkov
2023-03-21 16:45 ` [PATCH v4 5/6] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent Boris Burkov
2023-03-21 16:45 ` [PATCH v4 6/6] btrfs: split partial dio bios before submit Boris Burkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230321203358.GA24993@zen \
    --to=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox