From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v4 3/6] btrfs: repurpose split_zoned_em for partial dio splitting
Date: Tue, 21 Mar 2023 13:33:58 -0700 [thread overview]
Message-ID: <20230321203358.GA24993@zen> (raw)
In-Reply-To: <5faf0148f526b4e9eb373c177de3c70284999ce7.1679416511.git.boris@bur.io>
On Tue, Mar 21, 2023 at 09:45:30AM -0700, Boris Burkov wrote:
> In a subsequent patch I will be "extracting" a partial dio write bio
> from its ordered extent, creating a 1:1 bio<->ordered_extent relation.
> This is necessary to avoid triggering an assertion in unpin_extent_cache
> called from btrfs_finish_ordered_io that checks that the em matches the
> finished ordered extent.
>
> Since there is already a function which splits an uncompressed
> extent_map for a zoned bio use case, adapt it to this new, similar
> use case. Mostly, modify it to handle the case where the extent_map is
> bigger than the ordered_extent, and we cannot assume the em "post" split
> can be computed from the ordered_extent and bio. This comes up in
> btrfs/250, for example.
>
> I felt that these relaxations where not so damaging to the legibility of
> the zoned case as to merit a fully separate codepath, but I admit that is
> not my area of expertise.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/inode.c | 104 ++++++++++++++++++++++++++++++++---------------
> 1 file changed, 71 insertions(+), 33 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5ab486f448eb..2f8baf4797ea 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2512,37 +2512,59 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
> }
>
> /*
> - * Split an extent_map at [start, start + len]
> + * Split out a middle extent_map [start, start+len] from within an extent_map.
> *
> - * This function is intended to be used only for extract_ordered_extent().
> + * @inode: the inode to which the extent map belongs.
> + * @start: the start index of the middle split
> + * @len: the length of the middle split
> + *
> + * The result is two or three extent_maps inserted in the tree, depending on
> + * whether start and len imply an uncovered area at the beginning or end of the
> + * extent map. If the implied split lines up with the end or beginning, there
> + * will only be two extent maps in the resulting split, otherwise there will be
> + * three. (If they both match, the split operation is a no-op)
> + *
> + * extent map splitting assumptions:
> + * end = start + len
> + * em-end = em-start + em-len
> + * start >= em-start
> + * len < em-len
> + * end <= em-end
> + *
> + * Diagrams explaining the splitting cases:
> + * original em:
> + * [em-start---start---end---em-end)
> + * resulting ems:
> + * start != em-start && end != em-end (full tri split):
> + * [em-start---start) [start---end) [end---em-end)
> + * start == em-start (no pre split):
> + * [em-start---end) [end---em-end)
> + * end == em-end (no post split):
> + * [em-start---start) [start--em-end)
> + *
> + * Returns: 0 on success, -errno on failure.
> */
> -static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
> - u64 pre, u64 post)
> +static int split_em(struct btrfs_inode *inode, u64 start, u64 len)
> {
> struct extent_map_tree *em_tree = &inode->extent_tree;
> struct extent_map *em;
> + u64 pre_start, pre_len, pre_end;
> + u64 mid_start, mid_len, mid_end;
> + u64 post_start, post_len, post_end;
> struct extent_map *split_pre = NULL;
> struct extent_map *split_mid = NULL;
> struct extent_map *split_post = NULL;
> int ret = 0;
> unsigned long flags;
>
> - /* Sanity check */
> - if (pre == 0 && post == 0)
> - return 0;
> -
> split_pre = alloc_extent_map();
> - if (pre)
> - split_mid = alloc_extent_map();
> - if (post)
> - split_post = alloc_extent_map();
> - if (!split_pre || (pre && !split_mid) || (post && !split_post)) {
> + split_mid = alloc_extent_map();
> + split_post = alloc_extent_map();
> + if (!split_pre || !split_mid || !split_post) {
> ret = -ENOMEM;
> goto out;
> }
>
> - ASSERT(pre + post < len);
> -
> lock_extent(&inode->io_tree, start, start + len - 1, NULL);
> write_lock(&em_tree->lock);
> em = lookup_extent_mapping(em_tree, start, len);
> @@ -2551,19 +2573,38 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
> goto out_unlock;
> }
>
> - ASSERT(em->len == len);
> + pre_start = em->start;
> + pre_end = start;
> + pre_len = pre_end - pre_start;
> + mid_start = start;
> + mid_end = start + len;
> + mid_len = len;
> + post_start = mid_end;
> + post_end = em->start + em->len;
> + post_len = post_end - post_start;
> + ASSERT(pre_start == em->start);
> + ASSERT(pre_start + pre_len == mid_start);
> + ASSERT(mid_start + mid_len == post_start);
> + ASSERT(post_start + post_len == em->start + em->len);
> +
> + /* Sanity check */
> + if (pre_len == 0 && post_len == 0) {
> + ret = 0;
> + goto out_unlock;
> + }
> +
> ASSERT(!test_bit(EXTENT_FLAG_COMPRESSED, &em->flags));
> ASSERT(em->block_start < EXTENT_MAP_LAST_BYTE);
> - ASSERT(test_bit(EXTENT_FLAG_PINNED, &em->flags));
I am currently digging into this more, as I think we want the em to be
pinned. The length is likely the same issue, so maybe I can figure them
both out. I was seeing it violated on btrfs/250.
> ASSERT(!test_bit(EXTENT_FLAG_LOGGING, &em->flags));
> ASSERT(!list_empty(&em->list));
>
> flags = em->flags;
> - clear_bit(EXTENT_FLAG_PINNED, &em->flags);
> + if (test_bit(EXTENT_FLAG_PINNED, &em->flags))
> + clear_bit(EXTENT_FLAG_PINNED, &em->flags);
>
> /* First, replace the em with a new extent_map starting from * em->start */
> split_pre->start = em->start;
> - split_pre->len = (pre ? pre : em->len - post);
> + split_pre->len = (pre_len ? pre_len : mid_len);
> split_pre->orig_start = split_pre->start;
> split_pre->block_start = em->block_start;
> split_pre->block_len = split_pre->len;
> @@ -2577,16 +2618,15 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
>
> /*
> * Now we only have an extent_map at:
> - * [em->start, em->start + pre] if pre != 0
> - * [em->start, em->start + em->len - post] if pre == 0
> + * [em->start, em->start + pre_len] if pre_len != 0
> + * [em->start, em->start + mid_len] if pre == 0
> */
> -
> - if (pre) {
> + if (pre_len) {
> /* Insert the middle extent_map */
> - split_mid->start = em->start + pre;
> - split_mid->len = em->len - pre - post;
> + split_mid->start = mid_start;
> + split_mid->len = mid_len;
> split_mid->orig_start = split_mid->start;
> - split_mid->block_start = em->block_start + pre;
> + split_mid->block_start = em->block_start + pre_len;
> split_mid->block_len = split_mid->len;
> split_mid->orig_block_len = split_mid->block_len;
> split_mid->ram_bytes = split_mid->len;
> @@ -2596,11 +2636,11 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
> add_extent_mapping(em_tree, split_mid, 1);
> }
>
> - if (post) {
> - split_post->start = em->start + em->len - post;
> - split_post->len = post;
> + if (post_len) {
> + split_post->start = post_start;
> + split_post->len = post_len;
> split_post->orig_start = split_post->start;
> - split_post->block_start = em->block_start + em->len - post;
> + split_post->block_start = em->block_start + pre_len + mid_len;
> split_post->block_len = split_post->len;
> split_post->orig_block_len = split_post->block_len;
> split_post->ram_bytes = split_post->len;
> @@ -2632,7 +2672,6 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
> u64 len = bbio->bio.bi_iter.bi_size;
> struct btrfs_inode *inode = bbio->inode;
> struct btrfs_ordered_extent *ordered;
> - u64 file_len;
> u64 end = start + len;
> u64 ordered_end;
> u64 pre, post;
> @@ -2671,14 +2710,13 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
> goto out;
> }
>
> - file_len = ordered->num_bytes;
> pre = start - ordered->disk_bytenr;
> post = ordered_end - end;
>
> ret = btrfs_split_ordered_extent(ordered, pre, post);
> if (ret)
> goto out;
> - ret = split_zoned_em(inode, bbio->file_offset, file_len, pre, post);
> + ret = split_em(inode, bbio->file_offset, len);
>
> out:
> btrfs_put_ordered_extent(ordered);
> --
> 2.38.1
>
next prev parent reply other threads:[~2023-03-21 20:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-21 16:45 [PATCH v4 0/6] btrfs: fix corruption caused by partial dio writes Boris Burkov
2023-03-21 16:45 ` [PATCH v4 1/6] btrfs: add function to create and return an ordered extent Boris Burkov
2023-03-21 16:45 ` [PATCH v4 2/6] btrfs: stash ordered extent in dio_data during iomap dio Boris Burkov
2023-03-21 16:45 ` [PATCH v4 3/6] btrfs: repurpose split_zoned_em for partial dio splitting Boris Burkov
2023-03-21 20:33 ` Boris Burkov [this message]
2023-03-21 16:45 ` [PATCH v4 4/6] btrfs: return ordered_extent splits from bio extraction Boris Burkov
2023-03-21 16:45 ` [PATCH v4 5/6] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent Boris Burkov
2023-03-21 16:45 ` [PATCH v4 6/6] btrfs: split partial dio bios before submit Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230321203358.GA24993@zen \
--to=boris@bur.io \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox