From: Filipe Manana <fdmanana@kernel.org>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v2 1/2] btrfs: defrag: bring back the old file extent search behavior
Date: Wed, 9 Feb 2022 15:35:28 +0000 [thread overview]
Message-ID: <YgPfQKpcmLPlsAzZ@debian9.Home> (raw)
In-Reply-To: <342f827cc636b27472a7e8c7981eb7ef00a4deb2.1644301903.git.wqu@suse.com>
On Tue, Feb 08, 2022 at 02:36:30PM +0800, Qu Wenruo wrote:
> For defrag, we don't really want to use btrfs_get_extent() to iterate
> all extent maps of an inode.
>
> The reasons are:
>
> - btrfs_get_extent() can merge extent maps
> And the result em has the higher generation of the two, causing defrag
> to mark unnecessary part of such merged large extent map.
>
> This in fact can result extra IO for autodefrag in v5.16+ kernels.
>
> However this patch is not going to completely solve the problem, as
> one can still using read() to trigger extent map reading, and got
> them merged.
>
> The completely solution for the extent map merging generation problem
> will come as an standalone fix.
>
> - btrfs_get_extent() caches the extent map result
> Normally it's fine, but for defrag the target range may not get
> another read/write for a long long time.
> Such cache would only increase the memory usage.
>
> - btrfs_get_extent() doesn't skip older extent map
> Unlike the old find_new_extent() which uses btrfs_search_forward() to
> skip the older subtree, thus it will pick up unnecessary extent maps.
>
> This patch will fix the regression by introducing defrag_get_extent() to
> replace the btrfs_get_extent() call.
>
> This helper will:
>
> - Not cache the file extent we found
> It will search the file extent and manually convert it to em.
>
> - Use btrfs_search_foward() to skip entire ranges which is modified in
> the past
>
> This should reduce the IO for autodefrag.
>
> Reported-by: Filipe Manana <fdmanana@suse.com>
> Fixes: 7b508037d4ca ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/ioctl.c | 150 +++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 146 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 0b417d7cefa8..fb4c25e5ff5f 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1017,8 +1017,144 @@ static noinline int btrfs_mksnapshot(const struct path *parent,
> return ret;
> }
>
> +/*
> + * Defrag specific helper to get an extent map.
> + *
> + * Differences between this and btrfs_get_extent() are:
> + * - No extent_map will be added to inode->extent_tree
> + * To reduce memory usage in the long run.
> + *
> + * - Extra optimization to skip file extents older than @newer_than
> + * By using btrfs_search_forward() we can skip entire file ranges that
> + * have extents created in past transactions, because btrfs_search_forward()
> + * will not visit leaves and nodes with a generation smaller than given
> + * minimal generation threshold (@newer_than).
> + *
> + * Return valid em if we find a file extent matching the requirement.
> + * Return NULL if we can not find a file extent matching the requirement.
> + *
> + * Return ERR_PTR() for error.
> + */
> +static struct extent_map *defrag_get_extent(struct btrfs_inode *inode,
> + u64 start, u64 newer_than)
> +{
> + struct btrfs_root *root = inode->root;
> + struct btrfs_file_extent_item *fi;
> + struct btrfs_path path = {};
> + struct extent_map *em;
> + struct btrfs_key key;
> + u64 ino = btrfs_ino(inode);
> + int ret;
> +
> + em = alloc_extent_map();
> + if (!em) {
> + ret = -ENOMEM;
> + goto err;
> + }
> +
> + key.objectid = ino;
> + key.type = BTRFS_EXTENT_DATA_KEY;
> + key.offset = start;
> +
> + if (newer_than) {
> + ret = btrfs_search_forward(root, &key, &path, newer_than);
> + if (ret < 0)
> + goto err;
> + /* Can't find anything newer */
> + if (ret > 0)
> + goto not_found;
> + } else {
> + ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
> + if (ret < 0)
> + goto err;
> + }
> + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
This isn't safe, because in case an exact key was not found (btrfs_search_slot()
returned 1), path.slots[0] may point to a slot beyond the last item in a leaf.
> + /* Perfect match, no need to go one slot back */
> + if (key.objectid == ino && key.type == BTRFS_EXTENT_DATA_KEY &&
> + key.offset == start)
> + goto iterate;
> +
> + /* We didn't find a perfect match, needs to go one slot back */
> + if (path.slots[0] > 0) {
> + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
> + if (key.objectid == ino && key.type == BTRFS_EXTENT_DATA_KEY)
> + path.slots[0]--;
> + }
> +
> +iterate:
> + /* Iterate through the path to find a file extent covering @start */
> + while (true) {
> + u64 extent_end;
> +
> + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0]))
> + goto next;
> +
> + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
> +
> + /*
> + * We may go one slot back to INODE_REF/XATTR item, then
> + * need to go forward until we reach an EXTENT_DATA.
> + */
> + if (key.objectid < ino || key.type < BTRFS_EXTENT_DATA_KEY)
> + goto next;
I would make it:
if (WARN_ON(key.objectid < ino) || key.type < ...)
If we end up at a key with a lower objectid, than something went seriously
wrong somewhere else. In case a key does not exists, btrfs_search_slot() (and
search_forward(), or anything that uses the common binary search), always
leaves us at the first key greater then the one we want or at one slot past
the last item in a leaf (i.e. the location where the key should be at).
So if an inode exists, we should always at least have its inode item, and in
case it has no extent items, btrfs_search_slot() will leave us at the slot where
the key should be at, which can not have an objectid less than 'ino', and should
be the next inode, as we don't key types > BTRFS_EXTENT_DATA_KEY in subvolume
trees associated with inodes.
We do such warning like that at btrfs_drop_extents() for example. It's not a
critical error, as we can safely proceed and then exit and do nothing, but it
helps to signal that something odd (but harmless for this code) happened.
The rest looks fine.
Thanks.
> +
> + /* It's beyond our target range, definitely not extent found */
> + if (key.objectid > ino || key.type > BTRFS_EXTENT_DATA_KEY)
> + goto not_found;
> +
> + /*
> + * | |<- File extent ->|
> + * \- start
> + *
> + * This means there is a hole between start and key.offset.
> + */
> + if (key.offset > start) {
> + em->start = start;
> + em->orig_start = start;
> + em->block_start = EXTENT_MAP_HOLE;
> + em->len = key.offset - start;
> + break;
> + }
> +
> + fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
> + struct btrfs_file_extent_item);
> + extent_end = btrfs_file_extent_end(&path);
> +
> + /*
> + * |<- file extent ->| |
> + * \- start
> + *
> + * We haven't reach start, search next slot.
> + */
> + if (extent_end <= start)
> + goto next;
> +
> + /* Now this extent covers @start, convert it to em */
> + btrfs_extent_item_to_extent_map(inode, &path, fi, false, em);
> + break;
> +next:
> + ret = btrfs_next_item(root, &path);
> + if (ret < 0)
> + goto err;
> + if (ret > 0)
> + goto not_found;
> + }
> + btrfs_release_path(&path);
> + return em;
> +
> +not_found:
> + btrfs_release_path(&path);
> + free_extent_map(em);
> + return NULL;
> +
> +err:
> + btrfs_release_path(&path);
> + free_extent_map(em);
> + return ERR_PTR(ret);
> +}
> +
> static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
> - bool locked)
> + u64 newer_than, bool locked)
> {
> struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
> struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> @@ -1040,7 +1176,7 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start,
> /* get the big lock and read metadata off disk */
> if (!locked)
> lock_extent_bits(io_tree, start, end, &cached);
> - em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, sectorsize);
> + em = defrag_get_extent(BTRFS_I(inode), start, newer_than);
> if (!locked)
> unlock_extent_cached(io_tree, start, end, &cached);
>
> @@ -1068,7 +1204,12 @@ static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em,
> if (em->start + em->len >= i_size_read(inode))
> return ret;
>
> - next = defrag_lookup_extent(inode, em->start + em->len, locked);
> + /*
> + * We want to check if the next extent can be merged with the current
> + * one, which can be an extent created in a past generation, so we pass
> + * a minimum generation of 0 to defrag_lookup_extent().
> + */
> + next = defrag_lookup_extent(inode, em->start + em->len, 0, locked);
> /* No more em or hole */
> if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE)
> goto out;
> @@ -1214,7 +1355,8 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
> u64 range_len;
>
> last_is_target = false;
> - em = defrag_lookup_extent(&inode->vfs_inode, cur, locked);
> + em = defrag_lookup_extent(&inode->vfs_inode, cur,
> + ctrl->newer_than, locked);
> if (!em)
> break;
>
> --
> 2.35.0
>
next prev parent reply other threads:[~2022-02-09 15:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-08 6:36 [PATCH v2 0/2] btrfs: defrag: bring back the old file extent search behavior and address merged extent map generation problem Qu Wenruo
2022-02-08 6:36 ` [PATCH v2 1/2] btrfs: defrag: bring back the old file extent search behavior Qu Wenruo
2022-02-09 15:35 ` Filipe Manana [this message]
2022-02-08 6:36 ` [PATCH v2 2/2] btrfs: defrag: don't use merged extent map for their generation check Qu Wenruo
2022-02-09 15:46 ` Filipe Manana
2022-02-10 0:40 ` Qu Wenruo
2022-02-10 15:49 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YgPfQKpcmLPlsAzZ@debian9.Home \
--to=fdmanana@kernel.org \
--cc=fdmanana@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox