From: Filipe Manana <fdmanana@kernel.org>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: remove some duplicated checks in btrfs_previous_*_item()
Date: Tue, 14 Dec 2021 10:27:42 +0000 [thread overview]
Message-ID: <Ybhxngswi6vN+vH4@debian9.Home> (raw)
In-Reply-To: <20211214071411.48324-1-wqu@suse.com>
On Tue, Dec 14, 2021 at 03:14:11PM +0800, Qu Wenruo wrote:
> In btrfs_previous_item() and btrfs_previous_extent_item() we check
> btrfs_header_nritems() in a loop.
>
> But in fact we don't need to do it in a loop at all.
>
> Firstly, if a tree block is empty, the whole tree is empty and nodes[0]
> is the tree root.
> We don't need to do anything and can exit immediately.
>
> Secondly, the only timing we could get a slots[0] >= nritems is when the
> function get called. After the first slots[0]--, the slot should always
> be <= nritems.
>
> So this patch will move all the nritems related checks out of the loop
> by:
>
> - Check nritems of nodes[0] to do a quick exit
>
> - Sanitize path->slots[0] before entering the loop
> I doubt if there is any caller setting path->slots[0] beyond
> nritems + 1 (setting to nritems is possible when item is not found).
> Sanitize path->slots[0] to nritems won't hurt anyway.
>
> - Unconditionally reduce path->slots[0]
> Since we're sure all tree blocks should not be empty, and
> btrfs_prev_leaf() will return with path->slots[0] == nritems, we
> can safely reduce slots[0] unconditionally.
> Just keep an extra ASSERT() to make sure no tree block is empty.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/ctree.c | 52 +++++++++++++++++++++++++++++++++---------------
> 1 file changed, 36 insertions(+), 16 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 781537692a4a..555345aed84d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -4704,23 +4704,39 @@ int btrfs_previous_item(struct btrfs_root *root,
> {
> struct btrfs_key found_key;
> struct extent_buffer *leaf;
> - u32 nritems;
> + const u32 nritems = btrfs_header_nritems(path->nodes[0]);
> int ret;
>
> + /*
> + * Check nritems first, if the tree is empty we exit immediately.
> + * And if this leave is not empty, none of the tree blocks of this root
> + * should be empty.
> + */
> + if (nritems == 0)
> + return 1;
> +
> + /*
> + * If we're several slots beyond nritems, we reset slot to nritems,
> + * and it will be handled properly inside the loop.
> + */
> + if (unlikely(path->slots[0] > nritems))
> + path->slots[0] = nritems;
> +
> while (1) {
> if (path->slots[0] == 0) {
> ret = btrfs_prev_leaf(root, path);
> if (ret != 0)
> return ret;
> - } else {
> - path->slots[0]--;
> }
> leaf = path->nodes[0];
> - nritems = btrfs_header_nritems(leaf);
> - if (nritems == 0)
> - return 1;
> - if (path->slots[0] == nritems)
> - path->slots[0]--;
> + ASSERT(btrfs_header_nritems(leaf));
> + /*
> + * This is for both regular case and above btrfs_prev_leaf() case.
> + * As btrfs_prev_leaf() will return with path->slots[0] == nritems,
> + * and we're sure no tree block is empty, we can go safely
> + * reduce slots[0] here.
> + */
> + path->slots[0]--;
No, this is wrong.
btrfs_prev_leaf() computes the previous key and does a search_slot() for it.
With this unconditional decrement we can miss the previous key in 2 ways:
1) The previous key exists, and btrfs_prev_leaf() leaves us in a leaf that has it
and the slot is btrfs_header_nritems(prev_leaf) - 1 -> the last key on a leaf;
2) The previous key exists, but after btrfs_prev_leaf() released the path and
before it called search_slot(), there was a balance operation and it pushed the
previous key in the middle of the leaf we had, or some other leaf, and the slot
will be something less than btrfs_header_nritems(), it can even be 0.
That's why we have the call to header_nritems() in the loop, and check if slots[0]
is == to nritems before decrementing...
Thanks.
>
> btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> if (found_key.objectid < min_objectid)
> @@ -4745,23 +4761,27 @@ int btrfs_previous_extent_item(struct btrfs_root *root,
> {
> struct btrfs_key found_key;
> struct extent_buffer *leaf;
> - u32 nritems;
> + const u32 nritems = btrfs_header_nritems(path->nodes[0]);
> int ret;
>
> + /*
> + * Refer to btrfs_previous_item() for the reason of all nritems related
> + * checks/modifications.
> + */
> + if (nritems == 0)
> + return 1;
> + if (unlikely(path->slots[0] > nritems))
> + path->slots[0] = nritems;
> +
> while (1) {
> if (path->slots[0] == 0) {
> ret = btrfs_prev_leaf(root, path);
> if (ret != 0)
> return ret;
> - } else {
> - path->slots[0]--;
> }
> leaf = path->nodes[0];
> - nritems = btrfs_header_nritems(leaf);
> - if (nritems == 0)
> - return 1;
> - if (path->slots[0] == nritems)
> - path->slots[0]--;
> + ASSERT(btrfs_header_nritems(leaf));
> + path->slots[0]--;
>
> btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> if (found_key.objectid < min_objectid)
> --
> 2.34.1
>
next prev parent reply other threads:[~2021-12-14 10:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-14 7:14 [PATCH] btrfs: remove some duplicated checks in btrfs_previous_*_item() Qu Wenruo
2021-12-14 10:27 ` Filipe Manana [this message]
2021-12-14 10:50 ` Qu Wenruo
2021-12-14 14:37 ` Josef Bacik
2021-12-14 23:33 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ybhxngswi6vN+vH4@debian9.Home \
--to=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.