public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Sun YangKai <sunk67188@gmail.com>, linux-btrfs@vger.kernel.org
Cc: fdmanana@kernel.org
Subject: Re: [PATCH v2 4/4] btrfs: ctree: cleanup btrfs_prev_leaf()
Date: Sun, 8 Feb 2026 09:39:30 +1030	[thread overview]
Message-ID: <bf7c477d-ce4c-4d6e-9538-1baa33e39a02@suse.com> (raw)
In-Reply-To: <20251211072442.15920-6-sunk67188@gmail.com>



在 2025/12/11 17:52, Sun YangKai 写道:
> There's a common parttern in callers of btrfs_prev_leaf:
> p->slots[0]-- if p->slots[0] points to a slot with invalid item(nritem).
> 
> So just make btrfs_prev_leaf() ensure that path->slots[0] points to a
> valid slot and cleanup its over complex logic.
> 
> And then remove the related logic and cleanup the callers.
> 
> This will make things much simpler.
> 
> No functional changes.
> 
> A. Details about changes in callers:
> 
> ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0])) in callers
> is enough to make sure that nritems != 0 and slots[0] points to a valid
> btrfs_item.

I don't think the change is safe.

There are callers doing proper checks for empty trees, e.g. 
btrfs_previous_extent_item() and btrfs_previous_item(), now you will 
just ignore that for kernels without CONFIG_BTRFS_ASSERT, or crash the 
kernel if CONFIG_BTRFS_ASSERT is selected.

If you're just cleaning up btrfs_prev_leaf(), then you should keep the 
callers the same without changing their behaviors.

I think there may be some corner case races when tree deleting and some 
other work are happening at the same time.

> 
> And getting a `nritems==0` when btrfs_prev_leaf() returns 0 is a logic
> error because btrfs_pref_leaf() should always
> 
> 1. either find a non-empty leaf

Nope, there is no such guarantee in the first place.
btrfs_prev_leaf() will unlock the path, thus all kinds of modification 
can happen between btrfs_release_path() and btrfs_search_slot().

This assumption is just wrong.

> 2. or return 1
> 
> So we can use ASSERT safely here.
> 
> B. Details about cleanup of btrfs_prev_leaf().
> 
> The previous implementation works like this:
> 
> 0) Get a previous key by "dec by 1" of the original key. Let's call it
>     search key. It's obvious that search key is less than original key
>     and there's no key between them.
> 
> 1) Call btrfs_search_slot() with search key.
> 
> 2) If we got an error or an exact match, early return.
> 
> 3) If p->slots[0] points to the original item, p->slots[0]-- to make sure
>     that we will not return the same item again. This may happen because
>     there might be some tree balancing happened so the original item is no
>     longer at slot 0.
> 
> 4) Check if the key of the item at slot 0 is (less than the original key
>     / less than or equal to search key) to verify if we got a previous leaf.
> 
> However, 3) and 4) are over complex. We only need to check if
> p->slots[0] == 0 because:
> 
> 3a) If p->slots[0] == 0, there's no key less than or equal to search key
>      in the tree, which means the original key is lowest in the tree. In
>      this case, there's no previous leaf, we should return 1.
> 
> 3b) If p->slots[0] != 0, using p->slots[0]-- is enough to get a valid
>      previous item neither missing anything nor return the original item
>      again because:
> 
>      i) p->slots[0] == nritems, which means all keys in the leaf are less
>         than search key so the leaf is a previous leaf. We only need to
>         p->slots[0]-- to get a valid previous item.
> 
>      ii) p->slots[0] < nritems, p->slots[0] points to an item whose key
>          is greater than search key(probably the original item if it was
>          not deleted), and p->slots[0] - 1 points to an item whose key is
>          less than search key. So use p->slots[0]-- to get the previous
>          item and we will neigher miss anything nor return the original
>          item again. This handles the case 3) in original implementation.
> 
> Signed-off-by: Sun YangKai <sunk67188@gmail.com>
> ---
>   fs/btrfs/ctree.c | 99 ++++++++++--------------------------------------
>   1 file changed, 19 insertions(+), 80 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 0a0157db0b0c..3026d956c7fb 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2376,12 +2376,9 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
>   static int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
>   {
>   	struct btrfs_key key;
> -	struct btrfs_key orig_key;
> -	struct btrfs_disk_key found_key;
>   	int ret;
>   
>   	btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
> -	orig_key = key;
>   
>   	if (key.offset > 0) {
>   		key.offset--;
> @@ -2401,48 +2398,12 @@ static int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
>   	if (ret <= 0)
>   		return ret;
>   
> -	/*
> -	 * Previous key not found. Even if we were at slot 0 of the leaf we had
> -	 * before releasing the path and calling btrfs_search_slot(), we now may
> -	 * be in a slot pointing to the same original key - this can happen if
> -	 * after we released the path, one of more items were moved from a
> -	 * sibling leaf into the front of the leaf we had due to an insertion
> -	 * (see push_leaf_right()).
> -	 * If we hit this case and our slot is > 0 and just decrement the slot
> -	 * so that the caller does not process the same key again, which may or
> -	 * may not break the caller, depending on its logic.
> -	 */

Although the change is mostly fine, if you want to keep the existing 
behavior, you should return 0 for empty tree to keep the old behavior.

Furthermore, you have to explain why it's safe not only in the commit 
message, but also comments.

This not a small change, and very low-level. Just doing black magic is 
not acceptable.

And you need something better than your commit message as comments.

In short, you should:

- Not change the callers

- Better comments

> -	if (path->slots[0] < btrfs_header_nritems(path->nodes[0])) {
> -		btrfs_item_key(path->nodes[0], &found_key, path->slots[0]);
> -		ret = btrfs_comp_keys(&found_key, &orig_key);
> -		if (ret == 0) {
> -			if (path->slots[0] > 0) {
> -				path->slots[0]--;
> -				return 0;
> -			}
> -			/*
> -			 * At slot 0, same key as before, it means orig_key is
> -			 * the lowest, leftmost, key in the tree. We're done.
> -			 */
> -			return 1;
> -		}
> -	}
> +	/* There's no smaller keys in the whole tree. */
> +	if (path->slots[0] == 0)
> +		return 1;
>   
> -	btrfs_item_key(path->nodes[0], &found_key, 0);
> -	ret = btrfs_comp_keys(&found_key, &key);
> -	/*
> -	 * We might have had an item with the previous key in the tree right
> -	 * before we released our path. And after we released our path, that
> -	 * item might have been pushed to the first slot (0) of the leaf we
> -	 * were holding due to a tree balance. Alternatively, an item with the
> -	 * previous key can exist as the only element of a leaf (big fat item).
> -	 * Therefore account for these 2 cases, so that our callers (like
> -	 * btrfs_previous_item) don't miss an existing item with a key matching
> -	 * the previous key we computed above.
> -	 */
> -	if (ret <= 0)
> -		return 0;
> -	return 1;
> +	path->slots[0]--;
> +	return 0;
>   }
>   
>   /*
> @@ -2473,19 +2434,11 @@ int btrfs_search_slot_for_read(struct btrfs_root *root,
>   		if (p->slots[0] >= btrfs_header_nritems(p->nodes[0]))
>   			return btrfs_next_leaf(root, p);
>   	} else {
> -		if (p->slots[0] == 0) {
> -			ret = btrfs_prev_leaf(root, p);
> -			if (ret < 0)
> -				return ret;
> -			if (!ret) {
> -				if (p->slots[0] == btrfs_header_nritems(p->nodes[0]))
> -					p->slots[0]--;
> -				return 0;
> -			}
> +		/* We have no lower key in the tree. */
> +		if (p->slots[0] == 0)
>   			return 1;
> -		} else {
> -			p->slots[0]--;
> -		}
> +
> +		p->slots[0]--;
>   	}
>   	return 0;
>   }
> @@ -4969,26 +4922,19 @@ int btrfs_previous_item(struct btrfs_root *root,
>   			int type)
>   {
>   	struct btrfs_key found_key;
> -	struct extent_buffer *leaf;
> -	u32 nritems;
>   	int ret;
>   
>   	while (1) {
>   		if (path->slots[0] == 0) {
>   			ret = btrfs_prev_leaf(root, path);
> -			if (ret != 0)
> +			if (ret)
>   				return ret;
> -		} else {
> -			path->slots[0]--;
> -		}
> -		leaf = path->nodes[0];
> -		nritems = btrfs_header_nritems(leaf);
> -		if (nritems == 0)
> -			return 1;
> -		if (path->slots[0] == nritems)
> +		} else
>   			path->slots[0]--;
>   
> -		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +		ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0]));
> +
> +		btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]);
>   		if (found_key.objectid < min_objectid)
>   			break;
>   		if (found_key.type == type)
> @@ -5010,26 +4956,19 @@ int btrfs_previous_extent_item(struct btrfs_root *root,
>   			struct btrfs_path *path, u64 min_objectid)
>   {
>   	struct btrfs_key found_key;
> -	struct extent_buffer *leaf;
> -	u32 nritems;
>   	int ret;
>   
>   	while (1) {
>   		if (path->slots[0] == 0) {
>   			ret = btrfs_prev_leaf(root, path);
> -			if (ret != 0)
> +			if (ret)
>   				return ret;
> -		} else {
> -			path->slots[0]--;
> -		}
> -		leaf = path->nodes[0];
> -		nritems = btrfs_header_nritems(leaf);
> -		if (nritems == 0)
> -			return 1;
> -		if (path->slots[0] == nritems)
> +		} else
>   			path->slots[0]--;
>   
> -		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +		ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0]));
> +
> +		btrfs_item_key_to_cpu(path->nodes[0], &found_key, path->slots[0]);
>   		if (found_key.objectid < min_objectid)
>   			break;
>   		if (found_key.type == BTRFS_EXTENT_ITEM_KEY ||


  reply	other threads:[~2026-02-07 23:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-11  7:22 [PATCH v2 0/4] btrfs: some cleanups for two ctree functions Sun YangKai
2025-12-11  7:22 ` [PATCH v2 1/4] btrfs: don't set @return_any for btrfs_search_slot_for_read in btrfs_read_qgroup_config Sun YangKai
2025-12-11  7:22 ` [PATCH v2 2/4] btrfs: don't set return_any @return_any for btrfs_search_slot_for_read in get_last_extent() Sun YangKai
2025-12-11  7:22 ` [PATCH v2 3/4] btrfs: cleanup btrfs_search_slot_for_read() Sun YangKai
2025-12-11  7:22 ` [PATCH v2 4/4] btrfs: ctree: cleanup btrfs_prev_leaf() Sun YangKai
2026-02-07 23:09   ` Qu Wenruo [this message]
2026-02-08  2:42     ` Sun YangKai
2026-02-08  3:13       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bf7c477d-ce4c-4d6e-9538-1baa33e39a02@suse.com \
    --to=wqu@suse.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sunk67188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox