From: Sun YangKai <sunk67188@gmail.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: fdmanana@kernel.org
Subject: Re: [PATCH v2 4/4] btrfs: ctree: cleanup btrfs_prev_leaf()
Date: Sun, 8 Feb 2026 10:42:17 +0800 [thread overview]
Message-ID: <ed9925c6-3664-4158-87ef-48e7316a7128@gmail.com> (raw)
In-Reply-To: <bf7c477d-ce4c-4d6e-9538-1baa33e39a02@suse.com>
On 2026/2/8 07:09, Qu Wenruo wrote:
>
>
> 在 2025/12/11 17:52, Sun YangKai 写道:
>> There's a common parttern in callers of btrfs_prev_leaf:
>> p->slots[0]-- if p->slots[0] points to a slot with invalid item(nritem).
>>
>> So just make btrfs_prev_leaf() ensure that path->slots[0] points to a
>> valid slot and cleanup its over complex logic.
>>
>> And then remove the related logic and cleanup the callers.
>>
>> This will make things much simpler.
>>
>> No functional changes.
>>
>> A. Details about changes in callers:
>>
>> ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0])) in callers
>> is enough to make sure that nritems != 0 and slots[0] points to a valid
>> btrfs_item.
>
> I don't think the change is safe.
>
> There are callers doing proper checks for empty trees, e.g.
> btrfs_previous_extent_item() and btrfs_previous_item(), now you will
> just ignore that for kernels without CONFIG_BTRFS_ASSERT, or crash the
> kernel if CONFIG_BTRFS_ASSERT is selected.
>
> If you're just cleaning up btrfs_prev_leaf(), then you should keep the
> callers the same without changing their behaviors.
Thanks a lot for your code review and it makes a lot of sense. I'll not
touch the callers in the next version.
> I think there may be some corner case races when tree deleting and some
> other work are happening at the same time.
>
>>
>> And getting a `nritems==0` when btrfs_prev_leaf() returns 0 is a logic
>> error because btrfs_pref_leaf() should always
>>
>> 1. either find a non-empty leaf
>
> Nope, there is no such guarantee in the first place.
> btrfs_prev_leaf() will unlock the path, thus all kinds of modification
> can happen between btrfs_release_path() and btrfs_search_slot().
>
> This assumption is just wrong.
Sorry, I got a little confused here. When btrfs_search_slot() returns an
empty leaf, path->slots[0] will be 0 and btrfs_prev_leaf() will return
1. Therefore, if btrfs_prev_leaf() returns 0, the leaf we get should
always be non-empty. Which part of this reasoning is incorrect? Or maybe
I don't get what you want to say.
>> 2. or return 1
>>
>> So we can use ASSERT safely here.
>>
>> B. Details about cleanup of btrfs_prev_leaf().
>>
>> The previous implementation works like this:
>>
>> 0) Get a previous key by "dec by 1" of the original key. Let's call it
>> search key. It's obvious that search key is less than original key
>> and there's no key between them.
>>
>> 1) Call btrfs_search_slot() with search key.
>>
>> 2) If we got an error or an exact match, early return.
>>
>> 3) If p->slots[0] points to the original item, p->slots[0]-- to make sure
>> that we will not return the same item again. This may happen because
>> there might be some tree balancing happened so the original item
>> is no
>> longer at slot 0.
>>
>> 4) Check if the key of the item at slot 0 is (less than the original key
>> / less than or equal to search key) to verify if we got a previous
>> leaf.
>>
>> However, 3) and 4) are over complex. We only need to check if
>> p->slots[0] == 0 because:
>>
>> 3a) If p->slots[0] == 0, there's no key less than or equal to search key
>> in the tree, which means the original key is lowest in the tree. In
>> this case, there's no previous leaf, we should return 1.
>>
>> 3b) If p->slots[0] != 0, using p->slots[0]-- is enough to get a valid
>> previous item neither missing anything nor return the original item
>> again because:
>>
>> i) p->slots[0] == nritems, which means all keys in the leaf are less
>> than search key so the leaf is a previous leaf. We only need to
>> p->slots[0]-- to get a valid previous item.
>>
>> ii) p->slots[0] < nritems, p->slots[0] points to an item whose key
>> is greater than search key(probably the original item if it was
>> not deleted), and p->slots[0] - 1 points to an item whose key is
>> less than search key. So use p->slots[0]-- to get the previous
>> item and we will neigher miss anything nor return the original
>> item again. This handles the case 3) in original implementation.
>>
>> Signed-off-by: Sun YangKai <sunk67188@gmail.com>
>> ---
>> fs/btrfs/ctree.c | 99 ++++++++++--------------------------------------
>> 1 file changed, 19 insertions(+), 80 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 0a0157db0b0c..3026d956c7fb 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -2376,12 +2376,9 @@ int btrfs_search_old_slot(struct btrfs_root
>> *root, const struct btrfs_key *key,
>> static int btrfs_prev_leaf(struct btrfs_root *root, struct
>> btrfs_path *path)
>> {
>> struct btrfs_key key;
>> - struct btrfs_key orig_key;
>> - struct btrfs_disk_key found_key;
>> int ret;
>> btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
>> - orig_key = key;
>> if (key.offset > 0) {
>> key.offset--;
>> @@ -2401,48 +2398,12 @@ static int btrfs_prev_leaf(struct btrfs_root
>> *root, struct btrfs_path *path)
>> if (ret <= 0)
>> return ret;
>> - /*
>> - * Previous key not found. Even if we were at slot 0 of the leaf
>> we had
>> - * before releasing the path and calling btrfs_search_slot(), we
>> now may
>> - * be in a slot pointing to the same original key - this can
>> happen if
>> - * after we released the path, one of more items were moved from a
>> - * sibling leaf into the front of the leaf we had due to an
>> insertion
>> - * (see push_leaf_right()).
>> - * If we hit this case and our slot is > 0 and just decrement the
>> slot
>> - * so that the caller does not process the same key again, which
>> may or
>> - * may not break the caller, depending on its logic.
>> - */
>
> Although the change is mostly fine, if you want to keep the existing
> behavior, you should return 0 for empty tree to keep the old behavior.
I've not thought about the empty tree case before. But I think the
previous behavior in empty tree case is not proper and I don't want to
keep the old behavior in this case. All callers is doing
path->slots[0]-- if btrfs_prev_leaf() returns 0 and it will cause an
unexpected underflow although they'll check if nritems is 0 later. So I
will mention this behavior change in commit message and comments in the
next version.
> Furthermore, you have to explain why it's safe not only in the commit
> message, but also comments.
>
> This not a small change, and very low-level. Just doing black magic is
> not acceptable.
>
> And you need something better than your commit message as comments.
>
> In short, you should:
>
> - Not change the callers
>
> - Better comments
Thanks. Very helpful for me. And wish you have a good weekend :)
>
>> - if (path->slots[0] < btrfs_header_nritems(path->nodes[0])) {
>> - btrfs_item_key(path->nodes[0], &found_key, path->slots[0]);
>> - ret = btrfs_comp_keys(&found_key, &orig_key);
>> - if (ret == 0) {
>> - if (path->slots[0] > 0) {
>> - path->slots[0]--;
>> - return 0;
>> - }
>> - /*
>> - * At slot 0, same key as before, it means orig_key is
>> - * the lowest, leftmost, key in the tree. We're done.
>> - */
>> - return 1;
>> - }
>> - }
>> + /* There's no smaller keys in the whole tree. */
>> + if (path->slots[0] == 0)
>> + return 1;
>> - btrfs_item_key(path->nodes[0], &found_key, 0);
>> - ret = btrfs_comp_keys(&found_key, &key);
>> - /*
>> - * We might have had an item with the previous key in the tree right
>> - * before we released our path. And after we released our path, that
>> - * item might have been pushed to the first slot (0) of the leaf we
>> - * were holding due to a tree balance. Alternatively, an item
>> with the
>> - * previous key can exist as the only element of a leaf (big fat
>> item).
>> - * Therefore account for these 2 cases, so that our callers (like
>> - * btrfs_previous_item) don't miss an existing item with a key
>> matching
>> - * the previous key we computed above.
>> - */
>> - if (ret <= 0)
>> - return 0;
>> - return 1;
>> + path->slots[0]--;
>> + return 0;
>> }
>> /*
>> @@ -2473,19 +2434,11 @@ int btrfs_search_slot_for_read(struct
>> btrfs_root *root,
>> if (p->slots[0] >= btrfs_header_nritems(p->nodes[0]))
>> return btrfs_next_leaf(root, p);
>> } else {
>> - if (p->slots[0] == 0) {
>> - ret = btrfs_prev_leaf(root, p);
>> - if (ret < 0)
>> - return ret;
>> - if (!ret) {
>> - if (p->slots[0] == btrfs_header_nritems(p->nodes[0]))
>> - p->slots[0]--;
>> - return 0;
>> - }
>> + /* We have no lower key in the tree. */
>> + if (p->slots[0] == 0)
>> return 1;
>> - } else {
>> - p->slots[0]--;
>> - }
>> +
>> + p->slots[0]--;
>> }
>> return 0;
>> }
>> @@ -4969,26 +4922,19 @@ int btrfs_previous_item(struct btrfs_root *root,
>> int type)
>> {
>> struct btrfs_key found_key;
>> - struct extent_buffer *leaf;
>> - u32 nritems;
>> int ret;
>> while (1) {
>> if (path->slots[0] == 0) {
>> ret = btrfs_prev_leaf(root, path);
>> - if (ret != 0)
>> + if (ret)
>> return ret;
>> - } else {
>> - path->slots[0]--;
>> - }
>> - leaf = path->nodes[0];
>> - nritems = btrfs_header_nritems(leaf);
>> - if (nritems == 0)
>> - return 1;
>> - if (path->slots[0] == nritems)
>> + } else
>> path->slots[0]--;
>> - btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> + ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0]));
>> +
>> + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path-
>> >slots[0]);
>> if (found_key.objectid < min_objectid)
>> break;
>> if (found_key.type == type)
>> @@ -5010,26 +4956,19 @@ int btrfs_previous_extent_item(struct
>> btrfs_root *root,
>> struct btrfs_path *path, u64 min_objectid)
>> {
>> struct btrfs_key found_key;
>> - struct extent_buffer *leaf;
>> - u32 nritems;
>> int ret;
>> while (1) {
>> if (path->slots[0] == 0) {
>> ret = btrfs_prev_leaf(root, path);
>> - if (ret != 0)
>> + if (ret)
>> return ret;
>> - } else {
>> - path->slots[0]--;
>> - }
>> - leaf = path->nodes[0];
>> - nritems = btrfs_header_nritems(leaf);
>> - if (nritems == 0)
>> - return 1;
>> - if (path->slots[0] == nritems)
>> + } else
>> path->slots[0]--;
>> - btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> + ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0]));
>> +
>> + btrfs_item_key_to_cpu(path->nodes[0], &found_key, path-
>> >slots[0]);
>> if (found_key.objectid < min_objectid)
>> break;
>> if (found_key.type == BTRFS_EXTENT_ITEM_KEY ||
>
next prev parent reply other threads:[~2026-02-08 2:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-11 7:22 [PATCH v2 0/4] btrfs: some cleanups for two ctree functions Sun YangKai
2025-12-11 7:22 ` [PATCH v2 1/4] btrfs: don't set @return_any for btrfs_search_slot_for_read in btrfs_read_qgroup_config Sun YangKai
2025-12-11 7:22 ` [PATCH v2 2/4] btrfs: don't set return_any @return_any for btrfs_search_slot_for_read in get_last_extent() Sun YangKai
2025-12-11 7:22 ` [PATCH v2 3/4] btrfs: cleanup btrfs_search_slot_for_read() Sun YangKai
2025-12-11 7:22 ` [PATCH v2 4/4] btrfs: ctree: cleanup btrfs_prev_leaf() Sun YangKai
2026-02-07 23:09 ` Qu Wenruo
2026-02-08 2:42 ` Sun YangKai [this message]
2026-02-08 3:13 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed9925c6-3664-4158-87ef-48e7316a7128@gmail.com \
--to=sunk67188@gmail.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox