Re: [PATCH v2 4/4] btrfs: ctree: cleanup btrfs_prev_leaf()

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Sun YangKai <sunk67188@gmail.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: fdmanana@kernel.org
Subject: Re: [PATCH v2 4/4] btrfs: ctree: cleanup btrfs_prev_leaf()
Date: Sun, 8 Feb 2026 10:42:17 +0800	[thread overview]
Message-ID: <ed9925c6-3664-4158-87ef-48e7316a7128@gmail.com> (raw)
In-Reply-To: <bf7c477d-ce4c-4d6e-9538-1baa33e39a02@suse.com>



On 2026/2/8 07:09, Qu Wenruo wrote:
> 
> 
> 在 2025/12/11 17:52, Sun YangKai 写道:
>> There's a common parttern in callers of btrfs_prev_leaf:
>> p->slots[0]-- if p->slots[0] points to a slot with invalid item(nritem).
>>
>> So just make btrfs_prev_leaf() ensure that path->slots[0] points to a
>> valid slot and cleanup its over complex logic.
>>
>> And then remove the related logic and cleanup the callers.
>>
>> This will make things much simpler.
>>
>> No functional changes.
>>
>> A. Details about changes in callers:
>>
>> ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0])) in callers
>> is enough to make sure that nritems != 0 and slots[0] points to a valid
>> btrfs_item.
> 
> I don't think the change is safe.
> 
> There are callers doing proper checks for empty trees, e.g. 
> btrfs_previous_extent_item() and btrfs_previous_item(), now you will 
> just ignore that for kernels without CONFIG_BTRFS_ASSERT, or crash the 
> kernel if CONFIG_BTRFS_ASSERT is selected.
> 
> If you're just cleaning up btrfs_prev_leaf(), then you should keep the 
> callers the same without changing their behaviors.

Thanks a lot for your code review and it makes a lot of sense. I'll not 
touch the callers in the next version.

> I think there may be some corner case races when tree deleting and some 
> other work are happening at the same time.
> 
>>
>> And getting a `nritems==0` when btrfs_prev_leaf() returns 0 is a logic
>> error because btrfs_pref_leaf() should always
>>
>> 1. either find a non-empty leaf
> 
> Nope, there is no such guarantee in the first place.
> btrfs_prev_leaf() will unlock the path, thus all kinds of modification 
> can happen between btrfs_release_path() and btrfs_search_slot().
> 
> This assumption is just wrong.

Sorry, I got a little confused here. When btrfs_search_slot() returns an 
empty leaf, path->slots[0] will be 0 and btrfs_prev_leaf() will return 
1. Therefore, if btrfs_prev_leaf() returns 0, the leaf we get should 
always be non-empty. Which part of this reasoning is incorrect? Or maybe 
I don't get what you want to say.

>> 2. or return 1
>>
>> So we can use ASSERT safely here.
>>
>> B. Details about cleanup of btrfs_prev_leaf().
>>
>> The previous implementation works like this:
>>
>> 0) Get a previous key by "dec by 1" of the original key. Let's call it
>>     search key. It's obvious that search key is less than original key
>>     and there's no key between them.
>>
>> 1) Call btrfs_search_slot() with search key.
>>
>> 2) If we got an error or an exact match, early return.
>>
>> 3) If p->slots[0] points to the original item, p->slots[0]-- to make sure
>>     that we will not return the same item again. This may happen because
>>     there might be some tree balancing happened so the original item 
>> is no
>>     longer at slot 0.
>>
>> 4) Check if the key of the item at slot 0 is (less than the original key
>>     / less than or equal to search key) to verify if we got a previous 
>> leaf.
>>
>> However, 3) and 4) are over complex. We only need to check if
>> p->slots[0] == 0 because:
>>
>> 3a) If p->slots[0] == 0, there's no key less than or equal to search key
>>      in the tree, which means the original key is lowest in the tree. In
>>      this case, there's no previous leaf, we should return 1.
>>
>> 3b) If p->slots[0] != 0, using p->slots[0]-- is enough to get a valid
>>      previous item neither missing anything nor return the original item
>>      again because:
>>
>>      i) p->slots[0] == nritems, which means all keys in the leaf are less
>>         than search key so the leaf is a previous leaf. We only need to
>>         p->slots[0]-- to get a valid previous item.
>>
>>      ii) p->slots[0] < nritems, p->slots[0] points to an item whose key
>>          is greater than search key(probably the original item if it was
>>          not deleted), and p->slots[0] - 1 points to an item whose key is
>>          less than search key. So use p->slots[0]-- to get the previous
>>          item and we will neigher miss anything nor return the original
>>          item again. This handles the case 3) in original implementation.
>>
>> Signed-off-by: Sun YangKai <sunk67188@gmail.com>
>> ---
>>   fs/btrfs/ctree.c | 99 ++++++++++--------------------------------------
>>   1 file changed, 19 insertions(+), 80 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 0a0157db0b0c..3026d956c7fb 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -2376,12 +2376,9 @@ int btrfs_search_old_slot(struct btrfs_root 
>> *root, const struct btrfs_key *key,
>>   static int btrfs_prev_leaf(struct btrfs_root *root, struct 
>> btrfs_path *path)
>>   {
>>       struct btrfs_key key;
>> -    struct btrfs_key orig_key;
>> -    struct btrfs_disk_key found_key;
>>       int ret;
>>       btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
>> -    orig_key = key;
>>       if (key.offset > 0) {
>>           key.offset--;
>> @@ -2401,48 +2398,12 @@ static int btrfs_prev_leaf(struct btrfs_root 
>> *root, struct btrfs_path *path)
>>       if (ret <= 0)
>>           return ret;
>> -    /*
>> -     * Previous key not found. Even if we were at slot 0 of the leaf 
>> we had
>> -     * before releasing the path and calling btrfs_search_slot(), we 
>> now may
>> -     * be in a slot pointing to the same original key - this can 
>> happen if
>> -     * after we released the path, one of more items were moved from a
>> -     * sibling leaf into the front of the leaf we had due to an 
>> insertion
>> -     * (see push_leaf_right()).
>> -     * If we hit this case and our slot is > 0 and just decrement the 
>> slot
>> -     * so that the caller does not process the same key again, which 
>> may or
>> -     * may not break the caller, depending on its logic.
>> -     */
> 
> Although the change is mostly fine, if you want to keep the existing 
> behavior, you should return 0 for empty tree to keep the old behavior.

I've not thought about the empty tree case before. But I think the 
previous behavior in empty tree case is not proper and I don't want to 
keep the old behavior in this case. All callers is doing
path->slots[0]-- if btrfs_prev_leaf() returns 0 and it will cause an 
unexpected underflow although they'll check if nritems is 0 later. So I 
will mention this behavior change in commit message and comments in the 
next version.

> Furthermore, you have to explain why it's safe not only in the commit 
> message, but also comments.
> 
> This not a small change, and very low-level. Just doing black magic is 
> not acceptable.
> 
> And you need something better than your commit message as comments.
> 
> In short, you should:
> 
> - Not change the callers
> 
> - Better comments

Thanks. Very helpful for me. And wish you have a good weekend :)

> 
>> -    if (path->slots[0] < btrfs_header_nritems(path->nodes[0])) {
>> -        btrfs_item_key(path->nodes[0], &found_key, path->slots[0]);
>> -        ret = btrfs_comp_keys(&found_key, &orig_key);
>> -        if (ret == 0) {
>> -            if (path->slots[0] > 0) {
>> -                path->slots[0]--;
>> -                return 0;
>> -            }
>> -            /*
>> -             * At slot 0, same key as before, it means orig_key is
>> -             * the lowest, leftmost, key in the tree. We're done.
>> -             */
>> -            return 1;
>> -        }
>> -    }
>> +    /* There's no smaller keys in the whole tree. */
>> +    if (path->slots[0] == 0)
>> +        return 1;
>> -    btrfs_item_key(path->nodes[0], &found_key, 0);
>> -    ret = btrfs_comp_keys(&found_key, &key);
>> -    /*
>> -     * We might have had an item with the previous key in the tree right
>> -     * before we released our path. And after we released our path, that
>> -     * item might have been pushed to the first slot (0) of the leaf we
>> -     * were holding due to a tree balance. Alternatively, an item 
>> with the
>> -     * previous key can exist as the only element of a leaf (big fat 
>> item).
>> -     * Therefore account for these 2 cases, so that our callers (like
>> -     * btrfs_previous_item) don't miss an existing item with a key 
>> matching
>> -     * the previous key we computed above.
>> -     */
>> -    if (ret <= 0)
>> -        return 0;
>> -    return 1;
>> +    path->slots[0]--;
>> +    return 0;
>>   }
>>   /*
>> @@ -2473,19 +2434,11 @@ int btrfs_search_slot_for_read(struct 
>> btrfs_root *root,
>>           if (p->slots[0] >= btrfs_header_nritems(p->nodes[0]))
>>               return btrfs_next_leaf(root, p);
>>       } else {
>> -        if (p->slots[0] == 0) {
>> -            ret = btrfs_prev_leaf(root, p);
>> -            if (ret < 0)
>> -                return ret;
>> -            if (!ret) {
>> -                if (p->slots[0] == btrfs_header_nritems(p->nodes[0]))
>> -                    p->slots[0]--;
>> -                return 0;
>> -            }
>> +        /* We have no lower key in the tree. */
>> +        if (p->slots[0] == 0)
>>               return 1;
>> -        } else {
>> -            p->slots[0]--;
>> -        }
>> +
>> +        p->slots[0]--;
>>       }
>>       return 0;
>>   }
>> @@ -4969,26 +4922,19 @@ int btrfs_previous_item(struct btrfs_root *root,
>>               int type)
>>   {
>>       struct btrfs_key found_key;
>> -    struct extent_buffer *leaf;
>> -    u32 nritems;
>>       int ret;
>>       while (1) {
>>           if (path->slots[0] == 0) {
>>               ret = btrfs_prev_leaf(root, path);
>> -            if (ret != 0)
>> +            if (ret)
>>                   return ret;
>> -        } else {
>> -            path->slots[0]--;
>> -        }
>> -        leaf = path->nodes[0];
>> -        nritems = btrfs_header_nritems(leaf);
>> -        if (nritems == 0)
>> -            return 1;
>> -        if (path->slots[0] == nritems)
>> +        } else
>>               path->slots[0]--;
>> -        btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +        ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0]));
>> +
>> +        btrfs_item_key_to_cpu(path->nodes[0], &found_key, path- 
>> >slots[0]);
>>           if (found_key.objectid < min_objectid)
>>               break;
>>           if (found_key.type == type)
>> @@ -5010,26 +4956,19 @@ int btrfs_previous_extent_item(struct 
>> btrfs_root *root,
>>               struct btrfs_path *path, u64 min_objectid)
>>   {
>>       struct btrfs_key found_key;
>> -    struct extent_buffer *leaf;
>> -    u32 nritems;
>>       int ret;
>>       while (1) {
>>           if (path->slots[0] == 0) {
>>               ret = btrfs_prev_leaf(root, path);
>> -            if (ret != 0)
>> +            if (ret)
>>                   return ret;
>> -        } else {
>> -            path->slots[0]--;
>> -        }
>> -        leaf = path->nodes[0];
>> -        nritems = btrfs_header_nritems(leaf);
>> -        if (nritems == 0)
>> -            return 1;
>> -        if (path->slots[0] == nritems)
>> +        } else
>>               path->slots[0]--;
>> -        btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +        ASSERT(path->slots[0] < btrfs_header_nritems(path->nodes[0]));
>> +
>> +        btrfs_item_key_to_cpu(path->nodes[0], &found_key, path- 
>> >slots[0]);
>>           if (found_key.objectid < min_objectid)
>>               break;
>>           if (found_key.type == BTRFS_EXTENT_ITEM_KEY ||
>

next prev parent reply	other threads:[~2026-02-08  2:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-11  7:22 [PATCH v2 0/4] btrfs: some cleanups for two ctree functions Sun YangKai
2025-12-11  7:22 ` [PATCH v2 1/4] btrfs: don't set @return_any for btrfs_search_slot_for_read in btrfs_read_qgroup_config Sun YangKai
2025-12-11  7:22 ` [PATCH v2 2/4] btrfs: don't set return_any @return_any for btrfs_search_slot_for_read in get_last_extent() Sun YangKai
2025-12-11  7:22 ` [PATCH v2 3/4] btrfs: cleanup btrfs_search_slot_for_read() Sun YangKai
2025-12-11  7:22 ` [PATCH v2 4/4] btrfs: ctree: cleanup btrfs_prev_leaf() Sun YangKai
2026-02-07 23:09   ` Qu Wenruo
2026-02-08  2:42     ` Sun YangKai [this message]
2026-02-08  3:13       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed9925c6-3664-4158-87ef-48e7316a7128@gmail.com \
    --to=sunk67188@gmail.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox