From: "Miquel Sabaté Solà" <mssola@mssola.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs@vger.kernel.org, clm@fb.com, dsterba@suse.com,
fdmanana@suse.com, wqu@suse.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fs: btrfs: prevent a double kfree on delayed-ref
Date: Tue, 30 Sep 2025 20:39:28 +0200 [thread overview]
Message-ID: <87bjmraiov.fsf@> (raw)
In-Reply-To: <CAL3q7H66AOc_hbXX_PN-DGP5fT36NnxE7p4j2LqjPXyRaOu=iA@mail.gmail.com> (Filipe Manana's message of "Tue, 30 Sep 2025 17:07:47 +0100")
[-- Attachment #1: Type: text/plain, Size: 7140 bytes --]
Filipe Manana @ 2025-09-30 17:07 +01:
> On Tue, Sep 30, 2025 at 2:05 PM Miquel Sabaté Solà <mssola@mssola.com> wrote:
>>
>> In the previous code it was possible to incur into a double kfree()
>> scenario when calling 'add_delayed_ref_head'. This could happen if the
>> record was reported to already exist in the
>> 'btrfs_qgroup_trace_extent_nolock' call, but then there was an error
>> later on 'add_delayed_ref_head'. In this case, since
>> 'add_delayed_ref_head' returned an error, the caller went to free the
>> record. Since 'add_delayed_ref_head' couldn't set this kfree'd pointer
>> to NULL, then kfree() would have acted on a non-NULL 'record' object
>> which was pointing to memory already freed by the callee.
>>
>> The problem comes from the fact that the responsibility to kfree the
>> object is on both the caller and the callee at the same time. Hence, the
>> fix for this is to shift the ownership of the 'qrecord' object out of
>> the 'add_delayed_ref_head'. That is, we will never attempt to kfree()
>> the given object inside of this function, and will expect the caller to
>> act on the 'qrecord' object on its own. The only exception where the
>> 'qrecord' object cannot be kfree'd is if it was inserted into the
>> tracing logic, for which we already have the 'qrecord_inserted_ret'
>> boolean to account for this. Hence, the caller has to kfree the object
>> only if 'add_delayed_ref_head' reports not to have inserted it on the
>> tracing logic.
>>
>> As a side-effect of the above, we must guarantee that
>> 'qrecord_inserted_ret' is properly initialized at the start of the
>> function, not at the end, and then set when an actual insert
>> happens. This way we avoid 'qrecord_inserted_ret' having an invalid
>> value on an early exit.
>>
>> The documentation from the 'add_delayed_ref_head' has also been updated
>> to reflect on the exact ownership of the 'qrecord' object.
>>
>> Fixes: 6ef8fbce0104 ("btrfs: fix missing error handling when adding delayed ref with qgroups enabled")
>> Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
>> ---
>> fs/btrfs/delayed-ref.c | 39 +++++++++++++++++++++++++++++++--------
>> 1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>> index 481802efaa14..bc61e0eacc69 100644
>> --- a/fs/btrfs/delayed-ref.c
>> +++ b/fs/btrfs/delayed-ref.c
>> @@ -798,10 +798,14 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
>> }
>>
>> /*
>> - * helper function to actually insert a head node into the rbtree.
>> + * Helper function to actually insert a head node into the rbtree.
>
> Since you are updating this line just to capitalize the first word,
> you might as well replace rbtree with xarray as we don't use rbtree
> anymore.
>
>> * this does all the dirty work in terms of maintaining the correct
>> * overall modification count.
>> *
>> + * The caller is responsible for calling kfree() on @qrecord. More specifically,
>> + * if this function reports that it did not insert it as noted in
>> + * @qrecord_inserted_ret, then it's safe to call kfree() on it.
>> + *
>> * Returns an error pointer in case of an error.
>> */
>> static noinline struct btrfs_delayed_ref_head *
>> @@ -814,7 +818,14 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>> struct btrfs_delayed_ref_head *existing;
>> struct btrfs_delayed_ref_root *delayed_refs;
>> const unsigned long index = (head_ref->bytenr >> fs_info->sectorsize_bits);
>> - bool qrecord_inserted = false;
>> +
>> + /*
>> + * If 'qrecord_inserted_ret' is provided, then the first thing we need
>> + * to do is to initialize it to false just in case we have an exit
>> + * before trying to insert the record.
>> + */
>> + if (qrecord_inserted_ret)
>> + *qrecord_inserted_ret = false;
>>
>> delayed_refs = &trans->transaction->delayed_refs;
>> lockdep_assert_held(&delayed_refs->lock);
>> @@ -833,6 +844,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>>
>> /* Record qgroup extent info if provided */
>> if (qrecord) {
>> + /*
>> + * Setting 'qrecord' but not 'qrecord_inserted_ret' will likely
>> + * result in a memory leakage.
>> + */
>> + WARN_ON(!qrecord_inserted_ret);
>
> For this sort of mandatory stuff, we use assertions, not warnings:
>
> ASSERT(qrecord_insert_ret != NULL);
>
>
>> +
>> int ret;
>>
>> ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, qrecord,
>> @@ -840,12 +857,10 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>> if (ret) {
>> /* Clean up if insertion fails or item exists. */
>> xa_release(&delayed_refs->dirty_extents, index);
>> - /* Caller responsible for freeing qrecord on error. */
>> if (ret < 0)
>> return ERR_PTR(ret);
>> - kfree(qrecord);
>> - } else {
>> - qrecord_inserted = true;
>> + } else if (qrecord_inserted_ret) {
>> + *qrecord_inserted_ret = true;
>> }
>> }
>>
>> @@ -888,8 +903,6 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>> delayed_refs->num_heads++;
>> delayed_refs->num_heads_ready++;
>> }
>> - if (qrecord_inserted_ret)
>> - *qrecord_inserted_ret = qrecord_inserted;
>>
>> return head_ref;
>> }
>> @@ -1049,6 +1062,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>> xa_release(&delayed_refs->head_refs, index);
>> spin_unlock(&delayed_refs->lock);
>> ret = PTR_ERR(new_head_ref);
>> +
>> + /*
>> + * It's only safe to call kfree() on 'qrecord' if
>> + * 'add_delayed_ref_head' has _not_ inserted it for
>
> The notation we use for function names is function_name(), not 'function_name'.
I've included all your comments into a v2 patch set. I'll wait a bit
before sending it in case I get more comments.
>
> Otherwise it looks good, thanks.
>
Thanks to you for the review!
>> + * tracing. Otherwise we need to handle this here.
>> + */
>> + if (!qrecord_reserved || qrecord_inserted)
>> + goto free_head_ref;
>> goto free_record;
>> }
>> head_ref = new_head_ref;
>> @@ -1071,6 +1092,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>>
>> if (qrecord_inserted)
>> return btrfs_qgroup_trace_extent_post(trans, record, generic_ref->bytenr);
>> +
>> + kfree(record);
>> return 0;
>>
>> free_record:
>> --
>> 2.51.0
>>
>>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 897 bytes --]
prev parent reply other threads:[~2025-09-30 18:39 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-30 13:04 [PATCH] fs: btrfs: prevent a double kfree on delayed-ref Miquel Sabaté Solà
2025-09-30 16:07 ` Filipe Manana
2025-09-30 16:12 ` Filipe Manana
2025-09-30 18:39 ` Miquel Sabaté Solà [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bjmraiov.fsf@ \
--to=mssola@mssola.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=fdmanana@kernel.org \
--cc=fdmanana@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).