Filipe Manana @ 2025-09-30 17:07 +01: > On Tue, Sep 30, 2025 at 2:05 PM Miquel Sabaté Solà wrote: >> >> In the previous code it was possible to incur into a double kfree() >> scenario when calling 'add_delayed_ref_head'. This could happen if the >> record was reported to already exist in the >> 'btrfs_qgroup_trace_extent_nolock' call, but then there was an error >> later on 'add_delayed_ref_head'. In this case, since >> 'add_delayed_ref_head' returned an error, the caller went to free the >> record. Since 'add_delayed_ref_head' couldn't set this kfree'd pointer >> to NULL, then kfree() would have acted on a non-NULL 'record' object >> which was pointing to memory already freed by the callee. >> >> The problem comes from the fact that the responsibility to kfree the >> object is on both the caller and the callee at the same time. Hence, the >> fix for this is to shift the ownership of the 'qrecord' object out of >> the 'add_delayed_ref_head'. That is, we will never attempt to kfree() >> the given object inside of this function, and will expect the caller to >> act on the 'qrecord' object on its own. The only exception where the >> 'qrecord' object cannot be kfree'd is if it was inserted into the >> tracing logic, for which we already have the 'qrecord_inserted_ret' >> boolean to account for this. Hence, the caller has to kfree the object >> only if 'add_delayed_ref_head' reports not to have inserted it on the >> tracing logic. >> >> As a side-effect of the above, we must guarantee that >> 'qrecord_inserted_ret' is properly initialized at the start of the >> function, not at the end, and then set when an actual insert >> happens. This way we avoid 'qrecord_inserted_ret' having an invalid >> value on an early exit. >> >> The documentation from the 'add_delayed_ref_head' has also been updated >> to reflect on the exact ownership of the 'qrecord' object. >> >> Fixes: 6ef8fbce0104 ("btrfs: fix missing error handling when adding delayed ref with qgroups enabled") >> Signed-off-by: Miquel Sabaté Solà >> --- >> fs/btrfs/delayed-ref.c | 39 +++++++++++++++++++++++++++++++-------- >> 1 file changed, 31 insertions(+), 8 deletions(-) >> >> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c >> index 481802efaa14..bc61e0eacc69 100644 >> --- a/fs/btrfs/delayed-ref.c >> +++ b/fs/btrfs/delayed-ref.c >> @@ -798,10 +798,14 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, >> } >> >> /* >> - * helper function to actually insert a head node into the rbtree. >> + * Helper function to actually insert a head node into the rbtree. > > Since you are updating this line just to capitalize the first word, > you might as well replace rbtree with xarray as we don't use rbtree > anymore. > >> * this does all the dirty work in terms of maintaining the correct >> * overall modification count. >> * >> + * The caller is responsible for calling kfree() on @qrecord. More specifically, >> + * if this function reports that it did not insert it as noted in >> + * @qrecord_inserted_ret, then it's safe to call kfree() on it. >> + * >> * Returns an error pointer in case of an error. >> */ >> static noinline struct btrfs_delayed_ref_head * >> @@ -814,7 +818,14 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, >> struct btrfs_delayed_ref_head *existing; >> struct btrfs_delayed_ref_root *delayed_refs; >> const unsigned long index = (head_ref->bytenr >> fs_info->sectorsize_bits); >> - bool qrecord_inserted = false; >> + >> + /* >> + * If 'qrecord_inserted_ret' is provided, then the first thing we need >> + * to do is to initialize it to false just in case we have an exit >> + * before trying to insert the record. >> + */ >> + if (qrecord_inserted_ret) >> + *qrecord_inserted_ret = false; >> >> delayed_refs = &trans->transaction->delayed_refs; >> lockdep_assert_held(&delayed_refs->lock); >> @@ -833,6 +844,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, >> >> /* Record qgroup extent info if provided */ >> if (qrecord) { >> + /* >> + * Setting 'qrecord' but not 'qrecord_inserted_ret' will likely >> + * result in a memory leakage. >> + */ >> + WARN_ON(!qrecord_inserted_ret); > > For this sort of mandatory stuff, we use assertions, not warnings: > > ASSERT(qrecord_insert_ret != NULL); > > >> + >> int ret; >> >> ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, qrecord, >> @@ -840,12 +857,10 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, >> if (ret) { >> /* Clean up if insertion fails or item exists. */ >> xa_release(&delayed_refs->dirty_extents, index); >> - /* Caller responsible for freeing qrecord on error. */ >> if (ret < 0) >> return ERR_PTR(ret); >> - kfree(qrecord); >> - } else { >> - qrecord_inserted = true; >> + } else if (qrecord_inserted_ret) { >> + *qrecord_inserted_ret = true; >> } >> } >> >> @@ -888,8 +903,6 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, >> delayed_refs->num_heads++; >> delayed_refs->num_heads_ready++; >> } >> - if (qrecord_inserted_ret) >> - *qrecord_inserted_ret = qrecord_inserted; >> >> return head_ref; >> } >> @@ -1049,6 +1062,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, >> xa_release(&delayed_refs->head_refs, index); >> spin_unlock(&delayed_refs->lock); >> ret = PTR_ERR(new_head_ref); >> + >> + /* >> + * It's only safe to call kfree() on 'qrecord' if >> + * 'add_delayed_ref_head' has _not_ inserted it for > > The notation we use for function names is function_name(), not 'function_name'. I've included all your comments into a v2 patch set. I'll wait a bit before sending it in case I get more comments. > > Otherwise it looks good, thanks. > Thanks to you for the review! >> + * tracing. Otherwise we need to handle this here. >> + */ >> + if (!qrecord_reserved || qrecord_inserted) >> + goto free_head_ref; >> goto free_record; >> } >> head_ref = new_head_ref; >> @@ -1071,6 +1092,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, >> >> if (qrecord_inserted) >> return btrfs_qgroup_trace_extent_post(trans, record, generic_ref->bytenr); >> + >> + kfree(record); >> return 0; >> >> free_record: >> -- >> 2.51.0 >> >>