Re: [PATCH 1/5] btrfs: fix qgroup reservation leak on failure to allocate ordered extent

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Qu Wenruo <wqu@suse.com>
To: Boris Burkov <boris@bur.io>, fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/5] btrfs: fix qgroup reservation leak on failure to allocate ordered extent
Date: Thu, 8 May 2025 08:41:56 +0930	[thread overview]
Message-ID: <ccacb32f-e345-49c7-9c6b-8bdc72c69a7f@suse.com> (raw)
In-Reply-To: <20250507223347.GB332956@zen.localdomain>



在 2025/5/8 08:03, Boris Burkov 写道:
> On Wed, May 07, 2025 at 06:23:13PM +0100, fdmanana@kernel.org wrote:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> If we fail to allocate an ordered extent for a COW write we end up leaking
>> a qgroup data reservation since we called btrfs_qgroup_release_data() but
>> we didn't call btrfs_qgroup_free_refroot() (which would happen when
>> running the respective data delayed ref created by ordered extent
>> completion or when finishing the ordered extent in case an error happened).
>>
>> So make sure we call btrfs_qgroup_free_refroot() if we fail to allocate an
>> ordered extent for a COW write.
> 
> I haven't tried it myself yet, but I believe that this patch will double
> free reservation from the qgroup when this case occurs.
> 
> Can you share the context where you saw this bug? Have you run fstests
> with qgroups or squotas enabled? I think this should show pretty quickly
> in generic/475 with qgroups on.
> 
> Consider, for example, the following execution of the dio case:
> 
> btrfs_dio_iomap_begin
>    btrfs_check_data_free_space // reserves the data into `reserved`, sets dio_data->data_space_reserved
>    btrfs_get_blocks_direct_write
>      btrfs_create_dio_extent
>        btrfs_alloc_ordered_extent
>          alloc_ordered_extent // fails and frees refroot, reserved is "wrong" now.
>        // error propagates up
>      // error propagates up via PTR_ERR
> 
> which brings us to the code:
> if (ret < 0)
>          goto unlock_err;
> ...
> unlock_err:
> ...
> if (dio_data->data_space_reserved) {
>          btrfs_free_reserved_data_space()
> }
> 
> so the execution continues...
> 
> btrfs_free_reserved_data_space
>    btrfs_qgroup_free_data
>      __btrfs_qgroup_release_data
>        qgroup_free_reserved_data
>          btrfs_qgroup_free_refroot
> 
> This will result in a underflow of the reservation once everything
> outstanding gets released.

That should still be safe.

Firstly at alloc_ordered_extent() we released the qgroup space already, 
thus there will be no EXTENT_QGROUP_RESERVED range in extent-io tree 
anymore.

Then at the final cleanup, qgroup_free_reserved_data_space() will 
release/free nothing, because the extent-io tree no longer has the 
corresponding range with EXTENT_QGROUP_RESERVED.

This is the core design of qgroup data reserved space, which allows us 
to call btrfs_release/free_data() twice without double accounting.

> 
> Furthermore, raw calls to free_refroot in cases where we have a reserved
> changeset make me worried, because they will run afoul of races with
> multiple threads touching the various bits. I don't see the bugs here,
> but the reservation lifetime is really tricky so I wouldn't be surprised
> if something like that was wrong too.

This free_refroot() is the common practice here. The extent-io tree 
based accounting can only cover the reserved space before ordered extent 
is allocated.

Then the reserved space is "transferred" to ordered extent, and 
eventually go to the new data extent, and finally freed at 
btrfs_qgroup_account_extents(), which also goes the free_refroot() way.

> 
> As of the last time I looked at this, I think cow_file_range handles
> this correctly as well. Looking really quickly now, it looks like maybe
> submit_one_async_extent() might not do a qgroup free, but I'm not sure
> where the corresponding reservation is coming from.
> 
> I think if you have indeed found a different codepath that makes a data
> reservation but doesn't release the qgroup part when allocating the
> ordered extent fails, then the fastest path to a fix is to do that at
> the same level as where it calls btrfs_check_data_free_space or (however
> it gets the reservation), as is currently done by the main
> ordered_extent allocation paths. With async_extent, we might need to
> plumb through the reserved extent changeset through the async chunk to
> do it perfectly...

I agree with you that, the extent io tree based double freeing 
prevention should only be the last safe net, not something we should 
abuse when possible.

But I can't find a better solution, mostly due to the fact that after 
the btrfs_qgroup_release_data() call, the qgroup reserved space is 
already released, and we have no way to undo that...

Maybe we can delayed the qgroup release/free calls until the ordered 
extent @entry is allocated?

Thanks,
Qu


> 
> Thanks,
> Boris
> 
>>
>> Fixes: 7dbeaad0af7d ("btrfs: change timing for qgroup reserved space for ordered extents to fix reserved space leak")
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> ---
>>   fs/btrfs/ordered-data.c | 12 +++++++++---
>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
>> index ae49f87b27e8..e44d3dd17caf 100644
>> --- a/fs/btrfs/ordered-data.c
>> +++ b/fs/btrfs/ordered-data.c
>> @@ -153,9 +153,10 @@ static struct btrfs_ordered_extent *alloc_ordered_extent(
>>   	struct btrfs_ordered_extent *entry;
>>   	int ret;
>>   	u64 qgroup_rsv = 0;
>> +	const bool is_nocow = (flags &
>> +	       ((1U << BTRFS_ORDERED_NOCOW) | (1U << BTRFS_ORDERED_PREALLOC)));
>>   
>> -	if (flags &
>> -	    ((1U << BTRFS_ORDERED_NOCOW) | (1U << BTRFS_ORDERED_PREALLOC))) {
>> +	if (is_nocow) {
>>   		/* For nocow write, we can release the qgroup rsv right now */
>>   		ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes, &qgroup_rsv);
>>   		if (ret < 0)
>> @@ -170,8 +171,13 @@ static struct btrfs_ordered_extent *alloc_ordered_extent(
>>   			return ERR_PTR(ret);
>>   	}
>>   	entry = kmem_cache_zalloc(btrfs_ordered_extent_cache, GFP_NOFS);
>> -	if (!entry)
>> +	if (!entry) {
>> +		if (!is_nocow)
>> +			btrfs_qgroup_free_refroot(inode->root->fs_info,
>> +						  btrfs_root_id(inode->root),
>> +						  qgroup_rsv, BTRFS_QGROUP_RSV_DATA);
>>   		return ERR_PTR(-ENOMEM);
>> +	}
>>   
>>   	entry->file_offset = file_offset;
>>   	entry->num_bytes = num_bytes;
>> -- 
>> 2.47.2
>>
>

next prev parent reply	other threads:[~2025-05-07 23:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-07 17:23 [PATCH 0/5] btrfs: fixes and cleanups for ordered extent allocation fdmanana
2025-05-07 17:23 ` [PATCH 1/5] btrfs: fix qgroup reservation leak on failure to allocate ordered extent fdmanana
2025-05-07 22:33   ` Boris Burkov
2025-05-07 23:11     ` Qu Wenruo [this message]
2025-05-07 23:39       ` Boris Burkov
2025-05-08  0:08         ` Boris Burkov
2025-05-08 13:43       ` Filipe Manana
2025-05-08 16:07         ` Boris Burkov
2025-05-08 21:57         ` Qu Wenruo
2025-05-08 13:40     ` Filipe Manana
2025-05-08 16:19       ` Boris Burkov
2025-05-07 17:23 ` [PATCH 2/5] btrfs: check we grabbed inode reference when allocating an " fdmanana
2025-05-07 17:23 ` [PATCH 3/5] btrfs: fold error checks when allocating ordered extent and update comments fdmanana
2025-05-07 17:23 ` [PATCH 4/5] btrfs: use boolean for delalloc argument to btrfs_free_reserved_bytes() fdmanana
2025-05-07 17:23 ` [PATCH 5/5] btrfs: use boolean for delalloc argument to btrfs_free_reserved_extent() fdmanana
2025-05-08 16:20 ` [PATCH 0/5] btrfs: fixes and cleanups for ordered extent allocation Boris Burkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ccacb32f-e345-49c7-9c6b-8bdc72c69a7f@suse.com \
    --to=wqu@suse.com \
    --cc=boris@bur.io \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox