public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: Boris Burkov <boris@bur.io>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/5] btrfs: fix qgroup reservation leak on failure to allocate ordered extent
Date: Fri, 9 May 2025 07:27:08 +0930	[thread overview]
Message-ID: <5578dbe2-33b2-4cd8-a008-05e6f340ac59@suse.com> (raw)
In-Reply-To: <CAL3q7H4YWzsCTsSC=oup6_typt_PdNVAdAuK1RVhuH1nEto-eQ@mail.gmail.com>



在 2025/5/8 23:13, Filipe Manana 写道:
> On Thu, May 8, 2025 at 12:12 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> 在 2025/5/8 08:03, Boris Burkov 写道:
>>> On Wed, May 07, 2025 at 06:23:13PM +0100, fdmanana@kernel.org wrote:
>>>> From: Filipe Manana <fdmanana@suse.com>
>>>>
>>>> If we fail to allocate an ordered extent for a COW write we end up leaking
>>>> a qgroup data reservation since we called btrfs_qgroup_release_data() but
>>>> we didn't call btrfs_qgroup_free_refroot() (which would happen when
>>>> running the respective data delayed ref created by ordered extent
>>>> completion or when finishing the ordered extent in case an error happened).
>>>>
>>>> So make sure we call btrfs_qgroup_free_refroot() if we fail to allocate an
>>>> ordered extent for a COW write.
>>>
>>> I haven't tried it myself yet, but I believe that this patch will double
>>> free reservation from the qgroup when this case occurs.
>>>
>>> Can you share the context where you saw this bug? Have you run fstests
>>> with qgroups or squotas enabled? I think this should show pretty quickly
>>> in generic/475 with qgroups on.
>>>
>>> Consider, for example, the following execution of the dio case:
>>>
>>> btrfs_dio_iomap_begin
>>>     btrfs_check_data_free_space // reserves the data into `reserved`, sets dio_data->data_space_reserved
>>>     btrfs_get_blocks_direct_write
>>>       btrfs_create_dio_extent
>>>         btrfs_alloc_ordered_extent
>>>           alloc_ordered_extent // fails and frees refroot, reserved is "wrong" now.
>>>         // error propagates up
>>>       // error propagates up via PTR_ERR
>>>
>>> which brings us to the code:
>>> if (ret < 0)
>>>           goto unlock_err;
>>> ...
>>> unlock_err:
>>> ...
>>> if (dio_data->data_space_reserved) {
>>>           btrfs_free_reserved_data_space()
>>> }
>>>
>>> so the execution continues...
>>>
>>> btrfs_free_reserved_data_space
>>>     btrfs_qgroup_free_data
>>>       __btrfs_qgroup_release_data
>>>         qgroup_free_reserved_data
>>>           btrfs_qgroup_free_refroot
>>>
>>> This will result in a underflow of the reservation once everything
>>> outstanding gets released.
>>
>> That should still be safe.
>>
>> Firstly at alloc_ordered_extent() we released the qgroup space already,
>> thus there will be no EXTENT_QGROUP_RESERVED range in extent-io tree
>> anymore.
>>
>> Then at the final cleanup, qgroup_free_reserved_data_space() will
>> release/free nothing, because the extent-io tree no longer has the
>> corresponding range with EXTENT_QGROUP_RESERVED.
>>
>> This is the core design of qgroup data reserved space, which allows us
>> to call btrfs_release/free_data() twice without double accounting.
>>
>>>
>>> Furthermore, raw calls to free_refroot in cases where we have a reserved
>>> changeset make me worried, because they will run afoul of races with
>>> multiple threads touching the various bits. I don't see the bugs here,
>>> but the reservation lifetime is really tricky so I wouldn't be surprised
>>> if something like that was wrong too.
>>
>> This free_refroot() is the common practice here. The extent-io tree
>> based accounting can only cover the reserved space before ordered extent
>> is allocated.
>>
>> Then the reserved space is "transferred" to ordered extent, and
>> eventually go to the new data extent, and finally freed at
>> btrfs_qgroup_account_extents(), which also goes the free_refroot() way.
>>
>>>
>>> As of the last time I looked at this, I think cow_file_range handles
>>> this correctly as well. Looking really quickly now, it looks like maybe
>>> submit_one_async_extent() might not do a qgroup free, but I'm not sure
>>> where the corresponding reservation is coming from.
>>>
>>> I think if you have indeed found a different codepath that makes a data
>>> reservation but doesn't release the qgroup part when allocating the
>>> ordered extent fails, then the fastest path to a fix is to do that at
>>> the same level as where it calls btrfs_check_data_free_space or (however
>>> it gets the reservation), as is currently done by the main
>>> ordered_extent allocation paths. With async_extent, we might need to
>>> plumb through the reserved extent changeset through the async chunk to
>>> do it perfectly...
>>
>> I agree with you that, the extent io tree based double freeing
>> prevention should only be the last safe net, not something we should
>> abuse when possible.
>>
>> But I can't find a better solution, mostly due to the fact that after
>> the btrfs_qgroup_release_data() call, the qgroup reserved space is
>> already released, and we have no way to undo that...
>>
>> Maybe we can delayed the qgroup release/free calls until the ordered
>> extent @entry is allocated?
> 
> At some point I considered allocating first the ordered extent and
> then doing the qgroup free/release calls, and that would fix the leak
> too.
> At the moment it seemed more clear to me the way I did, but if
> everyone prefers that other way I'm fine with it and will change it.

I'm fine either way for the fix.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> 
> Thanks.

  parent reply	other threads:[~2025-05-08 21:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-07 17:23 [PATCH 0/5] btrfs: fixes and cleanups for ordered extent allocation fdmanana
2025-05-07 17:23 ` [PATCH 1/5] btrfs: fix qgroup reservation leak on failure to allocate ordered extent fdmanana
2025-05-07 22:33   ` Boris Burkov
2025-05-07 23:11     ` Qu Wenruo
2025-05-07 23:39       ` Boris Burkov
2025-05-08  0:08         ` Boris Burkov
2025-05-08 13:43       ` Filipe Manana
2025-05-08 16:07         ` Boris Burkov
2025-05-08 21:57         ` Qu Wenruo [this message]
2025-05-08 13:40     ` Filipe Manana
2025-05-08 16:19       ` Boris Burkov
2025-05-07 17:23 ` [PATCH 2/5] btrfs: check we grabbed inode reference when allocating an " fdmanana
2025-05-07 17:23 ` [PATCH 3/5] btrfs: fold error checks when allocating ordered extent and update comments fdmanana
2025-05-07 17:23 ` [PATCH 4/5] btrfs: use boolean for delalloc argument to btrfs_free_reserved_bytes() fdmanana
2025-05-07 17:23 ` [PATCH 5/5] btrfs: use boolean for delalloc argument to btrfs_free_reserved_extent() fdmanana
2025-05-08 16:20 ` [PATCH 0/5] btrfs: fixes and cleanups for ordered extent allocation Boris Burkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5578dbe2-33b2-4cd8-a008-05e6f340ac59@suse.com \
    --to=wqu@suse.com \
    --cc=boris@bur.io \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox