From: Ivan Shapovalov <intelfx@intelfx.name>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: don't refill whole delayed refs block reserve when starting transaction
Date: Fri, 02 Feb 2024 21:55:42 +0100 [thread overview]
Message-ID: <8dbde6d822e8af18d3249a0273e9bf12a951b6f2.camel@intelfx.name> (raw)
In-Reply-To: <eba624e8cef9a1e84c9e1ba0c8f32347aa487e63.1706892030.git.fdmanana@suse.com>
[-- Attachment #1: Type: text/plain, Size: 6304 bytes --]
On 2024-02-02 at 16:42 +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Since commit 28270e25c69a ("btrfs: always reserve space for delayed refs
> when starting transaction") we started not only to reserve metadata space
> for the delayed refs a caller of btrfs_start_transaction() might generate
> but also to try to fully refill the delayed refs block reserve, because
> there are several case where we generate delayed refs and haven't reserved
> space for them, relying on the global block reserve. Relying too much on
> the global block reserve is not always safe, and can result in hitting
> -ENOSPC during transaction commits or worst, in rare cases, being unable
> to mount a filesystem that needs to do orphan cleanup or anything that
> requires modifying the filesystem during mount, and has no more
> unallocated space and the metadata space is nearly full. This was
> explained in detail in that commit's change log.
>
> However the gap between the reserved amount and the size of the delayed
> refs block reserve can be huge, so attempting to reserve space for such
> a gap can result in allocating many metadata block groups that end up
> not being used. After a recent patch, with the subject:
>
> "btrfs: add new unused block groups to the list of unused block groups"
>
> We started to add new block groups that are unused to the list of unused
> block groups, to avoid having them around for a very long time in case
> they are never used, because a block group is only added to the list of
> unused block groups when we deallocate the last extent or when mounting
> the filesystem and the block group has 0 bytes used. This is not a problem
> introduced by the commit mentioned earlier, it always existed as our
> metadata space reservations are, most of the time, pessimistic and end up
> not using all the space they reserved, so we can occasionally end up with
> one or two unused metadata block groups for a long period. However after
> that commit mentioned earlier, we are just more pessimistic in the
> metadata space reservations when starting a transaction and therefore the
> issue is more likely to happen.
>
> This however is not always enough because we might create unused metadata
> block groups when reserving metadata space at a high rate if there's
> always a gap in the delayed refs block reserve and the cleaner kthread
> isn't triggered often enough or is busy with other work (running delayed
> iputs, cleaning deleted roots, etc), not to mention the block group's
> allocated space is only usable for a new block group after the transaction
> used to remove it is committed.
>
> A user reported that he's getting a lot of allocated metadata block groups
> but the usage percentage of metadata space was very low compared to the
> total allocated space, specially after running a series of block group
> relocations.
>
> So for now stop trying to refill the gap in the delayed refs block reserve
> and reserve space only for the delayed refs we are expected to generate
> when starting a transaction.
>
> CC: stable@vger.kernel.org # 6.7+
> Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
> Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
> Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy87+obZXXZpQ@mail.gmail.com/
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
Tested-by: Ivan Shapovalov <intelfx@intelfx.name>
Thanks!
--
Ivan Shapovalov / intelfx /
> fs/btrfs/transaction.c | 38 ++------------------------------------
> 1 file changed, 2 insertions(+), 36 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 70d7abd1f772..3575b2bf3042 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -562,56 +562,22 @@ static int btrfs_reserve_trans_metadata(struct btrfs_fs_info *fs_info,
> u64 num_bytes,
> u64 *delayed_refs_bytes)
> {
> - struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
> struct btrfs_space_info *si = fs_info->trans_block_rsv.space_info;
> - u64 extra_delayed_refs_bytes = 0;
> - u64 bytes;
> + u64 bytes = num_bytes + *delayed_refs_bytes;
> int ret;
>
> - /*
> - * If there's a gap between the size of the delayed refs reserve and
> - * its reserved space, than some tasks have added delayed refs or bumped
> - * its size otherwise (due to block group creation or removal, or block
> - * group item update). Also try to allocate that gap in order to prevent
> - * using (and possibly abusing) the global reserve when committing the
> - * transaction.
> - */
> - if (flush == BTRFS_RESERVE_FLUSH_ALL &&
> - !btrfs_block_rsv_full(delayed_refs_rsv)) {
> - spin_lock(&delayed_refs_rsv->lock);
> - if (delayed_refs_rsv->size > delayed_refs_rsv->reserved)
> - extra_delayed_refs_bytes = delayed_refs_rsv->size -
> - delayed_refs_rsv->reserved;
> - spin_unlock(&delayed_refs_rsv->lock);
> - }
> -
> - bytes = num_bytes + *delayed_refs_bytes + extra_delayed_refs_bytes;
> -
> /*
> * We want to reserve all the bytes we may need all at once, so we only
> * do 1 enospc flushing cycle per transaction start.
> */
> ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
> - if (ret == 0) {
> - if (extra_delayed_refs_bytes > 0)
> - btrfs_migrate_to_delayed_refs_rsv(fs_info,
> - extra_delayed_refs_bytes);
> - return 0;
> - }
> -
> - if (extra_delayed_refs_bytes > 0) {
> - bytes -= extra_delayed_refs_bytes;
> - ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
> - if (ret == 0)
> - return 0;
> - }
>
> /*
> * If we are an emergency flush, which can steal from the global block
> * reserve, then attempt to not reserve space for the delayed refs, as
> * we will consume space for them from the global block reserve.
> */
> - if (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
> + if (ret && flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
> bytes -= *delayed_refs_bytes;
> *delayed_refs_bytes = 0;
> ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-02-02 20:55 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 16:42 [PATCH] btrfs: don't refill whole delayed refs block reserve when starting transaction fdmanana
2024-02-02 20:55 ` Ivan Shapovalov [this message]
2024-02-13 15:21 ` Josef Bacik
2024-02-15 13:30 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8dbde6d822e8af18d3249a0273e9bf12a951b6f2.camel@intelfx.name \
--to=intelfx@intelfx.name \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox