From: Boris Burkov <boris@bur.io>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: adjust overcommit logic when very close to full
Date: Mon, 18 Sep 2023 13:14:41 -0700 [thread overview]
Message-ID: <20230918201441.GA299788@zen> (raw)
In-Reply-To: <b97e47ce0ce1d41d221878de7d6090b90aa7a597.1695065233.git.josef@toxicpanda.com>
On Mon, Sep 18, 2023 at 03:27:47PM -0400, Josef Bacik wrote:
> A user reported some unpleasant behavior with very small file systems.
> The reproducer is this
>
> mkfs.btrfs -f -m single -b 8g /dev/vdb
> mount /dev/vdb /mnt/test
> dd if=/dev/zero of=/mnt/test/testfile bs=512M count=20
>
> This will result in usage that looks like this
>
> Overall:
> Device size: 8.00GiB
> Device allocated: 8.00GiB
> Device unallocated: 1.00MiB
> Device missing: 0.00B
> Device slack: 2.00GiB
> Used: 5.47GiB
> Free (estimated): 2.52GiB (min: 2.52GiB)
> Free (statfs, df): 0.00B
> Data ratio: 1.00
> Metadata ratio: 1.00
> Global reserve: 5.50MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,single: Size:7.99GiB, Used:5.46GiB (68.41%)
> /dev/vdb 7.99GiB
>
> Metadata,single: Size:8.00MiB, Used:5.77MiB (72.07%)
> /dev/vdb 8.00MiB
>
> System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
> /dev/vdb 4.00MiB
>
> Unallocated:
> /dev/vdb 1.00MiB
>
> As you can see we've gotten ourselves quite full with metadata, with all
> of the disk being allocated for data.
>
> On smaller file systems there's not a lot of time before we get full, so
> our overcommit behavior bites us here. Generally speaking data
> reservations result in chunk allocations as we assume reservation ==
> actual use for data. This means at any point we could end up with a
> chunk allocation for data, and if we're very close to full we could do
> this before we have a chance to figure out that we need another metadata
> chunk.
>
> Address this by adjusting the overcommit logic. Simply put we need to
> take away 1 chunk from the available chunk space in case of a data
> reservation. This will allow us to stop overcommitting before we
> potentially lose this space to a data allocation. With this fix in
> place we properly allocate a metadata chunk before we're completely
> full, allowing for enough slack space in metadata.
LGTM, this should help and I've been kicking around the same idea in my
head for a while.
I do think this is kind of a band-aid, though. It isn't hard to imagine
that you allocate data chunks up to the 1G, then allocate a metadata
chunk, then fragment/under-utilize the data to the point that you
actually fill up the metadata and get right back to this same point.
Long term, I think we still need more/smarter reclaim, but this should
be a good steam valve for the simple cases where we deterministically
gobble up all the unallocated space for data.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/space-info.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index d7e8cd4f140c..7aa53058d893 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -365,6 +365,23 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info,
> factor = btrfs_bg_type_to_factor(profile);
> avail = div_u64(avail, factor);
>
> + /*
> + * Since data allocations immediately use block groups as part of the
> + * reservation, because we assume that data reservations will == actual
> + * usage, we could potentially overcommit and then immediately have that
> + * available space used by a data allocation, which could put us in a
> + * bind when we get close to filling the file system.
> + *
> + * To handle this simply remove 1G (which is our current maximum chunk
> + * allocation size) from the available space. If we are relatively
> + * empty this won't affect our ability to overcommit much, and if we're
> + * very close to full it'll keep us from getting into a position where
> + * we've given ourselves very little metadata wiggle room.
> + */
> + if (avail < SZ_1G)
> + return 0;
> + avail -= SZ_1G;
> +
> /*
> * If we aren't flushing all things, let us overcommit up to
> * 1/2th of the space. If we can flush, don't let us overcommit
> --
> 2.41.0
>
next prev parent reply other threads:[~2023-09-18 20:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-18 19:27 [PATCH] btrfs: adjust overcommit logic when very close to full Josef Bacik
2023-09-18 20:14 ` Boris Burkov [this message]
2023-09-20 13:59 ` Josef Bacik
2023-09-20 19:02 ` David Sterba
2023-09-18 21:29 ` David Sterba
2023-09-20 14:01 ` Josef Bacik
2023-09-20 14:04 ` David Sterba
2023-09-20 19:05 ` David Sterba
2023-09-21 22:50 ` Filipe Manana
2023-09-22 10:25 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230918201441.GA299788@zen \
--to=boris@bur.io \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).