From: Boris Burkov <boris@bur.io>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: adjust overcommit logic when very close to full
Date: Mon, 18 Sep 2023 13:14:41 -0700 [thread overview]
Message-ID: <20230918201441.GA299788@zen> (raw)
In-Reply-To: <b97e47ce0ce1d41d221878de7d6090b90aa7a597.1695065233.git.josef@toxicpanda.com>
On Mon, Sep 18, 2023 at 03:27:47PM -0400, Josef Bacik wrote:
> A user reported some unpleasant behavior with very small file systems.
> The reproducer is this
>
> mkfs.btrfs -f -m single -b 8g /dev/vdb
> mount /dev/vdb /mnt/test
> dd if=/dev/zero of=/mnt/test/testfile bs=512M count=20
>
> This will result in usage that looks like this
>
> Overall:
> Device size: 8.00GiB
> Device allocated: 8.00GiB
> Device unallocated: 1.00MiB
> Device missing: 0.00B
> Device slack: 2.00GiB
> Used: 5.47GiB
> Free (estimated): 2.52GiB (min: 2.52GiB)
> Free (statfs, df): 0.00B
> Data ratio: 1.00
> Metadata ratio: 1.00
> Global reserve: 5.50MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,single: Size:7.99GiB, Used:5.46GiB (68.41%)
> /dev/vdb 7.99GiB
>
> Metadata,single: Size:8.00MiB, Used:5.77MiB (72.07%)
> /dev/vdb 8.00MiB
>
> System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
> /dev/vdb 4.00MiB
>
> Unallocated:
> /dev/vdb 1.00MiB
>
> As you can see we've gotten ourselves quite full with metadata, with all
> of the disk being allocated for data.
>
> On smaller file systems there's not a lot of time before we get full, so
> our overcommit behavior bites us here. Generally speaking data
> reservations result in chunk allocations as we assume reservation ==
> actual use for data. This means at any point we could end up with a
> chunk allocation for data, and if we're very close to full we could do
> this before we have a chance to figure out that we need another metadata
> chunk.
>
> Address this by adjusting the overcommit logic. Simply put we need to
> take away 1 chunk from the available chunk space in case of a data
> reservation. This will allow us to stop overcommitting before we
> potentially lose this space to a data allocation. With this fix in
> place we properly allocate a metadata chunk before we're completely
> full, allowing for enough slack space in metadata.
LGTM, this should help and I've been kicking around the same idea in my
head for a while.
I do think this is kind of a band-aid, though. It isn't hard to imagine
that you allocate data chunks up to the 1G, then allocate a metadata
chunk, then fragment/under-utilize the data to the point that you
actually fill up the metadata and get right back to this same point.
Long term, I think we still need more/smarter reclaim, but this should
be a good steam valve for the simple cases where we deterministically
gobble up all the unallocated space for data.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/space-info.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index d7e8cd4f140c..7aa53058d893 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -365,6 +365,23 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info,
> factor = btrfs_bg_type_to_factor(profile);
> avail = div_u64(avail, factor);
>
> + /*
> + * Since data allocations immediately use block groups as part of the
> + * reservation, because we assume that data reservations will == actual
> + * usage, we could potentially overcommit and then immediately have that
> + * available space used by a data allocation, which could put us in a
> + * bind when we get close to filling the file system.
> + *
> + * To handle this simply remove 1G (which is our current maximum chunk
> + * allocation size) from the available space. If we are relatively
> + * empty this won't affect our ability to overcommit much, and if we're
> + * very close to full it'll keep us from getting into a position where
> + * we've given ourselves very little metadata wiggle room.
> + */
> + if (avail < SZ_1G)
> + return 0;
> + avail -= SZ_1G;
> +
> /*
> * If we aren't flushing all things, let us overcommit up to
> * 1/2th of the space. If we can flush, don't let us overcommit
> --
> 2.41.0
>
next prev parent reply other threads:[~2023-09-18 20:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-18 19:27 [PATCH] btrfs: adjust overcommit logic when very close to full Josef Bacik
2023-09-18 20:14 ` Boris Burkov [this message]
2023-09-20 13:59 ` Josef Bacik
2023-09-20 19:02 ` David Sterba
2023-09-18 21:29 ` David Sterba
2023-09-20 14:01 ` Josef Bacik
2023-09-20 14:04 ` David Sterba
2023-09-20 19:05 ` David Sterba
2023-09-21 22:50 ` Filipe Manana
2023-09-22 10:25 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230918201441.GA299788@zen \
--to=boris@bur.io \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.