All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: David Sterba <dsterba@suse.cz>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: adjust overcommit logic when very close to full
Date: Wed, 20 Sep 2023 10:01:47 -0400	[thread overview]
Message-ID: <20230920140147.GB3796940@perftesting> (raw)
In-Reply-To: <20230918212950.GP2747@twin.jikos.cz>

On Mon, Sep 18, 2023 at 11:29:50PM +0200, David Sterba wrote:
> On Mon, Sep 18, 2023 at 03:27:47PM -0400, Josef Bacik wrote:
> > A user reported some unpleasant behavior with very small file systems.
> > The reproducer is this
> > 
> > mkfs.btrfs -f -m single -b 8g /dev/vdb
> > mount /dev/vdb /mnt/test
> > dd if=/dev/zero of=/mnt/test/testfile bs=512M count=20
> > 
> > This will result in usage that looks like this
> > 
> > Overall:
> >     Device size:                   8.00GiB
> >     Device allocated:              8.00GiB
> >     Device unallocated:            1.00MiB
> >     Device missing:                  0.00B
> >     Device slack:                  2.00GiB
> >     Used:                          5.47GiB
> >     Free (estimated):              2.52GiB      (min: 2.52GiB)
> >     Free (statfs, df):               0.00B
> >     Data ratio:                       1.00
> >     Metadata ratio:                   1.00
> >     Global reserve:                5.50MiB      (used: 0.00B)
> >     Multiple profiles:                  no
> > 
> > Data,single: Size:7.99GiB, Used:5.46GiB (68.41%)
> >    /dev/vdb        7.99GiB
> > 
> > Metadata,single: Size:8.00MiB, Used:5.77MiB (72.07%)
> >    /dev/vdb        8.00MiB
> > 
> > System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
> >    /dev/vdb        4.00MiB
> > 
> > Unallocated:
> >    /dev/vdb        1.00MiB
> > 
> > As you can see we've gotten ourselves quite full with metadata, with all
> > of the disk being allocated for data.
> > 
> > On smaller file systems there's not a lot of time before we get full, so
> > our overcommit behavior bites us here.  Generally speaking data
> > reservations result in chunk allocations as we assume reservation ==
> > actual use for data.  This means at any point we could end up with a
> > chunk allocation for data, and if we're very close to full we could do
> > this before we have a chance to figure out that we need another metadata
> > chunk.
> > 
> > Address this by adjusting the overcommit logic.  Simply put we need to
> > take away 1 chunk from the available chunk space in case of a data
> > reservation.  This will allow us to stop overcommitting before we
> > potentially lose this space to a data allocation.  With this fix in
> > place we properly allocate a metadata chunk before we're completely
> > full, allowing for enough slack space in metadata.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/space-info.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> > index d7e8cd4f140c..7aa53058d893 100644
> > --- a/fs/btrfs/space-info.c
> > +++ b/fs/btrfs/space-info.c
> > @@ -365,6 +365,23 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info,
> >  	factor = btrfs_bg_type_to_factor(profile);
> >  	avail = div_u64(avail, factor);
> >  
> > +	/*
> > +	 * Since data allocations immediately use block groups as part of the
> > +	 * reservation, because we assume that data reservations will == actual
> > +	 * usage, we could potentially overcommit and then immediately have that
> > +	 * available space used by a data allocation, which could put us in a
> > +	 * bind when we get close to filling the file system.
> > +	 *
> > +	 * To handle this simply remove 1G (which is our current maximum chunk
> > +	 * allocation size) from the available space.  If we are relatively
> > +	 * empty this won't affect our ability to overcommit much, and if we're
> > +	 * very close to full it'll keep us from getting into a position where
> > +	 * we've given ourselves very little metadata wiggle room.
> > +	 */
> > +	if (avail < SZ_1G)
> > +		return 0;
> > +	avail -= SZ_1G;
> 
> Should the value be derived from the alloc_chunk_ctl::max_chunk_size or
> chunk_size? Or at least use a named constant, similar to
> BTRFS_MAX_DATA_CHUNK_SIZE .

I originally had this, but we still have this logic in the chunk allocator

	/* Stripe size should not go beyond 1G. */
	ctl->stripe_size = min_t(u64, ctl->stripe_size, SZ_1G);

max_chunk_size tends to be larger than this, BTRFS_MAX_DATA_CHUNK_SIZE is 10G.

We need to go back and clean up the logic to not have the SZ_1G here, rename it
to BTRFS_MAX_CHUNK_SIZE or something like that, and then update my patch to use
this.

I opted for the cleanest fix for now, the cleanups can come after.
Alternatively I can do the cleanups and do the fix, it's up to you.  Thanks,

Josef

  reply	other threads:[~2023-09-20 14:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-18 19:27 [PATCH] btrfs: adjust overcommit logic when very close to full Josef Bacik
2023-09-18 20:14 ` Boris Burkov
2023-09-20 13:59   ` Josef Bacik
2023-09-20 19:02     ` David Sterba
2023-09-18 21:29 ` David Sterba
2023-09-20 14:01   ` Josef Bacik [this message]
2023-09-20 14:04     ` David Sterba
2023-09-20 19:05 ` David Sterba
2023-09-21 22:50   ` Filipe Manana
2023-09-22 10:25     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230920140147.GB3796940@perftesting \
    --to=josef@toxicpanda.com \
    --cc=dsterba@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.