Re: [PATCH] btrfs: adjust overcommit logic when very close to full

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Josef Bacik <josef@toxicpanda.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: adjust overcommit logic when very close to full
Date: Wed, 20 Sep 2023 09:59:23 -0400	[thread overview]
Message-ID: <20230920135923.GA3796940@perftesting> (raw)
In-Reply-To: <20230918201441.GA299788@zen>

On Mon, Sep 18, 2023 at 01:14:41PM -0700, Boris Burkov wrote:
> On Mon, Sep 18, 2023 at 03:27:47PM -0400, Josef Bacik wrote:
> > A user reported some unpleasant behavior with very small file systems.
> > The reproducer is this
> > 
> > mkfs.btrfs -f -m single -b 8g /dev/vdb
> > mount /dev/vdb /mnt/test
> > dd if=/dev/zero of=/mnt/test/testfile bs=512M count=20
> > 
> > This will result in usage that looks like this
> > 
> > Overall:
> >     Device size:                   8.00GiB
> >     Device allocated:              8.00GiB
> >     Device unallocated:            1.00MiB
> >     Device missing:                  0.00B
> >     Device slack:                  2.00GiB
> >     Used:                          5.47GiB
> >     Free (estimated):              2.52GiB      (min: 2.52GiB)
> >     Free (statfs, df):               0.00B
> >     Data ratio:                       1.00
> >     Metadata ratio:                   1.00
> >     Global reserve:                5.50MiB      (used: 0.00B)
> >     Multiple profiles:                  no
> > 
> > Data,single: Size:7.99GiB, Used:5.46GiB (68.41%)
> >    /dev/vdb        7.99GiB
> > 
> > Metadata,single: Size:8.00MiB, Used:5.77MiB (72.07%)
> >    /dev/vdb        8.00MiB
> > 
> > System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
> >    /dev/vdb        4.00MiB
> > 
> > Unallocated:
> >    /dev/vdb        1.00MiB
> > 
> > As you can see we've gotten ourselves quite full with metadata, with all
> > of the disk being allocated for data.
> > 
> > On smaller file systems there's not a lot of time before we get full, so
> > our overcommit behavior bites us here.  Generally speaking data
> > reservations result in chunk allocations as we assume reservation ==
> > actual use for data.  This means at any point we could end up with a
> > chunk allocation for data, and if we're very close to full we could do
> > this before we have a chance to figure out that we need another metadata
> > chunk.
> > 
> > Address this by adjusting the overcommit logic.  Simply put we need to
> > take away 1 chunk from the available chunk space in case of a data
> > reservation.  This will allow us to stop overcommitting before we
> > potentially lose this space to a data allocation.  With this fix in
> > place we properly allocate a metadata chunk before we're completely
> > full, allowing for enough slack space in metadata.
> 
> LGTM, this should help and I've been kicking around the same idea in my
> head for a while.
> 
> I do think this is kind of a band-aid, though. It isn't hard to imagine
> that you allocate data chunks up to the 1G, then allocate a metadata
> chunk, then fragment/under-utilize the data to the point that you
> actually fill up the metadata and get right back to this same point.
> 
> Long term, I think we still need more/smarter reclaim, but this should
> be a good steam valve for the simple cases where we deterministically
> gobble up all the unallocated space for data.

This is definitely a bit of a bandaid, because we can have any number of things
allocate a chunk at any given time, however this is more of a concern for small
file systems where we only have the initial 8mib metadata block group.

We spoke offline, but the long term fix here is to have a chunk reservation
system and use that in overcommit so we never have to worry about suddenly not
being able to allocate a chunk.  Then if we need to revoke that reservation we
can force flush everything to get us under the overcommit threshold, and then
disable overcommit because we'll have allocated that chunk.

For now this fixes the problem with the least surprise.  Thanks,

Josef

next prev parent reply	other threads:[~2023-09-20 14:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-18 19:27 [PATCH] btrfs: adjust overcommit logic when very close to full Josef Bacik
2023-09-18 20:14 ` Boris Burkov
2023-09-20 13:59   ` Josef Bacik [this message]
2023-09-20 19:02     ` David Sterba
2023-09-18 21:29 ` David Sterba
2023-09-20 14:01   ` Josef Bacik
2023-09-20 14:04     ` David Sterba
2023-09-20 19:05 ` David Sterba
2023-09-21 22:50   ` Filipe Manana
2023-09-22 10:25     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230920135923.GA3796940@perftesting \
    --to=josef@toxicpanda.com \
    --cc=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).