Re: permanently wedged in filesystem, fs/btrfs/relocation.c:1937 prepare_to_merge

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Boris Burkov <boris@bur.io>
To: Nicholas D Steeves <nsteeves@gmail.com>
Cc: Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: permanently wedged in filesystem, fs/btrfs/relocation.c:1937 prepare_to_merge
Date: Fri, 4 Aug 2023 11:00:18 -0700	[thread overview]
Message-ID: <20230804180018.GA3699656@zen> (raw)
In-Reply-To: <87fs4ztxbd.fsf@digitalMercury.freeddns.org>

On Thu, Aug 03, 2023 at 09:23:34PM -0400, Nicholas D Steeves wrote:
> Boris Burkov <boris@bur.io> writes:
> 
> > On Thu, Jul 20, 2023 at 09:42:37AM -0400, Chris Murphy wrote:
> >
> > The btrfs allocator is far from perfect and despite a few measures that
> > attempt to prevent fragmentation, it can still happen. If you have a
> > system that reproduces this, you can consider using the scripts I wrote
> > here: https://github.com/josefbacik/fsperf/tree/master/src/frag to dump
> > the fragmentation level of the FS (and even visualize it) to confirm my
> > hypothesis. I'm happy to help you get that up and running.
> >
> > Now let's suppose you do have a workload that challenges our allocator,
> > fragments the data block groups, and chews through all the unallocated
> > space. We have a lot of those at Meta, so luckily, there is some relief
> > available.
> >
> > Fundamentally the remediation is to defragment the disk, which we do
> > do with data block group balancing. You can invoke this manually with:
> > `btrfs balance start -d<thresh> <fs>`
> > where <thresh> is a percentage fullness of data block_groups to target
> > with balancing. Lower is more conservative so you can start low and
> > increase it to 80 or so till you reclaim enough space. If you use that,
> > it's better to do it proactively periodically rather than after you get
> > stuck, 'cause as you saw, balances start failing with ENOSPC too.
> > (see point 2. above :))
> 
> Would it be useful to use fsperf's frag (module?) in combination with
> the required btrd to periodically assess the state of fragmentation?
> What are the downsides of doing this?

I think this is probably overkill, compared to experimenting with
auto-relocation and monitoring relocation/IO. Btrd is designed to run on
a mounted filesystem and uses the SEARCH_V2 ioctl so it should be "fine"
to use, but the script walks the entire extent tree so on a large file
system it will be slow and use lots of memory (it ooms on my test vms
when I'm not careful..)

I wrote this as a helper for testing out allocator changes targeting
fragmentation. fsperf is our perf testbed, so it runs some workload and
then when it's done on a basically inactive test fs, it runs the script.

I would say that it is unsupported for serious production use, and I
wouldn't use it in that way, but it doesn't use any insane features and
shouldn't crash your system besides normal resource hogging type issues.

I don't have concrete plans for btrfs to track block_group fragmentation
directly (haven't figured out if I can do it efficiently) but it would
be an interesting project for the future.

> 
> I'm specifically interested in minimising the risk of "everything was
> fine until the fs blew up", and it seems like running this test
> periodically would provide useful data that would inform the sysadmin
> about whether the risk of rewriting data at rest with a rebalance is
> less than the risk of encountering issues triggered by the less than
> perfect allocator.
> 
> Because it sounds like there still exist workloads that necessitate
> periodic rebalancing, sysadmins need a way to determine the degree of
> need for rebalancing in order to define a mitigation policy in a
> fact-based way.
> 
> Is fsperf the correct tool for this general case, or should we be using
> something else?

We monitor "unallocated" via btrfs filesystem usage. Unallocated
trending down while data usage % is relatively low is a good sign of
fragmentation and data over-allocation where balance would help.

> 
> 
> Thanks!
> Nicholas
> 
> P.S. Please CC me in replies.

     prev parent reply	other threads:[~2023-08-04 18:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20 13:42 permanently wedged in filesystem, fs/btrfs/relocation.c:1937 prepare_to_merge Chris Murphy
2023-08-03 21:12 ` Boris Burkov
2023-08-04  1:23   ` Nicholas D Steeves
2023-08-04 18:00     ` Boris Burkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230804180018.GA3699656@zen \
    --to=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=nsteeves@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.