Re: permanently wedged in filesystem, fs/btrfs/relocation.c:1937 prepare_to_merge

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Boris Burkov <boris@bur.io>
To: Nicholas D Steeves <nsteeves@gmail.com>
Cc: Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: permanently wedged in filesystem, fs/btrfs/relocation.c:1937 prepare_to_merge
Date: Fri, 4 Aug 2023 11:00:18 -0700	[thread overview]
Message-ID: <20230804180018.GA3699656@zen> (raw)
In-Reply-To: <87fs4ztxbd.fsf@digitalMercury.freeddns.org>

On Thu, Aug 03, 2023 at 09:23:34PM -0400, Nicholas D Steeves wrote:
> Boris Burkov <boris@bur.io> writes:
> 
> > On Thu, Jul 20, 2023 at 09:42:37AM -0400, Chris Murphy wrote:
> >
> > The btrfs allocator is far from perfect and despite a few measures that
> > attempt to prevent fragmentation, it can still happen. If you have a
> > system that reproduces this, you can consider using the scripts I wrote
> > here: https://github.com/josefbacik/fsperf/tree/master/src/frag to dump
> > the fragmentation level of the FS (and even visualize it) to confirm my
> > hypothesis. I'm happy to help you get that up and running.
> >
> > Now let's suppose you do have a workload that challenges our allocator,
> > fragments the data block groups, and chews through all the unallocated
> > space. We have a lot of those at Meta, so luckily, there is some relief
> > available.
> >
> > Fundamentally the remediation is to defragment the disk, which we do
> > do with data block group balancing. You can invoke this manually with:
> > `btrfs balance start -d<thresh> <fs>`
> > where <thresh> is a percentage fullness of data block_groups to target
> > with balancing. Lower is more conservative so you can start low and
> > increase it to 80 or so till you reclaim enough space. If you use that,
> > it's better to do it proactively periodically rather than after you get
> > stuck, 'cause as you saw, balances start failing with ENOSPC too.
> > (see point 2. above :))
> 
> Would it be useful to use fsperf's frag (module?) in combination with
> the required btrd to periodically assess the state of fragmentation?
> What are the downsides of doing this?

I think this is probably overkill, compared to experimenting with
auto-relocation and monitoring relocation/IO. Btrd is designed to run on
a mounted filesystem and uses the SEARCH_V2 ioctl so it should be "fine"
to use, but the script walks the entire extent tree so on a large file
system it will be slow and use lots of memory (it ooms on my test vms
when I'm not careful..)

I wrote this as a helper for testing out allocator changes targeting
fragmentation. fsperf is our perf testbed, so it runs some workload and
then when it's done on a basically inactive test fs, it runs the script.

I would say that it is unsupported for serious production use, and I
wouldn't use it in that way, but it doesn't use any insane features and
shouldn't crash your system besides normal resource hogging type issues.

I don't have concrete plans for btrfs to track block_group fragmentation
directly (haven't figured out if I can do it efficiently) but it would
be an interesting project for the future.

> 
> I'm specifically interested in minimising the risk of "everything was
> fine until the fs blew up", and it seems like running this test
> periodically would provide useful data that would inform the sysadmin
> about whether the risk of rewriting data at rest with a rebalance is
> less than the risk of encountering issues triggered by the less than
> perfect allocator.
> 
> Because it sounds like there still exist workloads that necessitate
> periodic rebalancing, sysadmins need a way to determine the degree of
> need for rebalancing in order to define a mitigation policy in a
> fact-based way.
> 
> Is fsperf the correct tool for this general case, or should we be using
> something else?

We monitor "unallocated" via btrfs filesystem usage. Unallocated
trending down while data usage % is relatively low is a good sign of
fragmentation and data over-allocation where balance would help.

> 
> 
> Thanks!
> Nicholas
> 
> P.S. Please CC me in replies.

     prev parent reply	other threads:[~2023-08-04 18:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20 13:42 permanently wedged in filesystem, fs/btrfs/relocation.c:1937 prepare_to_merge Chris Murphy
2023-08-03 21:12 ` Boris Burkov
2023-08-04  1:23   ` Nicholas D Steeves
2023-08-04 18:00     ` Boris Burkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230804180018.GA3699656@zen \
    --to=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=nsteeves@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox