Re: [PATCH] btrfs: make periodic dynamic reclaim the default for data

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Boris Burkov <boris@bur.io>
To: Chris Murphy <lists@colorremedies.com>
Cc: Leo Martins <loemra.dev@gmail.com>,
	kernel-team@fb.com, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] btrfs: make periodic dynamic reclaim the default for data
Date: Tue, 21 Oct 2025 18:02:15 -0700	[thread overview]
Message-ID: <20251022010215.GA167205@zen.localdomain> (raw)
In-Reply-To: <a254c33c-2a05-4e75-9c3b-12f823ebc8a7@app.fastmail.com>

On Tue, Oct 21, 2025 at 08:37:18PM -0400, Chris Murphy wrote:
> Thanks for the response.
> 
> On Tue, Oct 21, 2025, at 6:39 PM, Leo Martins wrote:
> 
> >
> > Wanted to provide some data from the Meta rollout to give more context on the
> > decision to enable dynamic+periodic reclaim by default for data. All the before
> > numbers are with bg_reclaim_threshold set to 30.
> >
> > Enabling dynamic+periodic reclaim for data block groups dramatically decreases
> > number of reclaims per host, going from 150/day to just 5/day (p99), and from
> > 6/day to 0/day (p50). The trade-offs are increases in fragmentation, and a
> > slight uptick in enospcs.
> >
> > I currently don't have direct fragmentation metrics, though that is a
> > work in progress, but I'm tracking FP as a proxy for fragmentation.
> >
> > FP = (allocated - used) / allocated
> > So if there are 100G allocated for data and 80G are used, FP = (100 - 
> > 80) / 100 = 20%.
> >
> > FP has increased from 30% to 45% (p99), and from 5% to 7% (p50).
> > Enospc rates have gone from around 0.5/day to 1/day per 100k hosts.

Leo, correct me if I'm wrong, but we have yet to investigate a system
where unallocated steadily marched down to 0 since the introduction of
dynamic reclaim and then it ENOSPC'd, right? If there is a strong,
undeniable increase in ENOSPCs we should absolutely look for such
systems in those regions to motivate further improvements with
full/filling filesystems.

There is also the confounding variable of the bug fixed here:
https://lore.kernel.org/linux-btrfs/22e8b64df3d4984000713433a89cfc14309b75fc.1759430967.git.boris@bur.io/
that has been plaguing our fleet causing ENOSPC issues.

> > This is a doubling in rate, but still a very small absolute number
> > of enospcs. The unallocated space on disk decreases by ~15G (p99)
> > and ~5G (p50) after rollout.
> 
> I'm curious how it compares with default btrfsmaintenance btrfs-balance.timer/service  - I'm guessing this is a bit harder to test at Meta in production due to the strictly time based trigger. And customization ends up being a choice between even higher reclaim or higher enospc.
> 

Yeah, we don't have that data unfortunately.

> > That being said I don't think bg_reclaim_threshold is enabled by default,
> > and I am comfortable saying dynamic+periodic reclaim is better than no
> > automatic reclaim!
> 
> So there are still corner cases occurring even with dynamic periodic reclaim. What do those look like? Is the file system unable to write metadata for arbitrary deletes to back the file system out? Or is it stuck in some cases?
> 

I would imagine the cases that are tough for dynamic reclaim are:
1. genuinely quite full fs
2. rapidly needs a big hunk of metadata between entering the dynamic
   reclaim zone but before the cleaner thread / reclaim worker can run.

> ext4 users are used to 5% of space being held in reserve for root user processes. I'm not sure if xfs has such a concept. Btrfs global reserve is different in that even root can't use it, it's really reserved for the kernel. But sometimes it's still possible to exhaust this metadata space, and be unable to delete files or balance even 1 data bg to back the file system out of the situation. The wedged in file system that keeps going read-only and appears stuck is a big concern since users have no idea what to do. And internet searches tend to produce results that are less help than no help.
> 
> -- 
> Chris Murphy

Anyway, I think Leo's forthcoming detailed per-BG fragmentation data
should be the most telling. System level fragmentation percentage
isn't the most useful IMO.

Thanks,
Boris

next prev parent reply	other threads:[~2025-10-22  1:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-15 18:58 [PATCH] btrfs: make periodic dynamic reclaim the default for data Boris Burkov
2025-07-16  6:24 ` Johannes Thumshirn
2025-07-16 15:56   ` Boris Burkov
2025-07-17 12:55     ` Johannes Thumshirn
2025-10-21 18:52 ` Chris Murphy
2025-10-21 22:39   ` Leo Martins
2025-10-22  0:37     ` Chris Murphy
2025-10-22  1:02       ` Boris Burkov [this message]
2025-10-23 23:27         ` Leo Martins
2025-12-13 22:09           ` Neal Gompa
2025-12-26  3:07 ` Sun Yangkai
2025-12-30  0:00   ` Boris Burkov
2025-12-30  1:29     ` Sun Yangkai
2025-12-30  1:41     ` Sun Yangkai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251022010215.GA167205@zen.localdomain \
    --to=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=loemra.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox