All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2 0/6] btrfs: dynamic and periodic block_group reclaim
Date: Mon, 24 Jun 2024 11:25:14 -0400	[thread overview]
Message-ID: <20240624152514.GB2513195@perftesting> (raw)
In-Reply-To: <cover.1718665689.git.boris@bur.io>

On Mon, Jun 17, 2024 at 04:11:12PM -0700, Boris Burkov wrote:
> Btrfs's block_group allocator suffers from a well known problem, that
> it is capable of eagerly allocating too much space to either data or
> metadata (most often data, absent bugs) and then later be unable to
> allocate more space for the other, when needed. When data starves
> metadata, this can extra painfully result in read only filesystems that
> need careful manual balancing to fix.
> 
> This can be worked around by:
> - enabling automatic reclaim
> - periodically running balance
> 
> The latter is widely deployed via btrfsmaintenance
> (https://github.com/kdave/btrfsmaintenance) and the former is used at
> scale at Meta with good results. However, neither of those solutions is
> perfect, as they both currently use a fixed threshold. A fixed threshold
> is vulnerable to workloads that trigger high amounts of reclaim. This
> has led to btrfsmaintenance setting very conservative thresholds of 5
> and 10 percent of data block groups.
> (https://github.com/kdave/btrfsmaintenance/commit/edbbfffe592f47c2849a8825f523e2ccc38b15f5)
> At Meta, we deal with an elevated level of reclaim which would be
> desirable to reduce.
> 
> This patch set expands on automatic reclaim, adding the ability to set a
> dynamic reclaim threshold that appropriately scales with the global file
> system allocation conditions as well as periodic reclaim which runs that
> reclaim sweep in the cleaner thread. Together, I believe they constitute
> a robust and general automatic reclaim system that should avoid
> unfortunate read only filesystems in all but extreme conditions, where
> space is running quite low anyway and failure is more reasonable.
> 
> At a very high level, the dynamic threshold's strategy is to set a fixed
> target of unallocated block groups (10 block groups) and linearly scale
> its aggression the further we are from that target. That way we do no
> automatic reclaim until we actually press against the unallocated
> target, allowing the allocator to gradually fill fragmented space with
> new extents, but do claw back space after  workloads that use and free a
> bunch of space, perhaps with fragmentation.
> 
> I ran it on three workloads (described in detail on the dynamic reclaim
> patch) but they are:
> 1. bounce allocations around X% full.
> 2. fill up all the way and introduce full fragmentation.
> 3. write in a fragmented way until the filesystem is just about full.
> script can be found here:
> https://github.com/boryas/scripts/tree/main/fio/reclaim
> 
> The important results can be seen here (full results explorable at
> https://bur.io/dyn-rec/)
> 
> bounce at 30%, higher relocations with a fixed threshold:
> https://bur.io/dyn-rec/bounce/reclaims.png
> https://bur.io/dyn-rec/bounce/reclaim_bytes.png
> https://bur.io/dyn-rec/bounce/unalloc_bytes.png
> 
> hard 30% fragmentation, dynamic actually reclaims, relocs not crazy:
> https://bur.io/dyn-rec/strict_frag/reclaims.png
> https://bur.io/dyn-rec/strict_frag/reclaim_bytes.png
> https://bur.io/dyn-rec/strict_frag/unalloc_bytes.png
> 
> fill it all the way up in a fragmented way, then keep making
> allocations: 
> https://bur.io/dyn-rec/last_gig/reclaims.png
> https://bur.io/dyn-rec/last_gig/reclaim_bytes.png
> https://bur.io/dyn-rec/last_gig/unalloc_bytes.png

These results are great, once you fix up the one comment I had you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to the whole series.  Thanks,

Josef

      parent reply	other threads:[~2024-06-24 15:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-17 23:11 [PATCH v2 0/6] btrfs: dynamic and periodic block_group reclaim Boris Burkov
2024-06-17 23:11 ` [PATCH v2 1/6] btrfs: report reclaim stats in sysfs Boris Burkov
2024-06-17 23:11 ` [PATCH v2 2/6] btrfs: store fs_info on space_info Boris Burkov
2024-06-17 23:11 ` [PATCH v2 3/6] btrfs: dynamic block_group reclaim threshold Boris Burkov
2024-06-25 13:40   ` Naohiro Aota
2024-06-17 23:11 ` [PATCH v2 4/6] btrfs: periodic block_group reclaim Boris Burkov
2024-06-17 23:11 ` [PATCH v2 5/6] btrfs: prevent pathological periodic reclaim loops Boris Burkov
2024-06-24 15:23   ` Josef Bacik
2024-06-24 16:05     ` David Sterba
2025-12-26  4:18   ` Sun Yangkai
2025-12-29 23:54     ` Boris Burkov
2024-06-17 23:11 ` [PATCH v2 6/6] btrfs: urgent periodic reclaim pass Boris Burkov
2024-06-24 15:25 ` Josef Bacik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240624152514.GB2513195@perftesting \
    --to=josef@toxicpanda.com \
    --cc=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.