public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] btrfs: dynamic and periodic block_group reclaim
@ 2024-02-02 23:12 Boris Burkov
  2024-02-02 23:12 ` [PATCH 1/6] btrfs: report reclaim count in sysfs Boris Burkov
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Boris Burkov @ 2024-02-02 23:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Btrfs's block_group allocator suffers from a well known problem, that
it is capable of eagerly allocating too much space to either data or
metadata (most often data, absent bugs) and then later be unable to
allocate more space for the other, when needed. When data starves
metadata, this can extra painfully result in read only filesystems that
need careful manual balancing to fix.

This can be worked around by:
- enabling automatic reclaim
- periodically running balance

Neither of these enjoy widespread use, as far as I know, though the
former is used at scale at Meta with good results.

This patch set expands on automatic reclaim, adding the ability to set a
dynamic reclaim threshold that appropriately scales with the global file
system allocation conditions as well as periodic reclaim which runs that
reclaim sweep in the cleaner thread. Together, I believe they constitute
a robust and general automatic reclaim system that should avoid
unfortunate read only filesystems in all but extreme conditions, where
space is running quite low anyway and failure is more reasonable.

I ran it on three workloads (described in detail on the dynamic reclaim
patch) but they are:
1. bounce allocations around X% full.
2. fill up all the way and introduce full fragmentation.
3. write in a fragmented way until the filesystem is just about full.
script can be found here:
https://github.com/boryas/scripts/tree/main/fio/reclaim

The important results can be seen here (full results explorable at
bur.io/dyn-rec/)

bounce at 30%, much higher relocations with a fixed threshold:
https://bur.io/dyn-rec/bounce-30/relocs.png

hard 30% fragmentation, dynamic actually reclaims, relocs not crazy:
https://bur.io/dyn-rec/strict_frag-30/unalloc_bytes.png
https://bur.io/dyn-rec/strict_frag-30/relocs.png

fill it all the way up, not crazy churn, but saving a buffer:
https://bur.io/dyn-rec/last_gig/unalloc_bytes.png
https://bur.io/dyn-rec/last_gig/relocs.png
https://bur.io/dyn-rec/last_gig/thresh.png

Boris Burkov (6):
  btrfs: report reclaim count in sysfs
  btrfs: store fs_info on space_info
  btrfs: dynamic block_group reclaim threshold
  btrfs: periodic block_group reclaim
  btrfs: urgent periodic reclaim pass
  btrfs: prevent pathological periodic reclaim loops

 fs/btrfs/block-group.c |  26 ++++---
 fs/btrfs/block-group.h |   1 +
 fs/btrfs/space-info.c  | 165 +++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/space-info.h  |  28 +++++++
 fs/btrfs/sysfs.c       |  79 +++++++++++++++++++-
 5 files changed, 289 insertions(+), 10 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread
* [PATCH 0/6] btrfs: dynamic and periodic block_group reclaim
@ 2024-04-03 19:38 Boris Burkov
  2024-04-03 19:38 ` [PATCH 4/6] btrfs: " Boris Burkov
  0 siblings, 1 reply; 12+ messages in thread
From: Boris Burkov @ 2024-04-03 19:38 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Btrfs's block_group allocator suffers from a well known problem, that
it is capable of eagerly allocating too much space to either data or
metadata (most often data, absent bugs) and then later be unable to
allocate more space for the other, when needed. When data starves
metadata, this can extra painfully result in read only filesystems that
need careful manual balancing to fix.

This can be worked around by:
- enabling automatic reclaim
- periodically running balance

The latter is widely deployed via btrfsmaintenance
(https://github.com/kdave/btrfsmaintenance) and the former is used at
scale at Meta with good results. However, neither of those solutions is
perfect, as they both currently use a fixed threshold. A fixed threshold
is vulnerable to workloads that trigger high amounts of reclaim. This
has led to btrfsmaintenance setting very conservative thresholds of 5
and 10 percent of data block groups.
(https://github.com/kdave/btrfsmaintenance/commit/edbbfffe592f47c2849a8825f523e2ccc38b15f5)
At Meta, we deal with an elevated level of reclaim which would be
desirable to reduce.

This patch set expands on automatic reclaim, adding the ability to set a
dynamic reclaim threshold that appropriately scales with the global file
system allocation conditions as well as periodic reclaim which runs that
reclaim sweep in the cleaner thread. Together, I believe they constitute
a robust and general automatic reclaim system that should avoid
unfortunate read only filesystems in all but extreme conditions, where
space is running quite low anyway and failure is more reasonable.

At a very high level, the dynamic threshold's strategy is to set a fixed
target of unallocated block groups (10 block groups) and linearly scale
its aggression the further we are from that target. That way we do no
automatic reclaim until we actually press against the unallocated
target, allowing the allocator to gradually fill fragmented space with
new extents, but do claw back space after  workloads that use and free a
bunch of space, perhaps with fragmentation.

I ran it on three workloads (described in detail on the dynamic reclaim
patch) but they are:
1. bounce allocations around X% full.
2. fill up all the way and introduce full fragmentation.
3. write in a fragmented way until the filesystem is just about full.
script can be found here:
https://github.com/boryas/scripts/tree/main/fio/reclaim

The important results can be seen here (full results explorable at
https://bur.io/dyn-rec/)

bounce at 30%, higher relocations with a fixed threshold:
https://bur.io/dyn-rec/bounce/reclaims.png
https://bur.io/dyn-rec/bounce/reclaim_bytes.png
https://bur.io/dyn-rec/bounce/unalloc_bytes.png

hard 30% fragmentation, dynamic actually reclaims, relocs not crazy:
https://bur.io/dyn-rec/strict_frag/reclaims.png
https://bur.io/dyn-rec/strict_frag/reclaim_bytes.png
https://bur.io/dyn-rec/strict_frag/unalloc_bytes.png

fill it all the way up in a fragmented way, then keep making
allocations: 
https://bur.io/dyn-rec/last_gig/reclaims.png
https://bur.io/dyn-rec/last_gig/reclaim_bytes.png
https://bur.io/dyn-rec/last_gig/unalloc_bytes.png

Boris Burkov (6):
  btrfs: report reclaim stats in sysfs
  btrfs: store fs_info on space_info
  btrfs: dynamic block_group reclaim threshold
  btrfs: periodic block_group reclaim
  btrfs: prevent pathological periodic reclaim loops
  btrfs: urgent periodic reclaim pass

 fs/btrfs/block-group.c |  42 ++++++--
 fs/btrfs/block-group.h |   1 +
 fs/btrfs/space-info.c  | 240 +++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/space-info.h  |  42 ++++++++
 fs/btrfs/sysfs.c       |  81 +++++++++++++-
 5 files changed, 383 insertions(+), 23 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-03 19:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-02 23:12 [RFC PATCH 0/6] btrfs: dynamic and periodic block_group reclaim Boris Burkov
2024-02-02 23:12 ` [PATCH 1/6] btrfs: report reclaim count in sysfs Boris Burkov
2024-02-02 23:12 ` [PATCH 2/6] btrfs: store fs_info on space_info Boris Burkov
2024-02-02 23:12 ` [PATCH 3/6] btrfs: dynamic block_group reclaim threshold Boris Burkov
2024-02-02 23:12 ` [PATCH 4/6] btrfs: periodic block_group reclaim Boris Burkov
2024-02-04 18:19   ` kernel test robot
2024-02-02 23:12 ` [PATCH 5/6] btrfs: urgent periodic reclaim pass Boris Burkov
2024-02-02 23:12 ` [PATCH 6/6] btrfs: prevent pathological periodic reclaim loops Boris Burkov
2024-02-06 14:55 ` [RFC PATCH 0/6] btrfs: dynamic and periodic block_group reclaim David Sterba
2024-02-06 22:07   ` Boris Burkov
2024-02-19 19:38     ` David Sterba
  -- strict thread matches above, loose matches on Subject: below --
2024-04-03 19:38 [PATCH " Boris Burkov
2024-04-03 19:38 ` [PATCH 4/6] btrfs: " Boris Burkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox