From: Dave Chinner <david@fromorbit.com>
To: Hans Holmberg <Hans.Holmberg@wdc.com>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"Darrick J . Wong" <djwong@kernel.org>, hch <hch@lst.de>
Subject: Re: [RFC PATCH] xfs: add mount option for zone gc pressure
Date: Wed, 19 Mar 2025 20:11:46 +1100 [thread overview]
Message-ID: <Z9qKUt1iPsQTTKu-@dread.disaster.area> (raw)
In-Reply-To: <20250319081818.6406-1-hans.holmberg@wdc.com>
On Wed, Mar 19, 2025 at 08:19:19AM +0000, Hans Holmberg wrote:
> Presently we start garbage collection late - when we start running
> out of free zones to backfill max_open_zones. This is a reasonable
> default as it minimizes write amplification. The longer we wait,
> the more blocks are invalidated and reclaim cost less in terms
> of blocks to relocate.
>
> Starting this late however introduces a risk of GC being outcompeted
> by user writes. If GC can't keep up, user writes will be forced to
> wait for free zones with high tail latencies as a result.
>
> This is not a problem under normal circumstances, but if fragmentation
> is bad and user write pressure is high (multiple full-throttle
> writers) we will "bottom out" of free zones.
>
> To mitigate this, introduce a gc_pressure mount option that lets the
> user specify a percentage of how much of the unused space that gc
> should keep available for writing. A high value will reclaim more of
> the space occupied by unused blocks, creating a larger buffer against
> write bursts.
>
> This comes at a cost as write amplification is increased. To
> illustrate this using a sample workload, setting gc_pressure to 60%
> avoids high (500ms) max latencies while increasing write amplification
> by 15%.
It seems to me that this is runtime workload dependent, and so maybe
a tunable variable in /sys/fs/xfs/<dev>/.... might suit better?
That way it can be controlled by a userspace agent as the filesystem
fills and empties rather than being fixed at mount time and never
really being optimal for a changing workload...
> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
> ---
>
> A patch for xfsprogs documenting the option will follow (if it makes
> it beyond RFC)
New mount options should also be documented in the kernel admin
guide here -> Documentation/admin-guide/xfs.rst.
....
>
> fs/xfs/xfs_mount.h | 1 +
> fs/xfs/xfs_super.c | 14 +++++++++++++-
> fs/xfs/xfs_zone_alloc.c | 5 +++++
> fs/xfs/xfs_zone_gc.c | 16 ++++++++++++++--
> 4 files changed, 33 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 799b84220ebb..af595024de00 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -229,6 +229,7 @@ typedef struct xfs_mount {
> bool m_finobt_nores; /* no per-AG finobt resv. */
> bool m_update_sb; /* sb needs update in mount */
> unsigned int m_max_open_zones;
> + unsigned int m_gc_pressure;
This is not explicitly initialised somewhere. If the magic "mount
gets zeroed on allocation" value of zero it gets means this feature
is turned off, there needs to be a comment somewhere explaining why
it is turned completely off rather than having a default of, say,
5% like we have for low space allocation thresholds in various
other lowspace allocation and reclaim algorithms....
> --- a/fs/xfs/xfs_zone_gc.c
> +++ b/fs/xfs/xfs_zone_gc.c
> @@ -162,18 +162,30 @@ struct xfs_zone_gc_data {
>
> /*
> * We aim to keep enough zones free in stock to fully use the open zone limit
> - * for data placement purposes.
> + * for data placement purposes. Additionally, the gc_pressure mount option
> + * can be set to make sure a fraction of the unused/free blocks are available
> + * for writing.
> */
> bool
> xfs_zoned_need_gc(
> struct xfs_mount *mp)
> {
> + s64 available, free;
> +
> if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE))
> return false;
> - if (xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE) <
> +
> + available = xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE);
> +
> + if (available <
> mp->m_groups[XG_TYPE_RTG].blocks *
> (mp->m_max_open_zones - XFS_OPEN_GC_ZONES))
> return true;
> +
> + free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS);
> + if (available < div_s64(free * mp->m_gc_pressure, 100))
mult_frac(free, mp->m_gc_pressure, 100) to avoid overflow.
Also, this is really a free space threshold, not a dynamic
"pressure" measurement...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2025-03-19 9:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-19 8:19 [RFC PATCH] xfs: add mount option for zone gc pressure Hans Holmberg
2025-03-19 9:11 ` Dave Chinner [this message]
2025-03-20 6:58 ` hch
2025-03-20 12:51 ` Hans Holmberg
2025-03-20 21:22 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z9qKUt1iPsQTTKu-@dread.disaster.area \
--to=david@fromorbit.com \
--cc=Hans.Holmberg@wdc.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox