* fix for selecting a zone with active GC I/O for GC v2
@ 2025-10-23 15:17 Christoph Hellwig
2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
2025-10-23 15:17 ` [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent Christoph Hellwig
0 siblings, 2 replies; 4+ messages in thread
From: Christoph Hellwig @ 2025-10-23 15:17 UTC (permalink / raw)
To: Carlos Maiolino; +Cc: Hans Holmberg, Darrick J. Wong, linux-xfs
Hi all,
the first patch in the series adds a counter to ensure that we never
select a zone for which there is active GC I/O for GC again, as we'd
try to copy the same blocks again, only to see the mapping had changed
in I/O completion to discard that I/O.
The second one documents an non-obvious version of that GC race detection
that took us way too long to track down.
Changes since v1:
- fix text in a comment and a commit message
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] xfs: prevent gc from picking the same zone twice
2025-10-23 15:17 fix for selecting a zone with active GC I/O for GC v2 Christoph Hellwig
@ 2025-10-23 15:17 ` Christoph Hellwig
2025-10-31 11:19 ` Carlos Maiolino
2025-10-23 15:17 ` [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent Christoph Hellwig
1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2025-10-23 15:17 UTC (permalink / raw)
To: Carlos Maiolino
Cc: Hans Holmberg, Darrick J. Wong, linux-xfs, Damien Le Moal,
Carlos Maiolino
When we are picking a zone for gc it might already be in the pipeline
which can lead to us moving the same data twice resulting in in write
amplification and a very unfortunate case where we keep on garbage
collecting the zone we just filled with migrated data stopping all
forward progress.
Fix this by introducing a count of on-going GC operations on a zone, and
skip any zone with ongoing GC when picking a new victim.
Fixes: 080d01c41 ("xfs: implement zoned garbage collection")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
---
fs/xfs/libxfs/xfs_rtgroup.h | 6 ++++++
fs/xfs/xfs_zone_gc.c | 27 +++++++++++++++++++++++++++
2 files changed, 33 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h
index d36a6ae0abe5..d4fcf591e63d 100644
--- a/fs/xfs/libxfs/xfs_rtgroup.h
+++ b/fs/xfs/libxfs/xfs_rtgroup.h
@@ -50,6 +50,12 @@ struct xfs_rtgroup {
uint8_t *rtg_rsum_cache;
struct xfs_open_zone *rtg_open_zone;
};
+
+ /*
+ * Count of outstanding GC operations for zoned XFS. Any RTG with a
+ * non-zero rtg_gccount will not be picked as new GC victim.
+ */
+ atomic_t rtg_gccount;
};
/*
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index 109877d9a6bf..4ade54445532 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -114,6 +114,8 @@ struct xfs_gc_bio {
/* Open Zone being written to */
struct xfs_open_zone *oz;
+ struct xfs_rtgroup *victim_rtg;
+
/* Bio used for reads and writes, including the bvec used by it */
struct bio_vec bv;
struct bio bio; /* must be last */
@@ -264,6 +266,7 @@ xfs_zone_gc_iter_init(
iter->rec_count = 0;
iter->rec_idx = 0;
iter->victim_rtg = victim_rtg;
+ atomic_inc(&victim_rtg->rtg_gccount);
}
/*
@@ -362,6 +365,7 @@ xfs_zone_gc_query(
return 0;
done:
+ atomic_dec(&iter->victim_rtg->rtg_gccount);
xfs_rtgroup_rele(iter->victim_rtg);
iter->victim_rtg = NULL;
return 0;
@@ -451,6 +455,20 @@ xfs_zone_gc_pick_victim_from(
if (!rtg)
continue;
+ /*
+ * If the zone is already undergoing GC, don't pick it again.
+ *
+ * This prevents us from picking one of the zones for which we
+ * already submitted GC I/O, but for which the remapping hasn't
+ * concluded yet. This won't cause data corruption, but
+ * increases write amplification and slows down GC, so this is
+ * a bad thing.
+ */
+ if (atomic_read(&rtg->rtg_gccount)) {
+ xfs_rtgroup_rele(rtg);
+ continue;
+ }
+
/* skip zones that are just waiting for a reset */
if (rtg_rmap(rtg)->i_used_blocks == 0 ||
rtg_rmap(rtg)->i_used_blocks >= victim_used) {
@@ -688,6 +706,9 @@ xfs_zone_gc_start_chunk(
chunk->scratch = &data->scratch[data->scratch_idx];
chunk->data = data;
chunk->oz = oz;
+ chunk->victim_rtg = iter->victim_rtg;
+ atomic_inc(&chunk->victim_rtg->rtg_group.xg_active_ref);
+ atomic_inc(&chunk->victim_rtg->rtg_gccount);
bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock);
bio->bi_end_io = xfs_zone_gc_end_io;
@@ -710,6 +731,8 @@ static void
xfs_zone_gc_free_chunk(
struct xfs_gc_bio *chunk)
{
+ atomic_dec(&chunk->victim_rtg->rtg_gccount);
+ xfs_rtgroup_rele(chunk->victim_rtg);
list_del(&chunk->entry);
xfs_open_zone_put(chunk->oz);
xfs_irele(chunk->ip);
@@ -770,6 +793,10 @@ xfs_zone_gc_split_write(
split_chunk->oz = chunk->oz;
atomic_inc(&chunk->oz->oz_ref);
+ split_chunk->victim_rtg = chunk->victim_rtg;
+ atomic_inc(&chunk->victim_rtg->rtg_group.xg_active_ref);
+ atomic_inc(&chunk->victim_rtg->rtg_gccount);
+
chunk->offset += split_len;
chunk->len -= split_len;
chunk->old_startblock += XFS_B_TO_FSB(data->mp, split_len);
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent
2025-10-23 15:17 fix for selecting a zone with active GC I/O for GC v2 Christoph Hellwig
2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
@ 2025-10-23 15:17 ` Christoph Hellwig
1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2025-10-23 15:17 UTC (permalink / raw)
To: Carlos Maiolino
Cc: Hans Holmberg, Darrick J. Wong, linux-xfs, Damien Le Moal,
Carlos Maiolino
Besides blocks being invalidated, there is another case when the original
mapping could have changed between querying the rmap for GC and calling
xfs_zoned_map_extent. Document it there as it took us quite some time
to figure out what is going on while developing the multiple-GC
protection fix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_zone_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index 27276519a4a7..336c9f184a75 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -246,6 +246,14 @@ xfs_zoned_map_extent(
* If a data write raced with this GC write, keep the existing data in
* the data fork, mark our newly written GC extent as reclaimable, then
* move on to the next extent.
+ *
+ * Note that this can also happen when racing with operations that do
+ * not actually invalidate the data, but just move it to a different
+ * inode (XFS_IOC_EXCHANGE_RANGE), or to a different offset inside the
+ * inode (FALLOC_FL_COLLAPSE_RANGE / FALLOC_FL_INSERT_RANGE). If the
+ * data was just moved around, GC fails to free the zone, but the zone
+ * becomes a GC candidate again as soon as all previous GC I/O has
+ * finished and these blocks will be moved out eventually.
*/
if (old_startblock != NULLFSBLOCK &&
old_startblock != data.br_startblock)
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] xfs: prevent gc from picking the same zone twice
2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
@ 2025-10-31 11:19 ` Carlos Maiolino
0 siblings, 0 replies; 4+ messages in thread
From: Carlos Maiolino @ 2025-10-31 11:19 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Hans Holmberg, Darrick J. Wong, linux-xfs, Damien Le Moal,
Carlos Maiolino
On Thu, 23 Oct 2025 17:17:02 +0200, Christoph Hellwig wrote:
> When we are picking a zone for gc it might already be in the pipeline
> which can lead to us moving the same data twice resulting in in write
> amplification and a very unfortunate case where we keep on garbage
> collecting the zone we just filled with migrated data stopping all
> forward progress.
>
> Fix this by introducing a count of on-going GC operations on a zone, and
> skip any zone with ongoing GC when picking a new victim.
>
> [...]
Applied to for-next, thanks!
[1/2] xfs: prevent gc from picking the same zone twice
commit: 83bac569c762651ac6dff9a86f54ecc13d911f7d
[2/2] xfs: document another racy GC case in xfs_zoned_map_extent
commit: 0db22d7ee462c42c1284e98d47840932792c1adb
Best regards,
--
Carlos Maiolino <cem@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-31 11:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 15:17 fix for selecting a zone with active GC I/O for GC v2 Christoph Hellwig
2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
2025-10-31 11:19 ` Carlos Maiolino
2025-10-23 15:17 ` [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).