fix for selecting a zone with active GC I/O for GC v2

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* fix for selecting a zone with active GC I/O for GC v2
@ 2025-10-23 15:17 Christoph Hellwig
  2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
  2025-10-23 15:17 ` [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: Christoph Hellwig @ 2025-10-23 15:17 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Hans Holmberg, Darrick J. Wong, linux-xfs

Hi all,

the first patch in the series adds a counter to ensure that we never
select a zone for which there is active GC I/O for GC again, as we'd
try to copy the same blocks again, only to see the mapping had changed
in I/O completion to discard that I/O.

The second one documents an non-obvious version of that GC race detection
that took us way too long to track down.

Changes since v1:
 - fix text in a comment and a commit message

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] xfs: prevent gc from picking the same zone twice
  2025-10-23 15:17 fix for selecting a zone with active GC I/O for GC v2 Christoph Hellwig
@ 2025-10-23 15:17 ` Christoph Hellwig
  2025-10-31 11:19   ` Carlos Maiolino
  2025-10-23 15:17 ` [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent Christoph Hellwig
  1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2025-10-23 15:17 UTC (permalink / raw)
  To: Carlos Maiolino
  Cc: Hans Holmberg, Darrick J. Wong, linux-xfs, Damien Le Moal,
	Carlos Maiolino

When we are picking a zone for gc it might already be in the pipeline
which can lead to us moving the same data twice resulting in in write
amplification and a very unfortunate case where we keep on garbage
collecting the zone we just filled with migrated data stopping all
forward progress.

Fix this by introducing a count of on-going GC operations on a zone, and
skip any zone with ongoing GC when picking a new victim.

Fixes: 080d01c41 ("xfs: implement zoned garbage collection")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
---
 fs/xfs/libxfs/xfs_rtgroup.h |  6 ++++++
 fs/xfs/xfs_zone_gc.c        | 27 +++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h
index d36a6ae0abe5..d4fcf591e63d 100644
--- a/fs/xfs/libxfs/xfs_rtgroup.h
+++ b/fs/xfs/libxfs/xfs_rtgroup.h
@@ -50,6 +50,12 @@ struct xfs_rtgroup {
 		uint8_t			*rtg_rsum_cache;
 		struct xfs_open_zone	*rtg_open_zone;
 	};
+
+	/*
+	 * Count of outstanding GC operations for zoned XFS.  Any RTG with a
+	 * non-zero rtg_gccount will not be picked as new GC victim.
+	 */
+	atomic_t		rtg_gccount;
 };
 
 /*
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index 109877d9a6bf..4ade54445532 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -114,6 +114,8 @@ struct xfs_gc_bio {
 	/* Open Zone being written to */
 	struct xfs_open_zone		*oz;
 
+	struct xfs_rtgroup		*victim_rtg;
+
 	/* Bio used for reads and writes, including the bvec used by it */
 	struct bio_vec			bv;
 	struct bio			bio;	/* must be last */
@@ -264,6 +266,7 @@ xfs_zone_gc_iter_init(
 	iter->rec_count = 0;
 	iter->rec_idx = 0;
 	iter->victim_rtg = victim_rtg;
+	atomic_inc(&victim_rtg->rtg_gccount);
 }
 
 /*
@@ -362,6 +365,7 @@ xfs_zone_gc_query(
 
 	return 0;
 done:
+	atomic_dec(&iter->victim_rtg->rtg_gccount);
 	xfs_rtgroup_rele(iter->victim_rtg);
 	iter->victim_rtg = NULL;
 	return 0;
@@ -451,6 +455,20 @@ xfs_zone_gc_pick_victim_from(
 		if (!rtg)
 			continue;
 
+		/*
+		 * If the zone is already undergoing GC, don't pick it again.
+		 *
+		 * This prevents us from picking one of the zones for which we
+		 * already submitted GC I/O, but for which the remapping hasn't
+		 * concluded yet.  This won't cause data corruption, but
+		 * increases write amplification and slows down GC, so this is
+		 * a bad thing.
+		 */
+		if (atomic_read(&rtg->rtg_gccount)) {
+			xfs_rtgroup_rele(rtg);
+			continue;
+		}
+
 		/* skip zones that are just waiting for a reset */
 		if (rtg_rmap(rtg)->i_used_blocks == 0 ||
 		    rtg_rmap(rtg)->i_used_blocks >= victim_used) {
@@ -688,6 +706,9 @@ xfs_zone_gc_start_chunk(
 	chunk->scratch = &data->scratch[data->scratch_idx];
 	chunk->data = data;
 	chunk->oz = oz;
+	chunk->victim_rtg = iter->victim_rtg;
+	atomic_inc(&chunk->victim_rtg->rtg_group.xg_active_ref);
+	atomic_inc(&chunk->victim_rtg->rtg_gccount);
 
 	bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock);
 	bio->bi_end_io = xfs_zone_gc_end_io;
@@ -710,6 +731,8 @@ static void
 xfs_zone_gc_free_chunk(
 	struct xfs_gc_bio	*chunk)
 {
+	atomic_dec(&chunk->victim_rtg->rtg_gccount);
+	xfs_rtgroup_rele(chunk->victim_rtg);
 	list_del(&chunk->entry);
 	xfs_open_zone_put(chunk->oz);
 	xfs_irele(chunk->ip);
@@ -770,6 +793,10 @@ xfs_zone_gc_split_write(
 	split_chunk->oz = chunk->oz;
 	atomic_inc(&chunk->oz->oz_ref);
 
+	split_chunk->victim_rtg = chunk->victim_rtg;
+	atomic_inc(&chunk->victim_rtg->rtg_group.xg_active_ref);
+	atomic_inc(&chunk->victim_rtg->rtg_gccount);
+
 	chunk->offset += split_len;
 	chunk->len -= split_len;
 	chunk->old_startblock += XFS_B_TO_FSB(data->mp, split_len);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent
  2025-10-23 15:17 fix for selecting a zone with active GC I/O for GC v2 Christoph Hellwig
  2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
@ 2025-10-23 15:17 ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2025-10-23 15:17 UTC (permalink / raw)
  To: Carlos Maiolino
  Cc: Hans Holmberg, Darrick J. Wong, linux-xfs, Damien Le Moal,
	Carlos Maiolino

Besides blocks being invalidated, there is another case when the original
mapping could have changed between querying the rmap for GC and calling
xfs_zoned_map_extent.  Document it there as it took us quite some time
to figure out what is going on while developing the multiple-GC
protection fix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_zone_alloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c
index 27276519a4a7..336c9f184a75 100644
--- a/fs/xfs/xfs_zone_alloc.c
+++ b/fs/xfs/xfs_zone_alloc.c
@@ -246,6 +246,14 @@ xfs_zoned_map_extent(
 	 * If a data write raced with this GC write, keep the existing data in
 	 * the data fork, mark our newly written GC extent as reclaimable, then
 	 * move on to the next extent.
+	 *
+	 * Note that this can also happen when racing with operations that do
+	 * not actually invalidate the data, but just move it to a different
+	 * inode (XFS_IOC_EXCHANGE_RANGE), or to a different offset inside the
+	 * inode (FALLOC_FL_COLLAPSE_RANGE / FALLOC_FL_INSERT_RANGE).  If the
+	 * data was just moved around, GC fails to free the zone, but the zone
+	 * becomes a GC candidate again as soon as all previous GC I/O has
+	 * finished and these blocks will be moved out eventually.
 	 */
 	if (old_startblock != NULLFSBLOCK &&
 	    old_startblock != data.br_startblock)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] xfs: prevent gc from picking the same zone twice
  2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
@ 2025-10-31 11:19   ` Carlos Maiolino
  0 siblings, 0 replies; 4+ messages in thread
From: Carlos Maiolino @ 2025-10-31 11:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hans Holmberg, Darrick J. Wong, linux-xfs, Damien Le Moal,
	Carlos Maiolino

On Thu, 23 Oct 2025 17:17:02 +0200, Christoph Hellwig wrote:
> When we are picking a zone for gc it might already be in the pipeline
> which can lead to us moving the same data twice resulting in in write
> amplification and a very unfortunate case where we keep on garbage
> collecting the zone we just filled with migrated data stopping all
> forward progress.
> 
> Fix this by introducing a count of on-going GC operations on a zone, and
> skip any zone with ongoing GC when picking a new victim.
> 
> [...]

Applied to for-next, thanks!

[1/2] xfs: prevent gc from picking the same zone twice
      commit: 83bac569c762651ac6dff9a86f54ecc13d911f7d
[2/2] xfs: document another racy GC case in xfs_zoned_map_extent
      commit: 0db22d7ee462c42c1284e98d47840932792c1adb

Best regards,
-- 
Carlos Maiolino <cem@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-31 11:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 15:17 fix for selecting a zone with active GC I/O for GC v2 Christoph Hellwig
2025-10-23 15:17 ` [PATCH 1/2] xfs: prevent gc from picking the same zone twice Christoph Hellwig
2025-10-31 11:19   ` Carlos Maiolino
2025-10-23 15:17 ` [PATCH 2/2] xfs: document another racy GC case in xfs_zoned_map_extent Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).