* improve zoned XFS GC buffer management
@ 2025-12-18 6:31 Christoph Hellwig
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw)
To: Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs
Hi all,
the zoned XFS GC code currently uses a weird bank switching strategy to
manage the scratch buffer. The reason for that was that I/O could be
proceed out of order in the early days, but that actually changed before
the code was upstreamed to avoid fragmentation. This replaced the logic
with a simple ring buffer, which makes the buffer space utilization much
more efficient.
Before that, two patches makes the reuse of the bios a lot less fragile,
and for that we need a new block layer helper that should eventually also
be useful in other places. Jens, let me know if merging this through the
XFS tree is ok.
Diffstat:
block/bio.c | 25 +++++++++++
fs/xfs/xfs_zone_gc.c | 113 +++++++++++++++++++++++++++------------------------
include/linux/bio.h | 1
3 files changed, 86 insertions(+), 53 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] block: add a bio_reuse helper
2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig
@ 2025-12-18 6:31 ` Christoph Hellwig
2025-12-18 6:43 ` Damien Le Moal
` (3 more replies)
2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig
2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig
2 siblings, 4 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw)
To: Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs
Add a helper to allow an existing bio to be resubmitted withtout
having to re-add the payload.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/bio.c | 25 +++++++++++++++++++++++++
include/linux/bio.h | 1 +
2 files changed, 26 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index e726c0e280a8..1b68ae877468 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf)
}
EXPORT_SYMBOL(bio_reset);
+/**
+ * bio_reuse - reuse a bio with the payload left intact
+ * @bio bio to reuse
+ *
+ * Allow reusing an existing bio for another operation with all set up
+ * fields including the payload, device and end_io handler left intact.
+ *
+ * Typically used for bios first used to read data which is then written
+ * to another location without modification.
+ */
+void bio_reuse(struct bio *bio)
+{
+ unsigned short vcnt = bio->bi_vcnt, i;
+ bio_end_io_t *end_io = bio->bi_end_io;
+ void *private = bio->bi_private;
+
+ bio_reset(bio, bio->bi_bdev, bio->bi_opf);
+ for (i = 0; i < vcnt; i++)
+ bio->bi_iter.bi_size += bio->bi_io_vec[i].bv_len;
+ bio->bi_vcnt = vcnt;
+ bio->bi_private = private;
+ bio->bi_end_io = end_io;
+}
+EXPORT_SYMBOL_GPL(bio_reuse);
+
static struct bio *__bio_chain_endio(struct bio *bio)
{
struct bio *parent = bio->bi_private;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ad2d57908c1c..c0190f8badde 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -414,6 +414,7 @@ static inline void bio_init_inline(struct bio *bio, struct block_device *bdev,
}
extern void bio_uninit(struct bio *);
void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf);
+void bio_reuse(struct bio *bio);
void bio_chain(struct bio *, struct bio *);
int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len,
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] xfs: use bio_reuse in the zone GC code
2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
@ 2025-12-18 6:31 ` Christoph Hellwig
2025-12-18 6:45 ` Damien Le Moal
2025-12-19 7:48 ` Hans Holmberg
2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig
2 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw)
To: Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs
Replace our somewhat fragile code to reuse the bio, which caused a
regression in the past with the block layer bio_reuse helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_zone_gc.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index 065005f13fe1..f3dec13a620c 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -812,8 +812,6 @@ xfs_zone_gc_write_chunk(
{
struct xfs_zone_gc_data *data = chunk->data;
struct xfs_mount *mp = chunk->ip->i_mount;
- phys_addr_t bvec_paddr =
- bvec_phys(bio_first_bvec_all(&chunk->bio));
struct xfs_gc_bio *split_chunk;
if (chunk->bio.bi_status)
@@ -826,10 +824,7 @@ xfs_zone_gc_write_chunk(
WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW);
list_move_tail(&chunk->entry, &data->writing);
- bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE);
- bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len,
- offset_in_folio(chunk->scratch->folio, bvec_paddr));
-
+ bio_reuse(&chunk->bio);
while ((split_chunk = xfs_zone_gc_split_write(data, chunk)))
xfs_zone_gc_submit_write(data, split_chunk);
xfs_zone_gc_submit_write(data, chunk);
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] xfs: rework zone GC buffer management
2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig
@ 2025-12-18 6:31 ` Christoph Hellwig
2025-12-18 7:12 ` Damien Le Moal
2025-12-19 8:06 ` Hans Holmberg
2 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw)
To: Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs
The double buffering where just one scratch area is used at a time does
not efficiently use the available memory. It was originally implemented
when GC I/O could happen out of order, but that was removed before
upstream submission to avoid fragmentation. Now that all GC I/Os are
processed in order, just use a number of buffers as a simple ring buffer.
For a synthetic benchmark that fills 256MiB HDD zones and punches out
holes to free half the space this leads to a decrease of GC time by
a little more than 25%.
Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and
benchmarking.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_zone_gc.c | 106 ++++++++++++++++++++++++-------------------
1 file changed, 59 insertions(+), 47 deletions(-)
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index f3dec13a620c..90f70473c3d5 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -50,23 +50,11 @@
*/
/*
- * Size of each GC scratch pad. This is also the upper bound for each
- * GC I/O, which helps to keep latency down.
+ * Size of each GC scratch allocation, and the number of buffers.
*/
-#define XFS_GC_CHUNK_SIZE SZ_1M
-
-/*
- * Scratchpad data to read GCed data into.
- *
- * The offset member tracks where the next allocation starts, and freed tracks
- * the amount of space that is not used anymore.
- */
-#define XFS_ZONE_GC_NR_SCRATCH 2
-struct xfs_zone_scratch {
- struct folio *folio;
- unsigned int offset;
- unsigned int freed;
-};
+#define XFS_GC_BUF_SIZE SZ_1M
+#define XFS_GC_NR_BUFS 2
+static_assert(XFS_GC_NR_BUFS < BIO_MAX_VECS);
/*
* Chunk that is read and written for each GC operation.
@@ -141,10 +129,14 @@ struct xfs_zone_gc_data {
struct bio_set bio_set;
/*
- * Scratchpad used, and index to indicated which one is used.
+ * Scratchpad to buffer GC data, organized as a ring buffer over
+ * discontiguous folios. scratch_head is where the buffer is filled,
+ * and scratch_tail tracks the buffer space freed.
*/
- struct xfs_zone_scratch scratch[XFS_ZONE_GC_NR_SCRATCH];
- unsigned int scratch_idx;
+ struct folio *scratch_folios[XFS_GC_NR_BUFS];
+ unsigned int scratch_size;
+ unsigned int scratch_head;
+ unsigned int scratch_tail;
/*
* List of bios currently being read, written and reset.
@@ -210,20 +202,16 @@ xfs_zone_gc_data_alloc(
if (!data->iter.recs)
goto out_free_data;
- /*
- * We actually only need a single bio_vec. It would be nice to have
- * a flag that only allocates the inline bvecs and not the separate
- * bvec pool.
- */
if (bioset_init(&data->bio_set, 16, offsetof(struct xfs_gc_bio, bio),
BIOSET_NEED_BVECS))
goto out_free_recs;
- for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++) {
- data->scratch[i].folio =
- folio_alloc(GFP_KERNEL, get_order(XFS_GC_CHUNK_SIZE));
- if (!data->scratch[i].folio)
+ for (i = 0; i < XFS_GC_NR_BUFS; i++) {
+ data->scratch_folios[i] =
+ folio_alloc(GFP_KERNEL, get_order(XFS_GC_BUF_SIZE));
+ if (!data->scratch_folios[i])
goto out_free_scratch;
}
+ data->scratch_size = XFS_GC_BUF_SIZE * XFS_GC_NR_BUFS;
INIT_LIST_HEAD(&data->reading);
INIT_LIST_HEAD(&data->writing);
INIT_LIST_HEAD(&data->resetting);
@@ -232,7 +220,7 @@ xfs_zone_gc_data_alloc(
out_free_scratch:
while (--i >= 0)
- folio_put(data->scratch[i].folio);
+ folio_put(data->scratch_folios[i]);
bioset_exit(&data->bio_set);
out_free_recs:
kfree(data->iter.recs);
@@ -247,8 +235,8 @@ xfs_zone_gc_data_free(
{
int i;
- for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++)
- folio_put(data->scratch[i].folio);
+ for (i = 0; i < XFS_GC_NR_BUFS; i++)
+ folio_put(data->scratch_folios[i]);
bioset_exit(&data->bio_set);
kfree(data->iter.recs);
kfree(data);
@@ -591,7 +579,12 @@ static unsigned int
xfs_zone_gc_scratch_available(
struct xfs_zone_gc_data *data)
{
- return XFS_GC_CHUNK_SIZE - data->scratch[data->scratch_idx].offset;
+ if (!data->scratch_tail)
+ return data->scratch_size - data->scratch_head;
+
+ if (!data->scratch_head)
+ return data->scratch_tail;
+ return (data->scratch_size - data->scratch_head) + data->scratch_tail;
}
static bool
@@ -665,6 +658,28 @@ xfs_zone_gc_alloc_blocks(
return oz;
}
+static void
+xfs_zone_gc_add_data(
+ struct xfs_gc_bio *chunk)
+{
+ struct xfs_zone_gc_data *data = chunk->data;
+ unsigned int len = chunk->len;
+ unsigned int off = data->scratch_head;
+
+ do {
+ unsigned int this_off = off % XFS_GC_BUF_SIZE;
+ unsigned int this_len = min(len, XFS_GC_BUF_SIZE - this_off);
+
+ bio_add_folio_nofail(&chunk->bio,
+ data->scratch_folios[off / XFS_GC_BUF_SIZE],
+ this_len, this_off);
+ len -= this_len;
+ off += this_len;
+ if (off == data->scratch_size)
+ off = 0;
+ } while (len);
+}
+
static bool
xfs_zone_gc_start_chunk(
struct xfs_zone_gc_data *data)
@@ -678,6 +693,7 @@ xfs_zone_gc_start_chunk(
struct xfs_inode *ip;
struct bio *bio;
xfs_daddr_t daddr;
+ unsigned int len;
bool is_seq;
if (xfs_is_shutdown(mp))
@@ -692,17 +708,19 @@ xfs_zone_gc_start_chunk(
return false;
}
- bio = bio_alloc_bioset(bdev, 1, REQ_OP_READ, GFP_NOFS, &data->bio_set);
+ len = XFS_FSB_TO_B(mp, irec.rm_blockcount);
+ bio = bio_alloc_bioset(bdev,
+ min(howmany(len, XFS_GC_BUF_SIZE) + 1, XFS_GC_NR_BUFS),
+ REQ_OP_READ, GFP_NOFS, &data->bio_set);
chunk = container_of(bio, struct xfs_gc_bio, bio);
chunk->ip = ip;
chunk->offset = XFS_FSB_TO_B(mp, irec.rm_offset);
- chunk->len = XFS_FSB_TO_B(mp, irec.rm_blockcount);
+ chunk->len = len;
chunk->old_startblock =
xfs_rgbno_to_rtb(iter->victim_rtg, irec.rm_startblock);
chunk->new_daddr = daddr;
chunk->is_seq = is_seq;
- chunk->scratch = &data->scratch[data->scratch_idx];
chunk->data = data;
chunk->oz = oz;
chunk->victim_rtg = iter->victim_rtg;
@@ -711,13 +729,9 @@ xfs_zone_gc_start_chunk(
bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock);
bio->bi_end_io = xfs_zone_gc_end_io;
- bio_add_folio_nofail(bio, chunk->scratch->folio, chunk->len,
- chunk->scratch->offset);
- chunk->scratch->offset += chunk->len;
- if (chunk->scratch->offset == XFS_GC_CHUNK_SIZE) {
- data->scratch_idx =
- (data->scratch_idx + 1) % XFS_ZONE_GC_NR_SCRATCH;
- }
+ xfs_zone_gc_add_data(chunk);
+ data->scratch_head = (data->scratch_head + len) % data->scratch_size;
+
WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW);
list_add_tail(&chunk->entry, &data->reading);
xfs_zone_gc_iter_advance(iter, irec.rm_blockcount);
@@ -835,6 +849,7 @@ xfs_zone_gc_finish_chunk(
struct xfs_gc_bio *chunk)
{
uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
+ struct xfs_zone_gc_data *data = chunk->data;
struct xfs_inode *ip = chunk->ip;
struct xfs_mount *mp = ip->i_mount;
int error;
@@ -846,11 +861,8 @@ xfs_zone_gc_finish_chunk(
return;
}
- chunk->scratch->freed += chunk->len;
- if (chunk->scratch->freed == chunk->scratch->offset) {
- chunk->scratch->offset = 0;
- chunk->scratch->freed = 0;
- }
+ data->scratch_tail =
+ (data->scratch_tail + chunk->len) % data->scratch_size;
/*
* Cycle through the iolock and wait for direct I/O and layouts to
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
@ 2025-12-18 6:43 ` Damien Le Moal
2025-12-18 9:20 ` Hannes Reinecke
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2025-12-18 6:43 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Carlos Maiolino
Cc: Hans Holmberg, linux-block, linux-xfs
On 12/18/25 15:31, Christoph Hellwig wrote:
> Add a helper to allow an existing bio to be resubmitted withtout
> having to re-add the payload.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] xfs: use bio_reuse in the zone GC code
2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig
@ 2025-12-18 6:45 ` Damien Le Moal
2025-12-19 7:48 ` Hans Holmberg
1 sibling, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2025-12-18 6:45 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Carlos Maiolino
Cc: Hans Holmberg, linux-block, linux-xfs
On 12/18/25 15:31, Christoph Hellwig wrote:
> Replace our somewhat fragile code to reuse the bio, which caused a
> regression in the past with the block layer bio_reuse helper.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] xfs: rework zone GC buffer management
2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig
@ 2025-12-18 7:12 ` Damien Le Moal
2025-12-19 8:06 ` Hans Holmberg
1 sibling, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2025-12-18 7:12 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Carlos Maiolino
Cc: Hans Holmberg, linux-block, linux-xfs
On 12/18/25 15:31, Christoph Hellwig wrote:
> The double buffering where just one scratch area is used at a time does
> not efficiently use the available memory. It was originally implemented
> when GC I/O could happen out of order, but that was removed before
> upstream submission to avoid fragmentation. Now that all GC I/Os are
> processed in order, just use a number of buffers as a simple ring buffer.
>
> For a synthetic benchmark that fills 256MiB HDD zones and punches out
> holes to free half the space this leads to a decrease of GC time by
> a little more than 25%.
>
> Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and
> benchmarking.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
2025-12-18 6:43 ` Damien Le Moal
@ 2025-12-18 9:20 ` Hannes Reinecke
2025-12-18 9:35 ` Christoph Hellwig
2025-12-18 9:40 ` Ming Lei
2025-12-19 7:45 ` Hans Holmberg
3 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2025-12-18 9:20 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs
On 12/18/25 07:31, Christoph Hellwig wrote:
> Add a helper to allow an existing bio to be resubmitted withtout
> having to re-add the payload.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/bio.c | 25 +++++++++++++++++++++++++
> include/linux/bio.h | 1 +
> 2 files changed, 26 insertions(+)
>
> diff --git a/block/bio.c b/block/bio.c
> index e726c0e280a8..1b68ae877468 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf)
> }
> EXPORT_SYMBOL(bio_reset);
>
> +/**
> + * bio_reuse - reuse a bio with the payload left intact
> + * @bio bio to reuse
> + *
> + * Allow reusing an existing bio for another operation with all set up
> + * fields including the payload, device and end_io handler left intact.
> + *
> + * Typically used for bios first used to read data which is then written
> + * to another location without modification.
> + */
> +void bio_reuse(struct bio *bio)
> +{
> + unsigned short vcnt = bio->bi_vcnt, i;
> + bio_end_io_t *end_io = bio->bi_end_io;
> + void *private = bio->bi_private;
> +
> + bio_reset(bio, bio->bi_bdev, bio->bi_opf);
> + for (i = 0; i < vcnt; i++)
> + bio->bi_iter.bi_size += bio->bi_io_vec[i].bv_len;
> + bio->bi_vcnt = vcnt;
> + bio->bi_private = private;
> + bio->bi_end_io = end_io;
> +}
> +EXPORT_SYMBOL_GPL(bio_reuse);
> +
'reuse' has a different connotation for me; I woudl have expected
a 'reused' bio to be used for any purposes.
Maybe 'bio_reprep'?
Otherwise looks good.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper
2025-12-18 9:20 ` Hannes Reinecke
@ 2025-12-18 9:35 ` Christoph Hellwig
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-12-18 9:35 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Christoph Hellwig, Jens Axboe, Carlos Maiolino, Damien Le Moal,
Hans Holmberg, linux-block, linux-xfs
On Thu, Dec 18, 2025 at 10:20:31AM +0100, Hannes Reinecke wrote:
> 'reuse' has a different connotation for me; I woudl have expected
> a 'reused' bio to be used for any purposes.
You can reuse it for any purpose. That's the point - it keeps the
payload, but you can resubmit it.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
2025-12-18 6:43 ` Damien Le Moal
2025-12-18 9:20 ` Hannes Reinecke
@ 2025-12-18 9:40 ` Ming Lei
2025-12-18 9:45 ` Christoph Hellwig
2025-12-19 7:45 ` Hans Holmberg
3 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2025-12-18 9:40 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Carlos Maiolino, Damien Le Moal, Hans Holmberg,
linux-block, linux-xfs
On Thu, Dec 18, 2025 at 07:31:45AM +0100, Christoph Hellwig wrote:
> Add a helper to allow an existing bio to be resubmitted withtout
> having to re-add the payload.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/bio.c | 25 +++++++++++++++++++++++++
> include/linux/bio.h | 1 +
> 2 files changed, 26 insertions(+)
>
> diff --git a/block/bio.c b/block/bio.c
> index e726c0e280a8..1b68ae877468 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf)
> }
> EXPORT_SYMBOL(bio_reset);
>
> +/**
> + * bio_reuse - reuse a bio with the payload left intact
> + * @bio bio to reuse
> + *
> + * Allow reusing an existing bio for another operation with all set up
> + * fields including the payload, device and end_io handler left intact.
> + *
> + * Typically used for bios first used to read data which is then written
> + * to another location without modification.
> + */
> +void bio_reuse(struct bio *bio)
> +{
> + unsigned short vcnt = bio->bi_vcnt, i;
> + bio_end_io_t *end_io = bio->bi_end_io;
> + void *private = bio->bi_private;
The incoming bio can't be a cloned bio, so
WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
should be added.
Otherwise, it looks fine.
Thanks,
Ming
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper
2025-12-18 9:40 ` Ming Lei
@ 2025-12-18 9:45 ` Christoph Hellwig
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-12-18 9:45 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Jens Axboe, Carlos Maiolino, Damien Le Moal,
Hans Holmberg, linux-block, linux-xfs
On Thu, Dec 18, 2025 at 05:40:00PM +0800, Ming Lei wrote:
> > +/**
> > + * bio_reuse - reuse a bio with the payload left intact
> > + * @bio bio to reuse
> > + *
> > + * Allow reusing an existing bio for another operation with all set up
> > + * fields including the payload, device and end_io handler left intact.
> > + *
> > + * Typically used for bios first used to read data which is then written
> > + * to another location without modification.
> > + */
> > +void bio_reuse(struct bio *bio)
> > +{
> > + unsigned short vcnt = bio->bi_vcnt, i;
> > + bio_end_io_t *end_io = bio->bi_end_io;
> > + void *private = bio->bi_private;
>
> The incoming bio can't be a cloned bio, so
It better no be, yes.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
` (2 preceding siblings ...)
2025-12-18 9:40 ` Ming Lei
@ 2025-12-19 7:45 ` Hans Holmberg
3 siblings, 0 replies; 15+ messages in thread
From: Hans Holmberg @ 2025-12-19 7:45 UTC (permalink / raw)
To: hch, Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, linux-block@vger.kernel.org,
linux-xfs@vger.kernel.org
On 18/12/2025 07:32, Christoph Hellwig wrote:
> Add a helper to allow an existing bio to be resubmitted withtout
> having to re-add the payload.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/bio.c | 25 +++++++++++++++++++++++++
> include/linux/bio.h | 1 +
> 2 files changed, 26 insertions(+)
>
> diff --git a/block/bio.c b/block/bio.c
> index e726c0e280a8..1b68ae877468 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf)
> }
> EXPORT_SYMBOL(bio_reset);
>
> +/**
> + * bio_reuse - reuse a bio with the payload left intact
> + * @bio bio to reuse
> + *
> + * Allow reusing an existing bio for another operation with all set up
> + * fields including the payload, device and end_io handler left intact.
> + *
> + * Typically used for bios first used to read data which is then written
> + * to another location without modification.
> + */
> +void bio_reuse(struct bio *bio)
> +{
> + unsigned short vcnt = bio->bi_vcnt, i;
> + bio_end_io_t *end_io = bio->bi_end_io;
> + void *private = bio->bi_private;
> +
> + bio_reset(bio, bio->bi_bdev, bio->bi_opf);
> + for (i = 0; i < vcnt; i++)
> + bio->bi_iter.bi_size += bio->bi_io_vec[i].bv_len;
> + bio->bi_vcnt = vcnt;
> + bio->bi_private = private;
> + bio->bi_end_io = end_io;
> +}
> +EXPORT_SYMBOL_GPL(bio_reuse);
> +
> static struct bio *__bio_chain_endio(struct bio *bio)
> {
> struct bio *parent = bio->bi_private;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index ad2d57908c1c..c0190f8badde 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -414,6 +414,7 @@ static inline void bio_init_inline(struct bio *bio, struct block_device *bdev,
> }
> extern void bio_uninit(struct bio *);
> void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf);
> +void bio_reuse(struct bio *bio);
> void bio_chain(struct bio *, struct bio *);
>
> int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len,
Looks good to me,
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] xfs: use bio_reuse in the zone GC code
2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig
2025-12-18 6:45 ` Damien Le Moal
@ 2025-12-19 7:48 ` Hans Holmberg
1 sibling, 0 replies; 15+ messages in thread
From: Hans Holmberg @ 2025-12-19 7:48 UTC (permalink / raw)
To: hch, Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, linux-block@vger.kernel.org,
linux-xfs@vger.kernel.org
On 18/12/2025 07:32, Christoph Hellwig wrote:
> Replace our somewhat fragile code to reuse the bio, which caused a
> regression in the past with the block layer bio_reuse helper.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_zone_gc.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
> index 065005f13fe1..f3dec13a620c 100644
> --- a/fs/xfs/xfs_zone_gc.c
> +++ b/fs/xfs/xfs_zone_gc.c
> @@ -812,8 +812,6 @@ xfs_zone_gc_write_chunk(
> {
> struct xfs_zone_gc_data *data = chunk->data;
> struct xfs_mount *mp = chunk->ip->i_mount;
> - phys_addr_t bvec_paddr =
> - bvec_phys(bio_first_bvec_all(&chunk->bio));
> struct xfs_gc_bio *split_chunk;
>
> if (chunk->bio.bi_status)
> @@ -826,10 +824,7 @@ xfs_zone_gc_write_chunk(
> WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW);
> list_move_tail(&chunk->entry, &data->writing);
>
> - bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE);
> - bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len,
> - offset_in_folio(chunk->scratch->folio, bvec_paddr));
> -
> + bio_reuse(&chunk->bio);
> while ((split_chunk = xfs_zone_gc_split_write(data, chunk)))
> xfs_zone_gc_submit_write(data, split_chunk);
> xfs_zone_gc_submit_write(data, chunk);
Nice!
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] xfs: rework zone GC buffer management
2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig
2025-12-18 7:12 ` Damien Le Moal
@ 2025-12-19 8:06 ` Hans Holmberg
1 sibling, 0 replies; 15+ messages in thread
From: Hans Holmberg @ 2025-12-19 8:06 UTC (permalink / raw)
To: hch, Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, linux-block@vger.kernel.org,
linux-xfs@vger.kernel.org
On 18/12/2025 07:32, Christoph Hellwig wrote:
> The double buffering where just one scratch area is used at a time does
> not efficiently use the available memory. It was originally implemented
> when GC I/O could happen out of order, but that was removed before
> upstream submission to avoid fragmentation. Now that all GC I/Os are
> processed in order, just use a number of buffers as a simple ring buffer.
>
> For a synthetic benchmark that fills 256MiB HDD zones and punches out
> holes to free half the space this leads to a decrease of GC time by
> a little more than 25%.
>
> Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and
> benchmarking.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_zone_gc.c | 106 ++++++++++++++++++++++++-------------------
> 1 file changed, 59 insertions(+), 47 deletions(-)
>
> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
> index f3dec13a620c..90f70473c3d5 100644
> --- a/fs/xfs/xfs_zone_gc.c
> +++ b/fs/xfs/xfs_zone_gc.c
> @@ -50,23 +50,11 @@
> */
>
> /*
> - * Size of each GC scratch pad. This is also the upper bound for each
> - * GC I/O, which helps to keep latency down.
> + * Size of each GC scratch allocation, and the number of buffers.
> */
> -#define XFS_GC_CHUNK_SIZE SZ_1M
> -
> -/*
> - * Scratchpad data to read GCed data into.
> - *
> - * The offset member tracks where the next allocation starts, and freed tracks
> - * the amount of space that is not used anymore.
> - */
> -#define XFS_ZONE_GC_NR_SCRATCH 2
> -struct xfs_zone_scratch {
> - struct folio *folio;
> - unsigned int offset;
> - unsigned int freed;
> -};
> +#define XFS_GC_BUF_SIZE SZ_1M
> +#define XFS_GC_NR_BUFS 2
> +static_assert(XFS_GC_NR_BUFS < BIO_MAX_VECS);
>
> /*
> * Chunk that is read and written for each GC operation.
> @@ -141,10 +129,14 @@ struct xfs_zone_gc_data {
> struct bio_set bio_set;
>
> /*
> - * Scratchpad used, and index to indicated which one is used.
> + * Scratchpad to buffer GC data, organized as a ring buffer over
> + * discontiguous folios. scratch_head is where the buffer is filled,
> + * and scratch_tail tracks the buffer space freed.
> */
> - struct xfs_zone_scratch scratch[XFS_ZONE_GC_NR_SCRATCH];
> - unsigned int scratch_idx;
> + struct folio *scratch_folios[XFS_GC_NR_BUFS];
> + unsigned int scratch_size;
> + unsigned int scratch_head;
> + unsigned int scratch_tail;
>
> /*
> * List of bios currently being read, written and reset.
> @@ -210,20 +202,16 @@ xfs_zone_gc_data_alloc(
> if (!data->iter.recs)
> goto out_free_data;
>
> - /*
> - * We actually only need a single bio_vec. It would be nice to have
> - * a flag that only allocates the inline bvecs and not the separate
> - * bvec pool.
> - */
> if (bioset_init(&data->bio_set, 16, offsetof(struct xfs_gc_bio, bio),
> BIOSET_NEED_BVECS))
> goto out_free_recs;
> - for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++) {
> - data->scratch[i].folio =
> - folio_alloc(GFP_KERNEL, get_order(XFS_GC_CHUNK_SIZE));
> - if (!data->scratch[i].folio)
> + for (i = 0; i < XFS_GC_NR_BUFS; i++) {
> + data->scratch_folios[i] =
> + folio_alloc(GFP_KERNEL, get_order(XFS_GC_BUF_SIZE));
> + if (!data->scratch_folios[i])
> goto out_free_scratch;
> }
> + data->scratch_size = XFS_GC_BUF_SIZE * XFS_GC_NR_BUFS;
> INIT_LIST_HEAD(&data->reading);
> INIT_LIST_HEAD(&data->writing);
> INIT_LIST_HEAD(&data->resetting);
> @@ -232,7 +220,7 @@ xfs_zone_gc_data_alloc(
>
> out_free_scratch:
> while (--i >= 0)
> - folio_put(data->scratch[i].folio);
> + folio_put(data->scratch_folios[i]);
> bioset_exit(&data->bio_set);
> out_free_recs:
> kfree(data->iter.recs);
> @@ -247,8 +235,8 @@ xfs_zone_gc_data_free(
> {
> int i;
>
> - for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++)
> - folio_put(data->scratch[i].folio);
> + for (i = 0; i < XFS_GC_NR_BUFS; i++)
> + folio_put(data->scratch_folios[i]);
> bioset_exit(&data->bio_set);
> kfree(data->iter.recs);
> kfree(data);
> @@ -591,7 +579,12 @@ static unsigned int
> xfs_zone_gc_scratch_available(
> struct xfs_zone_gc_data *data)
> {
> - return XFS_GC_CHUNK_SIZE - data->scratch[data->scratch_idx].offset;
> + if (!data->scratch_tail)
> + return data->scratch_size - data->scratch_head;
> +
> + if (!data->scratch_head)
> + return data->scratch_tail;
> + return (data->scratch_size - data->scratch_head) + data->scratch_tail;
> }
>
> static bool
> @@ -665,6 +658,28 @@ xfs_zone_gc_alloc_blocks(
> return oz;
> }
>
> +static void
> +xfs_zone_gc_add_data(
> + struct xfs_gc_bio *chunk)
> +{
> + struct xfs_zone_gc_data *data = chunk->data;
> + unsigned int len = chunk->len;
> + unsigned int off = data->scratch_head;
> +
> + do {
> + unsigned int this_off = off % XFS_GC_BUF_SIZE;
> + unsigned int this_len = min(len, XFS_GC_BUF_SIZE - this_off);
> +
> + bio_add_folio_nofail(&chunk->bio,
> + data->scratch_folios[off / XFS_GC_BUF_SIZE],
> + this_len, this_off);
> + len -= this_len;
> + off += this_len;
> + if (off == data->scratch_size)
> + off = 0;
> + } while (len);
> +}
> +
> static bool
> xfs_zone_gc_start_chunk(
> struct xfs_zone_gc_data *data)
> @@ -678,6 +693,7 @@ xfs_zone_gc_start_chunk(
> struct xfs_inode *ip;
> struct bio *bio;
> xfs_daddr_t daddr;
> + unsigned int len;
> bool is_seq;
>
> if (xfs_is_shutdown(mp))
> @@ -692,17 +708,19 @@ xfs_zone_gc_start_chunk(
> return false;
> }
>
> - bio = bio_alloc_bioset(bdev, 1, REQ_OP_READ, GFP_NOFS, &data->bio_set);
> + len = XFS_FSB_TO_B(mp, irec.rm_blockcount);
> + bio = bio_alloc_bioset(bdev,
> + min(howmany(len, XFS_GC_BUF_SIZE) + 1, XFS_GC_NR_BUFS),
> + REQ_OP_READ, GFP_NOFS, &data->bio_set);
>
> chunk = container_of(bio, struct xfs_gc_bio, bio);
> chunk->ip = ip;
> chunk->offset = XFS_FSB_TO_B(mp, irec.rm_offset);
> - chunk->len = XFS_FSB_TO_B(mp, irec.rm_blockcount);
> + chunk->len = len;
> chunk->old_startblock =
> xfs_rgbno_to_rtb(iter->victim_rtg, irec.rm_startblock);
> chunk->new_daddr = daddr;
> chunk->is_seq = is_seq;
> - chunk->scratch = &data->scratch[data->scratch_idx];
> chunk->data = data;
> chunk->oz = oz;
> chunk->victim_rtg = iter->victim_rtg;
> @@ -711,13 +729,9 @@ xfs_zone_gc_start_chunk(
>
> bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock);
> bio->bi_end_io = xfs_zone_gc_end_io;
> - bio_add_folio_nofail(bio, chunk->scratch->folio, chunk->len,
> - chunk->scratch->offset);
> - chunk->scratch->offset += chunk->len;
> - if (chunk->scratch->offset == XFS_GC_CHUNK_SIZE) {
> - data->scratch_idx =
> - (data->scratch_idx + 1) % XFS_ZONE_GC_NR_SCRATCH;
> - }
> + xfs_zone_gc_add_data(chunk);
> + data->scratch_head = (data->scratch_head + len) % data->scratch_size;
> +
> WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW);
> list_add_tail(&chunk->entry, &data->reading);
> xfs_zone_gc_iter_advance(iter, irec.rm_blockcount);
> @@ -835,6 +849,7 @@ xfs_zone_gc_finish_chunk(
> struct xfs_gc_bio *chunk)
> {
> uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> + struct xfs_zone_gc_data *data = chunk->data;
> struct xfs_inode *ip = chunk->ip;
> struct xfs_mount *mp = ip->i_mount;
> int error;
> @@ -846,11 +861,8 @@ xfs_zone_gc_finish_chunk(
> return;
> }
>
> - chunk->scratch->freed += chunk->len;
> - if (chunk->scratch->freed == chunk->scratch->offset) {
> - chunk->scratch->offset = 0;
> - chunk->scratch->freed = 0;
> - }
> + data->scratch_tail =
> + (data->scratch_tail + chunk->len) % data->scratch_size;
>
> /*
> * Cycle through the iolock and wait for direct I/O and layouts to
Looks good,
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] xfs: use bio_reuse in the zone GC code
2026-01-06 7:58 improve zoned XFS GC buffer management v2 Christoph Hellwig
@ 2026-01-06 7:58 ` Christoph Hellwig
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2026-01-06 7:58 UTC (permalink / raw)
To: Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs
Replace our somewhat fragile code to reuse the bio, which caused a
regression in the past with the block layer bio_reuse helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
fs/xfs/xfs_zone_gc.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index 3c52cc1497d4..40408b1132e0 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -811,8 +811,6 @@ xfs_zone_gc_write_chunk(
{
struct xfs_zone_gc_data *data = chunk->data;
struct xfs_mount *mp = chunk->ip->i_mount;
- phys_addr_t bvec_paddr =
- bvec_phys(bio_first_bvec_all(&chunk->bio));
struct xfs_gc_bio *split_chunk;
if (chunk->bio.bi_status)
@@ -825,10 +823,7 @@ xfs_zone_gc_write_chunk(
WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW);
list_move_tail(&chunk->entry, &data->writing);
- bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE);
- bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len,
- offset_in_folio(chunk->scratch->folio, bvec_paddr));
-
+ bio_reuse(&chunk->bio);
while ((split_chunk = xfs_zone_gc_split_write(data, chunk)))
xfs_zone_gc_submit_write(data, split_chunk);
xfs_zone_gc_submit_write(data, chunk);
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-01-06 7:59 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
2025-12-18 6:43 ` Damien Le Moal
2025-12-18 9:20 ` Hannes Reinecke
2025-12-18 9:35 ` Christoph Hellwig
2025-12-18 9:40 ` Ming Lei
2025-12-18 9:45 ` Christoph Hellwig
2025-12-19 7:45 ` Hans Holmberg
2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig
2025-12-18 6:45 ` Damien Le Moal
2025-12-19 7:48 ` Hans Holmberg
2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig
2025-12-18 7:12 ` Damien Le Moal
2025-12-19 8:06 ` Hans Holmberg
-- strict thread matches above, loose matches on Subject: below --
2026-01-06 7:58 improve zoned XFS GC buffer management v2 Christoph Hellwig
2026-01-06 7:58 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).