* improve zoned XFS GC buffer management
@ 2025-12-18 6:31 Christoph Hellwig
2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw)
To: Jens Axboe, Carlos Maiolino
Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs
Hi all,
the zoned XFS GC code currently uses a weird bank switching strategy to
manage the scratch buffer. The reason for that was that I/O could be
proceed out of order in the early days, but that actually changed before
the code was upstreamed to avoid fragmentation. This replaced the logic
with a simple ring buffer, which makes the buffer space utilization much
more efficient.
Before that, two patches makes the reuse of the bios a lot less fragile,
and for that we need a new block layer helper that should eventually also
be useful in other places. Jens, let me know if merging this through the
XFS tree is ok.
Diffstat:
block/bio.c | 25 +++++++++++
fs/xfs/xfs_zone_gc.c | 113 +++++++++++++++++++++++++++------------------------
include/linux/bio.h | 1
3 files changed, 86 insertions(+), 53 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 1/3] block: add a bio_reuse helper 2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig @ 2025-12-18 6:31 ` Christoph Hellwig 2025-12-18 6:43 ` Damien Le Moal ` (3 more replies) 2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig 2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig 2 siblings, 4 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw) To: Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs Add a helper to allow an existing bio to be resubmitted withtout having to re-add the payload. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/bio.c | 25 +++++++++++++++++++++++++ include/linux/bio.h | 1 + 2 files changed, 26 insertions(+) diff --git a/block/bio.c b/block/bio.c index e726c0e280a8..1b68ae877468 100644 --- a/block/bio.c +++ b/block/bio.c @@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) } EXPORT_SYMBOL(bio_reset); +/** + * bio_reuse - reuse a bio with the payload left intact + * @bio bio to reuse + * + * Allow reusing an existing bio for another operation with all set up + * fields including the payload, device and end_io handler left intact. + * + * Typically used for bios first used to read data which is then written + * to another location without modification. + */ +void bio_reuse(struct bio *bio) +{ + unsigned short vcnt = bio->bi_vcnt, i; + bio_end_io_t *end_io = bio->bi_end_io; + void *private = bio->bi_private; + + bio_reset(bio, bio->bi_bdev, bio->bi_opf); + for (i = 0; i < vcnt; i++) + bio->bi_iter.bi_size += bio->bi_io_vec[i].bv_len; + bio->bi_vcnt = vcnt; + bio->bi_private = private; + bio->bi_end_io = end_io; +} +EXPORT_SYMBOL_GPL(bio_reuse); + static struct bio *__bio_chain_endio(struct bio *bio) { struct bio *parent = bio->bi_private; diff --git a/include/linux/bio.h b/include/linux/bio.h index ad2d57908c1c..c0190f8badde 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -414,6 +414,7 @@ static inline void bio_init_inline(struct bio *bio, struct block_device *bdev, } extern void bio_uninit(struct bio *); void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf); +void bio_reuse(struct bio *bio); void bio_chain(struct bio *, struct bio *); int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len, -- 2.47.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper 2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig @ 2025-12-18 6:43 ` Damien Le Moal 2025-12-18 9:20 ` Hannes Reinecke ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Damien Le Moal @ 2025-12-18 6:43 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, Carlos Maiolino Cc: Hans Holmberg, linux-block, linux-xfs On 12/18/25 15:31, Christoph Hellwig wrote: > Add a helper to allow an existing bio to be resubmitted withtout > having to re-add the payload. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper 2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig 2025-12-18 6:43 ` Damien Le Moal @ 2025-12-18 9:20 ` Hannes Reinecke 2025-12-18 9:35 ` Christoph Hellwig 2025-12-18 9:40 ` Ming Lei 2025-12-19 7:45 ` Hans Holmberg 3 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2025-12-18 9:20 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs On 12/18/25 07:31, Christoph Hellwig wrote: > Add a helper to allow an existing bio to be resubmitted withtout > having to re-add the payload. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/bio.c | 25 +++++++++++++++++++++++++ > include/linux/bio.h | 1 + > 2 files changed, 26 insertions(+) > > diff --git a/block/bio.c b/block/bio.c > index e726c0e280a8..1b68ae877468 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) > } > EXPORT_SYMBOL(bio_reset); > > +/** > + * bio_reuse - reuse a bio with the payload left intact > + * @bio bio to reuse > + * > + * Allow reusing an existing bio for another operation with all set up > + * fields including the payload, device and end_io handler left intact. > + * > + * Typically used for bios first used to read data which is then written > + * to another location without modification. > + */ > +void bio_reuse(struct bio *bio) > +{ > + unsigned short vcnt = bio->bi_vcnt, i; > + bio_end_io_t *end_io = bio->bi_end_io; > + void *private = bio->bi_private; > + > + bio_reset(bio, bio->bi_bdev, bio->bi_opf); > + for (i = 0; i < vcnt; i++) > + bio->bi_iter.bi_size += bio->bi_io_vec[i].bv_len; > + bio->bi_vcnt = vcnt; > + bio->bi_private = private; > + bio->bi_end_io = end_io; > +} > +EXPORT_SYMBOL_GPL(bio_reuse); > + 'reuse' has a different connotation for me; I woudl have expected a 'reused' bio to be used for any purposes. Maybe 'bio_reprep'? Otherwise looks good. Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper 2025-12-18 9:20 ` Hannes Reinecke @ 2025-12-18 9:35 ` Christoph Hellwig 0 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-12-18 9:35 UTC (permalink / raw) To: Hannes Reinecke Cc: Christoph Hellwig, Jens Axboe, Carlos Maiolino, Damien Le Moal, Hans Holmberg, linux-block, linux-xfs On Thu, Dec 18, 2025 at 10:20:31AM +0100, Hannes Reinecke wrote: > 'reuse' has a different connotation for me; I woudl have expected > a 'reused' bio to be used for any purposes. You can reuse it for any purpose. That's the point - it keeps the payload, but you can resubmit it. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper 2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig 2025-12-18 6:43 ` Damien Le Moal 2025-12-18 9:20 ` Hannes Reinecke @ 2025-12-18 9:40 ` Ming Lei 2025-12-18 9:45 ` Christoph Hellwig 2025-12-19 7:45 ` Hans Holmberg 3 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2025-12-18 9:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Carlos Maiolino, Damien Le Moal, Hans Holmberg, linux-block, linux-xfs On Thu, Dec 18, 2025 at 07:31:45AM +0100, Christoph Hellwig wrote: > Add a helper to allow an existing bio to be resubmitted withtout > having to re-add the payload. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/bio.c | 25 +++++++++++++++++++++++++ > include/linux/bio.h | 1 + > 2 files changed, 26 insertions(+) > > diff --git a/block/bio.c b/block/bio.c > index e726c0e280a8..1b68ae877468 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) > } > EXPORT_SYMBOL(bio_reset); > > +/** > + * bio_reuse - reuse a bio with the payload left intact > + * @bio bio to reuse > + * > + * Allow reusing an existing bio for another operation with all set up > + * fields including the payload, device and end_io handler left intact. > + * > + * Typically used for bios first used to read data which is then written > + * to another location without modification. > + */ > +void bio_reuse(struct bio *bio) > +{ > + unsigned short vcnt = bio->bi_vcnt, i; > + bio_end_io_t *end_io = bio->bi_end_io; > + void *private = bio->bi_private; The incoming bio can't be a cloned bio, so WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); should be added. Otherwise, it looks fine. Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper 2025-12-18 9:40 ` Ming Lei @ 2025-12-18 9:45 ` Christoph Hellwig 0 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-12-18 9:45 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Jens Axboe, Carlos Maiolino, Damien Le Moal, Hans Holmberg, linux-block, linux-xfs On Thu, Dec 18, 2025 at 05:40:00PM +0800, Ming Lei wrote: > > +/** > > + * bio_reuse - reuse a bio with the payload left intact > > + * @bio bio to reuse > > + * > > + * Allow reusing an existing bio for another operation with all set up > > + * fields including the payload, device and end_io handler left intact. > > + * > > + * Typically used for bios first used to read data which is then written > > + * to another location without modification. > > + */ > > +void bio_reuse(struct bio *bio) > > +{ > > + unsigned short vcnt = bio->bi_vcnt, i; > > + bio_end_io_t *end_io = bio->bi_end_io; > > + void *private = bio->bi_private; > > The incoming bio can't be a cloned bio, so It better no be, yes. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: add a bio_reuse helper 2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig ` (2 preceding siblings ...) 2025-12-18 9:40 ` Ming Lei @ 2025-12-19 7:45 ` Hans Holmberg 3 siblings, 0 replies; 15+ messages in thread From: Hans Holmberg @ 2025-12-19 7:45 UTC (permalink / raw) To: hch, Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, linux-block@vger.kernel.org, linux-xfs@vger.kernel.org On 18/12/2025 07:32, Christoph Hellwig wrote: > Add a helper to allow an existing bio to be resubmitted withtout > having to re-add the payload. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/bio.c | 25 +++++++++++++++++++++++++ > include/linux/bio.h | 1 + > 2 files changed, 26 insertions(+) > > diff --git a/block/bio.c b/block/bio.c > index e726c0e280a8..1b68ae877468 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -311,6 +311,31 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) > } > EXPORT_SYMBOL(bio_reset); > > +/** > + * bio_reuse - reuse a bio with the payload left intact > + * @bio bio to reuse > + * > + * Allow reusing an existing bio for another operation with all set up > + * fields including the payload, device and end_io handler left intact. > + * > + * Typically used for bios first used to read data which is then written > + * to another location without modification. > + */ > +void bio_reuse(struct bio *bio) > +{ > + unsigned short vcnt = bio->bi_vcnt, i; > + bio_end_io_t *end_io = bio->bi_end_io; > + void *private = bio->bi_private; > + > + bio_reset(bio, bio->bi_bdev, bio->bi_opf); > + for (i = 0; i < vcnt; i++) > + bio->bi_iter.bi_size += bio->bi_io_vec[i].bv_len; > + bio->bi_vcnt = vcnt; > + bio->bi_private = private; > + bio->bi_end_io = end_io; > +} > +EXPORT_SYMBOL_GPL(bio_reuse); > + > static struct bio *__bio_chain_endio(struct bio *bio) > { > struct bio *parent = bio->bi_private; > diff --git a/include/linux/bio.h b/include/linux/bio.h > index ad2d57908c1c..c0190f8badde 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -414,6 +414,7 @@ static inline void bio_init_inline(struct bio *bio, struct block_device *bdev, > } > extern void bio_uninit(struct bio *); > void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf); > +void bio_reuse(struct bio *bio); > void bio_chain(struct bio *, struct bio *); > > int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len, Looks good to me, Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] xfs: use bio_reuse in the zone GC code 2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig 2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig @ 2025-12-18 6:31 ` Christoph Hellwig 2025-12-18 6:45 ` Damien Le Moal 2025-12-19 7:48 ` Hans Holmberg 2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig 2 siblings, 2 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw) To: Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs Replace our somewhat fragile code to reuse the bio, which caused a regression in the past with the block layer bio_reuse helper. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_zone_gc.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c index 065005f13fe1..f3dec13a620c 100644 --- a/fs/xfs/xfs_zone_gc.c +++ b/fs/xfs/xfs_zone_gc.c @@ -812,8 +812,6 @@ xfs_zone_gc_write_chunk( { struct xfs_zone_gc_data *data = chunk->data; struct xfs_mount *mp = chunk->ip->i_mount; - phys_addr_t bvec_paddr = - bvec_phys(bio_first_bvec_all(&chunk->bio)); struct xfs_gc_bio *split_chunk; if (chunk->bio.bi_status) @@ -826,10 +824,7 @@ xfs_zone_gc_write_chunk( WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW); list_move_tail(&chunk->entry, &data->writing); - bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE); - bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len, - offset_in_folio(chunk->scratch->folio, bvec_paddr)); - + bio_reuse(&chunk->bio); while ((split_chunk = xfs_zone_gc_split_write(data, chunk))) xfs_zone_gc_submit_write(data, split_chunk); xfs_zone_gc_submit_write(data, chunk); -- 2.47.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] xfs: use bio_reuse in the zone GC code 2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig @ 2025-12-18 6:45 ` Damien Le Moal 2025-12-19 7:48 ` Hans Holmberg 1 sibling, 0 replies; 15+ messages in thread From: Damien Le Moal @ 2025-12-18 6:45 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, Carlos Maiolino Cc: Hans Holmberg, linux-block, linux-xfs On 12/18/25 15:31, Christoph Hellwig wrote: > Replace our somewhat fragile code to reuse the bio, which caused a > regression in the past with the block layer bio_reuse helper. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] xfs: use bio_reuse in the zone GC code 2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig 2025-12-18 6:45 ` Damien Le Moal @ 2025-12-19 7:48 ` Hans Holmberg 1 sibling, 0 replies; 15+ messages in thread From: Hans Holmberg @ 2025-12-19 7:48 UTC (permalink / raw) To: hch, Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, linux-block@vger.kernel.org, linux-xfs@vger.kernel.org On 18/12/2025 07:32, Christoph Hellwig wrote: > Replace our somewhat fragile code to reuse the bio, which caused a > regression in the past with the block layer bio_reuse helper. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_zone_gc.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > index 065005f13fe1..f3dec13a620c 100644 > --- a/fs/xfs/xfs_zone_gc.c > +++ b/fs/xfs/xfs_zone_gc.c > @@ -812,8 +812,6 @@ xfs_zone_gc_write_chunk( > { > struct xfs_zone_gc_data *data = chunk->data; > struct xfs_mount *mp = chunk->ip->i_mount; > - phys_addr_t bvec_paddr = > - bvec_phys(bio_first_bvec_all(&chunk->bio)); > struct xfs_gc_bio *split_chunk; > > if (chunk->bio.bi_status) > @@ -826,10 +824,7 @@ xfs_zone_gc_write_chunk( > WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW); > list_move_tail(&chunk->entry, &data->writing); > > - bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE); > - bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len, > - offset_in_folio(chunk->scratch->folio, bvec_paddr)); > - > + bio_reuse(&chunk->bio); > while ((split_chunk = xfs_zone_gc_split_write(data, chunk))) > xfs_zone_gc_submit_write(data, split_chunk); > xfs_zone_gc_submit_write(data, chunk); Nice! Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/3] xfs: rework zone GC buffer management 2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig 2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig 2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig @ 2025-12-18 6:31 ` Christoph Hellwig 2025-12-18 7:12 ` Damien Le Moal 2025-12-19 8:06 ` Hans Holmberg 2 siblings, 2 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-12-18 6:31 UTC (permalink / raw) To: Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs The double buffering where just one scratch area is used at a time does not efficiently use the available memory. It was originally implemented when GC I/O could happen out of order, but that was removed before upstream submission to avoid fragmentation. Now that all GC I/Os are processed in order, just use a number of buffers as a simple ring buffer. For a synthetic benchmark that fills 256MiB HDD zones and punches out holes to free half the space this leads to a decrease of GC time by a little more than 25%. Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and benchmarking. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_zone_gc.c | 106 ++++++++++++++++++++++++------------------- 1 file changed, 59 insertions(+), 47 deletions(-) diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c index f3dec13a620c..90f70473c3d5 100644 --- a/fs/xfs/xfs_zone_gc.c +++ b/fs/xfs/xfs_zone_gc.c @@ -50,23 +50,11 @@ */ /* - * Size of each GC scratch pad. This is also the upper bound for each - * GC I/O, which helps to keep latency down. + * Size of each GC scratch allocation, and the number of buffers. */ -#define XFS_GC_CHUNK_SIZE SZ_1M - -/* - * Scratchpad data to read GCed data into. - * - * The offset member tracks where the next allocation starts, and freed tracks - * the amount of space that is not used anymore. - */ -#define XFS_ZONE_GC_NR_SCRATCH 2 -struct xfs_zone_scratch { - struct folio *folio; - unsigned int offset; - unsigned int freed; -}; +#define XFS_GC_BUF_SIZE SZ_1M +#define XFS_GC_NR_BUFS 2 +static_assert(XFS_GC_NR_BUFS < BIO_MAX_VECS); /* * Chunk that is read and written for each GC operation. @@ -141,10 +129,14 @@ struct xfs_zone_gc_data { struct bio_set bio_set; /* - * Scratchpad used, and index to indicated which one is used. + * Scratchpad to buffer GC data, organized as a ring buffer over + * discontiguous folios. scratch_head is where the buffer is filled, + * and scratch_tail tracks the buffer space freed. */ - struct xfs_zone_scratch scratch[XFS_ZONE_GC_NR_SCRATCH]; - unsigned int scratch_idx; + struct folio *scratch_folios[XFS_GC_NR_BUFS]; + unsigned int scratch_size; + unsigned int scratch_head; + unsigned int scratch_tail; /* * List of bios currently being read, written and reset. @@ -210,20 +202,16 @@ xfs_zone_gc_data_alloc( if (!data->iter.recs) goto out_free_data; - /* - * We actually only need a single bio_vec. It would be nice to have - * a flag that only allocates the inline bvecs and not the separate - * bvec pool. - */ if (bioset_init(&data->bio_set, 16, offsetof(struct xfs_gc_bio, bio), BIOSET_NEED_BVECS)) goto out_free_recs; - for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++) { - data->scratch[i].folio = - folio_alloc(GFP_KERNEL, get_order(XFS_GC_CHUNK_SIZE)); - if (!data->scratch[i].folio) + for (i = 0; i < XFS_GC_NR_BUFS; i++) { + data->scratch_folios[i] = + folio_alloc(GFP_KERNEL, get_order(XFS_GC_BUF_SIZE)); + if (!data->scratch_folios[i]) goto out_free_scratch; } + data->scratch_size = XFS_GC_BUF_SIZE * XFS_GC_NR_BUFS; INIT_LIST_HEAD(&data->reading); INIT_LIST_HEAD(&data->writing); INIT_LIST_HEAD(&data->resetting); @@ -232,7 +220,7 @@ xfs_zone_gc_data_alloc( out_free_scratch: while (--i >= 0) - folio_put(data->scratch[i].folio); + folio_put(data->scratch_folios[i]); bioset_exit(&data->bio_set); out_free_recs: kfree(data->iter.recs); @@ -247,8 +235,8 @@ xfs_zone_gc_data_free( { int i; - for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++) - folio_put(data->scratch[i].folio); + for (i = 0; i < XFS_GC_NR_BUFS; i++) + folio_put(data->scratch_folios[i]); bioset_exit(&data->bio_set); kfree(data->iter.recs); kfree(data); @@ -591,7 +579,12 @@ static unsigned int xfs_zone_gc_scratch_available( struct xfs_zone_gc_data *data) { - return XFS_GC_CHUNK_SIZE - data->scratch[data->scratch_idx].offset; + if (!data->scratch_tail) + return data->scratch_size - data->scratch_head; + + if (!data->scratch_head) + return data->scratch_tail; + return (data->scratch_size - data->scratch_head) + data->scratch_tail; } static bool @@ -665,6 +658,28 @@ xfs_zone_gc_alloc_blocks( return oz; } +static void +xfs_zone_gc_add_data( + struct xfs_gc_bio *chunk) +{ + struct xfs_zone_gc_data *data = chunk->data; + unsigned int len = chunk->len; + unsigned int off = data->scratch_head; + + do { + unsigned int this_off = off % XFS_GC_BUF_SIZE; + unsigned int this_len = min(len, XFS_GC_BUF_SIZE - this_off); + + bio_add_folio_nofail(&chunk->bio, + data->scratch_folios[off / XFS_GC_BUF_SIZE], + this_len, this_off); + len -= this_len; + off += this_len; + if (off == data->scratch_size) + off = 0; + } while (len); +} + static bool xfs_zone_gc_start_chunk( struct xfs_zone_gc_data *data) @@ -678,6 +693,7 @@ xfs_zone_gc_start_chunk( struct xfs_inode *ip; struct bio *bio; xfs_daddr_t daddr; + unsigned int len; bool is_seq; if (xfs_is_shutdown(mp)) @@ -692,17 +708,19 @@ xfs_zone_gc_start_chunk( return false; } - bio = bio_alloc_bioset(bdev, 1, REQ_OP_READ, GFP_NOFS, &data->bio_set); + len = XFS_FSB_TO_B(mp, irec.rm_blockcount); + bio = bio_alloc_bioset(bdev, + min(howmany(len, XFS_GC_BUF_SIZE) + 1, XFS_GC_NR_BUFS), + REQ_OP_READ, GFP_NOFS, &data->bio_set); chunk = container_of(bio, struct xfs_gc_bio, bio); chunk->ip = ip; chunk->offset = XFS_FSB_TO_B(mp, irec.rm_offset); - chunk->len = XFS_FSB_TO_B(mp, irec.rm_blockcount); + chunk->len = len; chunk->old_startblock = xfs_rgbno_to_rtb(iter->victim_rtg, irec.rm_startblock); chunk->new_daddr = daddr; chunk->is_seq = is_seq; - chunk->scratch = &data->scratch[data->scratch_idx]; chunk->data = data; chunk->oz = oz; chunk->victim_rtg = iter->victim_rtg; @@ -711,13 +729,9 @@ xfs_zone_gc_start_chunk( bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock); bio->bi_end_io = xfs_zone_gc_end_io; - bio_add_folio_nofail(bio, chunk->scratch->folio, chunk->len, - chunk->scratch->offset); - chunk->scratch->offset += chunk->len; - if (chunk->scratch->offset == XFS_GC_CHUNK_SIZE) { - data->scratch_idx = - (data->scratch_idx + 1) % XFS_ZONE_GC_NR_SCRATCH; - } + xfs_zone_gc_add_data(chunk); + data->scratch_head = (data->scratch_head + len) % data->scratch_size; + WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW); list_add_tail(&chunk->entry, &data->reading); xfs_zone_gc_iter_advance(iter, irec.rm_blockcount); @@ -835,6 +849,7 @@ xfs_zone_gc_finish_chunk( struct xfs_gc_bio *chunk) { uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; + struct xfs_zone_gc_data *data = chunk->data; struct xfs_inode *ip = chunk->ip; struct xfs_mount *mp = ip->i_mount; int error; @@ -846,11 +861,8 @@ xfs_zone_gc_finish_chunk( return; } - chunk->scratch->freed += chunk->len; - if (chunk->scratch->freed == chunk->scratch->offset) { - chunk->scratch->offset = 0; - chunk->scratch->freed = 0; - } + data->scratch_tail = + (data->scratch_tail + chunk->len) % data->scratch_size; /* * Cycle through the iolock and wait for direct I/O and layouts to -- 2.47.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] xfs: rework zone GC buffer management 2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig @ 2025-12-18 7:12 ` Damien Le Moal 2025-12-19 8:06 ` Hans Holmberg 1 sibling, 0 replies; 15+ messages in thread From: Damien Le Moal @ 2025-12-18 7:12 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, Carlos Maiolino Cc: Hans Holmberg, linux-block, linux-xfs On 12/18/25 15:31, Christoph Hellwig wrote: > The double buffering where just one scratch area is used at a time does > not efficiently use the available memory. It was originally implemented > when GC I/O could happen out of order, but that was removed before > upstream submission to avoid fragmentation. Now that all GC I/Os are > processed in order, just use a number of buffers as a simple ring buffer. > > For a synthetic benchmark that fills 256MiB HDD zones and punches out > holes to free half the space this leads to a decrease of GC time by > a little more than 25%. > > Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and > benchmarking. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] xfs: rework zone GC buffer management 2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig 2025-12-18 7:12 ` Damien Le Moal @ 2025-12-19 8:06 ` Hans Holmberg 1 sibling, 0 replies; 15+ messages in thread From: Hans Holmberg @ 2025-12-19 8:06 UTC (permalink / raw) To: hch, Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, linux-block@vger.kernel.org, linux-xfs@vger.kernel.org On 18/12/2025 07:32, Christoph Hellwig wrote: > The double buffering where just one scratch area is used at a time does > not efficiently use the available memory. It was originally implemented > when GC I/O could happen out of order, but that was removed before > upstream submission to avoid fragmentation. Now that all GC I/Os are > processed in order, just use a number of buffers as a simple ring buffer. > > For a synthetic benchmark that fills 256MiB HDD zones and punches out > holes to free half the space this leads to a decrease of GC time by > a little more than 25%. > > Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and > benchmarking. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_zone_gc.c | 106 ++++++++++++++++++++++++------------------- > 1 file changed, 59 insertions(+), 47 deletions(-) > > diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > index f3dec13a620c..90f70473c3d5 100644 > --- a/fs/xfs/xfs_zone_gc.c > +++ b/fs/xfs/xfs_zone_gc.c > @@ -50,23 +50,11 @@ > */ > > /* > - * Size of each GC scratch pad. This is also the upper bound for each > - * GC I/O, which helps to keep latency down. > + * Size of each GC scratch allocation, and the number of buffers. > */ > -#define XFS_GC_CHUNK_SIZE SZ_1M > - > -/* > - * Scratchpad data to read GCed data into. > - * > - * The offset member tracks where the next allocation starts, and freed tracks > - * the amount of space that is not used anymore. > - */ > -#define XFS_ZONE_GC_NR_SCRATCH 2 > -struct xfs_zone_scratch { > - struct folio *folio; > - unsigned int offset; > - unsigned int freed; > -}; > +#define XFS_GC_BUF_SIZE SZ_1M > +#define XFS_GC_NR_BUFS 2 > +static_assert(XFS_GC_NR_BUFS < BIO_MAX_VECS); > > /* > * Chunk that is read and written for each GC operation. > @@ -141,10 +129,14 @@ struct xfs_zone_gc_data { > struct bio_set bio_set; > > /* > - * Scratchpad used, and index to indicated which one is used. > + * Scratchpad to buffer GC data, organized as a ring buffer over > + * discontiguous folios. scratch_head is where the buffer is filled, > + * and scratch_tail tracks the buffer space freed. > */ > - struct xfs_zone_scratch scratch[XFS_ZONE_GC_NR_SCRATCH]; > - unsigned int scratch_idx; > + struct folio *scratch_folios[XFS_GC_NR_BUFS]; > + unsigned int scratch_size; > + unsigned int scratch_head; > + unsigned int scratch_tail; > > /* > * List of bios currently being read, written and reset. > @@ -210,20 +202,16 @@ xfs_zone_gc_data_alloc( > if (!data->iter.recs) > goto out_free_data; > > - /* > - * We actually only need a single bio_vec. It would be nice to have > - * a flag that only allocates the inline bvecs and not the separate > - * bvec pool. > - */ > if (bioset_init(&data->bio_set, 16, offsetof(struct xfs_gc_bio, bio), > BIOSET_NEED_BVECS)) > goto out_free_recs; > - for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++) { > - data->scratch[i].folio = > - folio_alloc(GFP_KERNEL, get_order(XFS_GC_CHUNK_SIZE)); > - if (!data->scratch[i].folio) > + for (i = 0; i < XFS_GC_NR_BUFS; i++) { > + data->scratch_folios[i] = > + folio_alloc(GFP_KERNEL, get_order(XFS_GC_BUF_SIZE)); > + if (!data->scratch_folios[i]) > goto out_free_scratch; > } > + data->scratch_size = XFS_GC_BUF_SIZE * XFS_GC_NR_BUFS; > INIT_LIST_HEAD(&data->reading); > INIT_LIST_HEAD(&data->writing); > INIT_LIST_HEAD(&data->resetting); > @@ -232,7 +220,7 @@ xfs_zone_gc_data_alloc( > > out_free_scratch: > while (--i >= 0) > - folio_put(data->scratch[i].folio); > + folio_put(data->scratch_folios[i]); > bioset_exit(&data->bio_set); > out_free_recs: > kfree(data->iter.recs); > @@ -247,8 +235,8 @@ xfs_zone_gc_data_free( > { > int i; > > - for (i = 0; i < XFS_ZONE_GC_NR_SCRATCH; i++) > - folio_put(data->scratch[i].folio); > + for (i = 0; i < XFS_GC_NR_BUFS; i++) > + folio_put(data->scratch_folios[i]); > bioset_exit(&data->bio_set); > kfree(data->iter.recs); > kfree(data); > @@ -591,7 +579,12 @@ static unsigned int > xfs_zone_gc_scratch_available( > struct xfs_zone_gc_data *data) > { > - return XFS_GC_CHUNK_SIZE - data->scratch[data->scratch_idx].offset; > + if (!data->scratch_tail) > + return data->scratch_size - data->scratch_head; > + > + if (!data->scratch_head) > + return data->scratch_tail; > + return (data->scratch_size - data->scratch_head) + data->scratch_tail; > } > > static bool > @@ -665,6 +658,28 @@ xfs_zone_gc_alloc_blocks( > return oz; > } > > +static void > +xfs_zone_gc_add_data( > + struct xfs_gc_bio *chunk) > +{ > + struct xfs_zone_gc_data *data = chunk->data; > + unsigned int len = chunk->len; > + unsigned int off = data->scratch_head; > + > + do { > + unsigned int this_off = off % XFS_GC_BUF_SIZE; > + unsigned int this_len = min(len, XFS_GC_BUF_SIZE - this_off); > + > + bio_add_folio_nofail(&chunk->bio, > + data->scratch_folios[off / XFS_GC_BUF_SIZE], > + this_len, this_off); > + len -= this_len; > + off += this_len; > + if (off == data->scratch_size) > + off = 0; > + } while (len); > +} > + > static bool > xfs_zone_gc_start_chunk( > struct xfs_zone_gc_data *data) > @@ -678,6 +693,7 @@ xfs_zone_gc_start_chunk( > struct xfs_inode *ip; > struct bio *bio; > xfs_daddr_t daddr; > + unsigned int len; > bool is_seq; > > if (xfs_is_shutdown(mp)) > @@ -692,17 +708,19 @@ xfs_zone_gc_start_chunk( > return false; > } > > - bio = bio_alloc_bioset(bdev, 1, REQ_OP_READ, GFP_NOFS, &data->bio_set); > + len = XFS_FSB_TO_B(mp, irec.rm_blockcount); > + bio = bio_alloc_bioset(bdev, > + min(howmany(len, XFS_GC_BUF_SIZE) + 1, XFS_GC_NR_BUFS), > + REQ_OP_READ, GFP_NOFS, &data->bio_set); > > chunk = container_of(bio, struct xfs_gc_bio, bio); > chunk->ip = ip; > chunk->offset = XFS_FSB_TO_B(mp, irec.rm_offset); > - chunk->len = XFS_FSB_TO_B(mp, irec.rm_blockcount); > + chunk->len = len; > chunk->old_startblock = > xfs_rgbno_to_rtb(iter->victim_rtg, irec.rm_startblock); > chunk->new_daddr = daddr; > chunk->is_seq = is_seq; > - chunk->scratch = &data->scratch[data->scratch_idx]; > chunk->data = data; > chunk->oz = oz; > chunk->victim_rtg = iter->victim_rtg; > @@ -711,13 +729,9 @@ xfs_zone_gc_start_chunk( > > bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock); > bio->bi_end_io = xfs_zone_gc_end_io; > - bio_add_folio_nofail(bio, chunk->scratch->folio, chunk->len, > - chunk->scratch->offset); > - chunk->scratch->offset += chunk->len; > - if (chunk->scratch->offset == XFS_GC_CHUNK_SIZE) { > - data->scratch_idx = > - (data->scratch_idx + 1) % XFS_ZONE_GC_NR_SCRATCH; > - } > + xfs_zone_gc_add_data(chunk); > + data->scratch_head = (data->scratch_head + len) % data->scratch_size; > + > WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW); > list_add_tail(&chunk->entry, &data->reading); > xfs_zone_gc_iter_advance(iter, irec.rm_blockcount); > @@ -835,6 +849,7 @@ xfs_zone_gc_finish_chunk( > struct xfs_gc_bio *chunk) > { > uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; > + struct xfs_zone_gc_data *data = chunk->data; > struct xfs_inode *ip = chunk->ip; > struct xfs_mount *mp = ip->i_mount; > int error; > @@ -846,11 +861,8 @@ xfs_zone_gc_finish_chunk( > return; > } > > - chunk->scratch->freed += chunk->len; > - if (chunk->scratch->freed == chunk->scratch->offset) { > - chunk->scratch->offset = 0; > - chunk->scratch->freed = 0; > - } > + data->scratch_tail = > + (data->scratch_tail + chunk->len) % data->scratch_size; > > /* > * Cycle through the iolock and wait for direct I/O and layouts to Looks good, Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* improve zoned XFS GC buffer management v2 @ 2026-01-06 7:58 Christoph Hellwig 2026-01-06 7:58 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2026-01-06 7:58 UTC (permalink / raw) To: Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs Hi all, the zoned XFS GC code currently uses a weird bank switching strategy to manage the scratch buffer. The reason for that was that I/O could be proceed out of order in the early days, but that actually changed before the code was upstreamed to avoid fragmentation. This replaced the logic with a simple ring buffer, which makes the buffer space utilization much more efficient. Before that, two patches makes the reuse of the bios a lot less fragile, and for that we need a new block layer helper that should eventually also be useful in other places. Jens, let me know if merging the bio helper through the XFS tree is ok. Changes since v1: - assert that the reused bio is not cloned, and update the comments for bio_reuse a bit Diffstat: block/bio.c | 29 +++++++++++++ fs/xfs/xfs_zone_gc.c | 113 +++++++++++++++++++++++++++------------------------ include/linux/bio.h | 1 3 files changed, 90 insertions(+), 53 deletions(-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] block: add a bio_reuse helper 2026-01-06 7:58 improve zoned XFS GC buffer management v2 Christoph Hellwig @ 2026-01-06 7:58 ` Christoph Hellwig 0 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2026-01-06 7:58 UTC (permalink / raw) To: Jens Axboe, Carlos Maiolino Cc: Damien Le Moal, Hans Holmberg, linux-block, linux-xfs Add a helper to allow an existing bio to be resubmitted withtout having to re-add the payload. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> --- block/bio.c | 29 +++++++++++++++++++++++++++++ include/linux/bio.h | 1 + 2 files changed, 30 insertions(+) diff --git a/block/bio.c b/block/bio.c index e726c0e280a8..ed99368c662f 100644 --- a/block/bio.c +++ b/block/bio.c @@ -311,6 +311,35 @@ void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf) } EXPORT_SYMBOL(bio_reset); +/** + * bio_reuse - reuse a bio with the payload left intact + * @bio bio to reuse + * + * Allow reusing an existing bio for another operation with all set up + * fields including the payload, device and end_io handler left intact. + * + * Typically used for bios first used to read data which is then written + * to another location without modification. This must be used by the + * I/O submitter on an bio that is not in flight. Can't be used for + * cloned bios. + */ +void bio_reuse(struct bio *bio) +{ + unsigned short vcnt = bio->bi_vcnt, i; + bio_end_io_t *end_io = bio->bi_end_io; + void *private = bio->bi_private; + + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); + + bio_reset(bio, bio->bi_bdev, bio->bi_opf); + for (i = 0; i < vcnt; i++) + bio->bi_iter.bi_size += bio->bi_io_vec[i].bv_len; + bio->bi_vcnt = vcnt; + bio->bi_private = private; + bio->bi_end_io = end_io; +} +EXPORT_SYMBOL_GPL(bio_reuse); + static struct bio *__bio_chain_endio(struct bio *bio) { struct bio *parent = bio->bi_private; diff --git a/include/linux/bio.h b/include/linux/bio.h index ad2d57908c1c..c0190f8badde 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -414,6 +414,7 @@ static inline void bio_init_inline(struct bio *bio, struct block_device *bdev, } extern void bio_uninit(struct bio *); void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf); +void bio_reuse(struct bio *bio); void bio_chain(struct bio *, struct bio *); int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len, -- 2.47.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-01-06 7:59 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-18 6:31 improve zoned XFS GC buffer management Christoph Hellwig 2025-12-18 6:31 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig 2025-12-18 6:43 ` Damien Le Moal 2025-12-18 9:20 ` Hannes Reinecke 2025-12-18 9:35 ` Christoph Hellwig 2025-12-18 9:40 ` Ming Lei 2025-12-18 9:45 ` Christoph Hellwig 2025-12-19 7:45 ` Hans Holmberg 2025-12-18 6:31 ` [PATCH 2/3] xfs: use bio_reuse in the zone GC code Christoph Hellwig 2025-12-18 6:45 ` Damien Le Moal 2025-12-19 7:48 ` Hans Holmberg 2025-12-18 6:31 ` [PATCH 3/3] xfs: rework zone GC buffer management Christoph Hellwig 2025-12-18 7:12 ` Damien Le Moal 2025-12-19 8:06 ` Hans Holmberg -- strict thread matches above, loose matches on Subject: below -- 2026-01-06 7:58 improve zoned XFS GC buffer management v2 Christoph Hellwig 2026-01-06 7:58 ` [PATCH 1/3] block: add a bio_reuse helper Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).