* [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset
@ 2025-05-12 14:43 Christoph Hellwig
2025-05-12 17:04 ` Darrick J. Wong
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-12 14:43 UTC (permalink / raw)
To: cem; +Cc: djwong, hans.holmberg, linux-xfs, Hans Holmberg
xfs_zone_gc_write_chunk writes out the data buffer read in earlier using
the same bio, and currenly looks at bv_offset for the offset into the
scratch folio for that. But commit 26064d3e2b4d ("block: fix adding
folio to bio") changed how bv_page and bv_offset are calculated for
adding larger folios, breaking this fragile logic.
Switch to extracting the full physical address from the old bio_vec,
and calculate the offset into the folio from that instead.
This fixes data corruption during garbage collection with heavy rockdsb
workloads. Thanks to Hans for tracking down the culprit commit during
long bisection sessions.
Fixes: 26064d3e2b4d ("block: fix adding folio to bio")
Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_zone_gc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index 8c541ca71872..a045b1dedd68 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -801,7 +801,8 @@ xfs_zone_gc_write_chunk(
{
struct xfs_zone_gc_data *data = chunk->data;
struct xfs_mount *mp = chunk->ip->i_mount;
- unsigned int folio_offset = chunk->bio.bi_io_vec->bv_offset;
+ phys_addr_t bvec_paddr =
+ bvec_phys(bio_first_bvec_all(&chunk->bio));
struct xfs_gc_bio *split_chunk;
if (chunk->bio.bi_status)
@@ -816,7 +817,7 @@ xfs_zone_gc_write_chunk(
bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE);
bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len,
- folio_offset);
+ offset_in_folio(chunk->scratch->folio, bvec_paddr));
while ((split_chunk = xfs_zone_gc_split_write(data, chunk)))
xfs_zone_gc_submit_write(data, split_chunk);
--
2.47.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset
2025-05-12 14:43 [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset Christoph Hellwig
@ 2025-05-12 17:04 ` Darrick J. Wong
2025-05-13 5:20 ` Christoph Hellwig
2025-05-13 6:56 ` Hans Holmberg
2025-05-13 17:35 ` Carlos Maiolino
2 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2025-05-12 17:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: cem, hans.holmberg, linux-xfs
On Mon, May 12, 2025 at 04:43:05PM +0200, Christoph Hellwig wrote:
> xfs_zone_gc_write_chunk writes out the data buffer read in earlier using
> the same bio, and currenly looks at bv_offset for the offset into the
> scratch folio for that. But commit 26064d3e2b4d ("block: fix adding
> folio to bio") changed how bv_page and bv_offset are calculated for
> adding larger folios, breaking this fragile logic.
>
> Switch to extracting the full physical address from the old bio_vec,
> and calculate the offset into the folio from that instead.
>
> This fixes data corruption during garbage collection with heavy rockdsb
> workloads. Thanks to Hans for tracking down the culprit commit during
> long bisection sessions.
>
> Fixes: 26064d3e2b4d ("block: fix adding folio to bio")
> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
> Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_zone_gc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
> index 8c541ca71872..a045b1dedd68 100644
> --- a/fs/xfs/xfs_zone_gc.c
> +++ b/fs/xfs/xfs_zone_gc.c
> @@ -801,7 +801,8 @@ xfs_zone_gc_write_chunk(
> {
> struct xfs_zone_gc_data *data = chunk->data;
> struct xfs_mount *mp = chunk->ip->i_mount;
> - unsigned int folio_offset = chunk->bio.bi_io_vec->bv_offset;
> + phys_addr_t bvec_paddr =
> + bvec_phys(bio_first_bvec_all(&chunk->bio));
Hmm. I started wondering why you can't reuse chunk->scratch->offset in
the bio_add_folio_nofail call below. I /think/ that's because
xfs_zone_gc_start_chunk increments chunk->scratch->offset after adding
the folio to the bio?
bio_add_folio_nofail(bio, chunk->scratch->folio, chunk->len,
chunk->scratch->offset);
chunk->scratch->offset += chunk->len;
And then we can attach the same scratch->folio to different read bios.
Each bio gets a different offset within the folio, right? So
xfs_zone_gc_write_chunk really does need to find the offset_in_folio
from the read bio. And I guess that's why you use bvec_phys to figure
that out (instead of, say, wasting space recording it again in the
xfs_gc_bio), correct?
If the answers to all my questions are 'yes' then I think I grok this
sufficiently to say
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> struct xfs_gc_bio *split_chunk;
>
> if (chunk->bio.bi_status)
> @@ -816,7 +817,7 @@ xfs_zone_gc_write_chunk(
>
> bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE);
> bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len,
> - folio_offset);
> + offset_in_folio(chunk->scratch->folio, bvec_paddr));
>
> while ((split_chunk = xfs_zone_gc_split_write(data, chunk)))
> xfs_zone_gc_submit_write(data, split_chunk);
> --
> 2.47.2
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset
2025-05-12 17:04 ` Darrick J. Wong
@ 2025-05-13 5:20 ` Christoph Hellwig
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-05-13 5:20 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, hans.holmberg, linux-xfs
On Mon, May 12, 2025 at 10:04:00AM -0700, Darrick J. Wong wrote:
> Hmm. I started wondering why you can't reuse chunk->scratch->offset in
> the bio_add_folio_nofail call below. I /think/ that's because
> xfs_zone_gc_start_chunk increments chunk->scratch->offset after adding
> the folio to the bio?
Exactly, it's a basically a circular buffer.
> And then we can attach the same scratch->folio to different read bios.
> Each bio gets a different offset within the folio, right?
Yes.
> So
> xfs_zone_gc_write_chunk really does need to find the offset_in_folio
> from the read bio. And I guess that's why you use bvec_phys to figure
> that out (instead of, say, wasting space recording it again in the
> xfs_gc_bio), correct?
Yes.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset
2025-05-12 14:43 [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset Christoph Hellwig
2025-05-12 17:04 ` Darrick J. Wong
@ 2025-05-13 6:56 ` Hans Holmberg
2025-05-13 17:35 ` Carlos Maiolino
2 siblings, 0 replies; 5+ messages in thread
From: Hans Holmberg @ 2025-05-13 6:56 UTC (permalink / raw)
To: hch, cem@kernel.org; +Cc: djwong@kernel.org, linux-xfs@vger.kernel.org
On 12/05/2025 16:43, Christoph Hellwig wrote:
> xfs_zone_gc_write_chunk writes out the data buffer read in earlier using
> the same bio, and currenly looks at bv_offset for the offset into the
> scratch folio for that. But commit 26064d3e2b4d ("block: fix adding
> folio to bio") changed how bv_page and bv_offset are calculated for
> adding larger folios, breaking this fragile logic.
>
> Switch to extracting the full physical address from the old bio_vec,
> and calculate the offset into the folio from that instead.
>
> This fixes data corruption during garbage collection with heavy rockdsb
> workloads. Thanks to Hans for tracking down the culprit commit during
> long bisection sessions.
>
> Fixes: 26064d3e2b4d ("block: fix adding folio to bio")
> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
> Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_zone_gc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
> index 8c541ca71872..a045b1dedd68 100644
> --- a/fs/xfs/xfs_zone_gc.c
> +++ b/fs/xfs/xfs_zone_gc.c
> @@ -801,7 +801,8 @@ xfs_zone_gc_write_chunk(
> {
> struct xfs_zone_gc_data *data = chunk->data;
> struct xfs_mount *mp = chunk->ip->i_mount;
> - unsigned int folio_offset = chunk->bio.bi_io_vec->bv_offset;
> + phys_addr_t bvec_paddr =
> + bvec_phys(bio_first_bvec_all(&chunk->bio));
> struct xfs_gc_bio *split_chunk;
>
> if (chunk->bio.bi_status)
> @@ -816,7 +817,7 @@ xfs_zone_gc_write_chunk(
>
> bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE);
> bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len,
> - folio_offset);
> + offset_in_folio(chunk->scratch->folio, bvec_paddr));
>
> while ((split_chunk = xfs_zone_gc_split_write(data, chunk)))
> xfs_zone_gc_submit_write(data, split_chunk);
Looks good and fixes the gc data corruption issue, thanks!
Reviewed-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Tested-by: Hans Holmberg <Hans.Holmberg@wdc.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset
2025-05-12 14:43 [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset Christoph Hellwig
2025-05-12 17:04 ` Darrick J. Wong
2025-05-13 6:56 ` Hans Holmberg
@ 2025-05-13 17:35 ` Carlos Maiolino
2 siblings, 0 replies; 5+ messages in thread
From: Carlos Maiolino @ 2025-05-13 17:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: djwong, hans.holmberg, linux-xfs, Hans Holmberg
On Mon, 12 May 2025 16:43:05 +0200, Christoph Hellwig wrote:
> xfs_zone_gc_write_chunk writes out the data buffer read in earlier using
> the same bio, and currenly looks at bv_offset for the offset into the
> scratch folio for that. But commit 26064d3e2b4d ("block: fix adding
> folio to bio") changed how bv_page and bv_offset are calculated for
> adding larger folios, breaking this fragile logic.
>
> Switch to extracting the full physical address from the old bio_vec,
> and calculate the offset into the folio from that instead.
>
> [...]
Applied to for-next, thanks!
[1/1] xfs: fix zoned GC data corruption due to wrong bv_offset
commit: 91ffea7cf2f0b53686bba24c3bd86fa5cedd48d6
Best regards,
--
Carlos Maiolino <cem@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-13 17:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 14:43 [PATCH] xfs: fix zoned GC data corruption due to wrong bv_offset Christoph Hellwig
2025-05-12 17:04 ` Darrick J. Wong
2025-05-13 5:20 ` Christoph Hellwig
2025-05-13 6:56 ` Hans Holmberg
2025-05-13 17:35 ` Carlos Maiolino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).