* [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes
@ 2025-11-03 17:40 Darrick J. Wong
2025-11-03 17:44 ` [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin Darrick J. Wong
2025-11-04 10:08 ` [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes John Garry
0 siblings, 2 replies; 8+ messages in thread
From: Darrick J. Wong @ 2025-11-03 17:40 UTC (permalink / raw)
To: Carlos Maiolino
Cc: Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, john.g.garry,
linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
With the 20 Oct 2025 release of fstests, generic/521 fails for me on
regular (aka non-block-atomic-writes) storage:
QA output created by 521
dowrite: write: Input/output error
LOG DUMP (8553 total operations):
1( 1 mod 256): SKIPPED (no operation)
2( 2 mod 256): WRITE 0x7e000 thru 0x8dfff (0x10000 bytes) HOLE
3( 3 mod 256): READ 0x69000 thru 0x79fff (0x11000 bytes)
4( 4 mod 256): FALLOC 0x53c38 thru 0x5e853 (0xac1b bytes) INTERIOR
5( 5 mod 256): COPY 0x55000 thru 0x59fff (0x5000 bytes) to 0x25000 thru 0x29fff
6( 6 mod 256): WRITE 0x74000 thru 0x88fff (0x15000 bytes)
7( 7 mod 256): ZERO 0xedb1 thru 0x11693 (0x28e3 bytes)
with a warning in dmesg from iomap about XFS trying to give it a
delalloc mapping for a directio write. Fix the software atomic write
iomap_begin code to convert the reservation into a written mapping.
This doesn't fix the data corruption problems reported by generic/760,
but it's a start.
Cc: <stable@vger.kernel.org> # v6.16
Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_iomap.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d3f6e3e42a1191..e1da06b157cf94 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1130,7 +1130,7 @@ xfs_atomic_write_cow_iomap_begin(
return -EAGAIN;
trace_xfs_iomap_atomic_write_cow(ip, offset, length);
-
+retry:
xfs_ilock(ip, XFS_ILOCK_EXCL);
if (!ip->i_cowfp) {
@@ -1141,6 +1141,8 @@ xfs_atomic_write_cow_iomap_begin(
if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap))
cmap.br_startoff = end_fsb;
if (cmap.br_startoff <= offset_fsb) {
+ if (isnullstartblock(cmap.br_startblock))
+ goto convert;
xfs_trim_extent(&cmap, offset_fsb, count_fsb);
goto found;
}
@@ -1169,8 +1171,10 @@ xfs_atomic_write_cow_iomap_begin(
if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap))
cmap.br_startoff = end_fsb;
if (cmap.br_startoff <= offset_fsb) {
- xfs_trim_extent(&cmap, offset_fsb, count_fsb);
xfs_trans_cancel(tp);
+ if (isnullstartblock(cmap.br_startblock))
+ goto convert;
+ xfs_trim_extent(&cmap, offset_fsb, count_fsb);
goto found;
}
@@ -1210,6 +1214,19 @@ xfs_atomic_write_cow_iomap_begin(
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq);
+convert:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ error = xfs_bmapi_convert_delalloc(ip, XFS_COW_FORK, offset, iomap,
+ NULL);
+ if (error)
+ return error;
+
+ /*
+ * Try the lookup again, because the delalloc conversion might have
+ * turned the COW mapping into unwritten, but we need it to be in
+ * written state.
+ */
+ goto retry;
out_unlock:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
return error;
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin 2025-11-03 17:40 [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes Darrick J. Wong @ 2025-11-03 17:44 ` Darrick J. Wong 2025-11-04 12:07 ` John Garry 2025-11-04 10:08 ` [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes John Garry 1 sibling, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2025-11-03 17:44 UTC (permalink / raw) To: Carlos Maiolino Cc: Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, john.g.garry, linux-xfs From: Darrick J. Wong <djwong@kernel.org> I think there are several things wrong with this function: A) xfs_bmapi_write can return a much larger unwritten mapping than what the caller asked for. We convert part of that range to written, but return the entire written mapping to iomap even though that's inaccurate. B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an unwritten mapping could be *smaller* than the write range (or even the hole range). In this case, we convert too much file range to written state because we then return a smaller mapping to iomap. C) It doesn't handle delalloc mappings. This I covered in the patch that I already sent to the list. D) Reassigning count_fsb to handle the hole means that if the second cmap lookup attempt succeeds (due to racing with someone else) we trim the mapping more than is strictly necessary. The changing meaning of count_fsb makes this harder to notice. E) The tracepoint is kinda wrong because @length is mutated. That makes it harder to chase the data flows through this function because you can't just grep on the pos/bytecount strings. F) We don't actually check that the br_state = XFS_EXT_NORM assignment is accurate, i.e that the cow fork actually contains a written mapping for the range we're interested in G) Somewhat inadequate documentation of why we need to xfs_trim_extent so aggressively in this function. H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped the write range to s_maxbytes. Fix these issues, and then the atomic writes regressions in generic/760, generic/617, generic/091, generic/263, and generic/521 all go away for me. Cc: <stable@vger.kernel.org> # v6.16 Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> --- fs/xfs/xfs_iomap.c | 60 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 49 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index e1da06b157cf94..469f34034daddd 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1091,6 +1091,29 @@ const struct iomap_ops xfs_zoned_direct_write_iomap_ops = { }; #endif /* CONFIG_XFS_RT */ +#ifdef DEBUG +static void +xfs_check_atomic_conversion( + struct xfs_inode *ip, + xfs_fileoff_t offset_fsb, + xfs_filblks_t count_fsb, + const struct xfs_bmbt_irec *cmap) +{ + struct xfs_iext_cursor icur; + struct xfs_bmbt_irec cmap2 = { }; + + if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap2)) + xfs_trim_extent(&cmap2, offset_fsb, count_fsb); + + ASSERT(cmap2.br_startoff == cmap->br_startoff); + ASSERT(cmap2.br_blockcount == cmap->br_blockcount); + ASSERT(cmap2.br_startblock == cmap->br_startblock); + ASSERT(cmap2.br_state == cmap->br_state); +} +#else +# define xfs_check_atomic_conversion(...) ((void)0) +#endif + static int xfs_atomic_write_cow_iomap_begin( struct inode *inode, @@ -1102,9 +1125,10 @@ xfs_atomic_write_cow_iomap_begin( { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; - const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); - xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); - xfs_filblks_t count_fsb = end_fsb - offset_fsb; + const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); + const xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + length); + const xfs_filblks_t count_fsb = end_fsb - offset_fsb; + xfs_filblks_t hole_count_fsb; int nmaps = 1; xfs_filblks_t resaligned; struct xfs_bmbt_irec cmap; @@ -1143,14 +1167,20 @@ xfs_atomic_write_cow_iomap_begin( if (cmap.br_startoff <= offset_fsb) { if (isnullstartblock(cmap.br_startblock)) goto convert; + + /* + * cmap could extend outside the write range due to previous + * speculative preallocations. We must trim cmap to the write + * range because the cow fork treats written mappings to mean + * "write in progress". + */ xfs_trim_extent(&cmap, offset_fsb, count_fsb); goto found; } - end_fsb = cmap.br_startoff; - count_fsb = end_fsb - offset_fsb; + hole_count_fsb = cmap.br_startoff - offset_fsb; - resaligned = xfs_aligned_fsb_count(offset_fsb, count_fsb, + resaligned = xfs_aligned_fsb_count(offset_fsb, hole_count_fsb, xfs_get_cowextsz_hint(ip)); xfs_iunlock(ip, XFS_ILOCK_EXCL); @@ -1186,7 +1216,7 @@ xfs_atomic_write_cow_iomap_begin( * atomic writes to that same range will be aligned (and don't require * this COW-based method). */ - error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, + error = xfs_bmapi_write(tp, ip, offset_fsb, hole_count_fsb, XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC | XFS_BMAPI_EXTSZALIGN, 0, &cmap, &nmaps); if (error) { @@ -1199,17 +1229,25 @@ xfs_atomic_write_cow_iomap_begin( if (error) goto out_unlock; + /* + * cmap could map more blocks than the range we passed into bmapi_write + * because of EXTSZALIGN or adjacent pre-existing unwritten mappings + * that were merged. Trim cmap to the original write range so that we + * don't convert more than we were asked to do for this write. + */ + xfs_trim_extent(&cmap, offset_fsb, count_fsb); + found: if (cmap.br_state != XFS_EXT_NORM) { - error = xfs_reflink_convert_cow_locked(ip, offset_fsb, - count_fsb); + error = xfs_reflink_convert_cow_locked(ip, cmap.br_startoff, + cmap.br_blockcount); if (error) goto out_unlock; cmap.br_state = XFS_EXT_NORM; + xfs_check_atomic_conversion(ip, offset_fsb, count_fsb, &cmap); } - length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount); - trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap); + trace_xfs_iomap_found(ip, offset, length, XFS_COW_FORK, &cmap); seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED); xfs_iunlock(ip, XFS_ILOCK_EXCL); return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin 2025-11-03 17:44 ` [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin Darrick J. Wong @ 2025-11-04 12:07 ` John Garry 2025-11-04 17:18 ` Darrick J. Wong 0 siblings, 1 reply; 8+ messages in thread From: John Garry @ 2025-11-04 12:07 UTC (permalink / raw) To: Darrick J. Wong, Carlos Maiolino Cc: Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, linux-xfs On 03/11/2025 17:44, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > I think there are several things wrong with this function: > > A) xfs_bmapi_write can return a much larger unwritten mapping than what > the caller asked for. We convert part of that range to written, but > return the entire written mapping to iomap even though that's > inaccurate. > > B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an > unwritten mapping could be *smaller* than the write range (or even > the hole range). In this case, we convert too much file range to > written state because we then return a smaller mapping to iomap. > > C) It doesn't handle delalloc mappings. This I covered in the patch > that I already sent to the list. > > D) Reassigning count_fsb to handle the hole means that if the second > cmap lookup attempt succeeds (due to racing with someone else) we > trim the mapping more than is strictly necessary. The changing > meaning of count_fsb makes this harder to notice. > > E) The tracepoint is kinda wrong because @length is mutated. That makes > it harder to chase the data flows through this function because you > can't just grep on the pos/bytecount strings. > > F) We don't actually check that the br_state = XFS_EXT_NORM assignment > is accurate, i.e that the cow fork actually contains a written > mapping for the range we're interested in > > G) Somewhat inadequate documentation of why we need to xfs_trim_extent > so aggressively in this function. > > H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped > the write range to s_maxbytes. Can you point out that code? I wonder if we should be rejecting anything which goes over s_maxbytes for RWF_ATOMIC. > > Fix these issues, and then the atomic writes regressions in generic/760, > generic/617, generic/091, generic/263, and generic/521 all go away for > me. > > Cc: <stable@vger.kernel.org> # v6.16 > Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> this looks ok, thanks Reviewed-by: John Garry <john.g.garry@oracle.com> I'm just doing powerfail testing - I'll let you know if any issues found > --- > fs/xfs/xfs_iomap.c | 60 ++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 49 insertions(+), 11 deletions(-) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index e1da06b157cf94..469f34034daddd 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1091,6 +1091,29 @@ const struct iomap_ops xfs_zoned_direct_write_iomap_ops = { > }; > #endif /* CONFIG_XFS_RT */ > > +#ifdef DEBUG > +static void > +xfs_check_atomic_conversion( xfs_check_atomic_cow_conversion() might be better, I don't think that it is so important > + struct xfs_inode *ip, > + xfs_fileoff_t offset_fsb, > + xfs_filblks_t count_fsb, > + const struct xfs_bmbt_irec *cmap) > +{ > + struct xfs_iext_cursor icur; > + struct xfs_bmbt_irec cmap2 = { }; > + > + if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap2)) > + xfs_trim_extent(&cmap2, offset_fsb, count_fsb); > + > + ASSERT(cmap2.br_startoff == cmap->br_startoff); > + ASSERT(cmap2.br_blockcount == cmap->br_blockcount); > + ASSERT(cmap2.br_startblock == cmap->br_startblock); > + ASSERT(cmap2.br_state == cmap->br_state); > +} > +#else > +# define xfs_check_atomic_conversion(...) ((void)0) > +#endif > + > static int > xfs_atomic_write_cow_iomap_begin( > struct inode *inode, > @@ -1102,9 +1125,10 @@ xfs_atomic_write_cow_iomap_begin( > { > struct xfs_inode *ip = XFS_I(inode); > struct xfs_mount *mp = ip->i_mount; > - const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); > - xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); > - xfs_filblks_t count_fsb = end_fsb - offset_fsb; > + const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); > + const xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + length); > + const xfs_filblks_t count_fsb = end_fsb - offset_fsb; > + xfs_filblks_t hole_count_fsb; > int nmaps = 1; > xfs_filblks_t resaligned; > struct xfs_bmbt_irec cmap; > @@ -1143,14 +1167,20 @@ xfs_atomic_write_cow_iomap_begin( > if (cmap.br_startoff <= offset_fsb) { > if (isnullstartblock(cmap.br_startblock)) > goto convert; > + > + /* > + * cmap could extend outside the write range due to previous > + * speculative preallocations. We must trim cmap to the write > + * range because the cow fork treats written mappings to mean > + * "write in progress". > + */ > xfs_trim_extent(&cmap, offset_fsb, count_fsb); > goto found; > } > > - end_fsb = cmap.br_startoff; > - count_fsb = end_fsb - offset_fsb; > + hole_count_fsb = cmap.br_startoff - offset_fsb; > > - resaligned = xfs_aligned_fsb_count(offset_fsb, count_fsb, > + resaligned = xfs_aligned_fsb_count(offset_fsb, hole_count_fsb, > xfs_get_cowextsz_hint(ip)); > xfs_iunlock(ip, XFS_ILOCK_EXCL); > > @@ -1186,7 +1216,7 @@ xfs_atomic_write_cow_iomap_begin( > * atomic writes to that same range will be aligned (and don't require > * this COW-based method). > */ > - error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, > + error = xfs_bmapi_write(tp, ip, offset_fsb, hole_count_fsb, > XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC | > XFS_BMAPI_EXTSZALIGN, 0, &cmap, &nmaps); > if (error) { > @@ -1199,17 +1229,25 @@ xfs_atomic_write_cow_iomap_begin( > if (error) > goto out_unlock; > > + /* > + * cmap could map more blocks than the range we passed into bmapi_write > + * because of EXTSZALIGN or adjacent pre-existing unwritten mappings > + * that were merged. Trim cmap to the original write range so that we > + * don't convert more than we were asked to do for this write. > + */ > + xfs_trim_extent(&cmap, offset_fsb, count_fsb); > + > found: > if (cmap.br_state != XFS_EXT_NORM) { > - error = xfs_reflink_convert_cow_locked(ip, offset_fsb, > - count_fsb); > + error = xfs_reflink_convert_cow_locked(ip, cmap.br_startoff, > + cmap.br_blockcount); > if (error) > goto out_unlock; > cmap.br_state = XFS_EXT_NORM; > + xfs_check_atomic_conversion(ip, offset_fsb, count_fsb, &cmap); > } > > - length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount); > - trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap); > + trace_xfs_iomap_found(ip, offset, length, XFS_COW_FORK, &cmap); > seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED); > xfs_iunlock(ip, XFS_ILOCK_EXCL); > return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin 2025-11-04 12:07 ` John Garry @ 2025-11-04 17:18 ` Darrick J. Wong 2025-11-05 12:21 ` John Garry 0 siblings, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2025-11-04 17:18 UTC (permalink / raw) To: John Garry Cc: Carlos Maiolino, Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, linux-xfs On Tue, Nov 04, 2025 at 12:07:53PM +0000, John Garry wrote: > On 03/11/2025 17:44, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > I think there are several things wrong with this function: > > > > A) xfs_bmapi_write can return a much larger unwritten mapping than what > > the caller asked for. We convert part of that range to written, but > > return the entire written mapping to iomap even though that's > > inaccurate. > > > > B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an > > unwritten mapping could be *smaller* than the write range (or even > > the hole range). In this case, we convert too much file range to > > written state because we then return a smaller mapping to iomap. > > > > C) It doesn't handle delalloc mappings. This I covered in the patch > > that I already sent to the list. > > > > D) Reassigning count_fsb to handle the hole means that if the second > > cmap lookup attempt succeeds (due to racing with someone else) we > > trim the mapping more than is strictly necessary. The changing > > meaning of count_fsb makes this harder to notice. > > > > E) The tracepoint is kinda wrong because @length is mutated. That makes > > it harder to chase the data flows through this function because you > > can't just grep on the pos/bytecount strings. > > > > F) We don't actually check that the br_state = XFS_EXT_NORM assignment > > is accurate, i.e that the cow fork actually contains a written > > mapping for the range we're interested in > > > > G) Somewhat inadequate documentation of why we need to xfs_trim_extent > > so aggressively in this function. > > > > H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped > > the write range to s_maxbytes. > > Can you point out that code? I wonder if we should be rejecting anything > which goes over s_maxbytes for RWF_ATOMIC. generic_write_check_limits: https://elixir.bootlin.com/linux/v6.17.7/source/fs/read_write.c#L1709 xfs_file_dio_write_atomic -> xfs_file_write_checks -> generic_write_checks -> generic_write_checks_count -> generic_write_check_limits > > > > Fix these issues, and then the atomic writes regressions in generic/760, > > generic/617, generic/091, generic/263, and generic/521 all go away for > > me. > > > > Cc: <stable@vger.kernel.org> # v6.16 > > Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > this looks ok, thanks > > Reviewed-by: John Garry <john.g.garry@oracle.com> > > I'm just doing powerfail testing - I'll let you know if any issues found > > > --- > > fs/xfs/xfs_iomap.c | 60 ++++++++++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 49 insertions(+), 11 deletions(-) > > > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > index e1da06b157cf94..469f34034daddd 100644 > > --- a/fs/xfs/xfs_iomap.c > > +++ b/fs/xfs/xfs_iomap.c > > @@ -1091,6 +1091,29 @@ const struct iomap_ops xfs_zoned_direct_write_iomap_ops = { > > }; > > #endif /* CONFIG_XFS_RT */ > > +#ifdef DEBUG > > +static void > > +xfs_check_atomic_conversion( > > xfs_check_atomic_cow_conversion() might be better, I don't think that it is > so important Ok. --D > > + struct xfs_inode *ip, > > + xfs_fileoff_t offset_fsb, > > + xfs_filblks_t count_fsb, > > + const struct xfs_bmbt_irec *cmap) > > +{ > > + struct xfs_iext_cursor icur; > > + struct xfs_bmbt_irec cmap2 = { }; > > + > > + if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap2)) > > + xfs_trim_extent(&cmap2, offset_fsb, count_fsb); > > + > > + ASSERT(cmap2.br_startoff == cmap->br_startoff); > > + ASSERT(cmap2.br_blockcount == cmap->br_blockcount); > > + ASSERT(cmap2.br_startblock == cmap->br_startblock); > > + ASSERT(cmap2.br_state == cmap->br_state); > > +} > > +#else > > +# define xfs_check_atomic_conversion(...) ((void)0) > > +#endif > > + > > static int > > xfs_atomic_write_cow_iomap_begin( > > struct inode *inode, > > @@ -1102,9 +1125,10 @@ xfs_atomic_write_cow_iomap_begin( > > { > > struct xfs_inode *ip = XFS_I(inode); > > struct xfs_mount *mp = ip->i_mount; > > - const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); > > - xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); > > - xfs_filblks_t count_fsb = end_fsb - offset_fsb; > > + const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); > > + const xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + length); > > + const xfs_filblks_t count_fsb = end_fsb - offset_fsb; > > + xfs_filblks_t hole_count_fsb; > > int nmaps = 1; > > xfs_filblks_t resaligned; > > struct xfs_bmbt_irec cmap; > > @@ -1143,14 +1167,20 @@ xfs_atomic_write_cow_iomap_begin( > > if (cmap.br_startoff <= offset_fsb) { > > if (isnullstartblock(cmap.br_startblock)) > > goto convert; > > + > > + /* > > + * cmap could extend outside the write range due to previous > > + * speculative preallocations. We must trim cmap to the write > > + * range because the cow fork treats written mappings to mean > > + * "write in progress". > > + */ > > xfs_trim_extent(&cmap, offset_fsb, count_fsb); > > goto found; > > } > > - end_fsb = cmap.br_startoff; > > - count_fsb = end_fsb - offset_fsb; > > + hole_count_fsb = cmap.br_startoff - offset_fsb; > > - resaligned = xfs_aligned_fsb_count(offset_fsb, count_fsb, > > + resaligned = xfs_aligned_fsb_count(offset_fsb, hole_count_fsb, > > xfs_get_cowextsz_hint(ip)); > > xfs_iunlock(ip, XFS_ILOCK_EXCL); > > @@ -1186,7 +1216,7 @@ xfs_atomic_write_cow_iomap_begin( > > * atomic writes to that same range will be aligned (and don't require > > * this COW-based method). > > */ > > - error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, > > + error = xfs_bmapi_write(tp, ip, offset_fsb, hole_count_fsb, > > XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC | > > XFS_BMAPI_EXTSZALIGN, 0, &cmap, &nmaps); > > if (error) { > > @@ -1199,17 +1229,25 @@ xfs_atomic_write_cow_iomap_begin( > > if (error) > > goto out_unlock; > > + /* > > + * cmap could map more blocks than the range we passed into bmapi_write > > + * because of EXTSZALIGN or adjacent pre-existing unwritten mappings > > + * that were merged. Trim cmap to the original write range so that we > > + * don't convert more than we were asked to do for this write. > > + */ > > + xfs_trim_extent(&cmap, offset_fsb, count_fsb); > > + > > found: > > if (cmap.br_state != XFS_EXT_NORM) { > > - error = xfs_reflink_convert_cow_locked(ip, offset_fsb, > > - count_fsb); > > + error = xfs_reflink_convert_cow_locked(ip, cmap.br_startoff, > > + cmap.br_blockcount); > > if (error) > > goto out_unlock; > > cmap.br_state = XFS_EXT_NORM; > > + xfs_check_atomic_conversion(ip, offset_fsb, count_fsb, &cmap); > > } > > - length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount); > > - trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap); > > + trace_xfs_iomap_found(ip, offset, length, XFS_COW_FORK, &cmap); > > seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED); > > xfs_iunlock(ip, XFS_ILOCK_EXCL); > > return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq); > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin 2025-11-04 17:18 ` Darrick J. Wong @ 2025-11-05 12:21 ` John Garry 2025-11-05 19:18 ` Darrick J. Wong 0 siblings, 1 reply; 8+ messages in thread From: John Garry @ 2025-11-05 12:21 UTC (permalink / raw) To: Darrick J. Wong Cc: Carlos Maiolino, Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, linux-xfs On 04/11/2025 17:18, Darrick J. Wong wrote: >> Can you point out that code? I wonder if we should be rejecting anything >> which goes over s_maxbytes for RWF_ATOMIC. > generic_write_check_limits: > https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.17.7/ > source/fs/read_write.c*L1709__;Iw!!ACWV5N9M2RV99hQ!M- > J1QmUWrGHUBb6gf9LHN33HxhBfk3rHcNR5z_glUHClffIPQ5UFQ1zmHpsetFGz3_3lHzUrwyAASD_90A$ > > xfs_file_dio_write_atomic -> xfs_file_write_checks -> > generic_write_checks -> generic_write_checks_count -> > generic_write_check_limits ok, thanks for the pointer. So should we stop any possible truncation for RWF_ATOMIC, like: +++ b/fs/xfs/xfs_file.c @@ -440,6 +440,9 @@ xfs_file_write_checks( if (error <= 0) return error; + if (error != count && iocb->ki_flags & IOCB_ATOMIC) + return -EINVAL; + if (iocb->ki_flags & IOCB_NOWAIT) { error = break_layout(inode, false); if (error == -EWOULDBLOCK) Note that I do realize that this may be better in generic fs code. Thanks, John ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin 2025-11-05 12:21 ` John Garry @ 2025-11-05 19:18 ` Darrick J. Wong 0 siblings, 0 replies; 8+ messages in thread From: Darrick J. Wong @ 2025-11-05 19:18 UTC (permalink / raw) To: John Garry Cc: Carlos Maiolino, Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, linux-xfs On Wed, Nov 05, 2025 at 12:21:15PM +0000, John Garry wrote: > On 04/11/2025 17:18, Darrick J. Wong wrote: > > > Can you point out that code? I wonder if we should be rejecting anything > > > which goes over s_maxbytes for RWF_ATOMIC. > > generic_write_check_limits: > > https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.17.7/ > > source/fs/read_write.c*L1709__;Iw!!ACWV5N9M2RV99hQ!M- J1QmUWrGHUBb6gf9LHN33HxhBfk3rHcNR5z_glUHClffIPQ5UFQ1zmHpsetFGz3_3lHzUrwyAASD_90A$ > > > > xfs_file_dio_write_atomic -> xfs_file_write_checks -> > > generic_write_checks -> generic_write_checks_count -> > > generic_write_check_limits > > ok, thanks for the pointer. > > So should we stop any possible truncation for RWF_ATOMIC, like: > > +++ b/fs/xfs/xfs_file.c > @@ -440,6 +440,9 @@ xfs_file_write_checks( > if (error <= 0) > return error; > > + if (error != count && iocb->ki_flags & IOCB_ATOMIC) > + return -EINVAL; > + > if (iocb->ki_flags & IOCB_NOWAIT) { > error = break_layout(inode, false); > if (error == -EWOULDBLOCK) > > > Note that I do realize that this may be better in generic fs code. Yeah, I think that'd be better placed in generic_write_checks_count if _limit() changes &count. --D > Thanks, > John > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes 2025-11-03 17:40 [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes Darrick J. Wong 2025-11-03 17:44 ` [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin Darrick J. Wong @ 2025-11-04 10:08 ` John Garry 2025-11-04 17:24 ` Darrick J. Wong 1 sibling, 1 reply; 8+ messages in thread From: John Garry @ 2025-11-04 10:08 UTC (permalink / raw) To: Darrick J. Wong, Carlos Maiolino Cc: Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, linux-xfs On 03/11/2025 17:40, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > With the 20 Oct 2025 release of fstests, generic/521 fails for me on > regular (aka non-block-atomic-writes) storage: > > QA output created by 521 > dowrite: write: Input/output error > LOG DUMP (8553 total operations): > 1( 1 mod 256): SKIPPED (no operation) > 2( 2 mod 256): WRITE 0x7e000 thru 0x8dfff (0x10000 bytes) HOLE > 3( 3 mod 256): READ 0x69000 thru 0x79fff (0x11000 bytes) > 4( 4 mod 256): FALLOC 0x53c38 thru 0x5e853 (0xac1b bytes) INTERIOR > 5( 5 mod 256): COPY 0x55000 thru 0x59fff (0x5000 bytes) to 0x25000 thru 0x29fff > 6( 6 mod 256): WRITE 0x74000 thru 0x88fff (0x15000 bytes) > 7( 7 mod 256): ZERO 0xedb1 thru 0x11693 (0x28e3 bytes) > > with a warning in dmesg from iomap about XFS trying to give it a > delalloc mapping for a directio write. Fix the software atomic write > iomap_begin code to convert the reservation into a written mapping. > This doesn't fix the data corruption problems reported by generic/760, > but it's a start. > > Cc: <stable@vger.kernel.org> # v6.16 > Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> FWIW: Reviewed-by: John Garry <john.g.garry@oracle.com> > --- > fs/xfs/xfs_iomap.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index d3f6e3e42a1191..e1da06b157cf94 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -1130,7 +1130,7 @@ xfs_atomic_write_cow_iomap_begin( > return -EAGAIN; > > trace_xfs_iomap_atomic_write_cow(ip, offset, length); > - > +retry: > xfs_ilock(ip, XFS_ILOCK_EXCL); > > if (!ip->i_cowfp) { > @@ -1141,6 +1141,8 @@ xfs_atomic_write_cow_iomap_begin( > if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap)) > cmap.br_startoff = end_fsb; > if (cmap.br_startoff <= offset_fsb) { > + if (isnullstartblock(cmap.br_startblock)) This following comment is unrelated to this patch and is only relevant to pre-existing code: isnullstartblock() seems to be a check specific to delayed allocation, so I don't why "null" is used in the name, and not "delalloc" or something else more specific. I guess that there is some history here (behind the naming). > + goto convert; > xfs_trim_extent(&cmap, offset_fsb, count_fsb); > goto found; > } > @@ -1169,8 +1171,10 @@ xfs_atomic_write_cow_iomap_begin( > if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap)) > cmap.br_startoff = end_fsb; > if (cmap.br_startoff <= offset_fsb) { > - xfs_trim_extent(&cmap, offset_fsb, count_fsb); > xfs_trans_cancel(tp); > + if (isnullstartblock(cmap.br_startblock)) > + goto convert; > + xfs_trim_extent(&cmap, offset_fsb, count_fsb); > goto found; > } > > @@ -1210,6 +1214,19 @@ xfs_atomic_write_cow_iomap_begin( > xfs_iunlock(ip, XFS_ILOCK_EXCL); > return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq); > > +convert: minor comment: could convert_delay be a better name, like used in xfs_buffered_write_iomap_begin()? > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > + error = xfs_bmapi_convert_delalloc(ip, XFS_COW_FORK, offset, iomap, > + NULL); > + if (error) > + return error; > + > + /* > + * Try the lookup again, because the delalloc conversion might have > + * turned the COW mapping into unwritten, but we need it to be in > + * written state. > + */ > + goto retry; > out_unlock: > xfs_iunlock(ip, XFS_ILOCK_EXCL); > return error; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes 2025-11-04 10:08 ` [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes John Garry @ 2025-11-04 17:24 ` Darrick J. Wong 0 siblings, 0 replies; 8+ messages in thread From: Darrick J. Wong @ 2025-11-04 17:24 UTC (permalink / raw) To: John Garry Cc: Carlos Maiolino, Ojaswin Mujoo, Zorro Lang, fstests, Ritesh Harjani, linux-xfs On Tue, Nov 04, 2025 at 10:08:10AM +0000, John Garry wrote: > On 03/11/2025 17:40, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > With the 20 Oct 2025 release of fstests, generic/521 fails for me on > > regular (aka non-block-atomic-writes) storage: > > > > QA output created by 521 > > dowrite: write: Input/output error > > LOG DUMP (8553 total operations): > > 1( 1 mod 256): SKIPPED (no operation) > > 2( 2 mod 256): WRITE 0x7e000 thru 0x8dfff (0x10000 bytes) HOLE > > 3( 3 mod 256): READ 0x69000 thru 0x79fff (0x11000 bytes) > > 4( 4 mod 256): FALLOC 0x53c38 thru 0x5e853 (0xac1b bytes) INTERIOR > > 5( 5 mod 256): COPY 0x55000 thru 0x59fff (0x5000 bytes) to 0x25000 thru 0x29fff > > 6( 6 mod 256): WRITE 0x74000 thru 0x88fff (0x15000 bytes) > > 7( 7 mod 256): ZERO 0xedb1 thru 0x11693 (0x28e3 bytes) > > > > with a warning in dmesg from iomap about XFS trying to give it a > > delalloc mapping for a directio write. Fix the software atomic write > > iomap_begin code to convert the reservation into a written mapping. > > This doesn't fix the data corruption problems reported by generic/760, > > but it's a start. > > > > Cc: <stable@vger.kernel.org> # v6.16 > > Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()") > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> > > FWIW: > > Reviewed-by: John Garry <john.g.garry@oracle.com> > > > --- > > fs/xfs/xfs_iomap.c | 21 +++++++++++++++++++-- > > 1 file changed, 19 insertions(+), 2 deletions(-) > > > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > index d3f6e3e42a1191..e1da06b157cf94 100644 > > --- a/fs/xfs/xfs_iomap.c > > +++ b/fs/xfs/xfs_iomap.c > > @@ -1130,7 +1130,7 @@ xfs_atomic_write_cow_iomap_begin( > > return -EAGAIN; > > trace_xfs_iomap_atomic_write_cow(ip, offset, length); > > - > > +retry: > > xfs_ilock(ip, XFS_ILOCK_EXCL); > > if (!ip->i_cowfp) { > > @@ -1141,6 +1141,8 @@ xfs_atomic_write_cow_iomap_begin( > > if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap)) > > cmap.br_startoff = end_fsb; > > if (cmap.br_startoff <= offset_fsb) { > > + if (isnullstartblock(cmap.br_startblock)) > > This following comment is unrelated to this patch and is only relevant to > pre-existing code: > > isnullstartblock() seems to be a check specific to delayed allocation, so I > don't why "null" is used in the name, and not "delalloc" or something else > more specific. > > I guess that there is some history here (behind the naming). I think the "null" is meant in the sense of "null pointer to storage device", which is an odd way of saying "file range space reservation" :) If you use high-level function xfs_bmapi_read(), then it sets br_startblock to DELAYSTARTBLOCK which is a little more clear. But here we're doing a direct lookup in the iext tree, so we have to interpret the raw incore record. For a delayed allocation of N blocks, we reserve those N blocks from the free space counter and stuff that in br_blockcount; and enough space to handle btree expansions in the lower 17 bits of br_startblock. That's why isnullstartblock does a bunch of masking magic. > > + goto convert; > > xfs_trim_extent(&cmap, offset_fsb, count_fsb); > > goto found; > > } > > @@ -1169,8 +1171,10 @@ xfs_atomic_write_cow_iomap_begin( > > if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap)) > > cmap.br_startoff = end_fsb; > > if (cmap.br_startoff <= offset_fsb) { > > - xfs_trim_extent(&cmap, offset_fsb, count_fsb); > > xfs_trans_cancel(tp); > > + if (isnullstartblock(cmap.br_startblock)) > > + goto convert; > > + xfs_trim_extent(&cmap, offset_fsb, count_fsb); > > goto found; > > } > > @@ -1210,6 +1214,19 @@ xfs_atomic_write_cow_iomap_begin( > > xfs_iunlock(ip, XFS_ILOCK_EXCL); > > return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq); > > +convert: > > minor comment: > > could convert_delay be a better name, like used in > xfs_buffered_write_iomap_begin()? Yeah, that'll be more consistent. Thanks for reviewing both patches. --D > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > + error = xfs_bmapi_convert_delalloc(ip, XFS_COW_FORK, offset, iomap, > > + NULL); > > + if (error) > > + return error; > > + > > + /* > > + * Try the lookup again, because the delalloc conversion might have > > + * turned the COW mapping into unwritten, but we need it to be in > > + * written state. > > + */ > > + goto retry; > > out_unlock: > > xfs_iunlock(ip, XFS_ILOCK_EXCL); > > return error; > > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-05 19:18 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-03 17:40 [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes Darrick J. Wong 2025-11-03 17:44 ` [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin Darrick J. Wong 2025-11-04 12:07 ` John Garry 2025-11-04 17:18 ` Darrick J. Wong 2025-11-05 12:21 ` John Garry 2025-11-05 19:18 ` Darrick J. Wong 2025-11-04 10:08 ` [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes John Garry 2025-11-04 17:24 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox