public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: Carlos Maiolino <cem@kernel.org>,
	Ojaswin Mujoo <ojaswin@linux.ibm.com>,
	Zorro Lang <zlang@redhat.com>,
	fstests@vger.kernel.org, Ritesh Harjani <ritesh.list@gmail.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes
Date: Tue, 4 Nov 2025 09:24:53 -0800	[thread overview]
Message-ID: <20251104172453.GM196370@frogsfrogsfrogs> (raw)
In-Reply-To: <cb1f1963-8ca4-460f-b620-6026a26ce9eb@oracle.com>

On Tue, Nov 04, 2025 at 10:08:10AM +0000, John Garry wrote:
> On 03/11/2025 17:40, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > With the 20 Oct 2025 release of fstests, generic/521 fails for me on
> > regular (aka non-block-atomic-writes) storage:
> > 
> > QA output created by 521
> > dowrite: write: Input/output error
> > LOG DUMP (8553 total operations):
> > 1(  1 mod 256): SKIPPED (no operation)
> > 2(  2 mod 256): WRITE    0x7e000 thru 0x8dfff	(0x10000 bytes) HOLE
> > 3(  3 mod 256): READ     0x69000 thru 0x79fff	(0x11000 bytes)
> > 4(  4 mod 256): FALLOC   0x53c38 thru 0x5e853	(0xac1b bytes) INTERIOR
> > 5(  5 mod 256): COPY 0x55000 thru 0x59fff	(0x5000 bytes) to 0x25000 thru 0x29fff
> > 6(  6 mod 256): WRITE    0x74000 thru 0x88fff	(0x15000 bytes)
> > 7(  7 mod 256): ZERO     0xedb1 thru 0x11693	(0x28e3 bytes)
> > 
> > with a warning in dmesg from iomap about XFS trying to give it a
> > delalloc mapping for a directio write.  Fix the software atomic write
> > iomap_begin code to convert the reservation into a written mapping.
> > This doesn't fix the data corruption problems reported by generic/760,
> > but it's a start.
> > 
> > Cc: <stable@vger.kernel.org> # v6.16
> > Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
> > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> FWIW:
> 
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> 
> > ---
> >   fs/xfs/xfs_iomap.c |   21 +++++++++++++++++++--
> >   1 file changed, 19 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index d3f6e3e42a1191..e1da06b157cf94 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1130,7 +1130,7 @@ xfs_atomic_write_cow_iomap_begin(
> >   		return -EAGAIN;
> >   	trace_xfs_iomap_atomic_write_cow(ip, offset, length);
> > -
> > +retry:
> >   	xfs_ilock(ip, XFS_ILOCK_EXCL);
> >   	if (!ip->i_cowfp) {
> > @@ -1141,6 +1141,8 @@ xfs_atomic_write_cow_iomap_begin(
> >   	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap))
> >   		cmap.br_startoff = end_fsb;
> >   	if (cmap.br_startoff <= offset_fsb) {
> > +		if (isnullstartblock(cmap.br_startblock))
> 
> This following comment is unrelated to this patch and is only relevant to
> pre-existing code:
> 
> isnullstartblock() seems to be a check specific to delayed allocation, so I
> don't why "null" is used in the name, and not "delalloc" or something else
> more specific.
> 
> I guess that there is some history here (behind the naming).

I think the "null" is meant in the sense of "null pointer to storage
device", which is an odd way of saying "file range space reservation" :)

If you use high-level function xfs_bmapi_read(), then it sets
br_startblock to DELAYSTARTBLOCK which is a little more clear.

But here we're doing a direct lookup in the iext tree, so we have to
interpret the raw incore record.  For a delayed allocation of N blocks,
we reserve those N blocks from the free space counter and stuff that in
br_blockcount; and enough space to handle btree expansions in the lower
17 bits of br_startblock.  That's why isnullstartblock does a bunch of
masking magic.

> > +			goto convert;
> >   		xfs_trim_extent(&cmap, offset_fsb, count_fsb);
> >   		goto found;
> >   	}
> > @@ -1169,8 +1171,10 @@ xfs_atomic_write_cow_iomap_begin(
> >   	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap))
> >   		cmap.br_startoff = end_fsb;
> >   	if (cmap.br_startoff <= offset_fsb) {
> > -		xfs_trim_extent(&cmap, offset_fsb, count_fsb);
> >   		xfs_trans_cancel(tp);
> > +		if (isnullstartblock(cmap.br_startblock))
> > +			goto convert;
> > +		xfs_trim_extent(&cmap, offset_fsb, count_fsb);
> >   		goto found;
> >   	}
> > @@ -1210,6 +1214,19 @@ xfs_atomic_write_cow_iomap_begin(
> >   	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >   	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq);
> > +convert:
> 
> minor comment:
> 
> could convert_delay be a better name, like used in
> xfs_buffered_write_iomap_begin()?

Yeah, that'll be more consistent.  Thanks for reviewing both patches.

--D

> > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +	error = xfs_bmapi_convert_delalloc(ip, XFS_COW_FORK, offset, iomap,
> > +			NULL);
> > +	if (error)
> > +		return error;
> > +
> > +	/*
> > +	 * Try the lookup again, because the delalloc conversion might have
> > +	 * turned the COW mapping into unwritten, but we need it to be in
> > +	 * written state.
> > +	 */
> > +	goto retry;
> >   out_unlock:
> >   	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >   	return error;
> 
> 

      reply	other threads:[~2025-11-04 17:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-03 17:40 [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes Darrick J. Wong
2025-11-03 17:44 ` [PATCH 2/2] xfs: fix various problems in xfs_atomic_write_cow_iomap_begin Darrick J. Wong
2025-11-04 12:07   ` John Garry
2025-11-04 17:18     ` Darrick J. Wong
2025-11-05 12:21       ` John Garry
2025-11-05 19:18         ` Darrick J. Wong
2025-11-04 10:08 ` [PATCH 1/2] xfs: fix delalloc write failures in software-provided atomic writes John Garry
2025-11-04 17:24   ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251104172453.GM196370@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=cem@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=john.g.garry@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox