linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>,
	Zorro Lang <zlang@redhat.com>,
	fstests@vger.kernel.org, Ritesh Harjani <ritesh.list@gmail.com>,
	djwong@kernel.org, tytso@mit.edu, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx
Date: Tue, 21 Oct 2025 07:30:32 -0400	[thread overview]
Message-ID: <aPdu2DtLSNrI7gfp@bfoster> (raw)
In-Reply-To: <aPdgR5gdA3l3oTLQ@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>

On Tue, Oct 21, 2025 at 03:58:23PM +0530, Ojaswin Mujoo wrote:
> On Mon, Oct 20, 2025 at 11:33:40AM +0100, John Garry wrote:
> > On 06/10/2025 14:20, Ojaswin Mujoo wrote:
> > > Hi Zorro, thanks for checking this. So correct me if im wrong but I
> > > understand that you have run this test on an atomic writes enabled
> > > kernel where the stack also supports atomic writes.
> > > 
> > > Looking at the bad data log:
> > > 
> > > 	+READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/xfstests/test/junk
> > > 	+OFFSET      GOOD    BAD     RANGE
> > > 	+0x1c000     0x0000  0xcdcd  0x0
> > > 	+operation# (mod 256) for the bad data may be 205
> > > 
> > > We see that 0x0000 was expected but we got 0xcdcd. Now the operation
> > > that caused this is indicated to be 205, but looking at that operation:
> > > 
> > > +205(205 mod 256): ZERO     0x6dbe6 thru 0x6e6aa	(0xac5 bytes)
> > > 
> > > This doesn't even overlap the range that is bad. (0x1c000 to 0x1c00f).
> > > Infact, it does seem like an unlikely coincidence that the actual data
> > > in the bad range is 0xcdcd which is something xfs_io -c "pwrite" writes
> > > to default (fsx writes random data in even offsets and operation num in
> > > odd).
> > > 
> > > I am able to replicate this but only on XFS but not on ext4 (atleast not
> > > in 20 runs).  I'm trying to better understand if this is a test issue or
> > > not. Will keep you update.
> > 
> > 
> > Hi Ojaswin,
> > 
> > Sorry for the very slow response.
> > 
> > Are you still checking this issue?
> > 
> > To replicate, should I just take latest xfs kernel and run this series on
> > top of latest xfstests? Is it 100% reproducible?
> > 
> > Thanks,
> > John
> 
> Hi John,
> 
> Yes Im looking into it but I'm now starting to run into some reflink/cow
> based concepts that are taking time to understand. Let me share what I
> have till now:
> 
> So the test.sh that I'm using can be found here [1] which just uses an
> fsx replay file (which replays all operations) present in the same repo
> [2]. If you see the replay file, there are a bunch of random operations
> followed by the last 2 commented out operations:
> 
> # copy_range 0xd000 0x1000 0x1d800 0x44000   <--- # operations <start> <len> <dest of copy> <filesize (can be ignored)>
> # mapread 0x1e000 0x1000 0x1e400 *
> 
> The copy_range here is the one which causes (or exposes) the corruption
> at 0x1e800 (the end of copy range destination gets corrupted).
> 
> To have more control, I commented these 2 operations and am doing it by
> hand in the test.sh file, with xfs_io. I'm also using a non atomic write
> device so we only have S/W fallback.
> 
> Now some observations:
> 
> 1. The copy_range operations is actually copying from a hole to a hole,
> so we should be reading all 0s. But What I see is the following happening:
> 
>   vfs_copy_file_range
>    do_splice_direct
>     do_splice_direct_actor
>      do_splice_read
>        # Adds the folio at src offset to the pipe. I confirmed this is all 0x0.
>      splice_direct_to_actor
>       direct_splice_actor
>        do_splice_from
>         iter_file_splice_write
>          xfs_file_write_iter
>           xfs_file_buffered_write
>            iomap_file_buferred_write
>             iomap_iter
>              xfs_buferred_write_iomap_begin
>                # Here we correctly see that there is noting at the
>                # destination in data fork, but somehow we find a mapped
>                # extent in cow fork which is returned to iomap.
>              iomap_write_iter
>               __iomap_write_begin
>                 # Here we notice folio is not uptodate and call
>                 # iomap_read_folio_range() to read from the cow_fork
>                 # mapping we found earlier. This results in folio having
>                 # incorrect data at 0x1e800 offset.
> 
>  So it seems like the fsx operations might be corrupting the cow fork state
>  somehow leading to stale data exposure. 
> 
> 2. If we disable atomic writes we dont hit the issue.
> 
> 3. If I do a -c pread of the destination range before doing the
> copy_range operation then I don't see the corruption any more.
> 
> I'm now trying to figure out why the mapping returned is not IOMAP_HOLE
> as it should be. I don't know the COW path in xfs so there are some gaps
> in my understanding. Let me know if you need any other information since
> I'm reliably able to replicate on 6.17.0-rc4.
> 

I haven't followed your issue closely, but just on this hole vs. COW
thing, XFS has a bit of a quirk where speculative COW fork preallocation
can expand out over holes in the data fork. If iomap lookup for buffered
write sees COW fork blocks present, it reports those blocks as the
primary mapping even if the data fork happens to be a hole (since
there's no point in allocating blocks to the data fork when we can just
remap).

Again I've no idea if this relates to your issue or what you're
referring to as a hole (i.e. data fork only?), but just pointing it out.
The latest iomap/xfs patches I posted a few days ago kind of dance
around this a bit, but I was somewhat hoping that maybe the cleanups
there would trigger some thoughts on better iomap reporting in that
regard.

Brian

> [1]
> https://github.com/OjaswinM/fsx-aw-issue/tree/master
> 
> [2] https://github.com/OjaswinM/fsx-aw-issue/blob/master/repro.fsxops
> 
> regards,
> ojaswin
> 


  reply	other threads:[~2025-10-21 11:26 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-19  6:47 [PATCH v7 00/11] Add more tests for multi fs block atomic writes Ojaswin Mujoo
2025-09-19  6:47 ` [PATCH v7 01/12] common/rc: Add _min() and _max() helpers Ojaswin Mujoo
2025-09-19  6:47 ` [PATCH v7 02/12] common/rc: Add fio atomic write helpers Ojaswin Mujoo
2025-09-19 16:27   ` Darrick J. Wong
2025-09-19  6:47 ` [PATCH v7 03/12] common/rc: Add a helper to run fsx on a given file Ojaswin Mujoo
2025-09-19  6:47 ` [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx Ojaswin Mujoo
2025-09-28  8:55   ` Zorro Lang
2025-09-28 13:19   ` Zorro Lang
2025-10-02 17:56     ` Ojaswin Mujoo
2025-10-03 17:19       ` Zorro Lang
2025-10-05 12:57         ` Ojaswin Mujoo
2025-10-05 15:39           ` Zorro Lang
2025-10-06 13:20             ` Ojaswin Mujoo
2025-10-07  9:58               ` Ojaswin Mujoo
2025-10-17 16:01                 ` Zorro Lang
2025-10-17 16:27                   ` Darrick J. Wong
2025-10-17 18:47                     ` Zorro Lang
2025-10-17 22:52                       ` Darrick J. Wong
2025-10-20 10:33               ` John Garry
2025-10-21 10:28                 ` Ojaswin Mujoo
2025-10-21 11:30                   ` Brian Foster [this message]
2025-10-21 11:58                     ` Ojaswin Mujoo
2025-10-21 17:44                       ` Darrick J. Wong
2025-10-22  7:40                         ` Ojaswin Mujoo
2025-10-23 15:44                           ` John Garry
2025-10-23 17:55                             ` Darrick J. Wong
2025-09-19  6:47 ` [PATCH v7 05/12] generic: Add atomic write test using fio crc check verifier Ojaswin Mujoo
2025-10-28  9:42   ` Ojaswin Mujoo
2025-11-01  9:00     ` Zorro Lang
2025-09-19  6:47 ` [PATCH v7 06/12] generic: Add atomic write test using fio verify on file mixed mappings Ojaswin Mujoo
2025-09-19  6:48 ` [PATCH v7 07/12] generic: Add atomic write multi-fsblock O_[D]SYNC tests Ojaswin Mujoo
2025-09-19  6:48 ` [PATCH v7 08/12] generic: Stress fsx with atomic writes enabled Ojaswin Mujoo
2025-09-19  6:48 ` [PATCH v7 09/12] generic: Add sudden shutdown tests for multi block atomic writes Ojaswin Mujoo
2025-09-19  6:48 ` [PATCH v7 10/12] ext4: Test atomic write and ioend codepaths with bigalloc Ojaswin Mujoo
2025-09-19  6:48 ` [PATCH v7 11/12] ext4: Test atomic writes allocation and write " Ojaswin Mujoo
2025-09-19  6:48 ` [PATCH v7 12/12] ext4: Atomic write test for extent split across leaf nodes Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPdu2DtLSNrI7gfp@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=john.g.garry@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).