From: Brian Foster <bfoster@redhat.com>
To: Gregg Leventhal <gleventhal@janestreet.com>
Cc: Eric Hagberg <ehagberg@janestreet.com>,
hch@infradead.org, djwong@kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org,
Jens Axboe <axboe@kernel.dk>,
stable@vger.kernel.org
Subject: Re: [BUG] iomap/io_uring: O_APPEND async buffered write silently re-appends a data chunk (corruption) on XFS, 6.1.y/6.12.y
Date: Wed, 10 Jun 2026 13:34:06 -0400 [thread overview]
Message-ID: <aimgDnzB_NYqOTx1@bfoster> (raw)
In-Reply-To: <CAFN_u7ELBj3YKncm6HA4-QUNyi-a3qPDEYxuLP+skVhm-r87uw@mail.gmail.com>
On Tue, Jun 09, 2026 at 01:14:40PM -0400, Gregg Leventhal wrote:
> I reproduce it by running 25 ~ concurrent instances of the attached reproducer,
> each writing its own file, on an otherwise-idle 15 GB VM:
>
> DIR=$(mktemp -d /tmp/uring.XXXXXX)
> for i in {1..25}; do
> ./repro_uring_dup "$DIR/file_$i" 120 48 &
> done
> ...
> *** CORRUPTION DETECTED in /tmp/UmgK/file_17.1 ***
> bytes kernel said it wrote (sum of CQE results): 53621960
> actual file size: 56218824
> extra (duplicated) bytes: 2596864
> first mismatching offset: 6791168 (0x67a000) page_aligned=YES
> expected u64 848896 but found 524288 (content from byte offset
> 4194304 reappeared here)
> (file kept for inspection)
>
>
>
> wait
>
> *** CORRUPTION DETECTED in /tmp/Gznx/file_18.2 ***
> bytes kernel said it wrote (sum of CQE results): 58112616
> actual file size: 60303976
> extra (duplicated) bytes: 2191360
> first mismatching offset: 2191360 (0x217000) page_aligned=YES
> expected u64 273920 but found 0 (content from byte offset 0 reappeared here)
> (file kept for inspection)
>
Thanks. I had to bump up the concurrency a bit and then was able to
reproduce.
The patch I sent survived my regression testing but when taking another
look at the upstream patch, I realized something else I had previously
missed. The code in master doesn't actually return -EAGAIN directly
along with partial completion. It just returns the partial completion,
loops again in iomap, and then presumably returns -EAGAIN at that point
which makes its way back to io_uring. I think that is mostly harmless
but technically a bug in the upstream patch as the intent was to be able
to advance the iter, return -EAGAIN, and let the operation unwind from
there.
I think this actually leaves at least a couple options here. One is that
we could presumably just do the same thing on stable as current master:
forget the flag and just remove the iov revert and direct -EAGAIN return
at the cost of one more iter before returning to the caller. Another is
to fix up the code in master and use the patch I posted as a customized
stable backport of that.
WRT the latter I suppose we could also just stick with this patch for
stable and I can follow up with a separate patch for the loop thing on
master. Hmm.. I want to think about it a little more so if any iomap
folks have Opinions in the meantime, let me know.
Brian
>
> On Tue, Jun 9, 2026 at 12:20 PM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Mon, Jun 08, 2026 at 01:17:10PM -0400, Eric Hagberg wrote:
> > > On Mon, Jun 8, 2026 at 12:03 PM Brian Foster <bfoster@redhat.com> wrote:
> > > > Another idea that came to mind is to try and just replace the -EAGAIN
> > > > return sequence from the low level iterator with a flag that triggers
> > > > -EAGAIN from the next iter advance. The idea here is to allow the write
> > > > to return partial completion (i.e. so no iov_iter revert) without having
> > > > to return an error from the lowest level in the stack. I had claude come
> > > > up with a quick patch [1] for reference/experimentation.
> > > >
> > > > This is based on v6.12 stable and compile tested only. It needs more
> > > > review and testing in general but might be worth throwing your
> > > > reproducer at if you can..?
> > >
> > > With that patch applied, the reproducer runs clean - no errors - and
> > > gets roughly the same performance (maybe slightly better) as when run
> > > against a 6.18 kernel on the same VM.
> > >
> >
> > Thanks for testing. I'll look into some more regression testing of this
> > patch and try to clean it up and post it for proper review for stable.
> >
> > Are you using the reproducer program in your original mail to test? If
> > so, does it require some concurrent memory pressure to reproduce, and
> > are you using anything in particular for that?
> >
> > That test seems small enough that we could potentially include it in
> > fstests, though I'm still not so sure about the mem pressure part..
> > Since you guys wrote the test, any interest in porting into fstests? If
> > not I can look into it.
> >
> > Brian
> >
> > > Thanks,
> > > -Eric
> > >
> >
>
prev parent reply other threads:[~2026-06-10 17:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 18:46 [BUG] iomap/io_uring: O_APPEND async buffered write silently re-appends a data chunk (corruption) on XFS, 6.1.y/6.12.y Gregg Leventhal
2026-06-05 15:55 ` Brian Foster
2026-06-08 16:02 ` Brian Foster
2026-06-08 17:17 ` Eric Hagberg
2026-06-09 16:20 ` Brian Foster
2026-06-09 17:14 ` Gregg Leventhal
2026-06-10 17:34 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aimgDnzB_NYqOTx1@bfoster \
--to=bfoster@redhat.com \
--cc=axboe@kernel.dk \
--cc=djwong@kernel.org \
--cc=ehagberg@janestreet.com \
--cc=gleventhal@janestreet.com \
--cc=hch@infradead.org \
--cc=io-uring@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.