From: Christoph Hellwig <hch@lst.de>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>, Carlos Maiolino <cem@kernel.org>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org,
linux-block@vger.kernel.org
Subject: Re: fall back from direct to buffered I/O when stable writes are required
Date: Fri, 31 Oct 2025 14:00:50 +0100 [thread overview]
Message-ID: <20251031130050.GA15719@lst.de> (raw)
In-Reply-To: <aQPyVtkvTg4W1nyz@dread.disaster.area>
On Fri, Oct 31, 2025 at 10:18:46AM +1100, Dave Chinner wrote:
> I'm not asking about btrfs - I'm asking about actual, real world
> problems reported in production XFS environments.
The same things applies once we have checksums with PI. But it seems
like you don't want to listen anyway.
> > For RAID you probably won't see too many reports, as with RAID the
> > problem will only show up as silent corruption long after a rebuild
> > rebuild happened that made use of the racy data.
>
> Yet we are not hearing about this, either. Nobody is reporting that
> their data is being found to be corrupt days/weeks/months/years down
> the track.
>
> This is important, because software RAID5 is pretty much the -only-
> common usage of BLK_FEAT_STABLE_WRITES that users are exposed to.
RAID5 bounce buffers by default. It has a tunable to disable that:
https://sbsfaq.com/qnap-fails-to-reveal-data-corruption-bug-that-affects-all-4-bay-and-higher-nas-devices/
and once that was turned on it pretty much immediately caused data
corruption:
https://sbsfaq.com/qnap-fails-to-reveal-data-corruption-bug-that-affects-all-4-bay-and-higher-nas-devices/
https://sbsfaq.com/synology-nas-confirmed-to-have-same-data-corruption-bug-as-qnap/
> This patch set is effectively disallowing direct IO for anyone
> using software RAID5. That is simply not an acceptible outcome here.
Quite contrary, fixing this properly allows STABLE_WRITES to actually
work without bouncing in lower layers and at least get efficient
buffered I/O.
>
> > With checksums
> > it is much easier to reproduce and trivially shown by various xfstests.
>
> Such as?
Basically anything using fsstress long enough plus a few others.
>
> > With increasing storage capacities checksums are becoming more and
> > more important, and I'm trying to get Linux in general and XFS
> > specifically to use them well.
>
> So when XFS implements checksums and that implementation is
> incompatible with Direct IO, then we can talk about disabling Direct
> IO on XFS when that feature is enabled. But right now, that feature
> does not exist, and ....
Every Linux file system supports checksums with PI capable device.
I'm trying to make it actually work for all case and perform well for a
while.
>
> > Right now I don't think anyone is
> > using PI with XFS or any Linux file system given the amount of work
> > I had to put in to make it work well, and how often I see regressions
> > with it.
>
> .... as you say, "nobody is using PI with XFS".
>
> So patchset is a "fix" for a problem that no-one is actually having
> right now.
I'm making it work.
> Modifying an IO buffer whilst a DIO is in flight on that buffer has
> -always- been an application bug.
Says who?
> It is a vector for torn writes
> that don't get detected until the next read. It is a vector for
> in-memory data corruption of read buffers.
That assumes that particular use case cares about torn writes. We've
never ever documented any such requirement. We can't just make that
up 20+ years later.
> Indeed, it does not matter if the underlying storage asserts
> BLK_FEAT_STABLE_WRITES or not, modifying DIO buffers that are under
> IO will (eventually) result in data corruption.
It doesn't if that's not your assumption. But more importantly with
RAID5 if you modify them you do not primarily corrupt your own data,
but other data in the stripe. It is a way how a malicious user can
corrupt other users data.
> Hence, by your
> logic, we should disable Direct IO for everyone.
That's your weird logic, not mine.
next prev parent reply other threads:[~2025-10-31 13:00 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 7:15 fall back from direct to buffered I/O when stable writes are required Christoph Hellwig
2025-10-29 7:15 ` [PATCH 1/4] fs: replace FOP_DIO_PARALLEL_WRITE with a fmode bits Christoph Hellwig
2025-10-29 16:01 ` Darrick J. Wong
2025-11-04 7:00 ` Nirjhar Roy (IBM)
2025-11-05 14:04 ` Christoph Hellwig
2025-11-11 9:44 ` Christian Brauner
2025-10-29 7:15 ` [PATCH 2/4] fs: return writeback errors for IOCB_DONTCACHE in generic_write_sync Christoph Hellwig
2025-10-29 16:01 ` Darrick J. Wong
2025-10-29 16:37 ` Christoph Hellwig
2025-10-29 18:12 ` Darrick J. Wong
2025-10-30 5:59 ` Christoph Hellwig
2025-11-04 12:04 ` Nirjhar Roy (IBM)
2025-11-04 15:53 ` Christoph Hellwig
2025-10-29 7:15 ` [PATCH 3/4] xfs: use IOCB_DONTCACHE when falling back to buffered writes Christoph Hellwig
2025-10-29 15:57 ` Darrick J. Wong
2025-11-04 12:33 ` Nirjhar Roy (IBM)
2025-11-04 15:52 ` Christoph Hellwig
2025-10-29 7:15 ` [PATCH 4/4] xfs: fallback to buffered I/O for direct I/O when stable writes are required Christoph Hellwig
2025-10-29 15:53 ` Darrick J. Wong
2025-10-29 16:35 ` Christoph Hellwig
2025-10-29 21:23 ` Qu Wenruo
2025-10-30 5:58 ` Christoph Hellwig
2025-10-30 6:37 ` Qu Wenruo
2025-10-30 6:49 ` Christoph Hellwig
2025-10-30 6:53 ` Qu Wenruo
2025-10-30 6:55 ` Christoph Hellwig
2025-10-30 7:14 ` Qu Wenruo
2025-10-30 7:17 ` Christoph Hellwig
2025-11-10 13:38 ` Nirjhar Roy (IBM)
2025-11-10 13:59 ` Christoph Hellwig
2025-11-12 7:13 ` Nirjhar Roy (IBM)
2025-10-29 15:58 ` fall back from direct to buffered " Bart Van Assche
2025-10-29 16:14 ` Darrick J. Wong
2025-10-29 16:33 ` Christoph Hellwig
2025-10-30 11:20 ` Dave Chinner
2025-10-30 12:00 ` Geoff Back
2025-10-30 12:54 ` Jan Kara
2025-10-30 14:35 ` Christoph Hellwig
2025-10-30 22:02 ` Dave Chinner
2025-10-30 14:33 ` Christoph Hellwig
2025-10-30 23:18 ` Dave Chinner
2025-10-31 13:00 ` Christoph Hellwig [this message]
2025-10-31 15:57 ` Keith Busch
2025-10-31 16:47 ` Christoph Hellwig
2025-11-03 11:14 ` Jan Kara
2025-11-03 12:21 ` Christoph Hellwig
2025-11-03 22:47 ` Keith Busch
2025-11-04 23:38 ` Darrick J. Wong
2025-11-05 14:11 ` Christoph Hellwig
2025-11-05 21:44 ` Darrick J. Wong
2025-11-06 9:50 ` Johannes Thumshirn
2025-11-06 12:49 ` hch
2025-11-12 14:18 ` Ming Lei
2025-11-12 14:38 ` hch
2025-11-13 17:39 ` Kevin Wolf
2025-11-14 5:39 ` Christoph Hellwig
2025-11-14 9:29 ` Kevin Wolf
2025-11-14 12:01 ` Christoph Hellwig
2025-11-14 12:31 ` Kevin Wolf
2025-11-14 15:36 ` Christoph Hellwig
2025-11-14 16:55 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251031130050.GA15719@lst.de \
--to=hch@lst.de \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).