Re: [PATCH] xfs: report a writeback error on a read() call

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] xfs: report a writeback error on a read() call
       [not found] <CAN2Y7hyi1HCrSiKsDT+KD8hBjQmsqzNp71Q9Z_RmBG0LLaZxCA@mail.gmail.com>
@ 2025-06-24 14:14 ` Christoph Hellwig
  2025-06-24 18:26   ` Jeff Layton
  2025-06-25  2:44   ` Yafang Shao
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Hellwig @ 2025-06-24 14:14 UTC (permalink / raw)
  To: ying chen; +Cc: djwong, linux-xfs, linux-kernel, linux-fsdevel, Jeff Layton

On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote:
> Normally, user space returns immediately after writing data to the
> buffer cache. However, if an error occurs during the actual disk
> write operation, data loss may ensue, and there is no way to report
> this error back to user space immediately. Current kernels may report
> writeback errors when fsync() is called, but frequent invocations of
> fsync() can degrade performance. Therefore, a new sysctl
> fs.xfs.report_writeback_error_on_read is introduced, which, when set
> to 1, reports writeback errors when read() is called. This allows user
> space to be notified of writeback errors more promptly.

That's really kernel wide policy and not something magic done by a
single file system.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-24 14:14 ` [PATCH] xfs: report a writeback error on a read() call Christoph Hellwig
@ 2025-06-24 18:26   ` Jeff Layton
  2025-06-24 19:56     ` Matthew Wilcox
  2025-06-25  2:44   ` Yafang Shao
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-06-24 18:26 UTC (permalink / raw)
  To: Christoph Hellwig, ying chen
  Cc: djwong, linux-xfs, linux-kernel, linux-fsdevel

On Tue, 2025-06-24 at 07:14 -0700, Christoph Hellwig wrote:
> On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote:
> > Normally, user space returns immediately after writing data to the
> > buffer cache. However, if an error occurs during the actual disk
> > write operation, data loss may ensue, and there is no way to report
> > this error back to user space immediately. Current kernels may report
> > writeback errors when fsync() is called, but frequent invocations of
> > fsync() can degrade performance. Therefore, a new sysctl
> > fs.xfs.report_writeback_error_on_read is introduced, which, when set
> > to 1, reports writeback errors when read() is called. This allows user
> > space to be notified of writeback errors more promptly.
> 
> That's really kernel wide policy and not something magic done by a
> single file system.

...not to mention that getting an error back on a read for a prior
writeback error would be completely unexpected by most applications.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-24 18:26   ` Jeff Layton
@ 2025-06-24 19:56     ` Matthew Wilcox
  2025-06-24 20:25       ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2025-06-24 19:56 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christoph Hellwig, ying chen, djwong, linux-xfs, linux-kernel,
	linux-fsdevel

On Tue, Jun 24, 2025 at 02:26:18PM -0400, Jeff Layton wrote:
> On Tue, 2025-06-24 at 07:14 -0700, Christoph Hellwig wrote:
> > On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote:
> > > Normally, user space returns immediately after writing data to the
> > > buffer cache. However, if an error occurs during the actual disk
> > > write operation, data loss may ensue, and there is no way to report
> > > this error back to user space immediately. Current kernels may report
> > > writeback errors when fsync() is called, but frequent invocations of
> > > fsync() can degrade performance. Therefore, a new sysctl
> > > fs.xfs.report_writeback_error_on_read is introduced, which, when set
> > > to 1, reports writeback errors when read() is called. This allows user
> > > space to be notified of writeback errors more promptly.
> > 
> > That's really kernel wide policy and not something magic done by a
> > single file system.
> 
> ...not to mention that getting an error back on a read for a prior
> writeback error would be completely unexpected by most applications.

Well.  It's somewhat understandable:

	write() (returns success)
	writeback happens, error logged
	memory pressure evicts folio
	read() brings folio into page cache
	attempt to read contents fails, error returned

I'm not sure it's a good solution, but it's plausible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-24 19:56     ` Matthew Wilcox
@ 2025-06-24 20:25       ` Jeff Layton
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-06-24 20:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, ying chen, djwong, linux-xfs, linux-kernel,
	linux-fsdevel

On Tue, 2025-06-24 at 20:56 +0100, Matthew Wilcox wrote:
> On Tue, Jun 24, 2025 at 02:26:18PM -0400, Jeff Layton wrote:
> > On Tue, 2025-06-24 at 07:14 -0700, Christoph Hellwig wrote:
> > > On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote:
> > > > Normally, user space returns immediately after writing data to the
> > > > buffer cache. However, if an error occurs during the actual disk
> > > > write operation, data loss may ensue, and there is no way to report
> > > > this error back to user space immediately. Current kernels may report
> > > > writeback errors when fsync() is called, but frequent invocations of
> > > > fsync() can degrade performance. Therefore, a new sysctl
> > > > fs.xfs.report_writeback_error_on_read is introduced, which, when set
> > > > to 1, reports writeback errors when read() is called. This allows user
> > > > space to be notified of writeback errors more promptly.
> > > 
> > > That's really kernel wide policy and not something magic done by a
> > > single file system.
> > 
> > ...not to mention that getting an error back on a read for a prior
> > writeback error would be completely unexpected by most applications.
> 
> Well.  It's somewhat understandable:
> 
> 	write() (returns success)
> 	writeback happens, error logged
> 	memory pressure evicts folio
> 	read() brings folio into page cache
> 	attempt to read contents fails, error returned
> 
> I'm not sure it's a good solution, but it's plausible.

Personally, I find it confusing. The range you're trying to read might
actually be fine if the error happened in a different range.

It also has the same problem as reporting writeback errors on close(),
in that it's non-deterministic. You might not see an error if writeback
didn't happen yet. Just because you didn't get an error when reading,
that doesn't mean that your data is actually safe.

Maybe this is ok as an opt-in thing for some workloads, but it does
have some potential footguns.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-24 14:14 ` [PATCH] xfs: report a writeback error on a read() call Christoph Hellwig
  2025-06-24 18:26   ` Jeff Layton
@ 2025-06-25  2:44   ` Yafang Shao
  2025-06-25  7:01     ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Yafang Shao @ 2025-06-25  2:44 UTC (permalink / raw)
  To: hch, david
  Cc: djwong, jlayton, linux-fsdevel, linux-kernel, linux-xfs,
	yc1082463

> That's really kernel wide policy and not something magic done by a
> single file system.

XFS already supports an optional policy for handling metadata errors via:
/sys/fs/xfs/<disk>/error/metadata/

It would be reasonable to introduce a similar optional policy for data 
errors:
/sys/fs/xfs/<disk>/error/data/

This data error policy could allow the filesystem to shut down 
immediately if corrupted data is detected that might otherwise be 
exposed to userspace.

While it’s unclear whether such a policy should be implemented at the 
VFS level as a kernel-wide mechanism, we can certainly extend the 
existing XFS error-handling framework to support it. In other words, 
this would be a natural extension of current XFS functionality—not a 
reinvention.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-25  2:44   ` Yafang Shao
@ 2025-06-25  7:01     ` Christoph Hellwig
  2025-06-25 10:40       ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-06-25  7:01 UTC (permalink / raw)
  To: Yafang Shao
  Cc: hch, david, djwong, jlayton, linux-fsdevel, linux-kernel,
	linux-xfs, yc1082463

On Wed, Jun 25, 2025 at 10:44:57AM +0800, Yafang Shao wrote:
> > That's really kernel wide policy and not something magic done by a
> > single file system.
> 
> XFS already supports an optional policy for handling metadata errors via:
> /sys/fs/xfs/<disk>/error/metadata/
> 
> It would be reasonable to introduce a similar optional policy for data
> errors:
> /sys/fs/xfs/<disk>/error/data/
> 
> This data error policy could allow the filesystem to shut down immediately
> if corrupted data is detected that might otherwise be exposed to userspace.

I fully agree on that part, and would in fact argue for making it the
default.

But reporting writeback errors on read just on one file system and with
a specific option is really strange.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-25  7:01     ` Christoph Hellwig
@ 2025-06-25 10:40       ` Jeff Layton
  2025-06-25 11:21         ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-06-25 10:40 UTC (permalink / raw)
  To: Christoph Hellwig, Yafang Shao
  Cc: david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463

On Wed, 2025-06-25 at 00:01 -0700, Christoph Hellwig wrote:
> On Wed, Jun 25, 2025 at 10:44:57AM +0800, Yafang Shao wrote:
> > > That's really kernel wide policy and not something magic done by a
> > > single file system.
> > 
> > XFS already supports an optional policy for handling metadata errors via:
> > /sys/fs/xfs/<disk>/error/metadata/
> > 
> > It would be reasonable to introduce a similar optional policy for data
> > errors:
> > /sys/fs/xfs/<disk>/error/data/
> > 
> > This data error policy could allow the filesystem to shut down immediately
> > if corrupted data is detected that might otherwise be exposed to userspace.
> 
> I fully agree on that part, and would in fact argue for making it the
> default.
> 
> But reporting writeback errors on read just on one file system and with
> a specific option is really strange.


Another option:

We could expose this functionality in preadv2() with a new RWF_WBERR
flag (better names welcome). That way applications could opt-in to
checking for writeback errors like this. With that, the application is
at least explicitly saying that it wants this behavior.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-25 10:40       ` Jeff Layton
@ 2025-06-25 11:21         ` Christoph Hellwig
  2025-06-25 11:49           ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-06-25 11:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christoph Hellwig, Yafang Shao, david, djwong, linux-fsdevel,
	linux-kernel, linux-xfs, yc1082463

On Wed, Jun 25, 2025 at 06:40:07AM -0400, Jeff Layton wrote:
> Another option:
> 
> We could expose this functionality in preadv2() with a new RWF_WBERR
> flag (better names welcome). That way applications could opt-in to
> checking for writeback errors like this. With that, the application is
> at least explicitly saying that it wants this behavior.

That sounds like a really strange interface to me.

I have to admit I don't fully understand the use case where an
application cares about these errors, but also doesn't use f(data)sync
or sync(fs) to actually persist the data.  If we can come up with a
coherent use case for that we should simply add a new syscall or fcntl
to query the delayed writeback errors instead of overloading other
interfaces.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-25 11:21         ` Christoph Hellwig
@ 2025-06-25 11:49           ` Jeff Layton
  2025-06-25 11:56             ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-06-25 11:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yafang Shao, david, djwong, linux-fsdevel, linux-kernel,
	linux-xfs, yc1082463

On Wed, 2025-06-25 at 04:21 -0700, Christoph Hellwig wrote:
> On Wed, Jun 25, 2025 at 06:40:07AM -0400, Jeff Layton wrote:
> > Another option:
> > 
> > We could expose this functionality in preadv2() with a new RWF_WBERR
> > flag (better names welcome). That way applications could opt-in to
> > checking for writeback errors like this. With that, the application is
> > at least explicitly saying that it wants this behavior.
> 
> That sounds like a really strange interface to me.
> 
> I have to admit I don't fully understand the use case where an
> application cares about these errors, but also doesn't use f(data)sync
> or sync(fs) to actually persist the data.  If we can come up with a
> coherent use case for that we should simply add a new syscall or fcntl
> to query the delayed writeback errors instead of overloading other
> interfaces.

It is weird, but I do sort of get the motivation.

In a some cases you want to be able to stream writes as fast as
possible and let the kernel lazily write that back (because you don't
want to block a thread), but knowing if a prior writeback error has
occurred is a good thing too.

Another idea: add a new generic ioctl() that checks for writeback
errors without syncing anything. That would be fairly simple to do and
sounds like it would be useful, but I'd want to hear a better
description of the use-case before we did anything like that.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-25 11:49           ` Jeff Layton
@ 2025-06-25 11:56             ` Christoph Hellwig
  2025-06-25 14:06               ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-06-25 11:56 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christoph Hellwig, Yafang Shao, david, djwong, linux-fsdevel,
	linux-kernel, linux-xfs, yc1082463

On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote:
> Another idea: add a new generic ioctl() that checks for writeback
> errors without syncing anything. That would be fairly simple to do and
> sounds like it would be useful, but I'd want to hear a better
> description of the use-case before we did anything like that.

That's what I mean with my above proposal, except that I though of an
fcntl or syscall and not an ioctl.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-25 11:56             ` Christoph Hellwig
@ 2025-06-25 14:06               ` Jeff Layton
  2025-06-26  2:41                 ` Yafang Shao
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-06-25 14:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yafang Shao, david, djwong, linux-fsdevel, linux-kernel,
	linux-xfs, yc1082463

On Wed, 2025-06-25 at 04:56 -0700, Christoph Hellwig wrote:
> On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote:
> > Another idea: add a new generic ioctl() that checks for writeback
> > errors without syncing anything. That would be fairly simple to do and
> > sounds like it would be useful, but I'd want to hear a better
> > description of the use-case before we did anything like that.
> 
> That's what I mean with my above proposal, except that I though of an
> fcntl or syscall and not an ioctl.

Yeah, a fcntl() would be reasonable, I think.

For a syscall, I guess we could add an fsync2() which just adds a flags
field. Then add a FSYNC_JUSTCHECK flag that makes it just check for
errors and return.

Personally, I like the fcntl() idea better for this, but maybe we have
other uses for a fsync2().
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-25 14:06               ` Jeff Layton
@ 2025-06-26  2:41                 ` Yafang Shao
  2025-06-26  3:57                   ` Dave Chinner
  2025-06-26 10:23                   ` Christoph Hellwig
  0 siblings, 2 replies; 17+ messages in thread
From: Yafang Shao @ 2025-06-26  2:41 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christoph Hellwig, david, djwong, linux-fsdevel, linux-kernel,
	linux-xfs, yc1082463

On Wed, Jun 25, 2025 at 10:06 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Wed, 2025-06-25 at 04:56 -0700, Christoph Hellwig wrote:
> > On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote:
> > > Another idea: add a new generic ioctl() that checks for writeback
> > > errors without syncing anything. That would be fairly simple to do and
> > > sounds like it would be useful, but I'd want to hear a better
> > > description of the use-case before we did anything like that.

As you mentioned earlier, calling fsync()/fdatasync() after every
write() blocks the thread, degrading performance—especially on HDDs.
However, this isn’t the main issue in practice.
The real problem is that users typically don’t understand "writeback
errors". If you warn them, "You should call fsync() because writeback
errors might occur," their response will likely be: "What the hell is
a writeback error?"

For example, our users (a big data platform) demanded that we
immediately shut down the filesystem upon writeback errors. These
users are algorithm analysts who write Python/Java UDFs for custom
logic—often involving temporary disk writes followed by reads to pass
data downstream. Yet, most have no idea how these underlying processes
work.

> >
> > That's what I mean with my above proposal, except that I though of an
> > fcntl or syscall and not an ioctl.
>
> Yeah, a fcntl() would be reasonable, I think.
>
> For a syscall, I guess we could add an fsync2() which just adds a flags
> field. Then add a FSYNC_JUSTCHECK flag that makes it just check for
> errors and return.
>
> Personally, I like the fcntl() idea better for this, but maybe we have
> other uses for a fsync2().

What do you expect users to do with this new fcntl() or fsync2()? Call
fsync2() after every write()? That would still require massive
application refactoring.

Writeback errors are fundamentally bugs in the filesystem/block
layer/driver stack. It makes no sense to expose kernel implementation
flaws to userspace, forcing developers to compensate for kernel-level
issues with additional syscalls.

Most users neither have the patience nor should they need to
understand the deepest intricacies of Linux kernel internals. The
proper fix should happen in the kernel—not by pushing workarounds onto
applications.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-26  2:41                 ` Yafang Shao
@ 2025-06-26  3:57                   ` Dave Chinner
  2025-06-26 10:25                     ` Christoph Hellwig
  2025-06-26 10:23                   ` Christoph Hellwig
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2025-06-26  3:57 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Jeff Layton, Christoph Hellwig, djwong, linux-fsdevel,
	linux-kernel, linux-xfs, yc1082463

On Thu, Jun 26, 2025 at 10:41:47AM +0800, Yafang Shao wrote:
> On Wed, Jun 25, 2025 at 10:06 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Wed, 2025-06-25 at 04:56 -0700, Christoph Hellwig wrote:
> > > On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote:
> > > > Another idea: add a new generic ioctl() that checks for writeback
> > > > errors without syncing anything. That would be fairly simple to do and
> > > > sounds like it would be useful, but I'd want to hear a better
> > > > description of the use-case before we did anything like that.
> 
> As you mentioned earlier, calling fsync()/fdatasync() after every
> write() blocks the thread, degrading performance—especially on HDDs.
> However, this isn’t the main issue in practice.
> The real problem is that users typically don’t understand "writeback
> errors". If you warn them, "You should call fsync() because writeback
> errors might occur," their response will likely be: "What the hell is
> a writeback error?"
> 
> For example, our users (a big data platform) demanded that we
> immediately shut down the filesystem upon writeback errors. These
> users are algorithm analysts who write Python/Java UDFs for custom
> logic—often involving temporary disk writes followed by reads to pass
> data downstream. Yet, most have no idea how these underlying processes
> work.

And that's exactly why XFS originally never threw away dirty data on
writeback errors. Because scientists and data analysts that wrote
programs to chew through large amounts of data didn't care about
persistence of their data mid-processing. They just wanted what they
wrote to be there the next time the processing pipeline read it.

> > > That's what I mean with my above proposal, except that I though of an
> > > fcntl or syscall and not an ioctl.
> >
> > Yeah, a fcntl() would be reasonable, I think.
> >
> > For a syscall, I guess we could add an fsync2() which just adds a flags
> > field. Then add a FSYNC_JUSTCHECK flag that makes it just check for
> > errors and return.
> >
> > Personally, I like the fcntl() idea better for this, but maybe we have
> > other uses for a fsync2().
> 
> What do you expect users to do with this new fcntl() or fsync2()? Call
> fsync2() after every write()? That would still require massive
> application refactoring.

<sigh>

We already have a user interface that provides exactly the desired
functionality.

$ man sync_file_range
....
   Some details
       SYNC_FILE_RANGE_WAIT_BEFORE  and  SYNC_FILE_RANGE_WAIT_AFTER
       will  detect  any I/O errors or ENOSPC conditions and will
       return these to the caller.
....

IOWs, checking for a past writeback IO error is as simple as:

	if (sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE) < 0) {
		/* An unreported writeback error was pending on the file */
		wb_err = -errno;
		......
	}

This does not cause new IO to be issued, it only blocks on writeback
that is currently in progress, and it has no data integrity
requirements at all. If the writeback has already been done, all it
will do is sweep residual errors out to userspace.....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-26  3:57                   ` Dave Chinner
@ 2025-06-26 10:25                     ` Christoph Hellwig
  2025-06-26 22:22                       ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2025-06-26 10:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Yafang Shao, Jeff Layton, Christoph Hellwig, djwong,
	linux-fsdevel, linux-kernel, linux-xfs, yc1082463

On Thu, Jun 26, 2025 at 01:57:59PM +1000, Dave Chinner wrote:
> writeback errors. Because scientists and data analysts that wrote
> programs to chew through large amounts of data didn't care about
> persistence of their data mid-processing. They just wanted what they
> wrote to be there the next time the processing pipeline read it.

That's only going to work if your RAM is as large as your permanent
storage :)

> IOWs, checking for a past writeback IO error is as simple as:
> 
> 	if (sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE) < 0) {
> 		/* An unreported writeback error was pending on the file */
> 		wb_err = -errno;
> 		......
> 	}
> 
> This does not cause new IO to be issued, it only blocks on writeback
> that is currently in progress, and it has no data integrity
> requirements at all. If the writeback has already been done, all it
> will do is sweep residual errors out to userspace.....

Not quite.

This will still wait for all I/O on the range, and given that
sync_file_range treats a 0 length as the entire file that might actually
do a significant amount of waiting.  But yes, it's the closest we get
right now.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-26 10:25                     ` Christoph Hellwig
@ 2025-06-26 22:22                       ` Dave Chinner
  2025-06-27 21:19                         ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2025-06-26 22:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yafang Shao, Jeff Layton, djwong, linux-fsdevel, linux-kernel,
	linux-xfs, yc1082463

On Thu, Jun 26, 2025 at 03:25:21AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 26, 2025 at 01:57:59PM +1000, Dave Chinner wrote:
> > writeback errors. Because scientists and data analysts that wrote
> > programs to chew through large amounts of data didn't care about
> > persistence of their data mid-processing. They just wanted what they
> > wrote to be there the next time the processing pipeline read it.
> 
> That's only going to work if your RAM is as large as your permanent
> storage :)

No, the old behaviour worked just fine with data sets larger than
RAM. When there is a random writeback error in a big data stream,
only those pages remained dirty and so never get tossed out of RAM. Hence
when a re-read of that file range occurred, the data was already in
RAM and the read succeeded, regardless of the fact that writeback
has been failing.

IOWs the behavioural problems that the user is reporting are present
because we got rid of the historic XFS writeback error handling
(leave the dirty pages in RAM and retry again later) and replaced it
with the historic Linux behaviour (toss the data out and mark the
mapping with an error).

The result of this change is exactly what the OP is having problems
with - reread of a range that had a writeback failure returns zeroes
or garbage, not the original data. If we kept the original XFS
behaviour, the user applications would handle these flakey writeback
failures just fine...

Put simply: we used to have more robust writeback failure handling
than we do now. That could (and probably should) be considered a
regression....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-26 22:22                       ` Dave Chinner
@ 2025-06-27 21:19                         ` Matthew Wilcox
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2025-06-27 21:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Yafang Shao, Jeff Layton, djwong,
	linux-fsdevel, linux-kernel, linux-xfs, yc1082463

On Fri, Jun 27, 2025 at 08:22:53AM +1000, Dave Chinner wrote:
> On Thu, Jun 26, 2025 at 03:25:21AM -0700, Christoph Hellwig wrote:
> > On Thu, Jun 26, 2025 at 01:57:59PM +1000, Dave Chinner wrote:
> > > writeback errors. Because scientists and data analysts that wrote
> > > programs to chew through large amounts of data didn't care about
> > > persistence of their data mid-processing. They just wanted what they
> > > wrote to be there the next time the processing pipeline read it.
> > 
> > That's only going to work if your RAM is as large as your permanent
> > storage :)
> 
> No, the old behaviour worked just fine with data sets larger than
> RAM. When there is a random writeback error in a big data stream,
> only those pages remained dirty and so never get tossed out of RAM. Hence
> when a re-read of that file range occurred, the data was already in
> RAM and the read succeeded, regardless of the fact that writeback
> has been failing.
> 
> IOWs the behavioural problems that the user is reporting are present
> because we got rid of the historic XFS writeback error handling
> (leave the dirty pages in RAM and retry again later) and replaced it
> with the historic Linux behaviour (toss the data out and mark the
> mapping with an error).
> 
> The result of this change is exactly what the OP is having problems
> with - reread of a range that had a writeback failure returns zeroes
> or garbage, not the original data. If we kept the original XFS
> behaviour, the user applications would handle these flakey writeback
> failures just fine...
> 
> Put simply: we used to have more robust writeback failure handling
> than we do now. That could (and probably should) be considered a
> regression....

When you say "used to" and "the old behaviour", when are you referring
to, exactly?  When I came to XFS/iomap, the behaviour on writeback errors
was to clear the Uptodate flag on writeback, which definitely did throw
away the written data and forced a re-read from storage.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] xfs: report a writeback error on a read() call
  2025-06-26  2:41                 ` Yafang Shao
  2025-06-26  3:57                   ` Dave Chinner
@ 2025-06-26 10:23                   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2025-06-26 10:23 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Jeff Layton, Christoph Hellwig, david, djwong, linux-fsdevel,
	linux-kernel, linux-xfs, yc1082463

On Thu, Jun 26, 2025 at 10:41:47AM +0800, Yafang Shao wrote:
> As you mentioned earlier, calling fsync()/fdatasync() after every
> write() blocks the thread, degrading performance—especially on HDDs.
> However, this isn’t the main issue in practice.
> The real problem is that users typically don’t understand "writeback
> errors". If you warn them, "You should call fsync() because writeback
> errors might occur," their response will likely be: "What the hell is
> a writeback error?"
> 
> For example, our users (a big data platform) demanded that we
> immediately shut down the filesystem upon writeback errors. These
> users are algorithm analysts who write Python/Java UDFs for custom
> logic—often involving temporary disk writes followed by reads to pass
> data downstream. Yet, most have no idea how these underlying processes
> work.

Well, if you want to immediately shutdown we should not report writeback
errors but do a file system shutdown.  Which given how we can't recover
from them in general is the right default.

> > Personally, I like the fcntl() idea better for this, but maybe we have
> > other uses for a fsync2().
> 
> What do you expect users to do with this new fcntl() or fsync2()? Call
> fsync2() after every write()? That would still require massive
> application refactoring.

That's why I'm asking what your intended use case for the writeback
reporting is.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-06-27 21:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAN2Y7hyi1HCrSiKsDT+KD8hBjQmsqzNp71Q9Z_RmBG0LLaZxCA@mail.gmail.com>
2025-06-24 14:14 ` [PATCH] xfs: report a writeback error on a read() call Christoph Hellwig
2025-06-24 18:26   ` Jeff Layton
2025-06-24 19:56     ` Matthew Wilcox
2025-06-24 20:25       ` Jeff Layton
2025-06-25  2:44   ` Yafang Shao
2025-06-25  7:01     ` Christoph Hellwig
2025-06-25 10:40       ` Jeff Layton
2025-06-25 11:21         ` Christoph Hellwig
2025-06-25 11:49           ` Jeff Layton
2025-06-25 11:56             ` Christoph Hellwig
2025-06-25 14:06               ` Jeff Layton
2025-06-26  2:41                 ` Yafang Shao
2025-06-26  3:57                   ` Dave Chinner
2025-06-26 10:25                     ` Christoph Hellwig
2025-06-26 22:22                       ` Dave Chinner
2025-06-27 21:19                         ` Matthew Wilcox
2025-06-26 10:23                   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).