* Re: [PATCH] xfs: report a writeback error on a read() call [not found] <CAN2Y7hyi1HCrSiKsDT+KD8hBjQmsqzNp71Q9Z_RmBG0LLaZxCA@mail.gmail.com> @ 2025-06-24 14:14 ` Christoph Hellwig 2025-06-24 18:26 ` Jeff Layton 2025-06-25 2:44 ` Yafang Shao 0 siblings, 2 replies; 17+ messages in thread From: Christoph Hellwig @ 2025-06-24 14:14 UTC (permalink / raw) To: ying chen; +Cc: djwong, linux-xfs, linux-kernel, linux-fsdevel, Jeff Layton On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote: > Normally, user space returns immediately after writing data to the > buffer cache. However, if an error occurs during the actual disk > write operation, data loss may ensue, and there is no way to report > this error back to user space immediately. Current kernels may report > writeback errors when fsync() is called, but frequent invocations of > fsync() can degrade performance. Therefore, a new sysctl > fs.xfs.report_writeback_error_on_read is introduced, which, when set > to 1, reports writeback errors when read() is called. This allows user > space to be notified of writeback errors more promptly. That's really kernel wide policy and not something magic done by a single file system. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-24 14:14 ` [PATCH] xfs: report a writeback error on a read() call Christoph Hellwig @ 2025-06-24 18:26 ` Jeff Layton 2025-06-24 19:56 ` Matthew Wilcox 2025-06-25 2:44 ` Yafang Shao 1 sibling, 1 reply; 17+ messages in thread From: Jeff Layton @ 2025-06-24 18:26 UTC (permalink / raw) To: Christoph Hellwig, ying chen Cc: djwong, linux-xfs, linux-kernel, linux-fsdevel On Tue, 2025-06-24 at 07:14 -0700, Christoph Hellwig wrote: > On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote: > > Normally, user space returns immediately after writing data to the > > buffer cache. However, if an error occurs during the actual disk > > write operation, data loss may ensue, and there is no way to report > > this error back to user space immediately. Current kernels may report > > writeback errors when fsync() is called, but frequent invocations of > > fsync() can degrade performance. Therefore, a new sysctl > > fs.xfs.report_writeback_error_on_read is introduced, which, when set > > to 1, reports writeback errors when read() is called. This allows user > > space to be notified of writeback errors more promptly. > > That's really kernel wide policy and not something magic done by a > single file system. ...not to mention that getting an error back on a read for a prior writeback error would be completely unexpected by most applications. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-24 18:26 ` Jeff Layton @ 2025-06-24 19:56 ` Matthew Wilcox 2025-06-24 20:25 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Matthew Wilcox @ 2025-06-24 19:56 UTC (permalink / raw) To: Jeff Layton Cc: Christoph Hellwig, ying chen, djwong, linux-xfs, linux-kernel, linux-fsdevel On Tue, Jun 24, 2025 at 02:26:18PM -0400, Jeff Layton wrote: > On Tue, 2025-06-24 at 07:14 -0700, Christoph Hellwig wrote: > > On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote: > > > Normally, user space returns immediately after writing data to the > > > buffer cache. However, if an error occurs during the actual disk > > > write operation, data loss may ensue, and there is no way to report > > > this error back to user space immediately. Current kernels may report > > > writeback errors when fsync() is called, but frequent invocations of > > > fsync() can degrade performance. Therefore, a new sysctl > > > fs.xfs.report_writeback_error_on_read is introduced, which, when set > > > to 1, reports writeback errors when read() is called. This allows user > > > space to be notified of writeback errors more promptly. > > > > That's really kernel wide policy and not something magic done by a > > single file system. > > ...not to mention that getting an error back on a read for a prior > writeback error would be completely unexpected by most applications. Well. It's somewhat understandable: write() (returns success) writeback happens, error logged memory pressure evicts folio read() brings folio into page cache attempt to read contents fails, error returned I'm not sure it's a good solution, but it's plausible. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-24 19:56 ` Matthew Wilcox @ 2025-06-24 20:25 ` Jeff Layton 0 siblings, 0 replies; 17+ messages in thread From: Jeff Layton @ 2025-06-24 20:25 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Hellwig, ying chen, djwong, linux-xfs, linux-kernel, linux-fsdevel On Tue, 2025-06-24 at 20:56 +0100, Matthew Wilcox wrote: > On Tue, Jun 24, 2025 at 02:26:18PM -0400, Jeff Layton wrote: > > On Tue, 2025-06-24 at 07:14 -0700, Christoph Hellwig wrote: > > > On Sun, Jun 22, 2025 at 08:32:18PM +0800, ying chen wrote: > > > > Normally, user space returns immediately after writing data to the > > > > buffer cache. However, if an error occurs during the actual disk > > > > write operation, data loss may ensue, and there is no way to report > > > > this error back to user space immediately. Current kernels may report > > > > writeback errors when fsync() is called, but frequent invocations of > > > > fsync() can degrade performance. Therefore, a new sysctl > > > > fs.xfs.report_writeback_error_on_read is introduced, which, when set > > > > to 1, reports writeback errors when read() is called. This allows user > > > > space to be notified of writeback errors more promptly. > > > > > > That's really kernel wide policy and not something magic done by a > > > single file system. > > > > ...not to mention that getting an error back on a read for a prior > > writeback error would be completely unexpected by most applications. > > Well. It's somewhat understandable: > > write() (returns success) > writeback happens, error logged > memory pressure evicts folio > read() brings folio into page cache > attempt to read contents fails, error returned > > I'm not sure it's a good solution, but it's plausible. Personally, I find it confusing. The range you're trying to read might actually be fine if the error happened in a different range. It also has the same problem as reporting writeback errors on close(), in that it's non-deterministic. You might not see an error if writeback didn't happen yet. Just because you didn't get an error when reading, that doesn't mean that your data is actually safe. Maybe this is ok as an opt-in thing for some workloads, but it does have some potential footguns. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-24 14:14 ` [PATCH] xfs: report a writeback error on a read() call Christoph Hellwig 2025-06-24 18:26 ` Jeff Layton @ 2025-06-25 2:44 ` Yafang Shao 2025-06-25 7:01 ` Christoph Hellwig 1 sibling, 1 reply; 17+ messages in thread From: Yafang Shao @ 2025-06-25 2:44 UTC (permalink / raw) To: hch, david Cc: djwong, jlayton, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 > That's really kernel wide policy and not something magic done by a > single file system. XFS already supports an optional policy for handling metadata errors via: /sys/fs/xfs/<disk>/error/metadata/ It would be reasonable to introduce a similar optional policy for data errors: /sys/fs/xfs/<disk>/error/data/ This data error policy could allow the filesystem to shut down immediately if corrupted data is detected that might otherwise be exposed to userspace. While it’s unclear whether such a policy should be implemented at the VFS level as a kernel-wide mechanism, we can certainly extend the existing XFS error-handling framework to support it. In other words, this would be a natural extension of current XFS functionality—not a reinvention. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-25 2:44 ` Yafang Shao @ 2025-06-25 7:01 ` Christoph Hellwig 2025-06-25 10:40 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2025-06-25 7:01 UTC (permalink / raw) To: Yafang Shao Cc: hch, david, djwong, jlayton, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Wed, Jun 25, 2025 at 10:44:57AM +0800, Yafang Shao wrote: > > That's really kernel wide policy and not something magic done by a > > single file system. > > XFS already supports an optional policy for handling metadata errors via: > /sys/fs/xfs/<disk>/error/metadata/ > > It would be reasonable to introduce a similar optional policy for data > errors: > /sys/fs/xfs/<disk>/error/data/ > > This data error policy could allow the filesystem to shut down immediately > if corrupted data is detected that might otherwise be exposed to userspace. I fully agree on that part, and would in fact argue for making it the default. But reporting writeback errors on read just on one file system and with a specific option is really strange. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-25 7:01 ` Christoph Hellwig @ 2025-06-25 10:40 ` Jeff Layton 2025-06-25 11:21 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2025-06-25 10:40 UTC (permalink / raw) To: Christoph Hellwig, Yafang Shao Cc: david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Wed, 2025-06-25 at 00:01 -0700, Christoph Hellwig wrote: > On Wed, Jun 25, 2025 at 10:44:57AM +0800, Yafang Shao wrote: > > > That's really kernel wide policy and not something magic done by a > > > single file system. > > > > XFS already supports an optional policy for handling metadata errors via: > > /sys/fs/xfs/<disk>/error/metadata/ > > > > It would be reasonable to introduce a similar optional policy for data > > errors: > > /sys/fs/xfs/<disk>/error/data/ > > > > This data error policy could allow the filesystem to shut down immediately > > if corrupted data is detected that might otherwise be exposed to userspace. > > I fully agree on that part, and would in fact argue for making it the > default. > > But reporting writeback errors on read just on one file system and with > a specific option is really strange. Another option: We could expose this functionality in preadv2() with a new RWF_WBERR flag (better names welcome). That way applications could opt-in to checking for writeback errors like this. With that, the application is at least explicitly saying that it wants this behavior. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-25 10:40 ` Jeff Layton @ 2025-06-25 11:21 ` Christoph Hellwig 2025-06-25 11:49 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2025-06-25 11:21 UTC (permalink / raw) To: Jeff Layton Cc: Christoph Hellwig, Yafang Shao, david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Wed, Jun 25, 2025 at 06:40:07AM -0400, Jeff Layton wrote: > Another option: > > We could expose this functionality in preadv2() with a new RWF_WBERR > flag (better names welcome). That way applications could opt-in to > checking for writeback errors like this. With that, the application is > at least explicitly saying that it wants this behavior. That sounds like a really strange interface to me. I have to admit I don't fully understand the use case where an application cares about these errors, but also doesn't use f(data)sync or sync(fs) to actually persist the data. If we can come up with a coherent use case for that we should simply add a new syscall or fcntl to query the delayed writeback errors instead of overloading other interfaces. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-25 11:21 ` Christoph Hellwig @ 2025-06-25 11:49 ` Jeff Layton 2025-06-25 11:56 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2025-06-25 11:49 UTC (permalink / raw) To: Christoph Hellwig Cc: Yafang Shao, david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Wed, 2025-06-25 at 04:21 -0700, Christoph Hellwig wrote: > On Wed, Jun 25, 2025 at 06:40:07AM -0400, Jeff Layton wrote: > > Another option: > > > > We could expose this functionality in preadv2() with a new RWF_WBERR > > flag (better names welcome). That way applications could opt-in to > > checking for writeback errors like this. With that, the application is > > at least explicitly saying that it wants this behavior. > > That sounds like a really strange interface to me. > > I have to admit I don't fully understand the use case where an > application cares about these errors, but also doesn't use f(data)sync > or sync(fs) to actually persist the data. If we can come up with a > coherent use case for that we should simply add a new syscall or fcntl > to query the delayed writeback errors instead of overloading other > interfaces. It is weird, but I do sort of get the motivation. In a some cases you want to be able to stream writes as fast as possible and let the kernel lazily write that back (because you don't want to block a thread), but knowing if a prior writeback error has occurred is a good thing too. Another idea: add a new generic ioctl() that checks for writeback errors without syncing anything. That would be fairly simple to do and sounds like it would be useful, but I'd want to hear a better description of the use-case before we did anything like that. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-25 11:49 ` Jeff Layton @ 2025-06-25 11:56 ` Christoph Hellwig 2025-06-25 14:06 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2025-06-25 11:56 UTC (permalink / raw) To: Jeff Layton Cc: Christoph Hellwig, Yafang Shao, david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote: > Another idea: add a new generic ioctl() that checks for writeback > errors without syncing anything. That would be fairly simple to do and > sounds like it would be useful, but I'd want to hear a better > description of the use-case before we did anything like that. That's what I mean with my above proposal, except that I though of an fcntl or syscall and not an ioctl. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-25 11:56 ` Christoph Hellwig @ 2025-06-25 14:06 ` Jeff Layton 2025-06-26 2:41 ` Yafang Shao 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2025-06-25 14:06 UTC (permalink / raw) To: Christoph Hellwig Cc: Yafang Shao, david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Wed, 2025-06-25 at 04:56 -0700, Christoph Hellwig wrote: > On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote: > > Another idea: add a new generic ioctl() that checks for writeback > > errors without syncing anything. That would be fairly simple to do and > > sounds like it would be useful, but I'd want to hear a better > > description of the use-case before we did anything like that. > > That's what I mean with my above proposal, except that I though of an > fcntl or syscall and not an ioctl. Yeah, a fcntl() would be reasonable, I think. For a syscall, I guess we could add an fsync2() which just adds a flags field. Then add a FSYNC_JUSTCHECK flag that makes it just check for errors and return. Personally, I like the fcntl() idea better for this, but maybe we have other uses for a fsync2(). -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-25 14:06 ` Jeff Layton @ 2025-06-26 2:41 ` Yafang Shao 2025-06-26 3:57 ` Dave Chinner 2025-06-26 10:23 ` Christoph Hellwig 0 siblings, 2 replies; 17+ messages in thread From: Yafang Shao @ 2025-06-26 2:41 UTC (permalink / raw) To: Jeff Layton Cc: Christoph Hellwig, david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Wed, Jun 25, 2025 at 10:06 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Wed, 2025-06-25 at 04:56 -0700, Christoph Hellwig wrote: > > On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote: > > > Another idea: add a new generic ioctl() that checks for writeback > > > errors without syncing anything. That would be fairly simple to do and > > > sounds like it would be useful, but I'd want to hear a better > > > description of the use-case before we did anything like that. As you mentioned earlier, calling fsync()/fdatasync() after every write() blocks the thread, degrading performance—especially on HDDs. However, this isn’t the main issue in practice. The real problem is that users typically don’t understand "writeback errors". If you warn them, "You should call fsync() because writeback errors might occur," their response will likely be: "What the hell is a writeback error?" For example, our users (a big data platform) demanded that we immediately shut down the filesystem upon writeback errors. These users are algorithm analysts who write Python/Java UDFs for custom logic—often involving temporary disk writes followed by reads to pass data downstream. Yet, most have no idea how these underlying processes work. > > > > That's what I mean with my above proposal, except that I though of an > > fcntl or syscall and not an ioctl. > > Yeah, a fcntl() would be reasonable, I think. > > For a syscall, I guess we could add an fsync2() which just adds a flags > field. Then add a FSYNC_JUSTCHECK flag that makes it just check for > errors and return. > > Personally, I like the fcntl() idea better for this, but maybe we have > other uses for a fsync2(). What do you expect users to do with this new fcntl() or fsync2()? Call fsync2() after every write()? That would still require massive application refactoring. Writeback errors are fundamentally bugs in the filesystem/block layer/driver stack. It makes no sense to expose kernel implementation flaws to userspace, forcing developers to compensate for kernel-level issues with additional syscalls. Most users neither have the patience nor should they need to understand the deepest intricacies of Linux kernel internals. The proper fix should happen in the kernel—not by pushing workarounds onto applications. -- Regards Yafang ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-26 2:41 ` Yafang Shao @ 2025-06-26 3:57 ` Dave Chinner 2025-06-26 10:25 ` Christoph Hellwig 2025-06-26 10:23 ` Christoph Hellwig 1 sibling, 1 reply; 17+ messages in thread From: Dave Chinner @ 2025-06-26 3:57 UTC (permalink / raw) To: Yafang Shao Cc: Jeff Layton, Christoph Hellwig, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Thu, Jun 26, 2025 at 10:41:47AM +0800, Yafang Shao wrote: > On Wed, Jun 25, 2025 at 10:06 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > On Wed, 2025-06-25 at 04:56 -0700, Christoph Hellwig wrote: > > > On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote: > > > > Another idea: add a new generic ioctl() that checks for writeback > > > > errors without syncing anything. That would be fairly simple to do and > > > > sounds like it would be useful, but I'd want to hear a better > > > > description of the use-case before we did anything like that. > > As you mentioned earlier, calling fsync()/fdatasync() after every > write() blocks the thread, degrading performance—especially on HDDs. > However, this isn’t the main issue in practice. > The real problem is that users typically don’t understand "writeback > errors". If you warn them, "You should call fsync() because writeback > errors might occur," their response will likely be: "What the hell is > a writeback error?" > > For example, our users (a big data platform) demanded that we > immediately shut down the filesystem upon writeback errors. These > users are algorithm analysts who write Python/Java UDFs for custom > logic—often involving temporary disk writes followed by reads to pass > data downstream. Yet, most have no idea how these underlying processes > work. And that's exactly why XFS originally never threw away dirty data on writeback errors. Because scientists and data analysts that wrote programs to chew through large amounts of data didn't care about persistence of their data mid-processing. They just wanted what they wrote to be there the next time the processing pipeline read it. > > > That's what I mean with my above proposal, except that I though of an > > > fcntl or syscall and not an ioctl. > > > > Yeah, a fcntl() would be reasonable, I think. > > > > For a syscall, I guess we could add an fsync2() which just adds a flags > > field. Then add a FSYNC_JUSTCHECK flag that makes it just check for > > errors and return. > > > > Personally, I like the fcntl() idea better for this, but maybe we have > > other uses for a fsync2(). > > What do you expect users to do with this new fcntl() or fsync2()? Call > fsync2() after every write()? That would still require massive > application refactoring. <sigh> We already have a user interface that provides exactly the desired functionality. $ man sync_file_range .... Some details SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any I/O errors or ENOSPC conditions and will return these to the caller. .... IOWs, checking for a past writeback IO error is as simple as: if (sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE) < 0) { /* An unreported writeback error was pending on the file */ wb_err = -errno; ...... } This does not cause new IO to be issued, it only blocks on writeback that is currently in progress, and it has no data integrity requirements at all. If the writeback has already been done, all it will do is sweep residual errors out to userspace..... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-26 3:57 ` Dave Chinner @ 2025-06-26 10:25 ` Christoph Hellwig 2025-06-26 22:22 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2025-06-26 10:25 UTC (permalink / raw) To: Dave Chinner Cc: Yafang Shao, Jeff Layton, Christoph Hellwig, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Thu, Jun 26, 2025 at 01:57:59PM +1000, Dave Chinner wrote: > writeback errors. Because scientists and data analysts that wrote > programs to chew through large amounts of data didn't care about > persistence of their data mid-processing. They just wanted what they > wrote to be there the next time the processing pipeline read it. That's only going to work if your RAM is as large as your permanent storage :) > IOWs, checking for a past writeback IO error is as simple as: > > if (sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE) < 0) { > /* An unreported writeback error was pending on the file */ > wb_err = -errno; > ...... > } > > This does not cause new IO to be issued, it only blocks on writeback > that is currently in progress, and it has no data integrity > requirements at all. If the writeback has already been done, all it > will do is sweep residual errors out to userspace..... Not quite. This will still wait for all I/O on the range, and given that sync_file_range treats a 0 length as the entire file that might actually do a significant amount of waiting. But yes, it's the closest we get right now. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-26 10:25 ` Christoph Hellwig @ 2025-06-26 22:22 ` Dave Chinner 2025-06-27 21:19 ` Matthew Wilcox 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2025-06-26 22:22 UTC (permalink / raw) To: Christoph Hellwig Cc: Yafang Shao, Jeff Layton, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Thu, Jun 26, 2025 at 03:25:21AM -0700, Christoph Hellwig wrote: > On Thu, Jun 26, 2025 at 01:57:59PM +1000, Dave Chinner wrote: > > writeback errors. Because scientists and data analysts that wrote > > programs to chew through large amounts of data didn't care about > > persistence of their data mid-processing. They just wanted what they > > wrote to be there the next time the processing pipeline read it. > > That's only going to work if your RAM is as large as your permanent > storage :) No, the old behaviour worked just fine with data sets larger than RAM. When there is a random writeback error in a big data stream, only those pages remained dirty and so never get tossed out of RAM. Hence when a re-read of that file range occurred, the data was already in RAM and the read succeeded, regardless of the fact that writeback has been failing. IOWs the behavioural problems that the user is reporting are present because we got rid of the historic XFS writeback error handling (leave the dirty pages in RAM and retry again later) and replaced it with the historic Linux behaviour (toss the data out and mark the mapping with an error). The result of this change is exactly what the OP is having problems with - reread of a range that had a writeback failure returns zeroes or garbage, not the original data. If we kept the original XFS behaviour, the user applications would handle these flakey writeback failures just fine... Put simply: we used to have more robust writeback failure handling than we do now. That could (and probably should) be considered a regression.... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-26 22:22 ` Dave Chinner @ 2025-06-27 21:19 ` Matthew Wilcox 0 siblings, 0 replies; 17+ messages in thread From: Matthew Wilcox @ 2025-06-27 21:19 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, Yafang Shao, Jeff Layton, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Fri, Jun 27, 2025 at 08:22:53AM +1000, Dave Chinner wrote: > On Thu, Jun 26, 2025 at 03:25:21AM -0700, Christoph Hellwig wrote: > > On Thu, Jun 26, 2025 at 01:57:59PM +1000, Dave Chinner wrote: > > > writeback errors. Because scientists and data analysts that wrote > > > programs to chew through large amounts of data didn't care about > > > persistence of their data mid-processing. They just wanted what they > > > wrote to be there the next time the processing pipeline read it. > > > > That's only going to work if your RAM is as large as your permanent > > storage :) > > No, the old behaviour worked just fine with data sets larger than > RAM. When there is a random writeback error in a big data stream, > only those pages remained dirty and so never get tossed out of RAM. Hence > when a re-read of that file range occurred, the data was already in > RAM and the read succeeded, regardless of the fact that writeback > has been failing. > > IOWs the behavioural problems that the user is reporting are present > because we got rid of the historic XFS writeback error handling > (leave the dirty pages in RAM and retry again later) and replaced it > with the historic Linux behaviour (toss the data out and mark the > mapping with an error). > > The result of this change is exactly what the OP is having problems > with - reread of a range that had a writeback failure returns zeroes > or garbage, not the original data. If we kept the original XFS > behaviour, the user applications would handle these flakey writeback > failures just fine... > > Put simply: we used to have more robust writeback failure handling > than we do now. That could (and probably should) be considered a > regression.... When you say "used to" and "the old behaviour", when are you referring to, exactly? When I came to XFS/iomap, the behaviour on writeback errors was to clear the Uptodate flag on writeback, which definitely did throw away the written data and forced a re-read from storage. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] xfs: report a writeback error on a read() call 2025-06-26 2:41 ` Yafang Shao 2025-06-26 3:57 ` Dave Chinner @ 2025-06-26 10:23 ` Christoph Hellwig 1 sibling, 0 replies; 17+ messages in thread From: Christoph Hellwig @ 2025-06-26 10:23 UTC (permalink / raw) To: Yafang Shao Cc: Jeff Layton, Christoph Hellwig, david, djwong, linux-fsdevel, linux-kernel, linux-xfs, yc1082463 On Thu, Jun 26, 2025 at 10:41:47AM +0800, Yafang Shao wrote: > As you mentioned earlier, calling fsync()/fdatasync() after every > write() blocks the thread, degrading performance—especially on HDDs. > However, this isn’t the main issue in practice. > The real problem is that users typically don’t understand "writeback > errors". If you warn them, "You should call fsync() because writeback > errors might occur," their response will likely be: "What the hell is > a writeback error?" > > For example, our users (a big data platform) demanded that we > immediately shut down the filesystem upon writeback errors. These > users are algorithm analysts who write Python/Java UDFs for custom > logic—often involving temporary disk writes followed by reads to pass > data downstream. Yet, most have no idea how these underlying processes > work. Well, if you want to immediately shutdown we should not report writeback errors but do a file system shutdown. Which given how we can't recover from them in general is the right default. > > Personally, I like the fcntl() idea better for this, but maybe we have > > other uses for a fsync2(). > > What do you expect users to do with this new fcntl() or fsync2()? Call > fsync2() after every write()? That would still require massive > application refactoring. That's why I'm asking what your intended use case for the writeback reporting is. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-06-27 21:19 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAN2Y7hyi1HCrSiKsDT+KD8hBjQmsqzNp71Q9Z_RmBG0LLaZxCA@mail.gmail.com> 2025-06-24 14:14 ` [PATCH] xfs: report a writeback error on a read() call Christoph Hellwig 2025-06-24 18:26 ` Jeff Layton 2025-06-24 19:56 ` Matthew Wilcox 2025-06-24 20:25 ` Jeff Layton 2025-06-25 2:44 ` Yafang Shao 2025-06-25 7:01 ` Christoph Hellwig 2025-06-25 10:40 ` Jeff Layton 2025-06-25 11:21 ` Christoph Hellwig 2025-06-25 11:49 ` Jeff Layton 2025-06-25 11:56 ` Christoph Hellwig 2025-06-25 14:06 ` Jeff Layton 2025-06-26 2:41 ` Yafang Shao 2025-06-26 3:57 ` Dave Chinner 2025-06-26 10:25 ` Christoph Hellwig 2025-06-26 22:22 ` Dave Chinner 2025-06-27 21:19 ` Matthew Wilcox 2025-06-26 10:23 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).