* [PATCH] nfsd: sample writeback error cursor before async COPY loop
@ 2026-05-22 21:45 Chuck Lever
2026-05-22 21:49 ` Jeff Layton
0 siblings, 1 reply; 2+ messages in thread
From: Chuck Lever @ 2026-05-22 21:45 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever, sashiko-bot
From: Chuck Lever <chuck.lever@oracle.com>
_nfsd_copy_file_range() samples dst->f_wb_err into "since"
after the copy loop, then uses it to detect writeback errors
via filemap_check_wb_err() once vfs_fsync_range() returns.
Because the nfsd_file cache reuses a single struct file
across requests targeting the same inode, a concurrent
COMMIT or stable WRITE on dst advances dst->f_wb_err to the
current mapping->wb_err via file_check_and_advance_wb_err()
during its own vfs_fsync_range(). If that advancement lands
between the writeback error appearing in mapping->wb_err
and the COPY worker sampling "since", the worker captures
the already-advanced cursor, errseq_check() sees cur ==
since and returns zero, and NFSD4_COPY_F_COMMITTED is set
even though writeback failed. CB_OFFLOAD then encodes
wr_stable_how = FILE_SYNC4, the client treats the copied
data as durable, and the failure becomes silent data loss.
Sample since once at the start of the function. The cursor
then reflects state in effect before this COPY issues any
writes, and filemap_check_wb_err() detects any error that
occurs during the copy regardless of which thread first
observes it. This matches the pattern used by
nfsd_vfs_write() and nfsd4_clone_file_range().
Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260522194441.436065-1-cel@kernel.org?part=1
Fixes: 555dbf1a9aac ("nfsd: Replace use of rwsem with errseq_t")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 93fcaf90d6ae..3024d51d6fb7 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1950,6 +1950,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy,
/* See RFC 7862 p.67: */
if (bytes_total == 0)
bytes_total = ULLONG_MAX;
+ since = READ_ONCE(dst->f_wb_err);
do {
/* Only async copies can be stopped here */
if (kthread_should_stop())
@@ -1965,7 +1966,6 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy,
} while (bytes_total > 0 && nfsd4_copy_is_async(copy));
/* for a non-zero asynchronous copy do a commit of data */
if (nfsd4_copy_is_async(copy) && copy->cp_res.wr_bytes_written > 0) {
- since = READ_ONCE(dst->f_wb_err);
end = copy->cp_dst_pos + copy->cp_res.wr_bytes_written - 1;
status = vfs_fsync_range(dst, copy->cp_dst_pos, end, 0);
if (!status)
--
2.54.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] nfsd: sample writeback error cursor before async COPY loop
2026-05-22 21:45 [PATCH] nfsd: sample writeback error cursor before async COPY loop Chuck Lever
@ 2026-05-22 21:49 ` Jeff Layton
0 siblings, 0 replies; 2+ messages in thread
From: Jeff Layton @ 2026-05-22 21:49 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever, sashiko-bot
On Fri, 2026-05-22 at 17:45 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> _nfsd_copy_file_range() samples dst->f_wb_err into "since"
> after the copy loop, then uses it to detect writeback errors
> via filemap_check_wb_err() once vfs_fsync_range() returns.
> Because the nfsd_file cache reuses a single struct file
> across requests targeting the same inode, a concurrent
> COMMIT or stable WRITE on dst advances dst->f_wb_err to the
> current mapping->wb_err via file_check_and_advance_wb_err()
> during its own vfs_fsync_range(). If that advancement lands
> between the writeback error appearing in mapping->wb_err
> and the COPY worker sampling "since", the worker captures
> the already-advanced cursor, errseq_check() sees cur ==
> since and returns zero, and NFSD4_COPY_F_COMMITTED is set
> even though writeback failed. CB_OFFLOAD then encodes
> wr_stable_how = FILE_SYNC4, the client treats the copied
> data as durable, and the failure becomes silent data loss.
>
> Sample since once at the start of the function. The cursor
> then reflects state in effect before this COPY issues any
> writes, and filemap_check_wb_err() detects any error that
> occurs during the copy regardless of which thread first
> observes it. This matches the pattern used by
> nfsd_vfs_write() and nfsd4_clone_file_range().
>
> Reported-by: sashiko-bot <sashiko-bot@kernel.org>
> Closes: https://sashiko.dev/#/patchset/20260522194441.436065-1-cel@kernel.org?part=1
> Fixes: 555dbf1a9aac ("nfsd: Replace use of rwsem with errseq_t")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4proc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 93fcaf90d6ae..3024d51d6fb7 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1950,6 +1950,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy,
> /* See RFC 7862 p.67: */
> if (bytes_total == 0)
> bytes_total = ULLONG_MAX;
> + since = READ_ONCE(dst->f_wb_err);
> do {
> /* Only async copies can be stopped here */
> if (kthread_should_stop())
> @@ -1965,7 +1966,6 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy,
> } while (bytes_total > 0 && nfsd4_copy_is_async(copy));
> /* for a non-zero asynchronous copy do a commit of data */
> if (nfsd4_copy_is_async(copy) && copy->cp_res.wr_bytes_written > 0) {
> - since = READ_ONCE(dst->f_wb_err);
> end = copy->cp_dst_pos + copy->cp_res.wr_bytes_written - 1;
> status = vfs_fsync_range(dst, copy->cp_dst_pos, end, 0);
> if (!status)
That makes a lot more sense.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-22 21:50 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 21:45 [PATCH] nfsd: sample writeback error cursor before async COPY loop Chuck Lever
2026-05-22 21:49 ` Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox