* [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec
@ 2025-10-22 16:22 Chuck Lever
2025-10-22 17:22 ` Tom Talpey
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Chuck Lever @ 2025-10-22 16:22 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever, Mike Snitzer
From: Chuck Lever <chuck.lever@oracle.com>
Mike noted that when NFSD responds to an NFS_FILE_SYNC WRITE, it
does not also persist file time stamps. To wit, Section 18.32.3
of RFC 8881 mandates:
> The client specifies with the stable parameter the method of how
> the data is to be processed by the server. If stable is
> FILE_SYNC4, the server MUST commit the data written plus all file
> system metadata to stable storage before returning results. This
> corresponds to the NFSv2 protocol semantics. Any other behavior
> constitutes a protocol violation. If stable is DATA_SYNC4, then
> the server MUST commit all of the data to stable storage and
> enough of the metadata to retrieve the data before returning.
For many years, NFSD has used a "data sync only" optimization for
FILE_SYNC WRITEs, so file time stamps haven't been persisted as the
mandate above requires.
Reported-by: Mike Snitzer <snitzer@kernel.org>
Closes: https://lore.kernel.org/linux-nfs/20251018005431.3403-1-cel@kernel.org/T/#t
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/vfs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
This would need to be applied to nfsd-testing before the DIRECT
WRITE patches. I'm guessing a Cc: stable would be needed as well.
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index f537a7b4ee01..2c5d38f38454 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1315,7 +1315,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
init_sync_kiocb(&kiocb, file);
kiocb.ki_pos = offset;
if (stable && !fhp->fh_use_wgather)
- kiocb.ki_flags |= IOCB_DSYNC;
+ kiocb.ki_flags |=
+ (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC);
nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 16:22 [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever @ 2025-10-22 17:22 ` Tom Talpey 2025-10-22 17:46 ` Mike Snitzer 2025-10-23 6:39 ` Christoph Hellwig 2025-10-22 17:27 ` Mike Snitzer 2025-10-23 6:38 ` Christoph Hellwig 2 siblings, 2 replies; 11+ messages in thread From: Tom Talpey @ 2025-10-22 17:22 UTC (permalink / raw) To: Chuck Lever, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo Cc: linux-nfs, Chuck Lever, Mike Snitzer There needs to be some statement of the performance consequences of removing this "optimization". I'm going to predict it's significant on rotating media, and should not go unmeasured. The commit message needs to more bluntly state the fact that the server has been violating the quoted MUST. See previous paragraph. Tom. On 10/22/2025 12:22 PM, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > Mike noted that when NFSD responds to an NFS_FILE_SYNC WRITE, it > does not also persist file time stamps. To wit, Section 18.32.3 > of RFC 8881 mandates: > >> The client specifies with the stable parameter the method of how >> the data is to be processed by the server. If stable is >> FILE_SYNC4, the server MUST commit the data written plus all file >> system metadata to stable storage before returning results. This >> corresponds to the NFSv2 protocol semantics. Any other behavior >> constitutes a protocol violation. If stable is DATA_SYNC4, then >> the server MUST commit all of the data to stable storage and >> enough of the metadata to retrieve the data before returning. > > For many years, NFSD has used a "data sync only" optimization for > FILE_SYNC WRITEs, so file time stamps haven't been persisted as the > mandate above requires. > > Reported-by: Mike Snitzer <snitzer@kernel.org> > Closes: https://lore.kernel.org/linux-nfs/20251018005431.3403-1-cel@kernel.org/T/#t > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/vfs.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > This would need to be applied to nfsd-testing before the DIRECT > WRITE patches. I'm guessing a Cc: stable would be needed as well. > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index f537a7b4ee01..2c5d38f38454 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1315,7 +1315,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > init_sync_kiocb(&kiocb, file); > kiocb.ki_pos = offset; > if (stable && !fhp->fh_use_wgather) > - kiocb.ki_flags |= IOCB_DSYNC; > + kiocb.ki_flags |= > + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); > > nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 17:22 ` Tom Talpey @ 2025-10-22 17:46 ` Mike Snitzer 2025-10-22 18:25 ` Tom Talpey 2025-10-23 6:39 ` Christoph Hellwig 1 sibling, 1 reply; 11+ messages in thread From: Mike Snitzer @ 2025-10-22 17:46 UTC (permalink / raw) To: Tom Talpey Cc: Chuck Lever, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, linux-nfs, Chuck Lever On Wed, Oct 22, 2025 at 01:22:52PM -0400, Tom Talpey wrote: > There needs to be some statement of the performance consequences of > removing this "optimization". I'm going to predict it's significant > on rotating media, and should not go unmeasured. I agree that rotational storage will likely see an impact. But Linux's more recent (now like 14 years ago) FLUSH+FUA rework really helped improve things -- that was a major undertaking where Christoph, Jens and others really did a fantastic job of breathing new life into Linux performance on modern rotational storage. Related blast from the past: https://lwn.net/Articles/457667/ My point: may not be as grim as we think... but there is a difference between SATA rotational storage and "enterprise" rotational storage (e.g. NetApp or EMC fronted by elaborate caching that is configured to report wbc=0 because they have battery backed cache) Mike ps: I don't have any rotational storage to test, not it! > The commit message needs to more bluntly state the fact that the > server has been violating the quoted MUST. See previous paragraph. > > Tom. > > On 10/22/2025 12:22 PM, Chuck Lever wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > Mike noted that when NFSD responds to an NFS_FILE_SYNC WRITE, it > > does not also persist file time stamps. To wit, Section 18.32.3 > > of RFC 8881 mandates: > > > > > The client specifies with the stable parameter the method of how > > > the data is to be processed by the server. If stable is > > > FILE_SYNC4, the server MUST commit the data written plus all file > > > system metadata to stable storage before returning results. This > > > corresponds to the NFSv2 protocol semantics. Any other behavior > > > constitutes a protocol violation. If stable is DATA_SYNC4, then > > > the server MUST commit all of the data to stable storage and > > > enough of the metadata to retrieve the data before returning. > > > > For many years, NFSD has used a "data sync only" optimization for > > FILE_SYNC WRITEs, so file time stamps haven't been persisted as the > > mandate above requires. > > > > Reported-by: Mike Snitzer <snitzer@kernel.org> > > Closes: https://lore.kernel.org/linux-nfs/20251018005431.3403-1-cel@kernel.org/T/#t > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > --- > > fs/nfsd/vfs.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > This would need to be applied to nfsd-testing before the DIRECT > > WRITE patches. I'm guessing a Cc: stable would be needed as well. > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index f537a7b4ee01..2c5d38f38454 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -1315,7 +1315,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > > init_sync_kiocb(&kiocb, file); > > kiocb.ki_pos = offset; > > if (stable && !fhp->fh_use_wgather) > > - kiocb.ki_flags |= IOCB_DSYNC; > > + kiocb.ki_flags |= > > + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); > > nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 17:46 ` Mike Snitzer @ 2025-10-22 18:25 ` Tom Talpey 2025-10-22 18:39 ` Mike Snitzer 0 siblings, 1 reply; 11+ messages in thread From: Tom Talpey @ 2025-10-22 18:25 UTC (permalink / raw) To: Mike Snitzer Cc: Chuck Lever, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, linux-nfs, Chuck Lever On 10/22/2025 1:46 PM, Mike Snitzer wrote: > On Wed, Oct 22, 2025 at 01:22:52PM -0400, Tom Talpey wrote: >> There needs to be some statement of the performance consequences of >> removing this "optimization". I'm going to predict it's significant >> on rotating media, and should not go unmeasured. > > I agree that rotational storage will likely see an impact. But > Linux's more recent (now like 14 years ago) FLUSH+FUA rework really > helped improve things -- that was a major undertaking where Christoph, > Jens and others really did a fantastic job of breathing new life into > Linux performance on modern rotational storage. I agree, but I think honesty requires it to be measured. Writing the metadata has to wait for the data to make it to the same storage, this can roughly double the total latency. If it's small (NVMe), great. This kind of "optimization" was tried by pretty much every server vendor in the early NFSv3 days, I recall many amusing scenes at old Connectathons where the schemes, or completely unaware servers were exposed. There was no place to hide when operation latencies were measured in tens of milliseconds. Same game today, just shifted. > Related blast from the past: https://lwn.net/Articles/457667/ > > My point: may not be as grim as we think... > but there is a difference between SATA rotational storage and > "enterprise" rotational storage (e.g. NetApp or EMC fronted by > elaborate caching that is configured to report wbc=0 because they have > battery backed cache) Yep. But this is upstream, right? It can't assume. To be clear - I totally agree with the change. Only concern is stating why it's so important, in the face of performance. Tom. > > Mike > > ps: I don't have any rotational storage to test, not it! > >> The commit message needs to more bluntly state the fact that the >> server has been violating the quoted MUST. See previous paragraph. >> >> Tom. >> >> On 10/22/2025 12:22 PM, Chuck Lever wrote: >>> From: Chuck Lever <chuck.lever@oracle.com> >>> >>> Mike noted that when NFSD responds to an NFS_FILE_SYNC WRITE, it >>> does not also persist file time stamps. To wit, Section 18.32.3 >>> of RFC 8881 mandates: >>> >>>> The client specifies with the stable parameter the method of how >>>> the data is to be processed by the server. If stable is >>>> FILE_SYNC4, the server MUST commit the data written plus all file >>>> system metadata to stable storage before returning results. This >>>> corresponds to the NFSv2 protocol semantics. Any other behavior >>>> constitutes a protocol violation. If stable is DATA_SYNC4, then >>>> the server MUST commit all of the data to stable storage and >>>> enough of the metadata to retrieve the data before returning. >>> >>> For many years, NFSD has used a "data sync only" optimization for >>> FILE_SYNC WRITEs, so file time stamps haven't been persisted as the >>> mandate above requires. >>> >>> Reported-by: Mike Snitzer <snitzer@kernel.org> >>> Closes: https://lore.kernel.org/linux-nfs/20251018005431.3403-1-cel@kernel.org/T/#t >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>> --- >>> fs/nfsd/vfs.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> This would need to be applied to nfsd-testing before the DIRECT >>> WRITE patches. I'm guessing a Cc: stable would be needed as well. >>> >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>> index f537a7b4ee01..2c5d38f38454 100644 >>> --- a/fs/nfsd/vfs.c >>> +++ b/fs/nfsd/vfs.c >>> @@ -1315,7 +1315,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, >>> init_sync_kiocb(&kiocb, file); >>> kiocb.ki_pos = offset; >>> if (stable && !fhp->fh_use_wgather) >>> - kiocb.ki_flags |= IOCB_DSYNC; >>> + kiocb.ki_flags |= >>> + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); >>> nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); >>> iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 18:25 ` Tom Talpey @ 2025-10-22 18:39 ` Mike Snitzer 0 siblings, 0 replies; 11+ messages in thread From: Mike Snitzer @ 2025-10-22 18:39 UTC (permalink / raw) To: Tom Talpey Cc: Chuck Lever, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, linux-nfs, Chuck Lever On Wed, Oct 22, 2025 at 02:25:42PM -0400, Tom Talpey wrote: > On 10/22/2025 1:46 PM, Mike Snitzer wrote: > > On Wed, Oct 22, 2025 at 01:22:52PM -0400, Tom Talpey wrote: > > > There needs to be some statement of the performance consequences of > > > removing this "optimization". I'm going to predict it's significant > > > on rotating media, and should not go unmeasured. > > > > I agree that rotational storage will likely see an impact. But > > Linux's more recent (now like 14 years ago) FLUSH+FUA rework really > > helped improve things -- that was a major undertaking where Christoph, > > Jens and others really did a fantastic job of breathing new life into > > Linux performance on modern rotational storage. > > I agree, but I think honesty requires it to be measured. Writing > the metadata has to wait for the data to make it to the same storage, > this can roughly double the total latency. If it's small (NVMe), great. > > This kind of "optimization" was tried by pretty much every server > vendor in the early NFSv3 days, I recall many amusing scenes at > old Connectathons where the schemes, or completely unaware servers > were exposed. There was no place to hide when operation latencies > were measured in tens of milliseconds. Same game today, just shifted. > > > Related blast from the past: https://lwn.net/Articles/457667/ > > > > My point: may not be as grim as we think... > > but there is a difference between SATA rotational storage and > > "enterprise" rotational storage (e.g. NetApp or EMC fronted by > > elaborate caching that is configured to report wbc=0 because they have > > battery backed cache) > > Yep. But this is upstream, right? It can't assume. Correct, but catering to really old and slow rotational storage isn't (or shouldn't be) a priority. > To be clear - I totally agree with the change. Only concern is > stating why it's so important, in the face of performance. Yes. In the end, data safety is priority 1, eeking out best performance is secondary. I think all of us can agree on that (even those who have really old/slow rotational storage that will get whiplash from this change). Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 17:22 ` Tom Talpey 2025-10-22 17:46 ` Mike Snitzer @ 2025-10-23 6:39 ` Christoph Hellwig 1 sibling, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2025-10-23 6:39 UTC (permalink / raw) To: Tom Talpey Cc: Chuck Lever, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, linux-nfs, Chuck Lever, Mike Snitzer On Wed, Oct 22, 2025 at 01:22:52PM -0400, Tom Talpey wrote: > There needs to be some statement of the performance consequences of > removing this "optimization". I'm going to predict it's significant > on rotating media, and should not go unmeasured. This has absolutely nothing to do with rotating media. The only difference is for pure overwrites. If you have e.g. a database workload that zeroes once and then overwrites you will see a difference on SSD and HDD. For anything that extends files, fills holes etc you won't see any difference. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 16:22 [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever 2025-10-22 17:22 ` Tom Talpey @ 2025-10-22 17:27 ` Mike Snitzer 2025-10-28 22:28 ` Dave Chinner 2025-10-23 6:38 ` Christoph Hellwig 2 siblings, 1 reply; 11+ messages in thread From: Mike Snitzer @ 2025-10-22 17:27 UTC (permalink / raw) To: Chuck Lever Cc: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever, linux-xfs, Dave Chinner, honza On Wed, Oct 22, 2025 at 12:22:37PM -0400, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > Mike noted that when NFSD responds to an NFS_FILE_SYNC WRITE, it > does not also persist file time stamps. To wit, Section 18.32.3 > of RFC 8881 mandates: > > > The client specifies with the stable parameter the method of how > > the data is to be processed by the server. If stable is > > FILE_SYNC4, the server MUST commit the data written plus all file > > system metadata to stable storage before returning results. This > > corresponds to the NFSv2 protocol semantics. Any other behavior > > constitutes a protocol violation. If stable is DATA_SYNC4, then > > the server MUST commit all of the data to stable storage and > > enough of the metadata to retrieve the data before returning. > > For many years, NFSD has used a "data sync only" optimization for > FILE_SYNC WRITEs, so file time stamps haven't been persisted as the > mandate above requires. > > Reported-by: Mike Snitzer <snitzer@kernel.org> > Closes: https://lore.kernel.org/linux-nfs/20251018005431.3403-1-cel@kernel.org/T/#t > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/vfs.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > This would need to be applied to nfsd-testing before the DIRECT > WRITE patches. I'm guessing a Cc: stable would be needed as well. > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index f537a7b4ee01..2c5d38f38454 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1315,7 +1315,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > init_sync_kiocb(&kiocb, file); > kiocb.ki_pos = offset; > if (stable && !fhp->fh_use_wgather) > - kiocb.ki_flags |= IOCB_DSYNC; > + kiocb.ki_flags |= > + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); > > nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); > -- > 2.51.0 > I agree with this change. And as I just replied elsewhere, IOCB_SYNC doesn't cause a performance drop (at least not on modern systems with NVMe): https://lore.kernel.org/linux-nfs/aPkNvmXsgdNJtK_7@kernel.org/ Only question I have: does IOCB_SYNC _always_ imply IOCB_DSYNC (for VFS and all filesystems)? Or should we be setting IOCB_DSYNC|IOCB_SYNC ? (I took to setting both for NFSD Direct, and NFS LOCALIO sets both.. that was done by original LOCALIO author) Basis for my question, is that there was a recent XFS performance improvement made by Dave Chinner, reported by Jan Kara, for DIO+DSYNC, see commit c91d38b57f2c4 ("xfs: rework datasync tracking and execution"). Will DIO + IOCB_SYNC get the benefit of DIO + IOCB_DSYNC relative to this XFS improvement? I tried to review that but wasn't able to spend enough time on it to be convinced that to be the case. Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 17:27 ` Mike Snitzer @ 2025-10-28 22:28 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2025-10-28 22:28 UTC (permalink / raw) To: Mike Snitzer Cc: Chuck Lever, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever, linux-xfs, honza On Wed, Oct 22, 2025 at 01:27:54PM -0400, Mike Snitzer wrote: > On Wed, Oct 22, 2025 at 12:22:37PM -0400, Chuck Lever wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > Mike noted that when NFSD responds to an NFS_FILE_SYNC WRITE, it > > does not also persist file time stamps. To wit, Section 18.32.3 > > of RFC 8881 mandates: > > > > > The client specifies with the stable parameter the method of how > > > the data is to be processed by the server. If stable is > > > FILE_SYNC4, the server MUST commit the data written plus all file > > > system metadata to stable storage before returning results. This > > > corresponds to the NFSv2 protocol semantics. Any other behavior > > > constitutes a protocol violation. If stable is DATA_SYNC4, then > > > the server MUST commit all of the data to stable storage and > > > enough of the metadata to retrieve the data before returning. > > > > For many years, NFSD has used a "data sync only" optimization for > > FILE_SYNC WRITEs, so file time stamps haven't been persisted as the > > mandate above requires. > > > > Reported-by: Mike Snitzer <snitzer@kernel.org> > > Closes: https://lore.kernel.org/linux-nfs/20251018005431.3403-1-cel@kernel.org/T/#t > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > --- > > fs/nfsd/vfs.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > This would need to be applied to nfsd-testing before the DIRECT > > WRITE patches. I'm guessing a Cc: stable would be needed as well. > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index f537a7b4ee01..2c5d38f38454 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -1315,7 +1315,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > > init_sync_kiocb(&kiocb, file); > > kiocb.ki_pos = offset; > > if (stable && !fhp->fh_use_wgather) > > - kiocb.ki_flags |= IOCB_DSYNC; > > + kiocb.ki_flags |= > > + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); > > > > nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); > > -- > > 2.51.0 > > > > I agree with this change. And as I just replied elsewhere, IOCB_SYNC > doesn't cause a performance drop (at least not on modern systems with > NVMe): https://lore.kernel.org/linux-nfs/aPkNvmXsgdNJtK_7@kernel.org/ Well, that depends on the underlying file layout. If the test was doing IO that required allocation or unwritten extent modification, the IOCB_SYNC performance is identical to IOCB_DSYNC because they both have to stabilise metadata needed to access the file data. However, if it is a pure overwrite (i.e. writing into already written space) then the only filesystem metadata update that occurs is timestamps. At this point, IOCB_DSYNC is *much* faster than IOCB_SYNC because it does not require a journal flush to stabilise metadata. So if you didn't see any change in performance between DSYNC and SYNC writes, then is likely that the the tests did not exercise the pure overwrite path which many high performance applications optimise for (e.g. databases). IOWs, I'd definitely expect performance regressions to be reported by users from this change further down the line... > Only question I have: > does IOCB_SYNC _always_ imply IOCB_DSYNC (for VFS and all > filesystems)? Or should we be setting IOCB_DSYNC|IOCB_SYNC ? Yes - "all metadata" (_SYNC) has always been a super set of "enough metadata to retreive the data" (_DSYNC)... > (I took to setting both for NFSD Direct, and NFS LOCALIO sets > both.. that was done by original LOCALIO author) That's a (harmless) bug. > Basis for my question, is that there was a recent XFS performance > improvement made by Dave Chinner, reported by Jan Kara, for > DIO+DSYNC, see commit c91d38b57f2c4 ("xfs: rework datasync tracking > and execution"). Will DIO + IOCB_SYNC get the benefit of DIO + > IOCB_DSYNC relative to this XFS improvement? Yes, it will. But IOCB_SYNC will still be much slower than IOCB_DSYNC for pure overwrites.... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-22 16:22 [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever 2025-10-22 17:22 ` Tom Talpey 2025-10-22 17:27 ` Mike Snitzer @ 2025-10-23 6:38 ` Christoph Hellwig 2025-10-23 12:46 ` Chuck Lever 2 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2025-10-23 6:38 UTC (permalink / raw) To: Chuck Lever Cc: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever, Mike Snitzer On Wed, Oct 22, 2025 at 12:22:37PM -0400, Chuck Lever wrote: > - kiocb.ki_flags |= IOCB_DSYNC; > + kiocb.ki_flags |= > + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); IOCB_SYNC without IOCB_DSYNC is always wrong, see my reply to the previous thread. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-23 6:38 ` Christoph Hellwig @ 2025-10-23 12:46 ` Chuck Lever 2025-10-23 13:34 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Chuck Lever @ 2025-10-23 12:46 UTC (permalink / raw) To: Christoph Hellwig Cc: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever, Mike Snitzer On 10/23/25 2:38 AM, Christoph Hellwig wrote: > On Wed, Oct 22, 2025 at 12:22:37PM -0400, Chuck Lever wrote: >> - kiocb.ki_flags |= IOCB_DSYNC; >> + kiocb.ki_flags |= >> + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); > > IOCB_SYNC without IOCB_DSYNC is always wrong, see my reply to the > previous thread. > I see only one API consumer that sets both: NFS LOCALIO. There are not enough examples for me to draw any kind of conclusion, and I haven't found documentation for these flags, fwiw. But I'll fix this up in the next revision. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec 2025-10-23 12:46 ` Chuck Lever @ 2025-10-23 13:34 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2025-10-23 13:34 UTC (permalink / raw) To: Chuck Lever Cc: Christoph Hellwig, NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever, Mike Snitzer On Thu, Oct 23, 2025 at 08:46:02AM -0400, Chuck Lever wrote: > On 10/23/25 2:38 AM, Christoph Hellwig wrote: > > On Wed, Oct 22, 2025 at 12:22:37PM -0400, Chuck Lever wrote: > >> - kiocb.ki_flags |= IOCB_DSYNC; > >> + kiocb.ki_flags |= > >> + (stable == NFS_FILE_SYNC ? IOCB_SYNC : IOCB_DSYNC); > > > > IOCB_SYNC without IOCB_DSYNC is always wrong, see my reply to the > > previous thread. > > > > I see only one API consumer that sets both: NFS LOCALIO. There are not > enough examples for me to draw any kind of conclusion, and I haven't > found documentation for these flags, fwiw. The primary way to set IOCB_DSYNC and IOCB_SYNC is iocb_flags() in include/linux/fs.h, which sets IOCB_DSYNC for O_DSYNC, and IOCB_SYNC for __O_SYNC. O_SYNC is defined as O_DSYNC | __O_SYNC for the historic reasons explained in the other thread. The main consumer of IOCB_DSYNC and IOCB_SYNC is generic_write_sync in include/linux/fs.h, which calls into vfs_fsync_range if IOCB_DSYNC is set (or IS_SYNC() is true on the inode, but that's irrelevant here), and only uses IOCB_SYNC to clear the datasync flag to vfs_fsync_range if it is set, i.e. no syncing is done without IOCB_DSYNC. As for other uses of these flags: fuse_write_flags should be assigning __O_SYNC instead of O_SYNC for clarify, but the code is not incorrect. __iomap_dio_rw does the right thing in checking IOCB_SYNC for controlling the REQ_FUA use. netfs_perform_write should just be checking IOCB_DYSNC, but that whole area looks bogus. I'll follow up with Dave. nfs_do_local_write does the right thing assuming nfs stable writes want O_SYNC. I don't understand what ovl_write_iter tries to do from just reading the code. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-10-28 22:28 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-22 16:22 [RFC PATCH] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever 2025-10-22 17:22 ` Tom Talpey 2025-10-22 17:46 ` Mike Snitzer 2025-10-22 18:25 ` Tom Talpey 2025-10-22 18:39 ` Mike Snitzer 2025-10-23 6:39 ` Christoph Hellwig 2025-10-22 17:27 ` Mike Snitzer 2025-10-28 22:28 ` Dave Chinner 2025-10-23 6:38 ` Christoph Hellwig 2025-10-23 12:46 ` Chuck Lever 2025-10-23 13:34 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.