From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <cel@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>,
jonathan.flynn@hammerspace.com
Subject: Re: [PATCH v5 0/4] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
Date: Wed, 22 Oct 2025 13:00:46 -0400 [thread overview]
Message-ID: <aPkNvmXsgdNJtK_7@kernel.org> (raw)
In-Reply-To: <4a2ab6a7-9af5-4d86-9b54-34a4f4a9682d@kernel.org>
On Tue, Oct 21, 2025 at 09:35:32AM -0400, Chuck Lever wrote:
> On 10/21/25 7:12 AM, Jeff Layton wrote:
> > On Mon, 2025-10-20 at 12:44 -0400, Chuck Lever wrote:
> >> On 10/20/25 12:33 PM, Mike Snitzer wrote:
> >>> On Mon, Oct 20, 2025 at 12:25:42PM -0400, Chuck Lever wrote:
> >>> Just a bit concerned about removing IOCB_SYNC in that
> >>> we're altering stable_how to be NFS_FILE_SYNC.
> >> Commit 3f3503adb332 ("NFSD: Use vfs_iocb_iter_write()") introduces
> >> the first use of IOCB_ flags in NFSD's write path, and it uses
> >> IOCB_DSYNC. The patch has Reviewed-by's from Christoph, Neil, and
> >> Jeff.
> >>
> >> Should we be concerned that IOCB_DSYNC does not persist time stamp
> >> changes that might be lost during an unplanned server boot?
> >>
> >> As a reminder to the thread, Section 3.3.7 of RFC 1813 says:
> >>
> >> If stable is FILE_SYNC, the server must commit the data
> >> written plus all file system metadata to stable storage
> >> before returning results.
> >>
> >> The text is a bit blurry about whether "file system metadata" means
> >> all of the outstanding metadata changes for every file, or just the
> >> metadata changes for the target file handle.
> >>
> >> NFSD has historically treated DATA_SYNC and FILE_SYNC identically,
> >> as the Linux NFS client does not use DATA_SYNC (IIRC).
> >>
> >
> > Surely it just meant for the one file.
>
> Well yes that is the traditional understanding. I'm merely pointing out
> that the actual text is not quite as specific as what we've come to
> understand.
>
>
> > FILE_SYNC is only applicable to
> > WRITE/COMMIT operations and those only deal with a single file at a
> > time.
>
> True but you may recall that NFSD's COMMIT used to ignore the range
> arguments and flush the whole file. Some file systems used to flush
> all dirty data in this case, IIRC.
>
> There's always been a bit of a mismatch between the spec and what NFSD
> has implemented.
>
>
> > If the client gets back FILE_SYNC on a write, it should _not_
> > assume that all outstanding dirty data to all files has been sync'ed.
>
> Agreed.
>
> But back to Mike's point.
>
> - The spec says NFS_DATA_SYNC means persist file data.
>
> - The spec says NFS_FILE_SYNC means persist file data and file
> attributes.
>
> - After consulting with the section describing COMMIT, I think that
> COMMIT is supposed to persist both file data and attributes.
>
> And my reading of the code in fs/nfsd/vfs.c is that NFSD does the
> equivalent of NFS_DATA_SYNC in all of these cases, and has done for
> as long as I cared to chase the commit log.
>
> Moveover, commit 3f3503adb332 did not introduce this behavior.
>
> Previous to that commit, nfsd_vfs_write() passed RWF_SYNC to
> vfs_iter_write(). This API uses kiocb_set_rw_flags() to convert the RWF
> flag into an IOCB flag. kiocb_set_rw_flags does this:
>
> kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);
> if (flags & RWF_SYNC)
> kiocb_flags |= IOCB_DSYNC;
>
> And that's where I copied IOCB_DSYNC from. The use of RWF_SYNC was
> introduced in 2016 by commit 24368aad47dc ("nfsd: use RWF_SYNC").
>
> So we've tacitly agreed to let NFSD fall short of the specs in this
> regard for some time. However I don't believe this is documented
> anywhere.
>
> Based on this reasoning, IOCB_DSYNC is historically correct for the
> DIRECT WRITE path and its fallbacks. I'm guessing that an O_DIRECT WRITE
> is going to persist the written data but won't persist file attribute
> changes either.
>
> I'm open to making NFSD adhere more strictly to the spec language, but
> I bet there will be a performance impact. Maybe that impact will be
> unnoticeable on modern storage devices.
I was able to get NFSD Direct's use of IOCB_DSYNC|IOCB_SYNC tested
with Jonathan Flynn last week, there was no performance difference on
his modern test cluster (8 "enterprise" NVMe devices in the server).
Was very comforting to learn IOCB_SYNC didn't cause any performance
loss.
Mike
prev parent reply other threads:[~2025-10-22 17:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-20 16:25 [PATCH v5 0/4] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-10-20 16:25 ` [PATCH v5 1/4] NFSD: Enable return of an updated stable_how to NFS clients Chuck Lever
2025-10-20 16:25 ` [PATCH v5 2/4] NFSD: Refactor nfsd_vfs_write() Chuck Lever
2025-10-20 16:32 ` Mike Snitzer
2025-10-21 10:51 ` Jeff Layton
2025-10-20 16:25 ` [PATCH v5 3/4] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-10-25 0:10 ` kernel test robot
2025-10-20 16:25 ` [PATCH v5 4/4] svcrdma: Mark Read chunks Chuck Lever
2025-10-21 11:07 ` Jeff Layton
2025-10-20 16:33 ` [PATCH v5 0/4] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Mike Snitzer
2025-10-20 16:44 ` Chuck Lever
2025-10-20 18:20 ` Chuck Lever
2025-10-21 11:12 ` Jeff Layton
2025-10-21 13:35 ` Chuck Lever
2025-10-21 13:45 ` Jeff Layton
2025-10-21 13:53 ` Chuck Lever
2025-10-22 17:00 ` Mike Snitzer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPkNvmXsgdNJtK_7@kernel.org \
--to=snitzer@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=jonathan.flynn@hammerspace.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.