From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <cel@kernel.org>, Mike Snitzer <snitzer@kernel.org>
Cc: NeilBrown <neil@brown.name>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH v1] NFSD: Enable return of an updated stable_how to NFS clients
Date: Wed, 15 Oct 2025 14:11:56 -0400 [thread overview]
Message-ID: <138caf9b98325952919d37119c1d2938a8f4f950.camel@kernel.org> (raw)
In-Reply-To: <c95b46b6-5db4-4588-ac79-4f6d38df0ae2@kernel.org>
On Wed, 2025-10-15 at 14:00 -0400, Chuck Lever wrote:
> On 10/15/25 12:53 PM, Mike Snitzer wrote:
> > On Mon, Oct 13, 2025 at 03:01:13PM -0400, Chuck Lever wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > >
> > > In a subsequent patch, nfsd_vfs_write() will promote an UNSTABLE
> > > WRITE to be a FILE_SYNC WRITE. This indicates that the client does
> > > not need a subsequent COMMIT operation, saving a round trip and
> > > allowing the client to dispense with cached dirty data as soon as
> > > it receives the server's WRITE response.
> > >
> > > This patch refactors nfsd_vfs_write() to return a possibly modified
> > > stable_how value to its callers. No behavior change is expected.
> > >
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > > fs/nfsd/nfs3proc.c | 2 +-
> > > fs/nfsd/nfs4proc.c | 2 +-
> > > fs/nfsd/nfsproc.c | 3 ++-
> > > fs/nfsd/vfs.c | 11 ++++++-----
> > > fs/nfsd/vfs.h | 6 ++++--
> > > fs/nfsd/xdr3.h | 2 +-
> > > 6 files changed, 15 insertions(+), 11 deletions(-)
> > >
> > > Here's what I had in mind, based on a patch I did a year ago but
> > > never posted.
> > >
> > > If nfsd_vfs_write() promotes an NFS_UNSTABLE write to NFS_FILE_SYNC,
> > > it can now set *stable_how and the WRITE encoders will return the
> > > updated value to the client.
> > >
> > >
> > > diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> > > index b6d03e1ef5f7..ad14b34583bb 100644
> > > --- a/fs/nfsd/nfs3proc.c
> > > +++ b/fs/nfsd/nfs3proc.c
> > > @@ -236,7 +236,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp)
> > > resp->committed = argp->stable;
> > > resp->status = nfsd_write(rqstp, &resp->fh, argp->offset,
> > > &argp->payload, &cnt,
> > > - resp->committed, resp->verf);
> > > + &resp->committed, resp->verf);
> > > resp->count = cnt;
> > > resp->status = nfsd3_map_status(resp->status);
> > > return rpc_success;
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index 7f7e6bb23a90..2222bb283baf 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -1285,7 +1285,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > write->wr_how_written = write->wr_stable_how;
> > > status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf,
> > > write->wr_offset, &write->wr_payload,
> > > - &cnt, write->wr_how_written,
> > > + &cnt, &write->wr_how_written,
> > > (__be32 *)write->wr_verifier.data);
> > > nfsd_file_put(nf);
> > >
> > > diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> > > index 8f71f5748c75..706401ed6f8d 100644
> > > --- a/fs/nfsd/nfsproc.c
> > > +++ b/fs/nfsd/nfsproc.c
> > > @@ -251,6 +251,7 @@ nfsd_proc_write(struct svc_rqst *rqstp)
> > > struct nfsd_writeargs *argp = rqstp->rq_argp;
> > > struct nfsd_attrstat *resp = rqstp->rq_resp;
> > > unsigned long cnt = argp->len;
> > > + u32 committed = NFS_DATA_SYNC;
> > >
> > > dprintk("nfsd: WRITE %s %u bytes at %d\n",
> > > SVCFH_fmt(&argp->fh),
> > > @@ -258,7 +259,7 @@ nfsd_proc_write(struct svc_rqst *rqstp)
> > >
> > > fh_copy(&resp->fh, &argp->fh);
> > > resp->status = nfsd_write(rqstp, &resp->fh, argp->offset,
> > > - &argp->payload, &cnt, NFS_DATA_SYNC, NULL);
> > > + &argp->payload, &cnt, &committed, NULL);
> > > if (resp->status == nfs_ok)
> > > resp->status = fh_getattr(&resp->fh, &resp->stat);
> > > else if (resp->status == nfserr_jukebox)
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index f537a7b4ee01..8b2dc7a88aab 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -1262,7 +1262,7 @@ static int wait_for_concurrent_writes(struct file *file)
> > > * @offset: Byte offset of start
> > > * @payload: xdr_buf containing the write payload
> > > * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
> > > - * @stable: An NFS stable_how value
> > > + * @stable_how: IN: Client's requested stable_how, OUT: Actual stable_how
> > > * @verf: NFS WRITE verifier
> > > *
> > > * Upon return, caller must invoke fh_put on @fhp.
> > > @@ -1274,11 +1274,12 @@ __be32
> > > nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > struct nfsd_file *nf, loff_t offset,
> > > const struct xdr_buf *payload, unsigned long *cnt,
> > > - int stable, __be32 *verf)
> > > + u32 *stable_how, __be32 *verf)
> > > {
> > > struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > > struct file *file = nf->nf_file;
> > > struct super_block *sb = file_inode(file)->i_sb;
> > > + u32 stable = *stable_how;
> > > struct kiocb kiocb;
> > > struct svc_export *exp;
> > > struct iov_iter iter;
> >
> > Seems we need this instead here?
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 5e5e5187d2e5..d0c89f8fdb57 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1479,7 +1479,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > struct file *file = nf->nf_file;
> > struct super_block *sb = file_inode(file)->i_sb;
> > - u32 stable = *stable_how;
> > + u32 stable;
> > struct kiocb kiocb;
> > struct svc_export *exp;
> > errseq_t since;
> > @@ -1511,7 +1511,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > exp = fhp->fh_export;
> >
> > if (!EX_ISSYNC(exp))
> > - stable = NFS_UNSTABLE;
> > + *stable_how = NFS_UNSTABLE;
> > + stable = *stable_how;
> > init_sync_kiocb(&kiocb, file);
> > kiocb.ki_pos = offset;
> > if (stable && !fhp->fh_use_wgather)
> >
> > Otherwise, isn't there potential for nfsd_vfs_write's NFS_UNSTABLE
> > override to cause a client set stable_how, that was set to something
> > other than NFS_UNSTABLE, to silently _not_ be respected by NFSD? But
> > client could assume COMMIT not needed? And does this then elevate this
> > patch to be a stable@ fix? (no pun intended).
> >
> > If not, please help me understand why.
>
> The protocol permits an NFS server to change the stable_how field in a
> WRITE response as follows:
>
> UNSTABLE -> DATA_SYNC
> UNSTABLE -> FILE_SYNC
> DATA_SYNC -> FILE_SYNC
>
> It forbids the reverse direction. A client that asks for a FILE_SYNC
> WRITE MUST get a FILE_SYNC reply.
>
> What the EX_ISSYNC test is doing is looking for the "async" export
> option. When that option is specified, internally NFSD converts all
> WRITEs, including FILE_SYNC WRITEs, to UNSTABLE. It then complies with
> the protocol by not telling the client about this invalid change and
> defies the protocol by not ensuring FILE_SYNC WRITEs are persisted
> before replying. See exports(5).
>
> In these cases, each WRITE response still reflects what the client
> requested, but it does not reflect what the server actually did.
>
> We tell anyone who will listen (and even those who won't) never to use
> the "async" export option because of the silent data corruption risk it
> introduces. But it goes faster than using the fully protocol-compliant
> "sync" export option, so people use it anyway.
>
>
Somewhat related question:
Since we're on track to deprecate NFSv2 support soon, should we be
looking at deprecating the "async" export option too? v2 was the main
reason for it in the first place, after all.
> > Thanks!
> >
> > > @@ -1434,7 +1435,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > * @offset: Byte offset of start
> > > * @payload: xdr_buf containing the write payload
> > > * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
> > > - * @stable: An NFS stable_how value
> > > + * @stable_how: IN: Client's requested stable_how, OUT: Actual stable_how
> > > * @verf: NFS WRITE verifier
> > > *
> > > * Upon return, caller must invoke fh_put on @fhp.
> > > @@ -1444,7 +1445,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > */
> > > __be32
> > > nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
> > > - const struct xdr_buf *payload, unsigned long *cnt, int stable,
> > > + const struct xdr_buf *payload, unsigned long *cnt, u32 *stable_how,
> > > __be32 *verf)
> > > {
> > > struct nfsd_file *nf;
> > > @@ -1457,7 +1458,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
> > > goto out;
> > >
> > > err = nfsd_vfs_write(rqstp, fhp, nf, offset, payload, cnt,
> > > - stable, verf);
> > > + stable_how, verf);
> > > nfsd_file_put(nf);
> > > out:
> > > trace_nfsd_write_done(rqstp, fhp, offset, *cnt);
> > > diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> > > index fa46f8b5f132..c713ed0b04e0 100644
> > > --- a/fs/nfsd/vfs.h
> > > +++ b/fs/nfsd/vfs.h
> > > @@ -130,11 +130,13 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > u32 *eof);
> > > __be32 nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > loff_t offset, const struct xdr_buf *payload,
> > > - unsigned long *cnt, int stable, __be32 *verf);
> > > + unsigned long *cnt, u32 *stable_how,
> > > + __be32 *verf);
> > > __be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > struct nfsd_file *nf, loff_t offset,
> > > const struct xdr_buf *payload,
> > > - unsigned long *cnt, int stable, __be32 *verf);
> > > + unsigned long *cnt, u32 *stable_how,
> > > + __be32 *verf);
> > > __be32 nfsd_readlink(struct svc_rqst *, struct svc_fh *,
> > > char *, int *);
> > > __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *,
> > > diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
> > > index 522067b7fd75..c0e443ef3a6b 100644
> > > --- a/fs/nfsd/xdr3.h
> > > +++ b/fs/nfsd/xdr3.h
> > > @@ -152,7 +152,7 @@ struct nfsd3_writeres {
> > > __be32 status;
> > > struct svc_fh fh;
> > > unsigned long count;
> > > - int committed;
> > > + u32 committed;
> > > __be32 verf[2];
> > > };
> > >
> > > --
> > > 2.51.0
> > >
>
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-10-15 18:11 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 19:01 [PATCH v1] NFSD: Enable return of an updated stable_how to NFS clients Chuck Lever
2025-10-13 19:34 ` Jeff Layton
2025-10-14 5:24 ` NeilBrown
2025-10-15 16:53 ` Mike Snitzer
2025-10-15 18:00 ` Chuck Lever
2025-10-15 18:11 ` Jeff Layton [this message]
2025-10-15 18:16 ` Chuck Lever
2025-10-17 6:22 ` NeilBrown
2025-10-17 11:09 ` Jeff Layton
2025-10-17 23:15 ` NeilBrown
2025-10-17 13:43 ` Chuck Lever
2025-10-17 23:19 ` NeilBrown
2025-10-15 18:40 ` Mike Snitzer
2025-10-15 22:13 ` [PATCH v3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Mike Snitzer
2025-10-16 16:10 ` Jeff Layton
2025-10-17 14:13 ` Chuck Lever
2025-10-17 21:03 ` Mike Snitzer
2025-10-17 21:54 ` Chuck Lever
2025-10-17 22:49 ` Mike Snitzer
2025-10-17 23:23 ` NeilBrown
2025-10-17 4:08 ` [PATCH v1] NFSD: Enable return of an updated stable_how to NFS clients Christoph Hellwig
2025-10-17 13:27 ` Chuck Lever
2025-10-20 6:59 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=138caf9b98325952919d37119c1d2938a8f4f950.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=snitzer@kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).