From: Chuck Lever <cel@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH v1] NFSD: Enable return of an updated stable_how to NFS clients
Date: Wed, 15 Oct 2025 14:00:12 -0400 [thread overview]
Message-ID: <c95b46b6-5db4-4588-ac79-4f6d38df0ae2@kernel.org> (raw)
In-Reply-To: <aO_RmCNR8rg9EtlP@kernel.org>
On 10/15/25 12:53 PM, Mike Snitzer wrote:
> On Mon, Oct 13, 2025 at 03:01:13PM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> In a subsequent patch, nfsd_vfs_write() will promote an UNSTABLE
>> WRITE to be a FILE_SYNC WRITE. This indicates that the client does
>> not need a subsequent COMMIT operation, saving a round trip and
>> allowing the client to dispense with cached dirty data as soon as
>> it receives the server's WRITE response.
>>
>> This patch refactors nfsd_vfs_write() to return a possibly modified
>> stable_how value to its callers. No behavior change is expected.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/nfs3proc.c | 2 +-
>> fs/nfsd/nfs4proc.c | 2 +-
>> fs/nfsd/nfsproc.c | 3 ++-
>> fs/nfsd/vfs.c | 11 ++++++-----
>> fs/nfsd/vfs.h | 6 ++++--
>> fs/nfsd/xdr3.h | 2 +-
>> 6 files changed, 15 insertions(+), 11 deletions(-)
>>
>> Here's what I had in mind, based on a patch I did a year ago but
>> never posted.
>>
>> If nfsd_vfs_write() promotes an NFS_UNSTABLE write to NFS_FILE_SYNC,
>> it can now set *stable_how and the WRITE encoders will return the
>> updated value to the client.
>>
>>
>> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
>> index b6d03e1ef5f7..ad14b34583bb 100644
>> --- a/fs/nfsd/nfs3proc.c
>> +++ b/fs/nfsd/nfs3proc.c
>> @@ -236,7 +236,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp)
>> resp->committed = argp->stable;
>> resp->status = nfsd_write(rqstp, &resp->fh, argp->offset,
>> &argp->payload, &cnt,
>> - resp->committed, resp->verf);
>> + &resp->committed, resp->verf);
>> resp->count = cnt;
>> resp->status = nfsd3_map_status(resp->status);
>> return rpc_success;
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 7f7e6bb23a90..2222bb283baf 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -1285,7 +1285,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> write->wr_how_written = write->wr_stable_how;
>> status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf,
>> write->wr_offset, &write->wr_payload,
>> - &cnt, write->wr_how_written,
>> + &cnt, &write->wr_how_written,
>> (__be32 *)write->wr_verifier.data);
>> nfsd_file_put(nf);
>>
>> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
>> index 8f71f5748c75..706401ed6f8d 100644
>> --- a/fs/nfsd/nfsproc.c
>> +++ b/fs/nfsd/nfsproc.c
>> @@ -251,6 +251,7 @@ nfsd_proc_write(struct svc_rqst *rqstp)
>> struct nfsd_writeargs *argp = rqstp->rq_argp;
>> struct nfsd_attrstat *resp = rqstp->rq_resp;
>> unsigned long cnt = argp->len;
>> + u32 committed = NFS_DATA_SYNC;
>>
>> dprintk("nfsd: WRITE %s %u bytes at %d\n",
>> SVCFH_fmt(&argp->fh),
>> @@ -258,7 +259,7 @@ nfsd_proc_write(struct svc_rqst *rqstp)
>>
>> fh_copy(&resp->fh, &argp->fh);
>> resp->status = nfsd_write(rqstp, &resp->fh, argp->offset,
>> - &argp->payload, &cnt, NFS_DATA_SYNC, NULL);
>> + &argp->payload, &cnt, &committed, NULL);
>> if (resp->status == nfs_ok)
>> resp->status = fh_getattr(&resp->fh, &resp->stat);
>> else if (resp->status == nfserr_jukebox)
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index f537a7b4ee01..8b2dc7a88aab 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1262,7 +1262,7 @@ static int wait_for_concurrent_writes(struct file *file)
>> * @offset: Byte offset of start
>> * @payload: xdr_buf containing the write payload
>> * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
>> - * @stable: An NFS stable_how value
>> + * @stable_how: IN: Client's requested stable_how, OUT: Actual stable_how
>> * @verf: NFS WRITE verifier
>> *
>> * Upon return, caller must invoke fh_put on @fhp.
>> @@ -1274,11 +1274,12 @@ __be32
>> nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> struct nfsd_file *nf, loff_t offset,
>> const struct xdr_buf *payload, unsigned long *cnt,
>> - int stable, __be32 *verf)
>> + u32 *stable_how, __be32 *verf)
>> {
>> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
>> struct file *file = nf->nf_file;
>> struct super_block *sb = file_inode(file)->i_sb;
>> + u32 stable = *stable_how;
>> struct kiocb kiocb;
>> struct svc_export *exp;
>> struct iov_iter iter;
>
> Seems we need this instead here?
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 5e5e5187d2e5..d0c89f8fdb57 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1479,7 +1479,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> struct file *file = nf->nf_file;
> struct super_block *sb = file_inode(file)->i_sb;
> - u32 stable = *stable_how;
> + u32 stable;
> struct kiocb kiocb;
> struct svc_export *exp;
> errseq_t since;
> @@ -1511,7 +1511,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> exp = fhp->fh_export;
>
> if (!EX_ISSYNC(exp))
> - stable = NFS_UNSTABLE;
> + *stable_how = NFS_UNSTABLE;
> + stable = *stable_how;
> init_sync_kiocb(&kiocb, file);
> kiocb.ki_pos = offset;
> if (stable && !fhp->fh_use_wgather)
>
> Otherwise, isn't there potential for nfsd_vfs_write's NFS_UNSTABLE
> override to cause a client set stable_how, that was set to something
> other than NFS_UNSTABLE, to silently _not_ be respected by NFSD? But
> client could assume COMMIT not needed? And does this then elevate this
> patch to be a stable@ fix? (no pun intended).
>
> If not, please help me understand why.
The protocol permits an NFS server to change the stable_how field in a
WRITE response as follows:
UNSTABLE -> DATA_SYNC
UNSTABLE -> FILE_SYNC
DATA_SYNC -> FILE_SYNC
It forbids the reverse direction. A client that asks for a FILE_SYNC
WRITE MUST get a FILE_SYNC reply.
What the EX_ISSYNC test is doing is looking for the "async" export
option. When that option is specified, internally NFSD converts all
WRITEs, including FILE_SYNC WRITEs, to UNSTABLE. It then complies with
the protocol by not telling the client about this invalid change and
defies the protocol by not ensuring FILE_SYNC WRITEs are persisted
before replying. See exports(5).
In these cases, each WRITE response still reflects what the client
requested, but it does not reflect what the server actually did.
We tell anyone who will listen (and even those who won't) never to use
the "async" export option because of the silent data corruption risk it
introduces. But it goes faster than using the fully protocol-compliant
"sync" export option, so people use it anyway.
> Thanks!
>
>> @@ -1434,7 +1435,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> * @offset: Byte offset of start
>> * @payload: xdr_buf containing the write payload
>> * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
>> - * @stable: An NFS stable_how value
>> + * @stable_how: IN: Client's requested stable_how, OUT: Actual stable_how
>> * @verf: NFS WRITE verifier
>> *
>> * Upon return, caller must invoke fh_put on @fhp.
>> @@ -1444,7 +1445,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> */
>> __be32
>> nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
>> - const struct xdr_buf *payload, unsigned long *cnt, int stable,
>> + const struct xdr_buf *payload, unsigned long *cnt, u32 *stable_how,
>> __be32 *verf)
>> {
>> struct nfsd_file *nf;
>> @@ -1457,7 +1458,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
>> goto out;
>>
>> err = nfsd_vfs_write(rqstp, fhp, nf, offset, payload, cnt,
>> - stable, verf);
>> + stable_how, verf);
>> nfsd_file_put(nf);
>> out:
>> trace_nfsd_write_done(rqstp, fhp, offset, *cnt);
>> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
>> index fa46f8b5f132..c713ed0b04e0 100644
>> --- a/fs/nfsd/vfs.h
>> +++ b/fs/nfsd/vfs.h
>> @@ -130,11 +130,13 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> u32 *eof);
>> __be32 nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> loff_t offset, const struct xdr_buf *payload,
>> - unsigned long *cnt, int stable, __be32 *verf);
>> + unsigned long *cnt, u32 *stable_how,
>> + __be32 *verf);
>> __be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> struct nfsd_file *nf, loff_t offset,
>> const struct xdr_buf *payload,
>> - unsigned long *cnt, int stable, __be32 *verf);
>> + unsigned long *cnt, u32 *stable_how,
>> + __be32 *verf);
>> __be32 nfsd_readlink(struct svc_rqst *, struct svc_fh *,
>> char *, int *);
>> __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *,
>> diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
>> index 522067b7fd75..c0e443ef3a6b 100644
>> --- a/fs/nfsd/xdr3.h
>> +++ b/fs/nfsd/xdr3.h
>> @@ -152,7 +152,7 @@ struct nfsd3_writeres {
>> __be32 status;
>> struct svc_fh fh;
>> unsigned long count;
>> - int committed;
>> + u32 committed;
>> __be32 verf[2];
>> };
>>
>> --
>> 2.51.0
>>
--
Chuck Lever
next prev parent reply other threads:[~2025-10-15 18:00 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 19:01 [PATCH v1] NFSD: Enable return of an updated stable_how to NFS clients Chuck Lever
2025-10-13 19:34 ` Jeff Layton
2025-10-14 5:24 ` NeilBrown
2025-10-15 16:53 ` Mike Snitzer
2025-10-15 18:00 ` Chuck Lever [this message]
2025-10-15 18:11 ` Jeff Layton
2025-10-15 18:16 ` Chuck Lever
2025-10-17 6:22 ` NeilBrown
2025-10-17 11:09 ` Jeff Layton
2025-10-17 23:15 ` NeilBrown
2025-10-17 13:43 ` Chuck Lever
2025-10-17 23:19 ` NeilBrown
2025-10-15 18:40 ` Mike Snitzer
2025-10-15 22:13 ` [PATCH v3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Mike Snitzer
2025-10-16 16:10 ` Jeff Layton
2025-10-17 14:13 ` Chuck Lever
2025-10-17 21:03 ` Mike Snitzer
2025-10-17 21:54 ` Chuck Lever
2025-10-17 22:49 ` Mike Snitzer
2025-10-17 23:23 ` NeilBrown
2025-10-17 4:08 ` [PATCH v1] NFSD: Enable return of an updated stable_how to NFS clients Christoph Hellwig
2025-10-17 13:27 ` Chuck Lever
2025-10-20 6:59 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c95b46b6-5db4-4588-ac79-4f6d38df0ae2@kernel.org \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dai.ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=snitzer@kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).