public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only
@ 2025-03-06 19:31 Dai Ngo
  2025-03-06 19:31 ` [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE Dai Ngo
  2025-03-07 14:13 ` [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only cel
  0 siblings, 2 replies; 11+ messages in thread
From: Dai Ngo @ 2025-03-06 19:31 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom; +Cc: linux-nfs, sagi

From RFC8881 does not explicitly state that server must grant write
delegation to OPEN with OPEN4_SHARE_ACCESS_WRITE only. However there
are text in the RFC that implies it is up to the server implementation
to offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE only.

Section 9.1.2:

  "In the case of READ, the server may perform the corresponding
   check on the access mode, or it may choose to allow READ for
   OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
   implementation may unavoidably do reads (e.g., due to buffer
   cache constraints)."

Also in section 10.4.1

  "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
   OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
   is in effect"

When a write delegation is granted for OPEN with OPEN4_SHARE_ACCESS_WRITE,
a new pair of nfsd_file and struct file are allocated with read access
and added to nfs4_file's fi_fds[]. This is to allow the client to use
the delegation stateid to do reads.

No additional actions is needed when the delegation is returned. The
nfsd_file for read remains attached to the nfs4_file and is freed when
the open stateid is closed.

The use of separate nfsd_file for read was suggested by Chuck.

I also tried the suggestion by Jeff which is to "acquire a R/W open from
the get-go instead when you intend to grant a delegation". To implement
this approach, in nfs4_get_vfs_file OPEN4_SHARE_ACCESS_READ was added to
op_share_access before computing the 'oflag' and 'access' flag. This
forces the rest of the code to process the OPEN as if it was sent with
access mode OPEN4_SHARE_ACCESS_WRITE|OPEN4_SHARE_ACCESS_READ.

This mostly works except a case where the file was created with 0222
permission and the user credential of OPEN is not the same as the owner
of the file. In this case the OPEN fails with NFS4ERR_ACCESS. This is
because nfsd_permission was called with (MAY_READ | MAY_WRITE) mask
and the file permission is 0222.

I don't see any simple fix for this nfsd_permission issue. Basically
the access mode has to be rdwr when creating the nfsd_file but the access
mode passed to nfsd_permission to check should be rdonly (the original
OPEN's op_share_access).

v4: reacted to Jeff's comment
v5: merged both patches into one to avoid potential bisect problem.
    fixed some nits from Jeff's comments
v6: Fix deny_mode issue by not setting the read access in st_access_bmap
    when adding read access to write delegation.

 fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++++------------------
 1 file changed, 47 insertions(+), 28 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-03-06 19:31 [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only Dai Ngo
@ 2025-03-06 19:31 ` Dai Ngo
  2025-05-09 19:32   ` Jeff Layton
  2025-03-07 14:13 ` [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only cel
  1 sibling, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-03-06 19:31 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom; +Cc: linux-nfs, sagi

RFC8881, section 9.1.2 says:

  "In the case of READ, the server may perform the corresponding
   check on the access mode, or it may choose to allow READ for
   OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
   implementation may unavoidably do reads (e.g., due to buffer cache
   constraints)."

and in section 10.4.1:
   "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
   OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
   is in effect"

This patch allow READ using write delegation stateid granted on OPENs
with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
implementation may unavoidably do (e.g., due to buffer cache
constraints).

For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
a new nfsd_file and a struct file are allocated to use for reads.
The nfsd_file is freed when the file is closed by release_all_access.

Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 28 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 0f97f2c62b3a..295415fda985 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
 	return ret;
 }
 
-static struct nfsd_file *
-find_rw_file(struct nfs4_file *f)
-{
-	struct nfsd_file *ret;
-
-	spin_lock(&f->fi_lock);
-	ret = nfsd_file_get(f->fi_fds[O_RDWR]);
-	spin_unlock(&f->fi_lock);
-
-	return ret;
-}
-
 struct nfsd_file *
 find_any_file(struct nfs4_file *f)
 {
@@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
 	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
 	 *   on its own, all opens."
 	 *
-	 * Furthermore the client can use a write delegation for most READ
-	 * operations as well, so we require a O_RDWR file here.
+	 * Furthermore, section 9.1.2 says:
+	 *
+	 *  "In the case of READ, the server may perform the corresponding
+	 *  check on the access mode, or it may choose to allow READ for
+	 *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
+	 *  implementation may unavoidably do reads (e.g., due to buffer
+	 *  cache constraints)."
 	 *
-	 * Offer a write delegation in the case of a BOTH open, and ensure
-	 * we get the O_RDWR descriptor.
+	 *  We choose to offer a write delegation for OPEN with the
+	 *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such clients.
 	 */
-	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
-		nf = find_rw_file(fp);
+	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
+		nf = find_writeable_file(fp);
 		dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : OPEN_DELEGATE_WRITE;
 	}
 
@@ -6116,7 +6109,7 @@ static bool
 nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
 		     struct kstat *stat)
 {
-	struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
+	struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
 	struct path path;
 	int rc;
 
@@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
 	return rc == 0;
 }
 
+/*
+ * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
+ * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
+ * struct file to be used for read with delegation stateid.
+ *
+ */
+static bool
+nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
+			      struct svc_fh *fh, struct nfs4_ol_stateid *stp)
+{
+	struct nfs4_file *fp;
+	struct nfsd_file *nf = NULL;
+
+	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
+			NFS4_SHARE_ACCESS_WRITE) {
+		if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, NULL, &nf))
+			return (false);
+		fp = stp->st_stid.sc_file;
+		spin_lock(&fp->fi_lock);
+		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
+		fp = stp->st_stid.sc_file;
+		fp->fi_fds[O_RDONLY] = nf;
+		spin_unlock(&fp->fi_lock);
+	}
+	return true;
+}
+
 /*
  * The Linux NFS server does not offer write delegations to NFSv4.0
  * clients in order to avoid conflicts between write delegations and
@@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
  * open or lock state.
  */
 static void
-nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
-		     struct svc_fh *currentfh)
+nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
+		     struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
+		     struct svc_fh *fh)
 {
 	bool deleg_ts = open->op_deleg_want & OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
 	struct nfs4_openowner *oo = openowner(stp->st_stateowner);
@@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
 	memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
 
 	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
-		if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
+		if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
+				!nfs4_delegation_stat(dp, currentfh, &stat)) {
 			nfs4_put_stid(&dp->dl_stid);
 			destroy_delegation(dp);
 			goto out_no_deleg;
@@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
 	* Attempt to hand out a delegation. No error return, because the
 	* OPEN succeeds even if we fail.
 	*/
-	nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
+	nfs4_open_delegation(rqstp, open, stp,
+		&resp->cstate.current_fh, current_fh);
 
 	/*
 	 * If there is an existing open stateid, it must be updated and
@@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		return_revoked = true;
 	if (typemask & SC_TYPE_DELEG)
 		/* Always allow REVOKED for DELEG so we can
-		 * retturn the appropriate error.
+		 * return the appropriate error.
 		 */
 		statusmask |= SC_STATUS_REVOKED;
 
@@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
 
 	switch (s->sc_type) {
 	case SC_TYPE_DELEG:
-		spin_lock(&s->sc_file->fi_lock);
-		ret = nfsd_file_get(s->sc_file->fi_deleg_file);
-		spin_unlock(&s->sc_file->fi_lock);
-		break;
 	case SC_TYPE_OPEN:
 	case SC_TYPE_LOCK:
 		if (flags & RD_STATE)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only
  2025-03-06 19:31 [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only Dai Ngo
  2025-03-06 19:31 ` [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE Dai Ngo
@ 2025-03-07 14:13 ` cel
  1 sibling, 0 replies; 11+ messages in thread
From: cel @ 2025-03-07 14:13 UTC (permalink / raw)
  To: jlayton, neilb, okorniev, tom, Dai Ngo; +Cc: Chuck Lever, linux-nfs, sagi

From: Chuck Lever <chuck.lever@oracle.com>

On Thu, 06 Mar 2025 11:31:32 -0800, Dai Ngo wrote:
> >From RFC8881 does not explicitly state that server must grant write
> delegation to OPEN with OPEN4_SHARE_ACCESS_WRITE only. However there
> are text in the RFC that implies it is up to the server implementation
> to offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE only.
> 
> Section 9.1.2:
> 
> [...]

Applied to nfsd-testing, thanks!

V5 reverted, v6 applied.

[1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
      commit: ae73ee9fefa9e1d6c0e274c74d2ccf9d5e6ed2e5

--
Chuck Lever


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-03-06 19:31 ` [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE Dai Ngo
@ 2025-05-09 19:32   ` Jeff Layton
  2025-05-09 21:48     ` Chuck Lever
  2025-05-10 19:23     ` Dai Ngo
  0 siblings, 2 replies; 11+ messages in thread
From: Jeff Layton @ 2025-05-09 19:32 UTC (permalink / raw)
  To: Dai Ngo, chuck.lever, neilb, okorniev, tom; +Cc: linux-nfs, sagi

On Thu, 2025-03-06 at 11:31 -0800, Dai Ngo wrote:
> RFC8881, section 9.1.2 says:
> 
>   "In the case of READ, the server may perform the corresponding
>    check on the access mode, or it may choose to allow READ for
>    OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>    implementation may unavoidably do reads (e.g., due to buffer cache
>    constraints)."
> 
> and in section 10.4.1:
>    "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
>    OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
>    is in effect"
> 
> This patch allow READ using write delegation stateid granted on OPENs
> with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
> implementation may unavoidably do (e.g., due to buffer cache
> constraints).
> 
> For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
> a new nfsd_file and a struct file are allocated to use for reads.
> The nfsd_file is freed when the file is closed by release_all_access.
> 
> Suggested-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++-----------------
>  1 file changed, 47 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 0f97f2c62b3a..295415fda985 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
>  	return ret;
>  }
>  
> -static struct nfsd_file *
> -find_rw_file(struct nfs4_file *f)
> -{
> -	struct nfsd_file *ret;
> -
> -	spin_lock(&f->fi_lock);
> -	ret = nfsd_file_get(f->fi_fds[O_RDWR]);
> -	spin_unlock(&f->fi_lock);
> -
> -	return ret;
> -}
> -
>  struct nfsd_file *
>  find_any_file(struct nfs4_file *f)
>  {
> @@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>  	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
>  	 *   on its own, all opens."
>  	 *
> -	 * Furthermore the client can use a write delegation for most READ
> -	 * operations as well, so we require a O_RDWR file here.
> +	 * Furthermore, section 9.1.2 says:
> +	 *
> +	 *  "In the case of READ, the server may perform the corresponding
> +	 *  check on the access mode, or it may choose to allow READ for
> +	 *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
> +	 *  implementation may unavoidably do reads (e.g., due to buffer
> +	 *  cache constraints)."
>  	 *
> -	 * Offer a write delegation in the case of a BOTH open, and ensure
> -	 * we get the O_RDWR descriptor.
> +	 *  We choose to offer a write delegation for OPEN with the
> +	 *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such clients.
>  	 */
> -	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
> -		nf = find_rw_file(fp);
> +	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
> +		nf = find_writeable_file(fp);
>  		dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : OPEN_DELEGATE_WRITE;
>  	}
>  
> @@ -6116,7 +6109,7 @@ static bool
>  nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>  		     struct kstat *stat)
>  {
> -	struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
> +	struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
>  	struct path path;
>  	int rc;
>  
> @@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>  	return rc == 0;
>  }
>  
> +/*
> + * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
> + * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
> + * struct file to be used for read with delegation stateid.
> + *
> + */
> +static bool
> +nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
> +			      struct svc_fh *fh, struct nfs4_ol_stateid *stp)
> +{
> +	struct nfs4_file *fp;
> +	struct nfsd_file *nf = NULL;
> +
> +	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
> +			NFS4_SHARE_ACCESS_WRITE) {
> +		if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, NULL, &nf))
> +			return (false);
> +		fp = stp->st_stid.sc_file;
> +		spin_lock(&fp->fi_lock);
> +		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
> +		fp = stp->st_stid.sc_file;
> +		fp->fi_fds[O_RDONLY] = nf;
> +		spin_unlock(&fp->fi_lock);
> +	}
> +	return true;
> +}
> +
>  /*
>   * The Linux NFS server does not offer write delegations to NFSv4.0
>   * clients in order to avoid conflicts between write delegations and
> @@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>   * open or lock state.
>   */
>  static void
> -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> -		     struct svc_fh *currentfh)
> +nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
> +		     struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
> +		     struct svc_fh *fh)
>  {
>  	bool deleg_ts = open->op_deleg_want & OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
>  	struct nfs4_openowner *oo = openowner(stp->st_stateowner);
> @@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>  	memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
>  
>  	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
> -		if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
> +		if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
> +				!nfs4_delegation_stat(dp, currentfh, &stat)) {
>  			nfs4_put_stid(&dp->dl_stid);
>  			destroy_delegation(dp);
>  			goto out_no_deleg;
> @@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>  	* Attempt to hand out a delegation. No error return, because the
>  	* OPEN succeeds even if we fail.
>  	*/
> -	nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
> +	nfs4_open_delegation(rqstp, open, stp,
> +		&resp->cstate.current_fh, current_fh);
>  
>  	/*
>  	 * If there is an existing open stateid, it must be updated and
> @@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
>  		return_revoked = true;
>  	if (typemask & SC_TYPE_DELEG)
>  		/* Always allow REVOKED for DELEG so we can
> -		 * retturn the appropriate error.
> +		 * return the appropriate error.
>  		 */
>  		statusmask |= SC_STATUS_REVOKED;
>  
> @@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>  
>  	switch (s->sc_type) {
>  	case SC_TYPE_DELEG:
> -		spin_lock(&s->sc_file->fi_lock);
> -		ret = nfsd_file_get(s->sc_file->fi_deleg_file);
> -		spin_unlock(&s->sc_file->fi_lock);
> -		break;
>  	case SC_TYPE_OPEN:
>  	case SC_TYPE_LOCK:
>  		if (flags & RD_STATE)

I'm seeing a nfsd_file leak in chuck's nfsd-next tree and a bisect
landed on this patch. The reproducer is:

1/ set up an exported fs on a server (I used xfs, but it probably
doesn't matter).

2/ mount up the export on a client using v4.2

3/ Run this fio script in the directory:

----------------8<---------------------
[global]
name=fio-seq-write
filename=fio-seq-write
rw=write
bs=1M
direct=0
numjobs=1
time_based
runtime=60

[file1]
size=50G
ioengine=io_uring
iodepth=16
----------------8<---------------------

When it completes, shut down the nfs server. You'll see warnings like
this:

    BUG nfsd_file (Tainted: G    B   W          ): Objects remaining on __kmem_cache_shutdown()

Dai, can you take a look?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-05-09 19:32   ` Jeff Layton
@ 2025-05-09 21:48     ` Chuck Lever
  2025-05-10  0:10       ` Chuck Lever
  2025-05-10 19:23     ` Dai Ngo
  1 sibling, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2025-05-09 21:48 UTC (permalink / raw)
  To: Jeff Layton, Dai Ngo, neilb, okorniev, tom; +Cc: linux-nfs, sagi

On 5/9/25 3:32 PM, Jeff Layton wrote:
> On Thu, 2025-03-06 at 11:31 -0800, Dai Ngo wrote:
>> RFC8881, section 9.1.2 says:
>>
>>   "In the case of READ, the server may perform the corresponding
>>    check on the access mode, or it may choose to allow READ for
>>    OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>    implementation may unavoidably do reads (e.g., due to buffer cache
>>    constraints)."
>>
>> and in section 10.4.1:
>>    "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
>>    OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
>>    is in effect"
>>
>> This patch allow READ using write delegation stateid granted on OPENs
>> with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
>> implementation may unavoidably do (e.g., due to buffer cache
>> constraints).
>>
>> For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
>> a new nfsd_file and a struct file are allocated to use for reads.
>> The nfsd_file is freed when the file is closed by release_all_access.
>>
>> Suggested-by: Chuck Lever <chuck.lever@oracle.com>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>> ---
>>  fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++-----------------
>>  1 file changed, 47 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 0f97f2c62b3a..295415fda985 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
>>  	return ret;
>>  }
>>  
>> -static struct nfsd_file *
>> -find_rw_file(struct nfs4_file *f)
>> -{
>> -	struct nfsd_file *ret;
>> -
>> -	spin_lock(&f->fi_lock);
>> -	ret = nfsd_file_get(f->fi_fds[O_RDWR]);
>> -	spin_unlock(&f->fi_lock);
>> -
>> -	return ret;
>> -}
>> -
>>  struct nfsd_file *
>>  find_any_file(struct nfs4_file *f)
>>  {
>> @@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>  	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
>>  	 *   on its own, all opens."
>>  	 *
>> -	 * Furthermore the client can use a write delegation for most READ
>> -	 * operations as well, so we require a O_RDWR file here.
>> +	 * Furthermore, section 9.1.2 says:
>> +	 *
>> +	 *  "In the case of READ, the server may perform the corresponding
>> +	 *  check on the access mode, or it may choose to allow READ for
>> +	 *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>> +	 *  implementation may unavoidably do reads (e.g., due to buffer
>> +	 *  cache constraints)."
>>  	 *
>> -	 * Offer a write delegation in the case of a BOTH open, and ensure
>> -	 * we get the O_RDWR descriptor.
>> +	 *  We choose to offer a write delegation for OPEN with the
>> +	 *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such clients.
>>  	 */
>> -	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
>> -		nf = find_rw_file(fp);
>> +	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>> +		nf = find_writeable_file(fp);
>>  		dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : OPEN_DELEGATE_WRITE;
>>  	}
>>  
>> @@ -6116,7 +6109,7 @@ static bool
>>  nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>  		     struct kstat *stat)
>>  {
>> -	struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
>> +	struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
>>  	struct path path;
>>  	int rc;
>>  
>> @@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>  	return rc == 0;
>>  }
>>  
>> +/*
>> + * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
>> + * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
>> + * struct file to be used for read with delegation stateid.
>> + *
>> + */
>> +static bool
>> +nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>> +			      struct svc_fh *fh, struct nfs4_ol_stateid *stp)
>> +{
>> +	struct nfs4_file *fp;
>> +	struct nfsd_file *nf = NULL;
>> +
>> +	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
>> +			NFS4_SHARE_ACCESS_WRITE) {
>> +		if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, NULL, &nf))
>> +			return (false);
>> +		fp = stp->st_stid.sc_file;
>> +		spin_lock(&fp->fi_lock);
>> +		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>> +		fp = stp->st_stid.sc_file;
>> +		fp->fi_fds[O_RDONLY] = nf;
>> +		spin_unlock(&fp->fi_lock);
>> +	}
>> +	return true;
>> +}
>> +
>>  /*
>>   * The Linux NFS server does not offer write delegations to NFSv4.0
>>   * clients in order to avoid conflicts between write delegations and
>> @@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>   * open or lock state.
>>   */
>>  static void
>> -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>> -		     struct svc_fh *currentfh)
>> +nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
>> +		     struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
>> +		     struct svc_fh *fh)
>>  {
>>  	bool deleg_ts = open->op_deleg_want & OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
>>  	struct nfs4_openowner *oo = openowner(stp->st_stateowner);
>> @@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>  	memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
>>  
>>  	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>> -		if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
>> +		if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
>> +				!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>  			nfs4_put_stid(&dp->dl_stid);
>>  			destroy_delegation(dp);
>>  			goto out_no_deleg;
>> @@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>>  	* Attempt to hand out a delegation. No error return, because the
>>  	* OPEN succeeds even if we fail.
>>  	*/
>> -	nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
>> +	nfs4_open_delegation(rqstp, open, stp,
>> +		&resp->cstate.current_fh, current_fh);
>>  
>>  	/*
>>  	 * If there is an existing open stateid, it must be updated and
>> @@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
>>  		return_revoked = true;
>>  	if (typemask & SC_TYPE_DELEG)
>>  		/* Always allow REVOKED for DELEG so we can
>> -		 * retturn the appropriate error.
>> +		 * return the appropriate error.
>>  		 */
>>  		statusmask |= SC_STATUS_REVOKED;
>>  
>> @@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>>  
>>  	switch (s->sc_type) {
>>  	case SC_TYPE_DELEG:
>> -		spin_lock(&s->sc_file->fi_lock);
>> -		ret = nfsd_file_get(s->sc_file->fi_deleg_file);
>> -		spin_unlock(&s->sc_file->fi_lock);
>> -		break;
>>  	case SC_TYPE_OPEN:
>>  	case SC_TYPE_LOCK:
>>  		if (flags & RD_STATE)
> 
> I'm seeing a nfsd_file leak in chuck's nfsd-next tree and a bisect
> landed on this patch. The reproducer is:
> 
> 1/ set up an exported fs on a server (I used xfs, but it probably
> doesn't matter).
> 
> 2/ mount up the export on a client using v4.2
> 
> 3/ Run this fio script in the directory:
> 
> ----------------8<---------------------
> [global]
> name=fio-seq-write
> filename=fio-seq-write
> rw=write
> bs=1M
> direct=0
> numjobs=1
> time_based
> runtime=60
> 
> [file1]
> size=50G
> ioengine=io_uring
> iodepth=16
> ----------------8<---------------------
> 
> When it completes, shut down the nfs server. You'll see warnings like
> this:
> 
>     BUG nfsd_file (Tainted: G    B   W          ): Objects remaining on __kmem_cache_shutdown()
> 
> Dai, can you take a look?

After any substantial NFSv4 workload on my nfsd-testing server followed
by a clean unmount from the client, /proc/fs/nfsd/filecache still has
junk in it:

total inodes:  281105
hash buckets:  524288
lru entries:   0
cache hits:    307264
acquisitions:  2092346
allocations:   1785084
releases:      1503979
evictions:     0
mean age (ms): 400

Looks like a leak to me.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-05-09 21:48     ` Chuck Lever
@ 2025-05-10  0:10       ` Chuck Lever
  2025-05-11 21:59         ` Sagi Grimberg
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2025-05-10  0:10 UTC (permalink / raw)
  To: Jeff Layton, Dai Ngo, neilb, okorniev, tom; +Cc: linux-nfs, sagi

On 5/9/25 5:48 PM, Chuck Lever wrote:
> On 5/9/25 3:32 PM, Jeff Layton wrote:
>> On Thu, 2025-03-06 at 11:31 -0800, Dai Ngo wrote:
>>> RFC8881, section 9.1.2 says:
>>>
>>>   "In the case of READ, the server may perform the corresponding
>>>    check on the access mode, or it may choose to allow READ for
>>>    OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>>    implementation may unavoidably do reads (e.g., due to buffer cache
>>>    constraints)."
>>>
>>> and in section 10.4.1:
>>>    "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
>>>    OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
>>>    is in effect"
>>>
>>> This patch allow READ using write delegation stateid granted on OPENs
>>> with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
>>> implementation may unavoidably do (e.g., due to buffer cache
>>> constraints).
>>>
>>> For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
>>> a new nfsd_file and a struct file are allocated to use for reads.
>>> The nfsd_file is freed when the file is closed by release_all_access.
>>>
>>> Suggested-by: Chuck Lever <chuck.lever@oracle.com>
>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>  fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++-----------------
>>>  1 file changed, 47 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index 0f97f2c62b3a..295415fda985 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
>>>  	return ret;
>>>  }
>>>  
>>> -static struct nfsd_file *
>>> -find_rw_file(struct nfs4_file *f)
>>> -{
>>> -	struct nfsd_file *ret;
>>> -
>>> -	spin_lock(&f->fi_lock);
>>> -	ret = nfsd_file_get(f->fi_fds[O_RDWR]);
>>> -	spin_unlock(&f->fi_lock);
>>> -
>>> -	return ret;
>>> -}
>>> -
>>>  struct nfsd_file *
>>>  find_any_file(struct nfs4_file *f)
>>>  {
>>> @@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>>  	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
>>>  	 *   on its own, all opens."
>>>  	 *
>>> -	 * Furthermore the client can use a write delegation for most READ
>>> -	 * operations as well, so we require a O_RDWR file here.
>>> +	 * Furthermore, section 9.1.2 says:
>>> +	 *
>>> +	 *  "In the case of READ, the server may perform the corresponding
>>> +	 *  check on the access mode, or it may choose to allow READ for
>>> +	 *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>> +	 *  implementation may unavoidably do reads (e.g., due to buffer
>>> +	 *  cache constraints)."
>>>  	 *
>>> -	 * Offer a write delegation in the case of a BOTH open, and ensure
>>> -	 * we get the O_RDWR descriptor.
>>> +	 *  We choose to offer a write delegation for OPEN with the
>>> +	 *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such clients.
>>>  	 */
>>> -	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
>>> -		nf = find_rw_file(fp);
>>> +	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>> +		nf = find_writeable_file(fp);
>>>  		dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : OPEN_DELEGATE_WRITE;
>>>  	}
>>>  
>>> @@ -6116,7 +6109,7 @@ static bool
>>>  nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>>  		     struct kstat *stat)
>>>  {
>>> -	struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
>>> +	struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
>>>  	struct path path;
>>>  	int rc;
>>>  
>>> @@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>>  	return rc == 0;
>>>  }
>>>  
>>> +/*
>>> + * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
>>> + * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
>>> + * struct file to be used for read with delegation stateid.
>>> + *
>>> + */
>>> +static bool
>>> +nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>>> +			      struct svc_fh *fh, struct nfs4_ol_stateid *stp)
>>> +{
>>> +	struct nfs4_file *fp;
>>> +	struct nfsd_file *nf = NULL;
>>> +
>>> +	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
>>> +			NFS4_SHARE_ACCESS_WRITE) {
>>> +		if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, NULL, &nf))
>>> +			return (false);
>>> +		fp = stp->st_stid.sc_file;
>>> +		spin_lock(&fp->fi_lock);
>>> +		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>>> +		fp = stp->st_stid.sc_file;
>>> +		fp->fi_fds[O_RDONLY] = nf;
>>> +		spin_unlock(&fp->fi_lock);
>>> +	}
>>> +	return true;
>>> +}
>>> +
>>>  /*
>>>   * The Linux NFS server does not offer write delegations to NFSv4.0
>>>   * clients in order to avoid conflicts between write delegations and
>>> @@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>>   * open or lock state.
>>>   */
>>>  static void
>>> -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>> -		     struct svc_fh *currentfh)
>>> +nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
>>> +		     struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
>>> +		     struct svc_fh *fh)
>>>  {
>>>  	bool deleg_ts = open->op_deleg_want & OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
>>>  	struct nfs4_openowner *oo = openowner(stp->st_stateowner);
>>> @@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>>  	memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
>>>  
>>>  	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>> -		if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>> +		if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
>>> +				!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>>  			nfs4_put_stid(&dp->dl_stid);
>>>  			destroy_delegation(dp);
>>>  			goto out_no_deleg;
>>> @@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>>>  	* Attempt to hand out a delegation. No error return, because the
>>>  	* OPEN succeeds even if we fail.
>>>  	*/
>>> -	nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
>>> +	nfs4_open_delegation(rqstp, open, stp,
>>> +		&resp->cstate.current_fh, current_fh);
>>>  
>>>  	/*
>>>  	 * If there is an existing open stateid, it must be updated and
>>> @@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
>>>  		return_revoked = true;
>>>  	if (typemask & SC_TYPE_DELEG)
>>>  		/* Always allow REVOKED for DELEG so we can
>>> -		 * retturn the appropriate error.
>>> +		 * return the appropriate error.
>>>  		 */
>>>  		statusmask |= SC_STATUS_REVOKED;
>>>  
>>> @@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>>>  
>>>  	switch (s->sc_type) {
>>>  	case SC_TYPE_DELEG:
>>> -		spin_lock(&s->sc_file->fi_lock);
>>> -		ret = nfsd_file_get(s->sc_file->fi_deleg_file);
>>> -		spin_unlock(&s->sc_file->fi_lock);
>>> -		break;
>>>  	case SC_TYPE_OPEN:
>>>  	case SC_TYPE_LOCK:
>>>  		if (flags & RD_STATE)
>>
>> I'm seeing a nfsd_file leak in chuck's nfsd-next tree and a bisect
>> landed on this patch. The reproducer is:
>>
>> 1/ set up an exported fs on a server (I used xfs, but it probably
>> doesn't matter).
>>
>> 2/ mount up the export on a client using v4.2
>>
>> 3/ Run this fio script in the directory:
>>
>> ----------------8<---------------------
>> [global]
>> name=fio-seq-write
>> filename=fio-seq-write
>> rw=write
>> bs=1M
>> direct=0
>> numjobs=1
>> time_based
>> runtime=60
>>
>> [file1]
>> size=50G
>> ioengine=io_uring
>> iodepth=16
>> ----------------8<---------------------
>>
>> When it completes, shut down the nfs server. You'll see warnings like
>> this:
>>
>>     BUG nfsd_file (Tainted: G    B   W          ): Objects remaining on __kmem_cache_shutdown()
>>
>> Dai, can you take a look?
> 
> After any substantial NFSv4 workload on my nfsd-testing server followed
> by a clean unmount from the client, /proc/fs/nfsd/filecache still has
> junk in it:
> 
> total inodes:  281105
> hash buckets:  524288
> lru entries:   0
> cache hits:    307264
> acquisitions:  2092346
> allocations:   1785084
> releases:      1503979
> evictions:     0
> mean age (ms): 400
> 
> Looks like a leak to me.
> 
> 

An additional symptom: At shutdown I see the unmount of exports fail
with EBUSY. I don't see the crash that Jeff reports above, but I'm
guessing he's got some extra slub debugging enabled.

I've dropped "NFSD: Offer write delegation for OPEN with
OPEN4_SHARE_ACCESS_WRITE" from nfsd-next for the moment.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-05-09 19:32   ` Jeff Layton
  2025-05-09 21:48     ` Chuck Lever
@ 2025-05-10 19:23     ` Dai Ngo
  2025-05-11 21:13       ` Dai Ngo
  1 sibling, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-05-10 19:23 UTC (permalink / raw)
  To: Jeff Layton, chuck.lever, neilb, okorniev, tom; +Cc: linux-nfs, sagi


On 5/9/25 12:32 PM, Jeff Layton wrote:
> On Thu, 2025-03-06 at 11:31 -0800, Dai Ngo wrote:
>> RFC8881, section 9.1.2 says:
>>
>>    "In the case of READ, the server may perform the corresponding
>>     check on the access mode, or it may choose to allow READ for
>>     OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>     implementation may unavoidably do reads (e.g., due to buffer cache
>>     constraints)."
>>
>> and in section 10.4.1:
>>     "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
>>     OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
>>     is in effect"
>>
>> This patch allow READ using write delegation stateid granted on OPENs
>> with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
>> implementation may unavoidably do (e.g., due to buffer cache
>> constraints).
>>
>> For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
>> a new nfsd_file and a struct file are allocated to use for reads.
>> The nfsd_file is freed when the file is closed by release_all_access.
>>
>> Suggested-by: Chuck Lever <chuck.lever@oracle.com>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>> ---
>>   fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++-----------------
>>   1 file changed, 47 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 0f97f2c62b3a..295415fda985 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
>>   	return ret;
>>   }
>>   
>> -static struct nfsd_file *
>> -find_rw_file(struct nfs4_file *f)
>> -{
>> -	struct nfsd_file *ret;
>> -
>> -	spin_lock(&f->fi_lock);
>> -	ret = nfsd_file_get(f->fi_fds[O_RDWR]);
>> -	spin_unlock(&f->fi_lock);
>> -
>> -	return ret;
>> -}
>> -
>>   struct nfsd_file *
>>   find_any_file(struct nfs4_file *f)
>>   {
>> @@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>   	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
>>   	 *   on its own, all opens."
>>   	 *
>> -	 * Furthermore the client can use a write delegation for most READ
>> -	 * operations as well, so we require a O_RDWR file here.
>> +	 * Furthermore, section 9.1.2 says:
>> +	 *
>> +	 *  "In the case of READ, the server may perform the corresponding
>> +	 *  check on the access mode, or it may choose to allow READ for
>> +	 *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>> +	 *  implementation may unavoidably do reads (e.g., due to buffer
>> +	 *  cache constraints)."
>>   	 *
>> -	 * Offer a write delegation in the case of a BOTH open, and ensure
>> -	 * we get the O_RDWR descriptor.
>> +	 *  We choose to offer a write delegation for OPEN with the
>> +	 *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such clients.
>>   	 */
>> -	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
>> -		nf = find_rw_file(fp);
>> +	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>> +		nf = find_writeable_file(fp);
>>   		dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : OPEN_DELEGATE_WRITE;
>>   	}
>>   
>> @@ -6116,7 +6109,7 @@ static bool
>>   nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>   		     struct kstat *stat)
>>   {
>> -	struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
>> +	struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
>>   	struct path path;
>>   	int rc;
>>   
>> @@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>   	return rc == 0;
>>   }
>>   
>> +/*
>> + * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
>> + * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
>> + * struct file to be used for read with delegation stateid.
>> + *
>> + */
>> +static bool
>> +nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>> +			      struct svc_fh *fh, struct nfs4_ol_stateid *stp)
>> +{
>> +	struct nfs4_file *fp;
>> +	struct nfsd_file *nf = NULL;
>> +
>> +	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
>> +			NFS4_SHARE_ACCESS_WRITE) {
>> +		if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, NULL, &nf))
>> +			return (false);
>> +		fp = stp->st_stid.sc_file;
>> +		spin_lock(&fp->fi_lock);
>> +		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>> +		fp = stp->st_stid.sc_file;
>> +		fp->fi_fds[O_RDONLY] = nf;
>> +		spin_unlock(&fp->fi_lock);
>> +	}
>> +	return true;
>> +}
>> +
>>   /*
>>    * The Linux NFS server does not offer write delegations to NFSv4.0
>>    * clients in order to avoid conflicts between write delegations and
>> @@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>    * open or lock state.
>>    */
>>   static void
>> -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>> -		     struct svc_fh *currentfh)
>> +nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
>> +		     struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
>> +		     struct svc_fh *fh)
>>   {
>>   	bool deleg_ts = open->op_deleg_want & OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
>>   	struct nfs4_openowner *oo = openowner(stp->st_stateowner);
>> @@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>   	memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
>>   
>>   	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>> -		if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
>> +		if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
>> +				!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>   			nfs4_put_stid(&dp->dl_stid);
>>   			destroy_delegation(dp);
>>   			goto out_no_deleg;
>> @@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>>   	* Attempt to hand out a delegation. No error return, because the
>>   	* OPEN succeeds even if we fail.
>>   	*/
>> -	nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
>> +	nfs4_open_delegation(rqstp, open, stp,
>> +		&resp->cstate.current_fh, current_fh);
>>   
>>   	/*
>>   	 * If there is an existing open stateid, it must be updated and
>> @@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
>>   		return_revoked = true;
>>   	if (typemask & SC_TYPE_DELEG)
>>   		/* Always allow REVOKED for DELEG so we can
>> -		 * retturn the appropriate error.
>> +		 * return the appropriate error.
>>   		 */
>>   		statusmask |= SC_STATUS_REVOKED;
>>   
>> @@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>>   
>>   	switch (s->sc_type) {
>>   	case SC_TYPE_DELEG:
>> -		spin_lock(&s->sc_file->fi_lock);
>> -		ret = nfsd_file_get(s->sc_file->fi_deleg_file);
>> -		spin_unlock(&s->sc_file->fi_lock);
>> -		break;
>>   	case SC_TYPE_OPEN:
>>   	case SC_TYPE_LOCK:
>>   		if (flags & RD_STATE)
> I'm seeing a nfsd_file leak in chuck's nfsd-next tree and a bisect
> landed on this patch. The reproducer is:
>
> 1/ set up an exported fs on a server (I used xfs, but it probably
> doesn't matter).
>
> 2/ mount up the export on a client using v4.2
>
> 3/ Run this fio script in the directory:
>
> ----------------8<---------------------
> [global]
> name=fio-seq-write
> filename=fio-seq-write
> rw=write
> bs=1M
> direct=0
> numjobs=1
> time_based
> runtime=60
>
> [file1]
> size=50G
> ioengine=io_uring
> iodepth=16
> ----------------8<---------------------
>
> When it completes, shut down the nfs server. You'll see warnings like
> this:
>
>      BUG nfsd_file (Tainted: G    B   W          ): Objects remaining on __kmem_cache_shutdown()
>
> Dai, can you take a look?

Will do,

Thanks,

-Dai

>
> Thanks,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-05-10 19:23     ` Dai Ngo
@ 2025-05-11 21:13       ` Dai Ngo
  2025-05-11 23:46         ` Chuck Lever
  0 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-05-11 21:13 UTC (permalink / raw)
  To: Jeff Layton, chuck.lever, neilb, okorniev, tom; +Cc: linux-nfs, sagi


On 5/10/25 12:23 PM, Dai Ngo wrote:
>
> On 5/9/25 12:32 PM, Jeff Layton wrote:
>> On Thu, 2025-03-06 at 11:31 -0800, Dai Ngo wrote:
>>> RFC8881, section 9.1.2 says:
>>>
>>>    "In the case of READ, the server may perform the corresponding
>>>     check on the access mode, or it may choose to allow READ for
>>>     OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>>     implementation may unavoidably do reads (e.g., due to buffer cache
>>>     constraints)."
>>>
>>> and in section 10.4.1:
>>>     "Similarly, when closing a file opened for 
>>> OPEN4_SHARE_ACCESS_WRITE/
>>>     OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
>>>     is in effect"
>>>
>>> This patch allow READ using write delegation stateid granted on OPENs
>>> with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
>>> implementation may unavoidably do (e.g., due to buffer cache
>>> constraints).
>>>
>>> For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
>>> a new nfsd_file and a struct file are allocated to use for reads.
>>> The nfsd_file is freed when the file is closed by release_all_access.
>>>
>>> Suggested-by: Chuck Lever <chuck.lever@oracle.com>
>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> ---
>>>   fs/nfsd/nfs4state.c | 75 
>>> ++++++++++++++++++++++++++++-----------------
>>>   1 file changed, 47 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index 0f97f2c62b3a..295415fda985 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
>>>       return ret;
>>>   }
>>>   -static struct nfsd_file *
>>> -find_rw_file(struct nfs4_file *f)
>>> -{
>>> -    struct nfsd_file *ret;
>>> -
>>> -    spin_lock(&f->fi_lock);
>>> -    ret = nfsd_file_get(f->fi_fds[O_RDWR]);
>>> -    spin_unlock(&f->fi_lock);
>>> -
>>> -    return ret;
>>> -}
>>> -
>>>   struct nfsd_file *
>>>   find_any_file(struct nfs4_file *f)
>>>   {
>>> @@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, 
>>> struct nfs4_ol_stateid *stp,
>>>        *  "An OPEN_DELEGATE_WRITE delegation allows the client to 
>>> handle,
>>>        *   on its own, all opens."
>>>        *
>>> -     * Furthermore the client can use a write delegation for most READ
>>> -     * operations as well, so we require a O_RDWR file here.
>>> +     * Furthermore, section 9.1.2 says:
>>> +     *
>>> +     *  "In the case of READ, the server may perform the corresponding
>>> +     *  check on the access mode, or it may choose to allow READ for
>>> +     *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>> +     *  implementation may unavoidably do reads (e.g., due to buffer
>>> +     *  cache constraints)."
>>>        *
>>> -     * Offer a write delegation in the case of a BOTH open, and ensure
>>> -     * we get the O_RDWR descriptor.
>>> +     *  We choose to offer a write delegation for OPEN with the
>>> +     *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such 
>>> clients.
>>>        */
>>> -    if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == 
>>> NFS4_SHARE_ACCESS_BOTH) {
>>> -        nf = find_rw_file(fp);
>>> +    if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>> +        nf = find_writeable_file(fp);
>>>           dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : 
>>> OPEN_DELEGATE_WRITE;
>>>       }
>>>   @@ -6116,7 +6109,7 @@ static bool
>>>   nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh 
>>> *currentfh,
>>>                struct kstat *stat)
>>>   {
>>> -    struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
>>> +    struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
>>>       struct path path;
>>>       int rc;
>>>   @@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation 
>>> *dp, struct svc_fh *currentfh,
>>>       return rc == 0;
>>>   }
>>>   +/*
>>> + * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
>>> + * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
>>> + * struct file to be used for read with delegation stateid.
>>> + *
>>> + */
>>> +static bool
>>> +nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct 
>>> nfsd4_open *open,
>>> +                  struct svc_fh *fh, struct nfs4_ol_stateid *stp)
>>> +{
>>> +    struct nfs4_file *fp;
>>> +    struct nfsd_file *nf = NULL;
>>> +
>>> +    if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
>>> +            NFS4_SHARE_ACCESS_WRITE) {
>>> +        if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, 
>>> NULL, &nf))
>>> +            return (false);
>>> +        fp = stp->st_stid.sc_file;
>>> +        spin_lock(&fp->fi_lock);
>>> +        __nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>>> +        fp = stp->st_stid.sc_file;
>>> +        fp->fi_fds[O_RDONLY] = nf;
>>> +        spin_unlock(&fp->fi_lock);
>>> +    }
>>> +    return true;
>>> +}
>>> +
>>>   /*
>>>    * The Linux NFS server does not offer write delegations to NFSv4.0
>>>    * clients in order to avoid conflicts between write delegations and
>>> @@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation 
>>> *dp, struct svc_fh *currentfh,
>>>    * open or lock state.
>>>    */
>>>   static void
>>> -nfs4_open_delegation(struct nfsd4_open *open, struct 
>>> nfs4_ol_stateid *stp,
>>> -             struct svc_fh *currentfh)
>>> +nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
>>> +             struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
>>> +             struct svc_fh *fh)
>>>   {
>>>       bool deleg_ts = open->op_deleg_want & 
>>> OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
>>>       struct nfs4_openowner *oo = openowner(stp->st_stateowner);
>>> @@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, 
>>> struct nfs4_ol_stateid *stp,
>>>       memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, 
>>> sizeof(dp->dl_stid.sc_stateid));
>>>         if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>> -        if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>> +        if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
>>> +                !nfs4_delegation_stat(dp, currentfh, &stat)) {
>>>               nfs4_put_stid(&dp->dl_stid);
>>>               destroy_delegation(dp);
>>>               goto out_no_deleg;
>>> @@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, 
>>> struct svc_fh *current_fh, struct nf
>>>       * Attempt to hand out a delegation. No error return, because the
>>>       * OPEN succeeds even if we fail.
>>>       */
>>> -    nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
>>> +    nfs4_open_delegation(rqstp, open, stp,
>>> +        &resp->cstate.current_fh, current_fh);
>>>         /*
>>>        * If there is an existing open stateid, it must be updated and
>>> @@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct 
>>> nfsd4_compound_state *cstate,
>>>           return_revoked = true;
>>>       if (typemask & SC_TYPE_DELEG)
>>>           /* Always allow REVOKED for DELEG so we can
>>> -         * retturn the appropriate error.
>>> +         * return the appropriate error.
>>>            */
>>>           statusmask |= SC_STATUS_REVOKED;
>>>   @@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>>>         switch (s->sc_type) {
>>>       case SC_TYPE_DELEG:
>>> -        spin_lock(&s->sc_file->fi_lock);
>>> -        ret = nfsd_file_get(s->sc_file->fi_deleg_file);
>>> -        spin_unlock(&s->sc_file->fi_lock);
>>> -        break;
>>>       case SC_TYPE_OPEN:
>>>       case SC_TYPE_LOCK:
>>>           if (flags & RD_STATE)
>> I'm seeing a nfsd_file leak in chuck's nfsd-next tree and a bisect
>> landed on this patch. The reproducer is:
>>
>> 1/ set up an exported fs on a server (I used xfs, but it probably
>> doesn't matter).
>>
>> 2/ mount up the export on a client using v4.2
>>
>> 3/ Run this fio script in the directory:
>>
>> ----------------8<---------------------
>> [global]
>> name=fio-seq-write
>> filename=fio-seq-write
>> rw=write
>> bs=1M
>> direct=0
>> numjobs=1
>> time_based
>> runtime=60
>>
>> [file1]
>> size=50G
>> ioengine=io_uring
>> iodepth=16
>> ----------------8<---------------------
>>
>> When it completes, shut down the nfs server. You'll see warnings like
>> this:
>>
>>      BUG nfsd_file (Tainted: G    B   W          ): Objects remaining 
>> on __kmem_cache_shutdown()
>>
>> Dai, can you take a look?
>
> Will do,

I found the problem, the nfsd_file added for READ did not get freed when 
the delegation is returned.

Chuck, do you want me to resend the whole patch plus the fix or just the 
fix? I'd prefer just the fix to make it easy for review but it's up to you.

-Dai


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-05-10  0:10       ` Chuck Lever
@ 2025-05-11 21:59         ` Sagi Grimberg
  0 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2025-05-11 21:59 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton, Dai Ngo, neilb, okorniev, tom; +Cc: linux-nfs



On 10/05/2025 3:10, Chuck Lever wrote:
> On 5/9/25 5:48 PM, Chuck Lever wrote:
>> On 5/9/25 3:32 PM, Jeff Layton wrote:
>>> On Thu, 2025-03-06 at 11:31 -0800, Dai Ngo wrote:
>>>> RFC8881, section 9.1.2 says:
>>>>
>>>>    "In the case of READ, the server may perform the corresponding
>>>>     check on the access mode, or it may choose to allow READ for
>>>>     OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>>>     implementation may unavoidably do reads (e.g., due to buffer cache
>>>>     constraints)."
>>>>
>>>> and in section 10.4.1:
>>>>     "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
>>>>     OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
>>>>     is in effect"
>>>>
>>>> This patch allow READ using write delegation stateid granted on OPENs
>>>> with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
>>>> implementation may unavoidably do (e.g., due to buffer cache
>>>> constraints).
>>>>
>>>> For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
>>>> a new nfsd_file and a struct file are allocated to use for reads.
>>>> The nfsd_file is freed when the file is closed by release_all_access.
>>>>
>>>> Suggested-by: Chuck Lever <chuck.lever@oracle.com>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>>   fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++-----------------
>>>>   1 file changed, 47 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index 0f97f2c62b3a..295415fda985 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
>>>>   	return ret;
>>>>   }
>>>>   
>>>> -static struct nfsd_file *
>>>> -find_rw_file(struct nfs4_file *f)
>>>> -{
>>>> -	struct nfsd_file *ret;
>>>> -
>>>> -	spin_lock(&f->fi_lock);
>>>> -	ret = nfsd_file_get(f->fi_fds[O_RDWR]);
>>>> -	spin_unlock(&f->fi_lock);
>>>> -
>>>> -	return ret;
>>>> -}
>>>> -
>>>>   struct nfsd_file *
>>>>   find_any_file(struct nfs4_file *f)
>>>>   {
>>>> @@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>>>   	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
>>>>   	 *   on its own, all opens."
>>>>   	 *
>>>> -	 * Furthermore the client can use a write delegation for most READ
>>>> -	 * operations as well, so we require a O_RDWR file here.
>>>> +	 * Furthermore, section 9.1.2 says:
>>>> +	 *
>>>> +	 *  "In the case of READ, the server may perform the corresponding
>>>> +	 *  check on the access mode, or it may choose to allow READ for
>>>> +	 *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>>> +	 *  implementation may unavoidably do reads (e.g., due to buffer
>>>> +	 *  cache constraints)."
>>>>   	 *
>>>> -	 * Offer a write delegation in the case of a BOTH open, and ensure
>>>> -	 * we get the O_RDWR descriptor.
>>>> +	 *  We choose to offer a write delegation for OPEN with the
>>>> +	 *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such clients.
>>>>   	 */
>>>> -	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
>>>> -		nf = find_rw_file(fp);
>>>> +	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>>> +		nf = find_writeable_file(fp);
>>>>   		dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : OPEN_DELEGATE_WRITE;
>>>>   	}
>>>>   
>>>> @@ -6116,7 +6109,7 @@ static bool
>>>>   nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>>>   		     struct kstat *stat)
>>>>   {
>>>> -	struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
>>>> +	struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
>>>>   	struct path path;
>>>>   	int rc;
>>>>   
>>>> @@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>>>   	return rc == 0;
>>>>   }
>>>>   
>>>> +/*
>>>> + * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
>>>> + * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
>>>> + * struct file to be used for read with delegation stateid.
>>>> + *
>>>> + */
>>>> +static bool
>>>> +nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>>>> +			      struct svc_fh *fh, struct nfs4_ol_stateid *stp)
>>>> +{
>>>> +	struct nfs4_file *fp;
>>>> +	struct nfsd_file *nf = NULL;
>>>> +
>>>> +	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
>>>> +			NFS4_SHARE_ACCESS_WRITE) {
>>>> +		if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, NULL, &nf))
>>>> +			return (false);
>>>> +		fp = stp->st_stid.sc_file;
>>>> +		spin_lock(&fp->fi_lock);
>>>> +		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>>>> +		fp = stp->st_stid.sc_file;
>>>> +		fp->fi_fds[O_RDONLY] = nf;
>>>> +		spin_unlock(&fp->fi_lock);
>>>> +	}
>>>> +	return true;
>>>> +}
>>>> +
>>>>   /*
>>>>    * The Linux NFS server does not offer write delegations to NFSv4.0
>>>>    * clients in order to avoid conflicts between write delegations and
>>>> @@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
>>>>    * open or lock state.
>>>>    */
>>>>   static void
>>>> -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>>> -		     struct svc_fh *currentfh)
>>>> +nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
>>>> +		     struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
>>>> +		     struct svc_fh *fh)
>>>>   {
>>>>   	bool deleg_ts = open->op_deleg_want & OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
>>>>   	struct nfs4_openowner *oo = openowner(stp->st_stateowner);
>>>> @@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
>>>>   	memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
>>>>   
>>>>   	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>>> -		if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>>> +		if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
>>>> +				!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>>>   			nfs4_put_stid(&dp->dl_stid);
>>>>   			destroy_delegation(dp);
>>>>   			goto out_no_deleg;
>>>> @@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>>>>   	* Attempt to hand out a delegation. No error return, because the
>>>>   	* OPEN succeeds even if we fail.
>>>>   	*/
>>>> -	nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
>>>> +	nfs4_open_delegation(rqstp, open, stp,
>>>> +		&resp->cstate.current_fh, current_fh);
>>>>   
>>>>   	/*
>>>>   	 * If there is an existing open stateid, it must be updated and
>>>> @@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
>>>>   		return_revoked = true;
>>>>   	if (typemask & SC_TYPE_DELEG)
>>>>   		/* Always allow REVOKED for DELEG so we can
>>>> -		 * retturn the appropriate error.
>>>> +		 * return the appropriate error.
>>>>   		 */
>>>>   		statusmask |= SC_STATUS_REVOKED;
>>>>   
>>>> @@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>>>>   
>>>>   	switch (s->sc_type) {
>>>>   	case SC_TYPE_DELEG:
>>>> -		spin_lock(&s->sc_file->fi_lock);
>>>> -		ret = nfsd_file_get(s->sc_file->fi_deleg_file);
>>>> -		spin_unlock(&s->sc_file->fi_lock);
>>>> -		break;
>>>>   	case SC_TYPE_OPEN:
>>>>   	case SC_TYPE_LOCK:
>>>>   		if (flags & RD_STATE)
>>> I'm seeing a nfsd_file leak in chuck's nfsd-next tree and a bisect
>>> landed on this patch. The reproducer is:
>>>
>>> 1/ set up an exported fs on a server (I used xfs, but it probably
>>> doesn't matter).
>>>
>>> 2/ mount up the export on a client using v4.2
>>>
>>> 3/ Run this fio script in the directory:
>>>
>>> ----------------8<---------------------
>>> [global]
>>> name=fio-seq-write
>>> filename=fio-seq-write
>>> rw=write
>>> bs=1M
>>> direct=0
>>> numjobs=1
>>> time_based
>>> runtime=60
>>>
>>> [file1]
>>> size=50G
>>> ioengine=io_uring
>>> iodepth=16
>>> ----------------8<---------------------
>>>
>>> When it completes, shut down the nfs server. You'll see warnings like
>>> this:
>>>
>>>      BUG nfsd_file (Tainted: G    B   W          ): Objects remaining on __kmem_cache_shutdown()
>>>
>>> Dai, can you take a look?
>> After any substantial NFSv4 workload on my nfsd-testing server followed
>> by a clean unmount from the client, /proc/fs/nfsd/filecache still has
>> junk in it:
>>
>> total inodes:  281105
>> hash buckets:  524288
>> lru entries:   0
>> cache hits:    307264
>> acquisitions:  2092346
>> allocations:   1785084
>> releases:      1503979
>> evictions:     0
>> mean age (ms): 400
>>
>> Looks like a leak to me.
>>
>>
> An additional symptom: At shutdown I see the unmount of exports fail
> with EBUSY. I don't see the crash that Jeff reports above, but I'm
> guessing he's got some extra slub debugging enabled.
>
> I've dropped "NFSD: Offer write delegation for OPEN with
> OPEN4_SHARE_ACCESS_WRITE" from nfsd-next for the moment.
>

Consistent with the kmemleak complaint that I see a bunch of (running
simple untar):
--
unreferenced object 0xffff8c778fd93e80 (size 128):
   comm "nfsd", pid 920, jiffies 4300669441
   hex dump (first 32 bytes):
     80 1d 87 8d 77 8c ff ff 00 00 00 00 00 00 00 00 ....w...........
     28 e4 b7 c9 77 8c ff ff 00 b3 20 8f 77 8c ff ff  (...w..... .w...
   backtrace (crc 3c1e761e):
     kmemleak_alloc+0x4a/0x90
     kmem_cache_alloc_noprof+0x344/0x410
     nfsd_file_do_acquire+0x22e/0xd70 [nfsd]
     nfsd_file_acquire_opened+0x30/0x50 [nfsd]
     nfs4_open_delegation+0xfd9/0x12d0 [nfsd]
     nfsd4_process_open2+0x4a1/0xa30 [nfsd]
     nfsd4_open+0x833/0xf50 [nfsd]
     nfsd4_proc_compound+0x3b0/0x7d0 [nfsd]
     nfsd_dispatch+0xd3/0x210 [nfsd]
     svc_process_common+0x465/0x760 [sunrpc]
     svc_process+0x136/0x1f0 [sunrpc]
     svc_recv+0x925/0xb20 [sunrpc]
     nfsd+0x90/0xf0 [nfsd]
     kthread+0x119/0x240
     ret_from_fork+0x44/0x70
     ret_from_fork_asm+0x1a/0x30


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-05-11 21:13       ` Dai Ngo
@ 2025-05-11 23:46         ` Chuck Lever
  0 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2025-05-11 23:46 UTC (permalink / raw)
  To: Dai Ngo, Jeff Layton, neilb, okorniev, tom; +Cc: linux-nfs, sagi

On 5/11/25 5:13 PM, Dai Ngo wrote:
> 
> On 5/10/25 12:23 PM, Dai Ngo wrote:
>>
>> On 5/9/25 12:32 PM, Jeff Layton wrote:
>>> On Thu, 2025-03-06 at 11:31 -0800, Dai Ngo wrote:
>>>> RFC8881, section 9.1.2 says:
>>>>
>>>>    "In the case of READ, the server may perform the corresponding
>>>>     check on the access mode, or it may choose to allow READ for
>>>>     OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>>>     implementation may unavoidably do reads (e.g., due to buffer cache
>>>>     constraints)."
>>>>
>>>> and in section 10.4.1:
>>>>     "Similarly, when closing a file opened for
>>>> OPEN4_SHARE_ACCESS_WRITE/
>>>>     OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
>>>>     is in effect"
>>>>
>>>> This patch allow READ using write delegation stateid granted on OPENs
>>>> with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
>>>> implementation may unavoidably do (e.g., due to buffer cache
>>>> constraints).
>>>>
>>>> For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
>>>> a new nfsd_file and a struct file are allocated to use for reads.
>>>> The nfsd_file is freed when the file is closed by release_all_access.
>>>>
>>>> Suggested-by: Chuck Lever <chuck.lever@oracle.com>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>>   fs/nfsd/nfs4state.c | 75 +++++++++++++++++++++++++++
>>>> +-----------------
>>>>   1 file changed, 47 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index 0f97f2c62b3a..295415fda985 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
>>>>       return ret;
>>>>   }
>>>>   -static struct nfsd_file *
>>>> -find_rw_file(struct nfs4_file *f)
>>>> -{
>>>> -    struct nfsd_file *ret;
>>>> -
>>>> -    spin_lock(&f->fi_lock);
>>>> -    ret = nfsd_file_get(f->fi_fds[O_RDWR]);
>>>> -    spin_unlock(&f->fi_lock);
>>>> -
>>>> -    return ret;
>>>> -}
>>>> -
>>>>   struct nfsd_file *
>>>>   find_any_file(struct nfs4_file *f)
>>>>   {
>>>> @@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open,
>>>> struct nfs4_ol_stateid *stp,
>>>>        *  "An OPEN_DELEGATE_WRITE delegation allows the client to
>>>> handle,
>>>>        *   on its own, all opens."
>>>>        *
>>>> -     * Furthermore the client can use a write delegation for most READ
>>>> -     * operations as well, so we require a O_RDWR file here.
>>>> +     * Furthermore, section 9.1.2 says:
>>>> +     *
>>>> +     *  "In the case of READ, the server may perform the corresponding
>>>> +     *  check on the access mode, or it may choose to allow READ for
>>>> +     *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
>>>> +     *  implementation may unavoidably do reads (e.g., due to buffer
>>>> +     *  cache constraints)."
>>>>        *
>>>> -     * Offer a write delegation in the case of a BOTH open, and ensure
>>>> -     * we get the O_RDWR descriptor.
>>>> +     *  We choose to offer a write delegation for OPEN with the
>>>> +     *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such
>>>> clients.
>>>>        */
>>>> -    if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
>>>> NFS4_SHARE_ACCESS_BOTH) {
>>>> -        nf = find_rw_file(fp);
>>>> +    if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>>> +        nf = find_writeable_file(fp);
>>>>           dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG :
>>>> OPEN_DELEGATE_WRITE;
>>>>       }
>>>>   @@ -6116,7 +6109,7 @@ static bool
>>>>   nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh
>>>> *currentfh,
>>>>                struct kstat *stat)
>>>>   {
>>>> -    struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
>>>> +    struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
>>>>       struct path path;
>>>>       int rc;
>>>>   @@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation
>>>> *dp, struct svc_fh *currentfh,
>>>>       return rc == 0;
>>>>   }
>>>>   +/*
>>>> + * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
>>>> + * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
>>>> + * struct file to be used for read with delegation stateid.
>>>> + *
>>>> + */
>>>> +static bool
>>>> +nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct
>>>> nfsd4_open *open,
>>>> +                  struct svc_fh *fh, struct nfs4_ol_stateid *stp)
>>>> +{
>>>> +    struct nfs4_file *fp;
>>>> +    struct nfsd_file *nf = NULL;
>>>> +
>>>> +    if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
>>>> +            NFS4_SHARE_ACCESS_WRITE) {
>>>> +        if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ,
>>>> NULL, &nf))
>>>> +            return (false);
>>>> +        fp = stp->st_stid.sc_file;
>>>> +        spin_lock(&fp->fi_lock);
>>>> +        __nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>>>> +        fp = stp->st_stid.sc_file;
>>>> +        fp->fi_fds[O_RDONLY] = nf;
>>>> +        spin_unlock(&fp->fi_lock);
>>>> +    }
>>>> +    return true;
>>>> +}
>>>> +
>>>>   /*
>>>>    * The Linux NFS server does not offer write delegations to NFSv4.0
>>>>    * clients in order to avoid conflicts between write delegations and
>>>> @@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation
>>>> *dp, struct svc_fh *currentfh,
>>>>    * open or lock state.
>>>>    */
>>>>   static void
>>>> -nfs4_open_delegation(struct nfsd4_open *open, struct
>>>> nfs4_ol_stateid *stp,
>>>> -             struct svc_fh *currentfh)
>>>> +nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
>>>> +             struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
>>>> +             struct svc_fh *fh)
>>>>   {
>>>>       bool deleg_ts = open->op_deleg_want &
>>>> OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
>>>>       struct nfs4_openowner *oo = openowner(stp->st_stateowner);
>>>> @@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open,
>>>> struct nfs4_ol_stateid *stp,
>>>>       memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid,
>>>> sizeof(dp->dl_stid.sc_stateid));
>>>>         if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
>>>> -        if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
>>>> +        if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
>>>> +                !nfs4_delegation_stat(dp, currentfh, &stat)) {
>>>>               nfs4_put_stid(&dp->dl_stid);
>>>>               destroy_delegation(dp);
>>>>               goto out_no_deleg;
>>>> @@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp,
>>>> struct svc_fh *current_fh, struct nf
>>>>       * Attempt to hand out a delegation. No error return, because the
>>>>       * OPEN succeeds even if we fail.
>>>>       */
>>>> -    nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
>>>> +    nfs4_open_delegation(rqstp, open, stp,
>>>> +        &resp->cstate.current_fh, current_fh);
>>>>         /*
>>>>        * If there is an existing open stateid, it must be updated and
>>>> @@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct
>>>> nfsd4_compound_state *cstate,
>>>>           return_revoked = true;
>>>>       if (typemask & SC_TYPE_DELEG)
>>>>           /* Always allow REVOKED for DELEG so we can
>>>> -         * retturn the appropriate error.
>>>> +         * return the appropriate error.
>>>>            */
>>>>           statusmask |= SC_STATUS_REVOKED;
>>>>   @@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>>>>         switch (s->sc_type) {
>>>>       case SC_TYPE_DELEG:
>>>> -        spin_lock(&s->sc_file->fi_lock);
>>>> -        ret = nfsd_file_get(s->sc_file->fi_deleg_file);
>>>> -        spin_unlock(&s->sc_file->fi_lock);
>>>> -        break;
>>>>       case SC_TYPE_OPEN:
>>>>       case SC_TYPE_LOCK:
>>>>           if (flags & RD_STATE)
>>> I'm seeing a nfsd_file leak in chuck's nfsd-next tree and a bisect
>>> landed on this patch. The reproducer is:
>>>
>>> 1/ set up an exported fs on a server (I used xfs, but it probably
>>> doesn't matter).
>>>
>>> 2/ mount up the export on a client using v4.2
>>>
>>> 3/ Run this fio script in the directory:
>>>
>>> ----------------8<---------------------
>>> [global]
>>> name=fio-seq-write
>>> filename=fio-seq-write
>>> rw=write
>>> bs=1M
>>> direct=0
>>> numjobs=1
>>> time_based
>>> runtime=60
>>>
>>> [file1]
>>> size=50G
>>> ioengine=io_uring
>>> iodepth=16
>>> ----------------8<---------------------
>>>
>>> When it completes, shut down the nfs server. You'll see warnings like
>>> this:
>>>
>>>      BUG nfsd_file (Tainted: G    B   W          ): Objects remaining
>>> on __kmem_cache_shutdown()
>>>
>>> Dai, can you take a look?
>>
>> Will do,
> 
> I found the problem, the nfsd_file added for READ did not get freed when
> the delegation is returned.
> 
> Chuck, do you want me to resend the whole patch plus the fix or just the
> fix? I'd prefer just the fix to make it easy for review but it's up to you.

You can send the patch and the fix separately. I'd like to squash them
together when applying them to nfsd-testing.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE
  2025-05-13 16:08 Dai Ngo
@ 2025-05-13 16:08 ` Dai Ngo
  0 siblings, 0 replies; 11+ messages in thread
From: Dai Ngo @ 2025-05-13 16:08 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom; +Cc: linux-nfs, sagi

RFC8881, section 9.1.2 says:

  "In the case of READ, the server may perform the corresponding
   check on the access mode, or it may choose to allow READ for
   OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
   implementation may unavoidably do reads (e.g., due to buffer cache
   constraints)."

and in section 10.4.1:
   "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/
   OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation
   is in effect"

This patch allow READ using write delegation stateid granted on OPENs
with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE
implementation may unavoidably do (e.g., due to buffer cache
constraints).

For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE
a new nfsd_file and a struct file are allocated to use for reads.
The nfsd_file is freed when the file is closed by release_all_access.

Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4state.c | 75 ++++++++++++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 28 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 0f97f2c62b3a..295415fda985 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -633,18 +633,6 @@ find_readable_file(struct nfs4_file *f)
 	return ret;
 }
 
-static struct nfsd_file *
-find_rw_file(struct nfs4_file *f)
-{
-	struct nfsd_file *ret;
-
-	spin_lock(&f->fi_lock);
-	ret = nfsd_file_get(f->fi_fds[O_RDWR]);
-	spin_unlock(&f->fi_lock);
-
-	return ret;
-}
-
 struct nfsd_file *
 find_any_file(struct nfs4_file *f)
 {
@@ -5987,14 +5975,19 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
 	 *  "An OPEN_DELEGATE_WRITE delegation allows the client to handle,
 	 *   on its own, all opens."
 	 *
-	 * Furthermore the client can use a write delegation for most READ
-	 * operations as well, so we require a O_RDWR file here.
+	 * Furthermore, section 9.1.2 says:
+	 *
+	 *  "In the case of READ, the server may perform the corresponding
+	 *  check on the access mode, or it may choose to allow READ for
+	 *  OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE
+	 *  implementation may unavoidably do reads (e.g., due to buffer
+	 *  cache constraints)."
 	 *
-	 * Offer a write delegation in the case of a BOTH open, and ensure
-	 * we get the O_RDWR descriptor.
+	 *  We choose to offer a write delegation for OPEN with the
+	 *  OPEN4_SHARE_ACCESS_WRITE access mode to accommodate such clients.
 	 */
-	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) == NFS4_SHARE_ACCESS_BOTH) {
-		nf = find_rw_file(fp);
+	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
+		nf = find_writeable_file(fp);
 		dl_type = deleg_ts ? OPEN_DELEGATE_WRITE_ATTRS_DELEG : OPEN_DELEGATE_WRITE;
 	}
 
@@ -6116,7 +6109,7 @@ static bool
 nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
 		     struct kstat *stat)
 {
-	struct nfsd_file *nf = find_rw_file(dp->dl_stid.sc_file);
+	struct nfsd_file *nf = find_writeable_file(dp->dl_stid.sc_file);
 	struct path path;
 	int rc;
 
@@ -6134,6 +6127,33 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
 	return rc == 0;
 }
 
+/*
+ * Add NFS4_SHARE_ACCESS_READ to the write delegation granted on OPEN
+ * with NFS4_SHARE_ACCESS_WRITE by allocating separate nfsd_file and
+ * struct file to be used for read with delegation stateid.
+ *
+ */
+static bool
+nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
+			      struct svc_fh *fh, struct nfs4_ol_stateid *stp)
+{
+	struct nfs4_file *fp;
+	struct nfsd_file *nf = NULL;
+
+	if ((open->op_share_access & NFS4_SHARE_ACCESS_BOTH) ==
+			NFS4_SHARE_ACCESS_WRITE) {
+		if (nfsd_file_acquire_opened(rqstp, fh, NFSD_MAY_READ, NULL, &nf))
+			return (false);
+		fp = stp->st_stid.sc_file;
+		spin_lock(&fp->fi_lock);
+		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
+		fp = stp->st_stid.sc_file;
+		fp->fi_fds[O_RDONLY] = nf;
+		spin_unlock(&fp->fi_lock);
+	}
+	return true;
+}
+
 /*
  * The Linux NFS server does not offer write delegations to NFSv4.0
  * clients in order to avoid conflicts between write delegations and
@@ -6159,8 +6179,9 @@ nfs4_delegation_stat(struct nfs4_delegation *dp, struct svc_fh *currentfh,
  * open or lock state.
  */
 static void
-nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
-		     struct svc_fh *currentfh)
+nfs4_open_delegation(struct svc_rqst *rqstp, struct nfsd4_open *open,
+		     struct nfs4_ol_stateid *stp, struct svc_fh *currentfh,
+		     struct svc_fh *fh)
 {
 	bool deleg_ts = open->op_deleg_want & OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS;
 	struct nfs4_openowner *oo = openowner(stp->st_stateowner);
@@ -6205,7 +6226,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
 	memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
 
 	if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) {
-		if (!nfs4_delegation_stat(dp, currentfh, &stat)) {
+		if (!nfsd4_add_rdaccess_to_wrdeleg(rqstp, open, fh, stp) ||
+				!nfs4_delegation_stat(dp, currentfh, &stat)) {
 			nfs4_put_stid(&dp->dl_stid);
 			destroy_delegation(dp);
 			goto out_no_deleg;
@@ -6361,7 +6383,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
 	* Attempt to hand out a delegation. No error return, because the
 	* OPEN succeeds even if we fail.
 	*/
-	nfs4_open_delegation(open, stp, &resp->cstate.current_fh);
+	nfs4_open_delegation(rqstp, open, stp,
+		&resp->cstate.current_fh, current_fh);
 
 	/*
 	 * If there is an existing open stateid, it must be updated and
@@ -7063,7 +7086,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		return_revoked = true;
 	if (typemask & SC_TYPE_DELEG)
 		/* Always allow REVOKED for DELEG so we can
-		 * retturn the appropriate error.
+		 * return the appropriate error.
 		 */
 		statusmask |= SC_STATUS_REVOKED;
 
@@ -7106,10 +7129,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
 
 	switch (s->sc_type) {
 	case SC_TYPE_DELEG:
-		spin_lock(&s->sc_file->fi_lock);
-		ret = nfsd_file_get(s->sc_file->fi_deleg_file);
-		spin_unlock(&s->sc_file->fi_lock);
-		break;
 	case SC_TYPE_OPEN:
 	case SC_TYPE_LOCK:
 		if (flags & RD_STATE)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-05-13 16:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-06 19:31 [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only Dai Ngo
2025-03-06 19:31 ` [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE Dai Ngo
2025-05-09 19:32   ` Jeff Layton
2025-05-09 21:48     ` Chuck Lever
2025-05-10  0:10       ` Chuck Lever
2025-05-11 21:59         ` Sagi Grimberg
2025-05-10 19:23     ` Dai Ngo
2025-05-11 21:13       ` Dai Ngo
2025-05-11 23:46         ` Chuck Lever
2025-03-07 14:13 ` [PATCH V6 0/1] NFSD: offer write delegation for OPEN with OPEN4_SHARE_ACCESS only cel
  -- strict thread matches above, loose matches on Subject: below --
2025-05-13 16:08 Dai Ngo
2025-05-13 16:08 ` [PATCH V6 1/1] NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE Dai Ngo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox