Linux NFS development
 help / color / mirror / Atom feed
* [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
@ 2025-12-01 22:09 Chuck Lever
  2025-12-02 12:08 ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-12-01 22:09 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
the file already has a READ open from a previous OPEN operation,
this overwrites the existing pointer without releasing its reference,
orphaning the previous reference.

Additionally, the function originally stored the same nfsd_file
pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
only a single reference. When put_deleg_file() runs, it clears
fi_rdeleg_file and calls nfs4_file_put_access() to release the file.

However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
the fi_access[O_RDONLY] counter drops to zero. If another READ open
exists on the file, the counter remains elevated and the nfsd_file
reference from the delegation is never released. This potentially
causes open conflicts on that file.

But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
to encounter files with an elevated reference count that cannot be
cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
because there are still nfsd_file objects allocated in that cache.

Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
X-Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4state.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 35004568d43e..11877b96dc4c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
 
 	if (nf)
 		nfsd_file_put(nf);
-	if (rnf)
+	if (rnf) {
+		nfsd_file_put(rnf);
 		nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
+	}
 }
 
 static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
@@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
 		fp = stp->st_stid.sc_file;
 		spin_lock(&fp->fi_lock);
 		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
-		fp = stp->st_stid.sc_file;
-		fp->fi_fds[O_RDONLY] = nf;
-		fp->fi_rdeleg_file = nf;
+		if (!fp->fi_fds[O_RDONLY]) {
+			fp->fi_fds[O_RDONLY] = nf;
+			nf = NULL;
+		}
+		fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
 		spin_unlock(&fp->fi_lock);
+		if (nf)
+			nfsd_file_put(nf);
 	}
 	return true;
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
  2025-12-01 22:09 [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg() Chuck Lever
@ 2025-12-02 12:08 ` Jeff Layton
  2025-12-02 13:34   ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-12-02 12:08 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
> fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
> the file already has a READ open from a previous OPEN operation,
> this overwrites the existing pointer without releasing its reference,
> orphaning the previous reference.
> 
> Additionally, the function originally stored the same nfsd_file
> pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
> only a single reference. When put_deleg_file() runs, it clears
> fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
> 
> However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
> the fi_access[O_RDONLY] counter drops to zero. If another READ open
> exists on the file, the counter remains elevated and the nfsd_file
> reference from the delegation is never released. This potentially
> causes open conflicts on that file.
> 
> But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
> to encounter files with an elevated reference count that cannot be
> cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
> because there are still nfsd_file objects allocated in that cache.
> 
> Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
> X-Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4state.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 35004568d43e..11877b96dc4c 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
>  
>  	if (nf)
>  		nfsd_file_put(nf);
> -	if (rnf)
> +	if (rnf) {
> +		nfsd_file_put(rnf);
>  		nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
> +	}
>  }
>  
>  static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
> @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>  		fp = stp->st_stid.sc_file;
>  		spin_lock(&fp->fi_lock);
>  		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
> -		fp = stp->st_stid.sc_file;

Weird. Just noticed the double assignment here.

> -		fp->fi_fds[O_RDONLY] = nf;
> -		fp->fi_rdeleg_file = nf;
> +		if (!fp->fi_fds[O_RDONLY]) {
> +			fp->fi_fds[O_RDONLY] = nf;
> +			nf = NULL;
> +		}
> +		fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
>  		spin_unlock(&fp->fi_lock);
> +		if (nf)
> +			nfsd_file_put(nf);
>  	}
>  	return true;
>  }

I do so wish this refcounting were easier to get right, but I don't
have any great ideas around it yet.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
  2025-12-02 12:08 ` Jeff Layton
@ 2025-12-02 13:34   ` Chuck Lever
  2025-12-02 14:53     ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-12-02 13:34 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On 12/2/25 7:08 AM, Jeff Layton wrote:
> On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
>> fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
>> the file already has a READ open from a previous OPEN operation,
>> this overwrites the existing pointer without releasing its reference,
>> orphaning the previous reference.
>>
>> Additionally, the function originally stored the same nfsd_file
>> pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
>> only a single reference. When put_deleg_file() runs, it clears
>> fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
>>
>> However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
>> the fi_access[O_RDONLY] counter drops to zero. If another READ open
>> exists on the file, the counter remains elevated and the nfsd_file
>> reference from the delegation is never released. This potentially
>> causes open conflicts on that file.
>>
>> But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
>> to encounter files with an elevated reference count that cannot be
>> cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
>> because there are still nfsd_file objects allocated in that cache.
>>
>> Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
>> X-Cc: stable@vger.kernel.org
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  fs/nfsd/nfs4state.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 35004568d43e..11877b96dc4c 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
>>  
>>  	if (nf)
>>  		nfsd_file_put(nf);
>> -	if (rnf)
>> +	if (rnf) {
>> +		nfsd_file_put(rnf);
>>  		nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
>> +	}
>>  }
>>  
>>  static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
>> @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>>  		fp = stp->st_stid.sc_file;
>>  		spin_lock(&fp->fi_lock);
>>  		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>> -		fp = stp->st_stid.sc_file;
> 
> Weird. Just noticed the double assignment here.
> 
>> -		fp->fi_fds[O_RDONLY] = nf;
>> -		fp->fi_rdeleg_file = nf;
>> +		if (!fp->fi_fds[O_RDONLY]) {
>> +			fp->fi_fds[O_RDONLY] = nf;
>> +			nf = NULL;
>> +		}
>> +		fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
>>  		spin_unlock(&fp->fi_lock);
>> +		if (nf)
>> +			nfsd_file_put(nf);
>>  	}
>>  	return true;
>>  }
> 
> I do so wish this refcounting were easier to get right, but I don't
> have any great ideas around it yet.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>

Thanks for the R-b. Chris's review-prompts do generic navigation of
reference counting, so we have a little more of a back-stop now. I ran
the review-prompts against e7a8ebc305f2 on a lark, and they indeed found
this problem.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
  2025-12-02 13:34   ` Chuck Lever
@ 2025-12-02 14:53     ` Jeff Layton
  2025-12-02 19:13       ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-12-02 14:53 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever

On Tue, 2025-12-02 at 08:34 -0500, Chuck Lever wrote:
> On 12/2/25 7:08 AM, Jeff Layton wrote:
> > On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > > 
> > > nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
> > > fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
> > > the file already has a READ open from a previous OPEN operation,
> > > this overwrites the existing pointer without releasing its reference,
> > > orphaning the previous reference.
> > > 
> > > Additionally, the function originally stored the same nfsd_file
> > > pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
> > > only a single reference. When put_deleg_file() runs, it clears
> > > fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
> > > 
> > > However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
> > > the fi_access[O_RDONLY] counter drops to zero. If another READ open
> > > exists on the file, the counter remains elevated and the nfsd_file
> > > reference from the delegation is never released. This potentially
> > > causes open conflicts on that file.
> > > 
> > > But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
> > > to encounter files with an elevated reference count that cannot be
> > > cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
> > > because there are still nfsd_file objects allocated in that cache.
> > > 
> > > Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
> > > X-Cc: stable@vger.kernel.org
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > >  fs/nfsd/nfs4state.c | 14 ++++++++++----
> > >  1 file changed, 10 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 35004568d43e..11877b96dc4c 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
> > >  
> > >  	if (nf)
> > >  		nfsd_file_put(nf);
> > > -	if (rnf)
> > > +	if (rnf) {
> > > +		nfsd_file_put(rnf);
> > >  		nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
> > > +	}
> > >  }
> > >  
> > >  static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
> > > @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
> > >  		fp = stp->st_stid.sc_file;
> > >  		spin_lock(&fp->fi_lock);
> > >  		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
> > > -		fp = stp->st_stid.sc_file;
> > 
> > Weird. Just noticed the double assignment here.
> > 
> > > -		fp->fi_fds[O_RDONLY] = nf;
> > > -		fp->fi_rdeleg_file = nf;
> > > +		if (!fp->fi_fds[O_RDONLY]) {
> > > +			fp->fi_fds[O_RDONLY] = nf;
> > > +			nf = NULL;
> > > +		}
> > > +		fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
> > >  		spin_unlock(&fp->fi_lock);
> > > +		if (nf)
> > > +			nfsd_file_put(nf);
> > >  	}
> > >  	return true;
> > >  }
> > 
> > I do so wish this refcounting were easier to get right, but I don't
> > have any great ideas around it yet.
> > 
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 
> Thanks for the R-b. Chris's review-prompts do generic navigation of
> reference counting, so we have a little more of a back-stop now. I ran
> the review-prompts against e7a8ebc305f2 on a lark, and they indeed found
> this problem.
> 

Now that I look again, I'm wondering -- is this problem possible?

nfsd4_add_rdaccess_to_wrdeleg() is called after we have set a write
lease on the file. There should be no other open files at that point,
so fi_fds[O_RDONLY] must be NULL already.

Instead of or in addition to doing this, maybe we should be doing a
WARN_ON_ONCE() if fp->fi_fds[O_RDONLY] is non-NULL here?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
  2025-12-02 14:53     ` Jeff Layton
@ 2025-12-02 19:13       ` Chuck Lever
  0 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2025-12-02 19:13 UTC (permalink / raw)
  To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, Chuck Lever



On Tue, Dec 2, 2025, at 9:53 AM, Jeff Layton wrote:
> On Tue, 2025-12-02 at 08:34 -0500, Chuck Lever wrote:
>> On 12/2/25 7:08 AM, Jeff Layton wrote:
>> > On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
>> > > From: Chuck Lever <chuck.lever@oracle.com>
>> > > 
>> > > nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
>> > > fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
>> > > the file already has a READ open from a previous OPEN operation,
>> > > this overwrites the existing pointer without releasing its reference,
>> > > orphaning the previous reference.
>> > > 
>> > > Additionally, the function originally stored the same nfsd_file
>> > > pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
>> > > only a single reference. When put_deleg_file() runs, it clears
>> > > fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
>> > > 
>> > > However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
>> > > the fi_access[O_RDONLY] counter drops to zero. If another READ open
>> > > exists on the file, the counter remains elevated and the nfsd_file
>> > > reference from the delegation is never released. This potentially
>> > > causes open conflicts on that file.
>> > > 
>> > > But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
>> > > to encounter files with an elevated reference count that cannot be
>> > > cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
>> > > because there are still nfsd_file objects allocated in that cache.
>> > > 
>> > > Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
>> > > X-Cc: stable@vger.kernel.org
>> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> > > ---
>> > >  fs/nfsd/nfs4state.c | 14 ++++++++++----
>> > >  1 file changed, 10 insertions(+), 4 deletions(-)
>> > > 
>> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> > > index 35004568d43e..11877b96dc4c 100644
>> > > --- a/fs/nfsd/nfs4state.c
>> > > +++ b/fs/nfsd/nfs4state.c
>> > > @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
>> > >  
>> > >  	if (nf)
>> > >  		nfsd_file_put(nf);
>> > > -	if (rnf)
>> > > +	if (rnf) {
>> > > +		nfsd_file_put(rnf);
>> > >  		nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
>> > > +	}
>> > >  }
>> > >  
>> > >  static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
>> > > @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>> > >  		fp = stp->st_stid.sc_file;
>> > >  		spin_lock(&fp->fi_lock);
>> > >  		__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>> > > -		fp = stp->st_stid.sc_file;
>> > 
>> > Weird. Just noticed the double assignment here.
>> > 
>> > > -		fp->fi_fds[O_RDONLY] = nf;
>> > > -		fp->fi_rdeleg_file = nf;
>> > > +		if (!fp->fi_fds[O_RDONLY]) {
>> > > +			fp->fi_fds[O_RDONLY] = nf;
>> > > +			nf = NULL;
>> > > +		}
>> > > +		fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
>> > >  		spin_unlock(&fp->fi_lock);
>> > > +		if (nf)
>> > > +			nfsd_file_put(nf);
>> > >  	}
>> > >  	return true;
>> > >  }
>> > 
>> > I do so wish this refcounting were easier to get right, but I don't
>> > have any great ideas around it yet.
>> > 
>> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
>> 
>> Thanks for the R-b. Chris's review-prompts do generic navigation of
>> reference counting, so we have a little more of a back-stop now. I ran
>> the review-prompts against e7a8ebc305f2 on a lark, and they indeed found
>> this problem.
>> 
>
> Now that I look again, I'm wondering -- is this problem possible?

It is. The server does crash during shutdown as described above,
and this patch makes the crashes stop.


> nfsd4_add_rdaccess_to_wrdeleg() is called after we have set a write
> lease on the file. There should be no other open files at that point,
> so fi_fds[O_RDONLY] must be NULL already.
>
> Instead of or in addition to doing this, maybe we should be doing a
> WARN_ON_ONCE() if fp->fi_fds[O_RDONLY] is non-NULL here?

A WARN_ON(1) will help me nail down exactly what part of xfstests is
triggering the problem. Stay tuned.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-12-02 19:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01 22:09 [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg() Chuck Lever
2025-12-02 12:08 ` Jeff Layton
2025-12-02 13:34   ` Chuck Lever
2025-12-02 14:53     ` Jeff Layton
2025-12-02 19:13       ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox