* [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
@ 2025-12-01 22:09 Chuck Lever
2025-12-02 12:08 ` Jeff Layton
0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-12-01 22:09 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
the file already has a READ open from a previous OPEN operation,
this overwrites the existing pointer without releasing its reference,
orphaning the previous reference.
Additionally, the function originally stored the same nfsd_file
pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
only a single reference. When put_deleg_file() runs, it clears
fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
the fi_access[O_RDONLY] counter drops to zero. If another READ open
exists on the file, the counter remains elevated and the nfsd_file
reference from the delegation is never released. This potentially
causes open conflicts on that file.
But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
to encounter files with an elevated reference count that cannot be
cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
because there are still nfsd_file objects allocated in that cache.
Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
X-Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4state.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 35004568d43e..11877b96dc4c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
if (nf)
nfsd_file_put(nf);
- if (rnf)
+ if (rnf) {
+ nfsd_file_put(rnf);
nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
+ }
}
static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
@@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
fp = stp->st_stid.sc_file;
spin_lock(&fp->fi_lock);
__nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
- fp = stp->st_stid.sc_file;
- fp->fi_fds[O_RDONLY] = nf;
- fp->fi_rdeleg_file = nf;
+ if (!fp->fi_fds[O_RDONLY]) {
+ fp->fi_fds[O_RDONLY] = nf;
+ nf = NULL;
+ }
+ fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
spin_unlock(&fp->fi_lock);
+ if (nf)
+ nfsd_file_put(nf);
}
return true;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
2025-12-01 22:09 [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg() Chuck Lever
@ 2025-12-02 12:08 ` Jeff Layton
2025-12-02 13:34 ` Chuck Lever
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-12-02 12:08 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
> fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
> the file already has a READ open from a previous OPEN operation,
> this overwrites the existing pointer without releasing its reference,
> orphaning the previous reference.
>
> Additionally, the function originally stored the same nfsd_file
> pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
> only a single reference. When put_deleg_file() runs, it clears
> fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
>
> However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
> the fi_access[O_RDONLY] counter drops to zero. If another READ open
> exists on the file, the counter remains elevated and the nfsd_file
> reference from the delegation is never released. This potentially
> causes open conflicts on that file.
>
> But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
> to encounter files with an elevated reference count that cannot be
> cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
> because there are still nfsd_file objects allocated in that cache.
>
> Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
> X-Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4state.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 35004568d43e..11877b96dc4c 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
>
> if (nf)
> nfsd_file_put(nf);
> - if (rnf)
> + if (rnf) {
> + nfsd_file_put(rnf);
> nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
> + }
> }
>
> static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
> @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
> fp = stp->st_stid.sc_file;
> spin_lock(&fp->fi_lock);
> __nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
> - fp = stp->st_stid.sc_file;
Weird. Just noticed the double assignment here.
> - fp->fi_fds[O_RDONLY] = nf;
> - fp->fi_rdeleg_file = nf;
> + if (!fp->fi_fds[O_RDONLY]) {
> + fp->fi_fds[O_RDONLY] = nf;
> + nf = NULL;
> + }
> + fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
> spin_unlock(&fp->fi_lock);
> + if (nf)
> + nfsd_file_put(nf);
> }
> return true;
> }
I do so wish this refcounting were easier to get right, but I don't
have any great ideas around it yet.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
2025-12-02 12:08 ` Jeff Layton
@ 2025-12-02 13:34 ` Chuck Lever
2025-12-02 14:53 ` Jeff Layton
0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2025-12-02 13:34 UTC (permalink / raw)
To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On 12/2/25 7:08 AM, Jeff Layton wrote:
> On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
>> fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
>> the file already has a READ open from a previous OPEN operation,
>> this overwrites the existing pointer without releasing its reference,
>> orphaning the previous reference.
>>
>> Additionally, the function originally stored the same nfsd_file
>> pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
>> only a single reference. When put_deleg_file() runs, it clears
>> fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
>>
>> However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
>> the fi_access[O_RDONLY] counter drops to zero. If another READ open
>> exists on the file, the counter remains elevated and the nfsd_file
>> reference from the delegation is never released. This potentially
>> causes open conflicts on that file.
>>
>> But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
>> to encounter files with an elevated reference count that cannot be
>> cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
>> because there are still nfsd_file objects allocated in that cache.
>>
>> Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
>> X-Cc: stable@vger.kernel.org
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/nfs4state.c | 14 ++++++++++----
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 35004568d43e..11877b96dc4c 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
>>
>> if (nf)
>> nfsd_file_put(nf);
>> - if (rnf)
>> + if (rnf) {
>> + nfsd_file_put(rnf);
>> nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
>> + }
>> }
>>
>> static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
>> @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>> fp = stp->st_stid.sc_file;
>> spin_lock(&fp->fi_lock);
>> __nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>> - fp = stp->st_stid.sc_file;
>
> Weird. Just noticed the double assignment here.
>
>> - fp->fi_fds[O_RDONLY] = nf;
>> - fp->fi_rdeleg_file = nf;
>> + if (!fp->fi_fds[O_RDONLY]) {
>> + fp->fi_fds[O_RDONLY] = nf;
>> + nf = NULL;
>> + }
>> + fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
>> spin_unlock(&fp->fi_lock);
>> + if (nf)
>> + nfsd_file_put(nf);
>> }
>> return true;
>> }
>
> I do so wish this refcounting were easier to get right, but I don't
> have any great ideas around it yet.
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Thanks for the R-b. Chris's review-prompts do generic navigation of
reference counting, so we have a little more of a back-stop now. I ran
the review-prompts against e7a8ebc305f2 on a lark, and they indeed found
this problem.
--
Chuck Lever
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
2025-12-02 13:34 ` Chuck Lever
@ 2025-12-02 14:53 ` Jeff Layton
2025-12-02 19:13 ` Chuck Lever
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2025-12-02 14:53 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On Tue, 2025-12-02 at 08:34 -0500, Chuck Lever wrote:
> On 12/2/25 7:08 AM, Jeff Layton wrote:
> > On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > >
> > > nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
> > > fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
> > > the file already has a READ open from a previous OPEN operation,
> > > this overwrites the existing pointer without releasing its reference,
> > > orphaning the previous reference.
> > >
> > > Additionally, the function originally stored the same nfsd_file
> > > pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
> > > only a single reference. When put_deleg_file() runs, it clears
> > > fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
> > >
> > > However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
> > > the fi_access[O_RDONLY] counter drops to zero. If another READ open
> > > exists on the file, the counter remains elevated and the nfsd_file
> > > reference from the delegation is never released. This potentially
> > > causes open conflicts on that file.
> > >
> > > But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
> > > to encounter files with an elevated reference count that cannot be
> > > cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
> > > because there are still nfsd_file objects allocated in that cache.
> > >
> > > Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
> > > X-Cc: stable@vger.kernel.org
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > > fs/nfsd/nfs4state.c | 14 ++++++++++----
> > > 1 file changed, 10 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 35004568d43e..11877b96dc4c 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
> > >
> > > if (nf)
> > > nfsd_file_put(nf);
> > > - if (rnf)
> > > + if (rnf) {
> > > + nfsd_file_put(rnf);
> > > nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
> > > + }
> > > }
> > >
> > > static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
> > > @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
> > > fp = stp->st_stid.sc_file;
> > > spin_lock(&fp->fi_lock);
> > > __nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
> > > - fp = stp->st_stid.sc_file;
> >
> > Weird. Just noticed the double assignment here.
> >
> > > - fp->fi_fds[O_RDONLY] = nf;
> > > - fp->fi_rdeleg_file = nf;
> > > + if (!fp->fi_fds[O_RDONLY]) {
> > > + fp->fi_fds[O_RDONLY] = nf;
> > > + nf = NULL;
> > > + }
> > > + fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
> > > spin_unlock(&fp->fi_lock);
> > > + if (nf)
> > > + nfsd_file_put(nf);
> > > }
> > > return true;
> > > }
> >
> > I do so wish this refcounting were easier to get right, but I don't
> > have any great ideas around it yet.
> >
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
>
> Thanks for the R-b. Chris's review-prompts do generic navigation of
> reference counting, so we have a little more of a back-stop now. I ran
> the review-prompts against e7a8ebc305f2 on a lark, and they indeed found
> this problem.
>
Now that I look again, I'm wondering -- is this problem possible?
nfsd4_add_rdaccess_to_wrdeleg() is called after we have set a write
lease on the file. There should be no other open files at that point,
so fi_fds[O_RDONLY] must be NULL already.
Instead of or in addition to doing this, maybe we should be doing a
WARN_ON_ONCE() if fp->fi_fds[O_RDONLY] is non-NULL here?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
2025-12-02 14:53 ` Jeff Layton
@ 2025-12-02 19:13 ` Chuck Lever
0 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2025-12-02 19:13 UTC (permalink / raw)
To: Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On Tue, Dec 2, 2025, at 9:53 AM, Jeff Layton wrote:
> On Tue, 2025-12-02 at 08:34 -0500, Chuck Lever wrote:
>> On 12/2/25 7:08 AM, Jeff Layton wrote:
>> > On Mon, 2025-12-01 at 17:09 -0500, Chuck Lever wrote:
>> > > From: Chuck Lever <chuck.lever@oracle.com>
>> > >
>> > > nfsd4_add_rdaccess_to_wrdeleg() unconditionally overwrites
>> > > fp->fi_fds[O_RDONLY] with a newly acquired nfsd_file. However, if
>> > > the file already has a READ open from a previous OPEN operation,
>> > > this overwrites the existing pointer without releasing its reference,
>> > > orphaning the previous reference.
>> > >
>> > > Additionally, the function originally stored the same nfsd_file
>> > > pointer in both fp->fi_fds[O_RDONLY] and fp->fi_rdeleg_file with
>> > > only a single reference. When put_deleg_file() runs, it clears
>> > > fi_rdeleg_file and calls nfs4_file_put_access() to release the file.
>> > >
>> > > However, nfs4_file_put_access() only releases fi_fds[O_RDONLY] when
>> > > the fi_access[O_RDONLY] counter drops to zero. If another READ open
>> > > exists on the file, the counter remains elevated and the nfsd_file
>> > > reference from the delegation is never released. This potentially
>> > > causes open conflicts on that file.
>> > >
>> > > But, on server shutdown, these leaks cause __nfsd_file_cache_purge()
>> > > to encounter files with an elevated reference count that cannot be
>> > > cleaned up, ultimately triggering a BUG() in kmem_cache_destroy()
>> > > because there are still nfsd_file objects allocated in that cache.
>> > >
>> > > Fixes: e7a8ebc305f2 ("NFSD: Offer write deleg for OPEN4_SHARE_ACCESS_WRITE")
>> > > X-Cc: stable@vger.kernel.org
>> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> > > ---
>> > > fs/nfsd/nfs4state.c | 14 ++++++++++----
>> > > 1 file changed, 10 insertions(+), 4 deletions(-)
>> > >
>> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> > > index 35004568d43e..11877b96dc4c 100644
>> > > --- a/fs/nfsd/nfs4state.c
>> > > +++ b/fs/nfsd/nfs4state.c
>> > > @@ -1218,8 +1218,10 @@ static void put_deleg_file(struct nfs4_file *fp)
>> > >
>> > > if (nf)
>> > > nfsd_file_put(nf);
>> > > - if (rnf)
>> > > + if (rnf) {
>> > > + nfsd_file_put(rnf);
>> > > nfs4_file_put_access(fp, NFS4_SHARE_ACCESS_READ);
>> > > + }
>> > > }
>> > >
>> > > static void nfsd4_finalize_deleg_timestamps(struct nfs4_delegation *dp, struct file *f)
>> > > @@ -6231,10 +6233,14 @@ nfsd4_add_rdaccess_to_wrdeleg(struct svc_rqst *rqstp, struct nfsd4_open *open,
>> > > fp = stp->st_stid.sc_file;
>> > > spin_lock(&fp->fi_lock);
>> > > __nfs4_file_get_access(fp, NFS4_SHARE_ACCESS_READ);
>> > > - fp = stp->st_stid.sc_file;
>> >
>> > Weird. Just noticed the double assignment here.
>> >
>> > > - fp->fi_fds[O_RDONLY] = nf;
>> > > - fp->fi_rdeleg_file = nf;
>> > > + if (!fp->fi_fds[O_RDONLY]) {
>> > > + fp->fi_fds[O_RDONLY] = nf;
>> > > + nf = NULL;
>> > > + }
>> > > + fp->fi_rdeleg_file = nfsd_file_get(fp->fi_fds[O_RDONLY]);
>> > > spin_unlock(&fp->fi_lock);
>> > > + if (nf)
>> > > + nfsd_file_put(nf);
>> > > }
>> > > return true;
>> > > }
>> >
>> > I do so wish this refcounting were easier to get right, but I don't
>> > have any great ideas around it yet.
>> >
>> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>
>> Thanks for the R-b. Chris's review-prompts do generic navigation of
>> reference counting, so we have a little more of a back-stop now. I ran
>> the review-prompts against e7a8ebc305f2 on a lark, and they indeed found
>> this problem.
>>
>
> Now that I look again, I'm wondering -- is this problem possible?
It is. The server does crash during shutdown as described above,
and this patch makes the crashes stop.
> nfsd4_add_rdaccess_to_wrdeleg() is called after we have set a write
> lease on the file. There should be no other open files at that point,
> so fi_fds[O_RDONLY] must be NULL already.
>
> Instead of or in addition to doing this, maybe we should be doing a
> WARN_ON_ONCE() if fp->fi_fds[O_RDONLY] is non-NULL here?
A WARN_ON(1) will help me nail down exactly what part of xfstests is
triggering the problem. Stay tuned.
--
Chuck Lever
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-12-02 19:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01 22:09 [PATCH v1] nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg() Chuck Lever
2025-12-02 12:08 ` Jeff Layton
2025-12-02 13:34 ` Chuck Lever
2025-12-02 14:53 ` Jeff Layton
2025-12-02 19:13 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox