* [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
@ 2025-09-30 14:05 Chuck Lever
2025-09-30 14:45 ` Jeff Layton
2025-10-01 0:53 ` NeilBrown
0 siblings, 2 replies; 9+ messages in thread
From: Chuck Lever @ 2025-09-30 14:05 UTC (permalink / raw)
To: NeilBrown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
When tracing is enabled, the trace_nfsd_read_done trace point
crashes during the pynfs read.testNoFh test.
Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4proc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index e466cf52d7d7..f9aeefc0da73 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
static void
nfsd4_read_release(union nfsd4_op_u *u)
{
- if (u->read.rd_nf)
+ if (u->read.rd_nf) {
+ trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
+ u->read.rd_offset, u->read.rd_length);
nfsd_file_put(u->read.rd_nf);
- trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
- u->read.rd_offset, u->read.rd_length);
+ }
}
static __be32
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-09-30 14:05 [PATCH v1] NFSD: Fix crash in nfsd4_read_release() Chuck Lever
@ 2025-09-30 14:45 ` Jeff Layton
2025-10-01 0:53 ` NeilBrown
1 sibling, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2025-09-30 14:45 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Chuck Lever
On Tue, 2025-09-30 at 10:05 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> When tracing is enabled, the trace_nfsd_read_done trace point
> crashes during the pynfs read.testNoFh test.
>
> Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4proc.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index e466cf52d7d7..f9aeefc0da73 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> static void
> nfsd4_read_release(union nfsd4_op_u *u)
> {
> - if (u->read.rd_nf)
> + if (u->read.rd_nf) {
> + trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> + u->read.rd_offset, u->read.rd_length);
> nfsd_file_put(u->read.rd_nf);
> - trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> - u->read.rd_offset, u->read.rd_length);
> + }
> }
>
> static __be32
Not sending a filehandle is pretty rare so I guess we won't miss the
lack of a tracepoint in that case.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-09-30 14:05 [PATCH v1] NFSD: Fix crash in nfsd4_read_release() Chuck Lever
2025-09-30 14:45 ` Jeff Layton
@ 2025-10-01 0:53 ` NeilBrown
2025-10-01 13:16 ` Chuck Lever
1 sibling, 1 reply; 9+ messages in thread
From: NeilBrown @ 2025-10-01 0:53 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
On Wed, 01 Oct 2025, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> When tracing is enabled, the trace_nfsd_read_done trace point
> crashes during the pynfs read.testNoFh test.
>
> Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4proc.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index e466cf52d7d7..f9aeefc0da73 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> static void
> nfsd4_read_release(union nfsd4_op_u *u)
> {
> - if (u->read.rd_nf)
> + if (u->read.rd_nf) {
> + trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> + u->read.rd_offset, u->read.rd_length);
> nfsd_file_put(u->read.rd_nf);
> - trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> - u->read.rd_offset, u->read.rd_length);
> + }
> }
I must say this looks a bit weird. rd_nf isn't used in the trace but if
it isn't set, you say the trace crashes...
That is because rd_fhp being NULL (because there is no current_fh) is
one thing that results in rd_nf being NULL. Seems a bit indirect.
Did you consider
if (u->read.rd_fhp)
trace .....
if (u->read.rd_nf)
nfsd_file_put....
??
Or maybe
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -444,7 +444,7 @@ DECLARE_EVENT_CLASS(nfsd_io_class,
),
TP_fast_assign(
__entry->xid = be32_to_cpu(rqstp->rq_xid);
- __entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
+ __entry->fh_hash = fhp ? 0 : knfsd_fh_hash(&fhp->fh_handle);
__entry->offset = offset;
__entry->len = len;
),
That would make all IO event trace points robust.
I'm almost tempted to suggest knfsd_fh_hash() be changed to receive a
svc_fh and test for NULL, but there are 2 places where there isn't a
svc_fh handy.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-10-01 0:53 ` NeilBrown
@ 2025-10-01 13:16 ` Chuck Lever
2025-10-03 7:08 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2025-10-01 13:16 UTC (permalink / raw)
To: NeilBrown
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
On 9/30/25 8:53 PM, NeilBrown wrote:
> On Wed, 01 Oct 2025, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> When tracing is enabled, the trace_nfsd_read_done trace point
>> crashes during the pynfs read.testNoFh test.
>>
>> Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/nfs4proc.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index e466cf52d7d7..f9aeefc0da73 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> static void
>> nfsd4_read_release(union nfsd4_op_u *u)
>> {
>> - if (u->read.rd_nf)
>> + if (u->read.rd_nf) {
>> + trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
>> + u->read.rd_offset, u->read.rd_length);
>> nfsd_file_put(u->read.rd_nf);
>> - trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
>> - u->read.rd_offset, u->read.rd_length);
>> + }
>> }
>
> I must say this looks a bit weird. rd_nf isn't used in the trace but if
> it isn't set, you say the trace crashes...
>
> That is because rd_fhp being NULL (because there is no current_fh) is
> one thing that results in rd_nf being NULL. Seems a bit indirect.
When rd_nf is NULL, no file or file handle is present. That's really all
there is to it. You can read it as "when rd_nf is NULL, no processing is
needed".
The other read trace points are already skipped in this case as well, so
there is no need to protect them.
> Did you consider
> if (u->read.rd_fhp)
> trace .....
> if (u->read.rd_nf)
> nfsd_file_put....
>
> ??
I can't think of a legitimate case where fhp is not NULL but nf is
NULL. Or vice versa.
> Or maybe
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -444,7 +444,7 @@ DECLARE_EVENT_CLASS(nfsd_io_class,
> ),
> TP_fast_assign(
> __entry->xid = be32_to_cpu(rqstp->rq_xid);
> - __entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
> + __entry->fh_hash = fhp ? 0 : knfsd_fh_hash(&fhp->fh_handle);
> __entry->offset = offset;
> __entry->len = len;
> ),
>
>
> That would make all IO event trace points robust.
> I'm almost tempted to suggest knfsd_fh_hash() be changed to receive a
> svc_fh and test for NULL, but there are 2 places where there isn't a
> svc_fh handy.
>
> NeilBrown
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-10-01 13:16 ` Chuck Lever
@ 2025-10-03 7:08 ` NeilBrown
2025-10-03 13:57 ` Chuck Lever
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2025-10-03 7:08 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
On Wed, 01 Oct 2025, Chuck Lever wrote:
> On 9/30/25 8:53 PM, NeilBrown wrote:
> > On Wed, 01 Oct 2025, Chuck Lever wrote:
> >> From: Chuck Lever <chuck.lever@oracle.com>
> >>
> >> When tracing is enabled, the trace_nfsd_read_done trace point
> >> crashes during the pynfs read.testNoFh test.
> >>
> >> Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >> fs/nfsd/nfs4proc.c | 7 ++++---
> >> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index e466cf52d7d7..f9aeefc0da73 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >> static void
> >> nfsd4_read_release(union nfsd4_op_u *u)
> >> {
> >> - if (u->read.rd_nf)
> >> + if (u->read.rd_nf) {
> >> + trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> >> + u->read.rd_offset, u->read.rd_length);
> >> nfsd_file_put(u->read.rd_nf);
> >> - trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> >> - u->read.rd_offset, u->read.rd_length);
> >> + }
> >> }
> >
> > I must say this looks a bit weird. rd_nf isn't used in the trace but if
> > it isn't set, you say the trace crashes...
> >
> > That is because rd_fhp being NULL (because there is no current_fh) is
> > one thing that results in rd_nf being NULL. Seems a bit indirect.
>
> When rd_nf is NULL, no file or file handle is present. That's really all
> there is to it. You can read it as "when rd_nf is NULL, no processing is
> needed".
>
> The other read trace points are already skipped in this case as well, so
> there is no need to protect them.
They are skipped because ALLOWED_WITHOUT_FH is not set so ->op_func
isn't called. But ->op_release *is* called. That seems inconsistent.
I wonder if we could move the ->op_release call out of
nfsd4_encode_operation() and only call it in cases were ->op_func
was called.
Something like the following? It isn't as elegant as I would have
liked, but I think it is better than hiding the ->op_release cxall in
nfsd4_encode_replay().
Thanks,
NeilBrown
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 71b428efcbb5..4ed823d1e6be 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2837,6 +2837,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
trace_nfsd_compound(rqstp, args->tag, args->taglen, args->opcnt);
while (!status && resp->opcnt < args->opcnt) {
+ bool called_func = false;
+
op = &args->ops[resp->opcnt++];
/*
@@ -2886,6 +2888,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
if (op->opdesc->op_get_currentstateid)
op->opdesc->op_get_currentstateid(cstate, &op->u);
op->status = op->opdesc->op_func(rqstp, cstate, &op->u);
+ called_func = true;
trace_nfsd_compound_op_err(rqstp, op->opnum, op->status);
/* Only from SEQUENCE */
@@ -2914,6 +2917,9 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
nfsd4_encode_operation(resp, op);
status = op->status;
}
+ if (called_func && op->opdesc->op_release)
+ op->opdesc->op_release(&op->u);
+
trace_nfsd_compound_status(args->opcnt, resp->opcnt,
status, nfsd4_op_name(op->opnum));
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index ea91bad4eee2..e7b4363aadb0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -5885,10 +5885,10 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
nfsd4_enc encoder;
if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
- goto release;
+ goto out;
op_status_offset = xdr->buf->len;
if (!xdr_reserve_space(xdr, XDR_UNIT))
- goto release;
+ goto out;
if (op->opnum == OP_ILLEGAL)
goto status;
@@ -5944,10 +5944,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
resp->cstate.minorversion);
write_bytes_to_xdr_buf(xdr->buf, op_status_offset,
&op->status, XDR_UNIT);
-release:
- if (opdesc && opdesc->op_release)
- opdesc->op_release(&op->u);
-
+out:
/*
* Account for pages consumed while encoding this operation.
* The xdr_stream primitives don't manage rq_next_page.
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-10-03 7:08 ` NeilBrown
@ 2025-10-03 13:57 ` Chuck Lever
2025-10-04 6:34 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2025-10-03 13:57 UTC (permalink / raw)
To: NeilBrown, Jeff Layton
Cc: Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever
On 10/3/25 3:08 AM, NeilBrown wrote:
> On Wed, 01 Oct 2025, Chuck Lever wrote:
>> On 9/30/25 8:53 PM, NeilBrown wrote:
>>> On Wed, 01 Oct 2025, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> When tracing is enabled, the trace_nfsd_read_done trace point
>>>> crashes during the pynfs read.testNoFh test.
>>>>
>>>> Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>> fs/nfsd/nfs4proc.c | 7 ++++---
>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index e466cf52d7d7..f9aeefc0da73 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>> static void
>>>> nfsd4_read_release(union nfsd4_op_u *u)
>>>> {
>>>> - if (u->read.rd_nf)
>>>> + if (u->read.rd_nf) {
>>>> + trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
>>>> + u->read.rd_offset, u->read.rd_length);
>>>> nfsd_file_put(u->read.rd_nf);
>>>> - trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
>>>> - u->read.rd_offset, u->read.rd_length);
>>>> + }
>>>> }
>>>
>>> I must say this looks a bit weird. rd_nf isn't used in the trace but if
>>> it isn't set, you say the trace crashes...
>>>
>>> That is because rd_fhp being NULL (because there is no current_fh) is
>>> one thing that results in rd_nf being NULL. Seems a bit indirect.
>>
>> When rd_nf is NULL, no file or file handle is present. That's really all
>> there is to it. You can read it as "when rd_nf is NULL, no processing is
>> needed".
>>
>> The other read trace points are already skipped in this case as well, so
>> there is no need to protect them.
>
> They are skipped because ALLOWED_WITHOUT_FH is not set so ->op_func
> isn't called. But ->op_release *is* called. That seems inconsistent.
> I wonder if we could move the ->op_release call out of
> nfsd4_encode_operation() and only call it in cases were ->op_func
> was called.
>
> Something like the following? It isn't as elegant as I would have
> liked, but I think it is better than hiding the ->op_release cxall in
> nfsd4_encode_replay().
>
> Thanks,
> NeilBrown
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 71b428efcbb5..4ed823d1e6be 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -2837,6 +2837,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
>
> trace_nfsd_compound(rqstp, args->tag, args->taglen, args->opcnt);
> while (!status && resp->opcnt < args->opcnt) {
> + bool called_func = false;
> +
Or call it "release_needed" ?
This change is not something I'd care to have backported into stable
kernels... So maybe keep the modification of nfsd4_read_release for
backport, and then subsequently do this change too?
I was reminded of a release-related change that Jeff did recently, maybe
commit 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns
an error")... might be relevant, might not.
> op = &args->ops[resp->opcnt++];
>
> /*
> @@ -2886,6 +2888,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
> if (op->opdesc->op_get_currentstateid)
> op->opdesc->op_get_currentstateid(cstate, &op->u);
> op->status = op->opdesc->op_func(rqstp, cstate, &op->u);
> + called_func = true;
> trace_nfsd_compound_op_err(rqstp, op->opnum, op->status);
>
> /* Only from SEQUENCE */
> @@ -2914,6 +2917,9 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
> nfsd4_encode_operation(resp, op);
> status = op->status;
> }
> + if (called_func && op->opdesc->op_release)
> + op->opdesc->op_release(&op->u);
> +
>
> trace_nfsd_compound_status(args->opcnt, resp->opcnt,
> status, nfsd4_op_name(op->opnum));
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index ea91bad4eee2..e7b4363aadb0 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5885,10 +5885,10 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> nfsd4_enc encoder;
>
> if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
> - goto release;
> + goto out;
> op_status_offset = xdr->buf->len;
> if (!xdr_reserve_space(xdr, XDR_UNIT))
> - goto release;
> + goto out;
>
> if (op->opnum == OP_ILLEGAL)
> goto status;
> @@ -5944,10 +5944,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> resp->cstate.minorversion);
> write_bytes_to_xdr_buf(xdr->buf, op_status_offset,
> &op->status, XDR_UNIT);
> -release:
> - if (opdesc && opdesc->op_release)
> - opdesc->op_release(&op->u);
> -
> +out:
> /*
> * Account for pages consumed while encoding this operation.
> * The xdr_stream primitives don't manage rq_next_page.
>
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-10-03 13:57 ` Chuck Lever
@ 2025-10-04 6:34 ` NeilBrown
2025-10-04 15:51 ` Chuck Lever
2025-10-06 16:23 ` Jeff Layton
0 siblings, 2 replies; 9+ messages in thread
From: NeilBrown @ 2025-10-04 6:34 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
On Fri, 03 Oct 2025, Chuck Lever wrote:
> On 10/3/25 3:08 AM, NeilBrown wrote:
> > On Wed, 01 Oct 2025, Chuck Lever wrote:
> >> On 9/30/25 8:53 PM, NeilBrown wrote:
> >>> On Wed, 01 Oct 2025, Chuck Lever wrote:
> >>>> From: Chuck Lever <chuck.lever@oracle.com>
> >>>>
> >>>> When tracing is enabled, the trace_nfsd_read_done trace point
> >>>> crashes during the pynfs read.testNoFh test.
> >>>>
> >>>> Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>> ---
> >>>> fs/nfsd/nfs4proc.c | 7 ++++---
> >>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>> index e466cf52d7d7..f9aeefc0da73 100644
> >>>> --- a/fs/nfsd/nfs4proc.c
> >>>> +++ b/fs/nfsd/nfs4proc.c
> >>>> @@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>> static void
> >>>> nfsd4_read_release(union nfsd4_op_u *u)
> >>>> {
> >>>> - if (u->read.rd_nf)
> >>>> + if (u->read.rd_nf) {
> >>>> + trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> >>>> + u->read.rd_offset, u->read.rd_length);
> >>>> nfsd_file_put(u->read.rd_nf);
> >>>> - trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> >>>> - u->read.rd_offset, u->read.rd_length);
> >>>> + }
> >>>> }
> >>>
> >>> I must say this looks a bit weird. rd_nf isn't used in the trace but if
> >>> it isn't set, you say the trace crashes...
> >>>
> >>> That is because rd_fhp being NULL (because there is no current_fh) is
> >>> one thing that results in rd_nf being NULL. Seems a bit indirect.
> >>
> >> When rd_nf is NULL, no file or file handle is present. That's really all
> >> there is to it. You can read it as "when rd_nf is NULL, no processing is
> >> needed".
> >>
> >> The other read trace points are already skipped in this case as well, so
> >> there is no need to protect them.
> >
> > They are skipped because ALLOWED_WITHOUT_FH is not set so ->op_func
> > isn't called. But ->op_release *is* called. That seems inconsistent.
> > I wonder if we could move the ->op_release call out of
> > nfsd4_encode_operation() and only call it in cases were ->op_func
> > was called.
> >
> > Something like the following? It isn't as elegant as I would have
> > liked, but I think it is better than hiding the ->op_release cxall in
> > nfsd4_encode_replay().
> >
> > Thanks,
> > NeilBrown
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 71b428efcbb5..4ed823d1e6be 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -2837,6 +2837,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
> >
> > trace_nfsd_compound(rqstp, args->tag, args->taglen, args->opcnt);
> > while (!status && resp->opcnt < args->opcnt) {
> > + bool called_func = false;
> > +
>
> Or call it "release_needed" ?
That might be a bit better - yes.
>
> This change is not something I'd care to have backported into stable
> kernels... So maybe keep the modification of nfsd4_read_release for
> backport, and then subsequently do this change too?
That seems like a sensible safe approach.
>
> I was reminded of a release-related change that Jeff did recently, maybe
> commit 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns
> an error")... might be relevant, might not.
Definitely relevant. I think this is the commit to mention in the
Fixes: tag. Prior to this commit nfsd4_read_release wouldn't have been
called if there was no filehandle, so the trace point was perfectly
safe.
I wonder if this fix by Jeff was the best fix for the problem.
An alternative would be to require ->op_func to release any resources
before returning an error..... except that wouldn't work for
OP_NONTRIVIAL_ERROR_ENCODE functions like OP_LOCK as the ->op_release
frees the "deny" info which is needed to encode the reply....
But wait ...nfs4_encode_operation - which currently calls ->op_release -
is also called from nfsd4_enc_sequence_replay(). The status is set to
an error so ->op_release previously would not have been called, but now
it is. Could that mean nfsd4_lock_release() can get called there even
tough nfsd4_lock wasn't called. Could that be a problem? It isn't
clear to me that is *isn't* a problem, should maybe there is a subtlety
that saves it.
I'm beginning to think that we don't want an op_release at all.
Anything allocated in ->op_func should either be freed in ->op_func when
an error is detected, or in the corresponding nfsd4_enc_ops function
after the allocated value is used for encoding.
Looking further...
->op_get_currentstateid is only called immediately before the one time
that ->op_func is called. So ->op_func could do that
get_currentstateid work.
and ->op_set_currentstateid looks like it could be folded in to
->op_func as well.
Does anyone have any thoughts about that?
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-10-04 6:34 ` NeilBrown
@ 2025-10-04 15:51 ` Chuck Lever
2025-10-06 16:23 ` Jeff Layton
1 sibling, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2025-10-04 15:51 UTC (permalink / raw)
To: NeilBrown
Cc: Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
On 10/4/25 2:34 AM, NeilBrown wrote:
> On Fri, 03 Oct 2025, Chuck Lever wrote:
>> I was reminded of a release-related change that Jeff did recently, maybe
>> commit 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns
>> an error")... might be relevant, might not.
>
> Definitely relevant. I think this is the commit to mention in the
> Fixes: tag. Prior to this commit nfsd4_read_release wouldn't have been
> called if there was no filehandle, so the trace point was perfectly
> safe.
I'll replace the Fixes: tag with that one.
> I wonder if this fix by Jeff was the best fix for the problem.
> An alternative would be to require ->op_func to release any resources
> before returning an error..... except that wouldn't work for
> OP_NONTRIVIAL_ERROR_ENCODE functions like OP_LOCK as the ->op_release
> frees the "deny" info which is needed to encode the reply....
>
> But wait ...nfs4_encode_operation - which currently calls ->op_release -
> is also called from nfsd4_enc_sequence_replay(). The status is set to
> an error so ->op_release previously would not have been called, but now
> it is. Could that mean nfsd4_lock_release() can get called there even
> tough nfsd4_lock wasn't called. Could that be a problem? It isn't
> clear to me that is *isn't* a problem, should maybe there is a subtlety
> that saves it.
One approach to getting experimental data would be to hack together a
pynfs test for this case. (If it proves useful, please contribute it).
> I'm beginning to think that we don't want an op_release at all.
> Anything allocated in ->op_func should either be freed in ->op_func when
> an error is detected, or in the corresponding nfsd4_enc_ops function
> after the allocated value is used for encoding.
>
> Looking further...
> ->op_get_currentstateid is only called immediately before the one time
> that ->op_func is called. So ->op_func could do that
> get_currentstateid work.
> and ->op_set_currentstateid looks like it could be folded in to
> ->op_func as well.
>
> Does anyone have any thoughts about that?
Careful review needed, I think. We are getting ready for a week of
bake-a-thon next week... that might delay any deep thinking on my
part for a few days.
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] NFSD: Fix crash in nfsd4_read_release()
2025-10-04 6:34 ` NeilBrown
2025-10-04 15:51 ` Chuck Lever
@ 2025-10-06 16:23 ` Jeff Layton
1 sibling, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2025-10-06 16:23 UTC (permalink / raw)
To: NeilBrown, Chuck Lever
Cc: Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever
On Sat, 2025-10-04 at 16:34 +1000, NeilBrown wrote:
> On Fri, 03 Oct 2025, Chuck Lever wrote:
> > On 10/3/25 3:08 AM, NeilBrown wrote:
> > > On Wed, 01 Oct 2025, Chuck Lever wrote:
> > > > On 9/30/25 8:53 PM, NeilBrown wrote:
> > > > > On Wed, 01 Oct 2025, Chuck Lever wrote:
> > > > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > > >
> > > > > > When tracing is enabled, the trace_nfsd_read_done trace point
> > > > > > crashes during the pynfs read.testNoFh test.
> > > > > >
> > > > > > Fixes: 87c5942e8fae ("nfsd: Add I/O trace points in the NFSv4 read proc")
> > > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > > > ---
> > > > > > fs/nfsd/nfs4proc.c | 7 ++++---
> > > > > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > index e466cf52d7d7..f9aeefc0da73 100644
> > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > @@ -988,10 +988,11 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > > static void
> > > > > > nfsd4_read_release(union nfsd4_op_u *u)
> > > > > > {
> > > > > > - if (u->read.rd_nf)
> > > > > > + if (u->read.rd_nf) {
> > > > > > + trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> > > > > > + u->read.rd_offset, u->read.rd_length);
> > > > > > nfsd_file_put(u->read.rd_nf);
> > > > > > - trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> > > > > > - u->read.rd_offset, u->read.rd_length);
> > > > > > + }
> > > > > > }
> > > > >
> > > > > I must say this looks a bit weird. rd_nf isn't used in the trace but if
> > > > > it isn't set, you say the trace crashes...
> > > > >
> > > > > That is because rd_fhp being NULL (because there is no current_fh) is
> > > > > one thing that results in rd_nf being NULL. Seems a bit indirect.
> > > >
> > > > When rd_nf is NULL, no file or file handle is present. That's really all
> > > > there is to it. You can read it as "when rd_nf is NULL, no processing is
> > > > needed".
> > > >
> > > > The other read trace points are already skipped in this case as well, so
> > > > there is no need to protect them.
> > >
> > > They are skipped because ALLOWED_WITHOUT_FH is not set so ->op_func
> > > isn't called. But ->op_release *is* called. That seems inconsistent.
> > > I wonder if we could move the ->op_release call out of
> > > nfsd4_encode_operation() and only call it in cases were ->op_func
> > > was called.
> > >
> > > Something like the following? It isn't as elegant as I would have
> > > liked, but I think it is better than hiding the ->op_release cxall in
> > > nfsd4_encode_replay().
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index 71b428efcbb5..4ed823d1e6be 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -2837,6 +2837,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
> > >
> > > trace_nfsd_compound(rqstp, args->tag, args->taglen, args->opcnt);
> > > while (!status && resp->opcnt < args->opcnt) {
> > > + bool called_func = false;
> > > +
> >
> > Or call it "release_needed" ?
>
> That might be a bit better - yes.
>
> >
> > This change is not something I'd care to have backported into stable
> > kernels... So maybe keep the modification of nfsd4_read_release for
> > backport, and then subsequently do this change too?
>
> That seems like a sensible safe approach.
>
> >
> > I was reminded of a release-related change that Jeff did recently, maybe
> > commit 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns
> > an error")... might be relevant, might not.
>
> Definitely relevant. I think this is the commit to mention in the
> Fixes: tag. Prior to this commit nfsd4_read_release wouldn't have been
> called if there was no filehandle, so the trace point was perfectly
> safe.
>
> I wonder if this fix by Jeff was the best fix for the problem.
> An alternative would be to require ->op_func to release any resources
> before returning an error..... except that wouldn't work for
> OP_NONTRIVIAL_ERROR_ENCODE functions like OP_LOCK as the ->op_release
> frees the "deny" info which is needed to encode the reply....
>
> But wait ...nfs4_encode_operation - which currently calls ->op_release -
> is also called from nfsd4_enc_sequence_replay(). The status is set to
> an error so ->op_release previously would not have been called, but now
> it is. Could that mean nfsd4_lock_release() can get called there even
> tough nfsd4_lock wasn't called. Could that be a problem? It isn't
> clear to me that is *isn't* a problem, should maybe there is a subtlety
> that saves it.
>
> I'm beginning to think that we don't want an op_release at all.
> Anything allocated in ->op_func should either be freed in ->op_func when
> an error is detected, or in the corresponding nfsd4_enc_ops function
> after the allocated value is used for encoding.
>
> Looking further...
> ->op_get_currentstateid is only called immediately before the one time
> that ->op_func is called. So ->op_func could do that
> get_currentstateid work.
> and ->op_set_currentstateid looks like it could be folded in to
> ->op_func as well.
>
> Does anyone have any thoughts about that?
I think that sounds reasonable. I don't see anything that would
preclude that anyway, at least with a cursory look. All of the
references that get released in the op_release calls are acquired
during op_func, AFAICT.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-10-06 16:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-30 14:05 [PATCH v1] NFSD: Fix crash in nfsd4_read_release() Chuck Lever
2025-09-30 14:45 ` Jeff Layton
2025-10-01 0:53 ` NeilBrown
2025-10-01 13:16 ` Chuck Lever
2025-10-03 7:08 ` NeilBrown
2025-10-03 13:57 ` Chuck Lever
2025-10-04 6:34 ` NeilBrown
2025-10-04 15:51 ` Chuck Lever
2025-10-06 16:23 ` Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox