* [PATCH 0/2] nfsd: fix handling of timed out idmap lookups
@ 2025-11-27 17:57 Anthony Iliopoulos
2025-11-27 17:57 ` [PATCH 1/2] nfsd: never defer requests during idmap lookup Anthony Iliopoulos
2025-11-27 17:57 ` [PATCH 2/2] nfsd: fix return error code for nfsd_map_name_to_[ug]id Anthony Iliopoulos
0 siblings, 2 replies; 9+ messages in thread
From: Anthony Iliopoulos @ 2025-11-27 17:57 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
This series addresses an issue occurring during v4 compound decoding, when
idmap lookup upcall responses are delayed in userspace. This causes the
related request to be marked for deferral and to be dropped, preventing
nfs4svc_encode_compoundres from being executed and thus never clearing the
session slot flag NFSD4_SLOT_INUSE. Subsequent requests will fail with
NFSERR_JUKEBOX, given that the slot will be marked as in use.
The first patch fixes this by making sure that no deferrals can happen
during decoding.
The second patch fixes the return code of delayed idmap lookups, so that
clients will retry the operation instead of aborting with an error.
Anthony Iliopoulos (2):
nfsd: never defer requests during idmap lookup
nfsd: fix return error code for nfsd_map_name_to_[ug]id
fs/nfsd/nfs4idmap.c | 4 ++++
fs/nfsd/nfs4xdr.c | 7 ++++++-
2 files changed, 10 insertions(+), 1 deletion(-)
--
2.52.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] nfsd: never defer requests during idmap lookup
2025-11-27 17:57 [PATCH 0/2] nfsd: fix handling of timed out idmap lookups Anthony Iliopoulos
@ 2025-11-27 17:57 ` Anthony Iliopoulos
2025-11-28 0:55 ` NeilBrown
2025-11-28 16:09 ` Chuck Lever
2025-11-27 17:57 ` [PATCH 2/2] nfsd: fix return error code for nfsd_map_name_to_[ug]id Anthony Iliopoulos
1 sibling, 2 replies; 9+ messages in thread
From: Anthony Iliopoulos @ 2025-11-27 17:57 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
During v4 request compound arg decoding, some ops (e.g. SETATTR) can
trigger idmap lookup upcalls. When those upcall responses get delayed
beyond the allowed time limit, cache_check() will mark the request for
deferral and cause it to be dropped.
This prevents nfs4svc_encode_compoundres from being executed, and thus the
session slot flag NFSD4_SLOT_INUSE never gets cleared. Subsequent client
requests will fail with NFSERR_JUKEBOX, given that the slot will be marked
as in-use, making the SEQUENCE op fail.
Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during
nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred.
Fixes: 2f425878b6a7 ("nfsd: don't use the deferral service, return NFS4ERR_DELAY")
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
---
fs/nfsd/nfs4xdr.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6040a6145dad..0a1a46b750ef 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -5989,6 +5989,7 @@ bool
nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
{
struct nfsd4_compoundargs *args = rqstp->rq_argp;
+ bool ret = false;
/* svcxdr_tmp_alloc */
args->to_free = NULL;
@@ -5997,7 +5998,11 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
args->ops = args->iops;
args->rqstp = rqstp;
- return nfsd4_decode_compound(args);
+ clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
+ ret = nfsd4_decode_compound(args);
+ set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
+
+ return ret;
}
bool
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] nfsd: fix return error code for nfsd_map_name_to_[ug]id
2025-11-27 17:57 [PATCH 0/2] nfsd: fix handling of timed out idmap lookups Anthony Iliopoulos
2025-11-27 17:57 ` [PATCH 1/2] nfsd: never defer requests during idmap lookup Anthony Iliopoulos
@ 2025-11-27 17:57 ` Anthony Iliopoulos
2025-11-28 0:46 ` NeilBrown
1 sibling, 1 reply; 9+ messages in thread
From: Anthony Iliopoulos @ 2025-11-27 17:57 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
idmap lookups can time out while the cache is waiting for a userspace
upcall reply. In that case cache_check() returns -ETIMEDOUT to callers.
The nfsd_map_name_to_[ug]id functions currently proceed with attempting
to map the id to a kuid despite a potentially temporary failure to
perform the idmap lookup. This results in the code returning the error
NFSERR_BADOWNER which can cause client operations to return to userspace
with failure.
Fix this by returning the failure status before attempting kuid mapping.
This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients
can retry the operation instead of aborting it.
Fixes: 65e10f6d0ab0 ("nfsd: Convert idmap to use kuids and kgids")
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
---
fs/nfsd/nfs4idmap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c
index 8cca1329f348..123ac45b512e 100644
--- a/fs/nfsd/nfs4idmap.c
+++ b/fs/nfsd/nfs4idmap.c
@@ -654,6 +654,8 @@ nfsd_map_name_to_uid(struct svc_rqst *rqstp, const char *name, size_t namelen,
return nfserr_inval;
status = do_name_to_id(rqstp, IDMAP_TYPE_USER, name, namelen, &id);
+ if (status)
+ return status;
*uid = make_kuid(nfsd_user_namespace(rqstp), id);
if (!uid_valid(*uid))
status = nfserr_badowner;
@@ -671,6 +673,8 @@ nfsd_map_name_to_gid(struct svc_rqst *rqstp, const char *name, size_t namelen,
return nfserr_inval;
status = do_name_to_id(rqstp, IDMAP_TYPE_GROUP, name, namelen, &id);
+ if (status)
+ return status;
*gid = make_kgid(nfsd_user_namespace(rqstp), id);
if (!gid_valid(*gid))
status = nfserr_badowner;
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] nfsd: fix return error code for nfsd_map_name_to_[ug]id
2025-11-27 17:57 ` [PATCH 2/2] nfsd: fix return error code for nfsd_map_name_to_[ug]id Anthony Iliopoulos
@ 2025-11-28 0:46 ` NeilBrown
0 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2025-11-28 0:46 UTC (permalink / raw)
To: Anthony Iliopoulos; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Fri, 28 Nov 2025, Anthony Iliopoulos wrote:
> idmap lookups can time out while the cache is waiting for a userspace
> upcall reply. In that case cache_check() returns -ETIMEDOUT to callers.
>
> The nfsd_map_name_to_[ug]id functions currently proceed with attempting
> to map the id to a kuid despite a potentially temporary failure to
> perform the idmap lookup. This results in the code returning the error
> NFSERR_BADOWNER which can cause client operations to return to userspace
> with failure.
>
> Fix this by returning the failure status before attempting kuid mapping.
>
> This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients
> can retry the operation instead of aborting it.
>
> Fixes: 65e10f6d0ab0 ("nfsd: Convert idmap to use kuids and kgids")
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
> fs/nfsd/nfs4idmap.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c
> index 8cca1329f348..123ac45b512e 100644
> --- a/fs/nfsd/nfs4idmap.c
> +++ b/fs/nfsd/nfs4idmap.c
> @@ -654,6 +654,8 @@ nfsd_map_name_to_uid(struct svc_rqst *rqstp, const char *name, size_t namelen,
> return nfserr_inval;
>
> status = do_name_to_id(rqstp, IDMAP_TYPE_USER, name, namelen, &id);
> + if (status)
> + return status;
> *uid = make_kuid(nfsd_user_namespace(rqstp), id);
Ignoring the state and using the id anyway is clearly wrong.
Reviewed-by: NeilBrown <neil@brown.name>
Thanks,
NeilBrown
> if (!uid_valid(*uid))
> status = nfserr_badowner;
> @@ -671,6 +673,8 @@ nfsd_map_name_to_gid(struct svc_rqst *rqstp, const char *name, size_t namelen,
> return nfserr_inval;
>
> status = do_name_to_id(rqstp, IDMAP_TYPE_GROUP, name, namelen, &id);
> + if (status)
> + return status;
> *gid = make_kgid(nfsd_user_namespace(rqstp), id);
> if (!gid_valid(*gid))
> status = nfserr_badowner;
> --
> 2.52.0
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] nfsd: never defer requests during idmap lookup
2025-11-27 17:57 ` [PATCH 1/2] nfsd: never defer requests during idmap lookup Anthony Iliopoulos
@ 2025-11-28 0:55 ` NeilBrown
2025-11-28 16:09 ` Chuck Lever
1 sibling, 0 replies; 9+ messages in thread
From: NeilBrown @ 2025-11-28 0:55 UTC (permalink / raw)
To: Anthony Iliopoulos; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Fri, 28 Nov 2025, Anthony Iliopoulos wrote:
> During v4 request compound arg decoding, some ops (e.g. SETATTR) can
> trigger idmap lookup upcalls. When those upcall responses get delayed
> beyond the allowed time limit, cache_check() will mark the request for
> deferral and cause it to be dropped.
>
> This prevents nfs4svc_encode_compoundres from being executed, and thus the
> session slot flag NFSD4_SLOT_INUSE never gets cleared. Subsequent client
> requests will fail with NFSERR_JUKEBOX, given that the slot will be marked
> as in-use, making the SEQUENCE op fail.
>
> Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during
> nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred.
>
> Fixes: 2f425878b6a7 ("nfsd: don't use the deferral service, return NFS4ERR_DELAY")
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
> fs/nfsd/nfs4xdr.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 6040a6145dad..0a1a46b750ef 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5989,6 +5989,7 @@ bool
> nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> {
> struct nfsd4_compoundargs *args = rqstp->rq_argp;
> + bool ret = false;
This initialisation is unnecessary. Just
bool ret;
please.
>
> /* svcxdr_tmp_alloc */
> args->to_free = NULL;
> @@ -5997,7 +5998,11 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> args->ops = args->iops;
> args->rqstp = rqstp;
>
> - return nfsd4_decode_compound(args);
> + clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
> + ret = nfsd4_decode_compound(args);
> + set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
> +
> + return ret;
> }
nfs4svc_encode_compoundres doesn't need this because the encoding
actually happens from nfsd4_proc_compound on an op-by-op basis.
So only decode_compoundargs and proc_compound need to clear RQ_USEDEFERRAL.
proc_compound already does, this patch handles decode_compoundargs.
Reviewed-By: NeilBrown <neil@brown.name>
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] nfsd: never defer requests during idmap lookup
2025-11-27 17:57 ` [PATCH 1/2] nfsd: never defer requests during idmap lookup Anthony Iliopoulos
2025-11-28 0:55 ` NeilBrown
@ 2025-11-28 16:09 ` Chuck Lever
2025-11-28 21:20 ` Anthony Iliopoulos
1 sibling, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2025-11-28 16:09 UTC (permalink / raw)
To: Anthony Iliopoulos, Chuck Lever, Jeff Layton; +Cc: linux-nfs
On Thu, Nov 27, 2025, at 12:57 PM, Anthony Iliopoulos wrote:
> During v4 request compound arg decoding, some ops (e.g. SETATTR) can
> trigger idmap lookup upcalls. When those upcall responses get delayed
> beyond the allowed time limit, cache_check() will mark the request for
> deferral and cause it to be dropped.
The RFCs mandate that NFSv4 servers MUST NOT drop requests. What
nfsd_dispatch() does in your case is return RPC_GARBAGE_ARGS to
the client, which is distinct behavior from "dropping" a request.
Please update the patch description (if I've understood correctly).
But agreed, RPC_GARBAGE_ARGS is still the wrong server response
for a timed-out idmap upcall.
> This prevents nfs4svc_encode_compoundres from being executed, and thus the
> session slot flag NFSD4_SLOT_INUSE never gets cleared. Subsequent client
> requests will fail with NFSERR_JUKEBOX, given that the slot will be marked
> as in-use, making the SEQUENCE op fail.
Yes, this is pure badness.
> Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during
> nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred.
Help me understand how the upcall failure during XDR decoding is
handled later? What server response is returned? Is it possible
for the proc function to execute anyway with incorrect uid and
gid values?
> Fixes: 2f425878b6a7 ("nfsd: don't use the deferral service, return
> NFS4ERR_DELAY")
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
> fs/nfsd/nfs4xdr.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 6040a6145dad..0a1a46b750ef 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5989,6 +5989,7 @@ bool
> nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> {
> struct nfsd4_compoundargs *args = rqstp->rq_argp;
> + bool ret = false;
>
> /* svcxdr_tmp_alloc */
> args->to_free = NULL;
> @@ -5997,7 +5998,11 @@ nfs4svc_decode_compoundargs(struct svc_rqst
> *rqstp, struct xdr_stream *xdr)
> args->ops = args->iops;
> args->rqstp = rqstp;
>
> - return nfsd4_decode_compound(args);
> + clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
> + ret = nfsd4_decode_compound(args);
> + set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags);
> +
> + return ret;
> }
>
> bool
> --
> 2.52.0
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] nfsd: never defer requests during idmap lookup
2025-11-28 16:09 ` Chuck Lever
@ 2025-11-28 21:20 ` Anthony Iliopoulos
2025-11-29 16:54 ` Chuck Lever
0 siblings, 1 reply; 9+ messages in thread
From: Anthony Iliopoulos @ 2025-11-28 21:20 UTC (permalink / raw)
To: Chuck Lever; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Fri, Nov 28, 2025 at 11:09:52AM -0500, Chuck Lever wrote:
>
>
> On Thu, Nov 27, 2025, at 12:57 PM, Anthony Iliopoulos wrote:
> > During v4 request compound arg decoding, some ops (e.g. SETATTR) can
> > trigger idmap lookup upcalls. When those upcall responses get delayed
> > beyond the allowed time limit, cache_check() will mark the request for
> > deferral and cause it to be dropped.
>
> The RFCs mandate that NFSv4 servers MUST NOT drop requests. What
> nfsd_dispatch() does in your case is return RPC_GARBAGE_ARGS to
> the client, which is distinct behavior from "dropping" a request.
It actually does drop the request, as pc_decode doesn't fail when this
happens.
For example in one instance of this issue which occurs while decoding a
SETATTR op that has FATTR4_WORD1_OWNER/GROUP set, nfsd4_decode_setattr
returns with status set to nfserr_badowner. This is set in op->status in
nfsd4_decode_compound, which will stop decoding further ops, but stil
returns true.
During nfsd4_decode_setattr, nfsd_map_name_to_[ug]id will end up calling
cache_check in idmap_lookup. What that does is basically:
- issue the upcall
- wait for completion with a short timeout
- attempt to defer the request if the upcall hasn't updated the cache entry
in the meantime
That happens by calling svc_defer which will set RQ_DROPME on the
rqstp->rq_flags, causing nfsd_dispatch to return through the
out_update_drop, and in turn there will be no response sent out by
svc_process.
> > Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during
> > nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred.
>
> Help me understand how the upcall failure during XDR decoding is
> handled later? What server response is returned? Is it possible
> for the proc function to execute anyway with incorrect uid and
> gid values?
Without the next patch in this series, if the request isn't deferred it
will send back the NFS4ERR_BADOWNER status, which the nfs client will
map to -EINVAL and return to userspace.
With the next patch, it will return NFS4ERR_DELAY so that the client
will keep retrying the request until the id mapping completes.
In either case, even when the request is being dropped, the proc
function is never executed. While nfsd_dispatch will proceed to call the
pc_func, nfsd4_proc_compound checks for the op->status which was set in
the decoding stage, and jumps to nfsd4_encode_operation without calling
the op_func. In the particular instance of the SETATTR op, the encoder
will encode an empty attrsset bitmap to indicate the inability to set
any attributes.
Regards,
Anthony
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] nfsd: never defer requests during idmap lookup
2025-11-28 21:20 ` Anthony Iliopoulos
@ 2025-11-29 16:54 ` Chuck Lever
2025-11-30 19:25 ` Chuck Lever
0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2025-11-29 16:54 UTC (permalink / raw)
To: Anthony Iliopoulos; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Fri, Nov 28, 2025, at 4:20 PM, Anthony Iliopoulos wrote:
> On Fri, Nov 28, 2025 at 11:09:52AM -0500, Chuck Lever wrote:
>>
>>
>> On Thu, Nov 27, 2025, at 12:57 PM, Anthony Iliopoulos wrote:
>> > During v4 request compound arg decoding, some ops (e.g. SETATTR) can
>> > trigger idmap lookup upcalls. When those upcall responses get delayed
>> > beyond the allowed time limit, cache_check() will mark the request for
>> > deferral and cause it to be dropped.
>>
>> The RFCs mandate that NFSv4 servers MUST NOT drop requests. What
>> nfsd_dispatch() does in your case is return RPC_GARBAGE_ARGS to
>> the client, which is distinct behavior from "dropping" a request.
>
> It actually does drop the request, as pc_decode doesn't fail when this
> happens.
Anthony, thanks for clarifying. So, pc_decode returns true, but
it has set op->status to something other than nfs_ok.
> For example in one instance of this issue which occurs while decoding a
> SETATTR op that has FATTR4_WORD1_OWNER/GROUP set, nfsd4_decode_setattr
> returns with status set to nfserr_badowner. This is set in op->status in
> nfsd4_decode_compound, which will stop decoding further ops, but stil
> returns true.
>
> During nfsd4_decode_setattr, nfsd_map_name_to_[ug]id will end up calling
> cache_check in idmap_lookup. What that does is basically:
>
> - issue the upcall
> - wait for completion with a short timeout
> - attempt to defer the request if the upcall hasn't updated the cache entry
> in the meantime
>
> That happens by calling svc_defer which will set RQ_DROPME on the
> rqstp->rq_flags, causing nfsd_dispatch to return through the
> out_update_drop, and in turn there will be no response sent out by
> svc_process.
>
>> > Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during
>> > nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred.
>>
>> Help me understand how the upcall failure during XDR decoding is
>> handled later? What server response is returned? Is it possible
>> for the proc function to execute anyway with incorrect uid and
>> gid values?
>
> Without the next patch in this series, if the request isn't deferred it
> will send back the NFS4ERR_BADOWNER status, which the nfs client will
> map to -EINVAL and return to userspace.
>
> With the next patch, it will return NFS4ERR_DELAY so that the client
> will keep retrying the request until the id mapping completes.
I agree that nfserr_delay is a much better server response than
nfserr_badowner when idmapping experiences a temporary failure.
Clearing USEDEFERRAL during pc_decode seems a bit expedient,
though. Probably this is a good initial fix because it can be
backported cleanly. But it's a brittle fix IMHO and leaves a
lot of technical debt.
[ Review action: I'm thinking of taking this fix because it can
be applied to LTS kernels -- I see that the idmapper calls have
been in the pc_decode path since day zero. But please add more
of this nice detail to the patch description. ]
In the longer term, one thing that might improve matters is to
move all idmapping calls out of the pc_decode path. That would
include both nfsd4_decode_fattr4 and nfsd4_decode_nfsace4.
> In either case, even when the request is being dropped, the proc
> function is never executed. While nfsd_dispatch will proceed to call the
> pc_func, nfsd4_proc_compound checks for the op->status which was set in
> the decoding stage, and jumps to nfsd4_encode_operation without calling
> the op_func.
OK, but how does that improve the situation with the in-use
session slot? Are there other cases where COMPOUND decoding
sets a non-nfs_ok status and leaves an in-use slot?
This is what worries me in the longer run.
> In the particular instance of the SETATTR op, the encoder
> will encode an empty attrsset bitmap to indicate the inability to set
> any attributes.
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] nfsd: never defer requests during idmap lookup
2025-11-29 16:54 ` Chuck Lever
@ 2025-11-30 19:25 ` Chuck Lever
0 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2025-11-30 19:25 UTC (permalink / raw)
To: Anthony Iliopoulos; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Sat, Nov 29, 2025, at 11:54 AM, Chuck Lever wrote:
> On Fri, Nov 28, 2025, at 4:20 PM, Anthony Iliopoulos wrote:
>> On Fri, Nov 28, 2025 at 11:09:52AM -0500, Chuck Lever wrote:
>>>
>>>
>>> On Thu, Nov 27, 2025, at 12:57 PM, Anthony Iliopoulos wrote:
>>> > During v4 request compound arg decoding, some ops (e.g. SETATTR) can
>>> > trigger idmap lookup upcalls. When those upcall responses get delayed
>>> > beyond the allowed time limit, cache_check() will mark the request for
>>> > deferral and cause it to be dropped.
>>>
>>> The RFCs mandate that NFSv4 servers MUST NOT drop requests. What
>>> nfsd_dispatch() does in your case is return RPC_GARBAGE_ARGS to
>>> the client, which is distinct behavior from "dropping" a request.
>>
>> It actually does drop the request, as pc_decode doesn't fail when this
>> happens.
>
> Anthony, thanks for clarifying. So, pc_decode returns true, but
> it has set op->status to something other than nfs_ok.
>
>
>> For example in one instance of this issue which occurs while decoding a
>> SETATTR op that has FATTR4_WORD1_OWNER/GROUP set, nfsd4_decode_setattr
>> returns with status set to nfserr_badowner. This is set in op->status in
>> nfsd4_decode_compound, which will stop decoding further ops, but stil
>> returns true.
>>
>> During nfsd4_decode_setattr, nfsd_map_name_to_[ug]id will end up calling
>> cache_check in idmap_lookup. What that does is basically:
>>
>> - issue the upcall
>> - wait for completion with a short timeout
>> - attempt to defer the request if the upcall hasn't updated the cache entry
>> in the meantime
>>
>> That happens by calling svc_defer which will set RQ_DROPME on the
>> rqstp->rq_flags, causing nfsd_dispatch to return through the
>> out_update_drop, and in turn there will be no response sent out by
>> svc_process.
>>
>>> > Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during
>>> > nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred.
>>>
>>> Help me understand how the upcall failure during XDR decoding is
>>> handled later? What server response is returned? Is it possible
>>> for the proc function to execute anyway with incorrect uid and
>>> gid values?
>>
>> Without the next patch in this series, if the request isn't deferred it
>> will send back the NFS4ERR_BADOWNER status, which the nfs client will
>> map to -EINVAL and return to userspace.
>>
>> With the next patch, it will return NFS4ERR_DELAY so that the client
>> will keep retrying the request until the id mapping completes.
>
> I agree that nfserr_delay is a much better server response than
> nfserr_badowner when idmapping experiences a temporary failure.
>
> Clearing USEDEFERRAL during pc_decode seems a bit expedient,
> though. Probably this is a good initial fix because it can be
> backported cleanly. But it's a brittle fix IMHO and leaves a
> lot of technical debt.
>
> [ Review action: I'm thinking of taking this fix because it can
> be applied to LTS kernels -- I see that the idmapper calls have
> been in the pc_decode path since day zero. But please add more
> of this nice detail to the patch description. ]
>
> In the longer term, one thing that might improve matters is to
> move all idmapping calls out of the pc_decode path. That would
> include both nfsd4_decode_fattr4 and nfsd4_decode_nfsace4.
After more analysis, I can see that the idmapping calls in the
NFSv4 operation decoders are the only risks here. Clearing
RQ_USEDEFERRAL seems like a good fix, and I will look into
moving the idmapping calls themselves sometime down the road.
I would ask for a few changes to your patches:
1. Clear RQ_USEDEFERRAL in nfs4svc_decode_compoundargs(), and
precede the clear_bit() with a loud comment that explains
why this is done. Something like:
* NFSv4 operation decoders can invoke svc cache lookups
* that trigger svc_defer() when RQ_USEDEFERRAL is set,
* setting RQ_DROPME. This creates two problems:
*
* 1. Non-idempotency: Compounds make it too hard to avoid
* problems if a request is deferred and replayed.
*
* 2. Session slot leakage (NFSv4.1+): If RQ_DROPME is set
* during decode but SEQUENCE executes successfully, the
* session slot will be marked INUSE. The request is then
* dropped before encoding, so the slot is never released,
* rendering it permanently unusable by the client.
2. Remove subsequent set_bit(RQ_USEDEFERRAL) calls in NFSv4. There
doesn't seem to be any reason to re-enable that bit again during
NFSv4 COMPOUND processing. (Feel free to do your own analysis
here and check me if I'm wrong).
3. Add full kdoc comments in front of nfsd_map_name_to_[ug]id()
that note the deferral risk. Something like:
* This function may be called during NFSv4 COMPOUND decoding while
* processing SETATTR, CREATE, or OPEN operations that set owner or
* ACL attributes. The idmap lookup below triggers an upcall that
* invokes cache_check().
*
* The caller must ensure RQ_USEDEFERRAL is cleared before calling to
* prevent cache_check() from setting RQ_DROPME via svc_defer(). If
* RQ_DROPME is set during decode, NFSv4.1 session slots can be
* orphaned. See nfs4svc_decode_compoundargs().
Now, we might also put a test_bit() in here with a
WARN_ON_ONCE, but that is probably overkill.
--
Chuck Lever
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-11-30 19:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 17:57 [PATCH 0/2] nfsd: fix handling of timed out idmap lookups Anthony Iliopoulos
2025-11-27 17:57 ` [PATCH 1/2] nfsd: never defer requests during idmap lookup Anthony Iliopoulos
2025-11-28 0:55 ` NeilBrown
2025-11-28 16:09 ` Chuck Lever
2025-11-28 21:20 ` Anthony Iliopoulos
2025-11-29 16:54 ` Chuck Lever
2025-11-30 19:25 ` Chuck Lever
2025-11-27 17:57 ` [PATCH 2/2] nfsd: fix return error code for nfsd_map_name_to_[ug]id Anthony Iliopoulos
2025-11-28 0:46 ` NeilBrown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.