public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [RFC PATCH 3/3] NFSD: Remove the cap on number of operations per NFSv4 COMPOUND
Date: Tue, 10 Jun 2025 13:07:19 -0400	[thread overview]
Message-ID: <f566f15d-5656-4e82-bf11-9da029d43d0e@kernel.org> (raw)
In-Reply-To: <2155635c72f3bf440d25f74fd7924694389fb378.camel@kernel.org>

On 6/10/25 1:01 PM, Jeff Layton wrote:
> On Tue, 2025-06-10 at 12:05 -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> This limit has always been a sanity check; in nearly all cases a
>> large COMPOUND is a sign of a malfunctioning client. The only real
>> limit on COMPOUND size and complexity is the size of NFSD's send
>> and receive buffers.
>>
>> However, there are a few cases where a large COMPOUND is sane. For
>> example, when a client implementation wants to walk down a long file
>> pathname in a single round trip.
>>
>> A small risk is that now a client can construct a COMPOUND request
>> that can keep a single nfsd thread busy for quite some time.
>>
> 
> You're right about the risk there. I wonder what we could do to
> mitigate that?
> 
> Maybe get a timestamp at the start of the compound and then check vs.
> that after every operation? If the compound is taking longer than a
> some timeout, give up and return an error on the next operation?

I'm open to thinking about additional guard rails.

The problem with a timeout is that any single operation can take a long
time -- if the underlying media is malfunctioning or if the remote NFS
server for a re-export is unreachable, for example.


> Also, while I did suggest it, we should consider not removing this
> limit altogether, and rather just increase it to something like a max
> practical limit:
> 
> For instance, we have limits in the channel_attrs for ca_maxrequestsize
> and ca_maxresponsesize. What's the smallest operation? If we had a
> compound comprised of just those operations, how many would fit?
> 
> That would at least act as a sanity check against compounds that are
> clearly nonsensical.

Relying on the size of the COMPOUND itself should be sufficient. If the
whole COMPOUND can't fit in ca_maxrequestsize, that's effectively the
same thing as limiting the number of ops based on the maxrequestsize
value.


>> Suggested-by: Jeff Layton <jlayton@kernel.org>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  fs/nfsd/nfs4proc.c  | 14 ++------------
>>  fs/nfsd/nfs4state.c |  1 -
>>  fs/nfsd/nfs4xdr.c   |  4 +---
>>  fs/nfsd/nfsd.h      |  3 ---
>>  fs/nfsd/xdr4.h      |  1 -
>>  5 files changed, 3 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index f13abbb13b38..f4edf222e00e 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -2842,20 +2842,10 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
>>  
>>  	rqstp->rq_lease_breaker = (void **)&cstate->clp;
>>  
>> -	trace_nfsd_compound(rqstp, args->tag, args->taglen, args->client_opcnt);
>> +	trace_nfsd_compound(rqstp, args->tag, args->taglen, args->opcnt);
>>  	while (!status && resp->opcnt < args->opcnt) {
>>  		op = &args->ops[resp->opcnt++];
>>  
>> -		if (unlikely(resp->opcnt == NFSD_MAX_OPS_PER_COMPOUND)) {
>> -			/* If there are still more operations to process,
>> -			 * stop here and report NFS4ERR_RESOURCE. */
>> -			if (cstate->minorversion == 0 &&
>> -			    args->client_opcnt > resp->opcnt) {
>> -				op->status = nfserr_resource;
>> -				goto encode_op;
>> -			}
>> -		}
>> -
>>  		/*
>>  		 * The XDR decode routines may have pre-set op->status;
>>  		 * for example, if there is a miscellaneous XDR error
>> @@ -2932,7 +2922,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
>>  			status = op->status;
>>  		}
>>  
>> -		trace_nfsd_compound_status(args->client_opcnt, resp->opcnt,
>> +		trace_nfsd_compound_status(args->opcnt, resp->opcnt,
>>  					   status, nfsd4_op_name(op->opnum));
>>  
>>  		nfsd4_cstate_clear_replay(cstate);
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index d5694987f86f..4b6ae8e54cd2 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -3872,7 +3872,6 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
>>  	ca->headerpadsz = 0;
>>  	ca->maxreq_sz = min_t(u32, ca->maxreq_sz, maxrpc);
>>  	ca->maxresp_sz = min_t(u32, ca->maxresp_sz, maxrpc);
>> -	ca->maxops = min_t(u32, ca->maxops, NFSD_MAX_OPS_PER_COMPOUND);
>>  	ca->maxresp_cached = min_t(u32, ca->maxresp_cached,
>>  			NFSD_SLOT_CACHE_SIZE + NFSD_MIN_HDR_SEQ_SZ);
>>  	ca->maxreqs = min_t(u32, ca->maxreqs, NFSD_MAX_SLOTS_PER_SESSION);
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 3afcdbed6e14..ea91bad4eee2 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -2500,10 +2500,8 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
>>  
>>  	if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0)
>>  		return false;
>> -	if (xdr_stream_decode_u32(argp->xdr, &argp->client_opcnt) < 0)
>> +	if (xdr_stream_decode_u32(argp->xdr, &argp->opcnt) < 0)
>>  		return false;
>> -	argp->opcnt = min_t(u32, argp->client_opcnt,
>> -			    NFSD_MAX_OPS_PER_COMPOUND);
>>  
>>  	if (argp->opcnt > ARRAY_SIZE(argp->iops)) {
>>  		argp->ops = vcalloc(argp->opcnt, sizeof(*argp->ops));
>> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
>> index 570065285e67..54a96042f5ac 100644
>> --- a/fs/nfsd/nfsd.h
>> +++ b/fs/nfsd/nfsd.h
>> @@ -57,9 +57,6 @@ struct readdir_cd {
>>  	__be32			err;	/* 0, nfserr, or nfserr_eof */
>>  };
>>  
>> -/* Maximum number of operations per session compound */
>> -#define NFSD_MAX_OPS_PER_COMPOUND	50
>> -
>>  struct nfsd_genl_rqstp {
>>  	struct sockaddr		rq_daddr;
>>  	struct sockaddr		rq_saddr;
>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
>> index aa2a356da784..a23bc56051ca 100644
>> --- a/fs/nfsd/xdr4.h
>> +++ b/fs/nfsd/xdr4.h
>> @@ -870,7 +870,6 @@ struct nfsd4_compoundargs {
>>  	char *				tag;
>>  	u32				taglen;
>>  	u32				minorversion;
>> -	u32				client_opcnt;
>>  	u32				opcnt;
>>  	bool				splice_ok;
>>  	struct nfsd4_op			*ops;
> 


-- 
Chuck Lever

  reply	other threads:[~2025-06-10 17:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-10 16:05 [RFC PATCH 0/3] Remove the max-ops-per-compound-limit Chuck Lever
2025-06-10 16:05 ` [RFC PATCH 1/3] NFSD: Rename a function parameter Chuck Lever
2025-06-10 16:05 ` [RFC PATCH 2/3] NFSD: Make nfsd_genl_rqstp::rq_ops array best-effort Chuck Lever
2025-06-10 16:05 ` [RFC PATCH 3/3] NFSD: Remove the cap on number of operations per NFSv4 COMPOUND Chuck Lever
2025-06-10 17:01   ` Jeff Layton
2025-06-10 17:07     ` Chuck Lever [this message]
2025-06-10 17:28 ` [RFC PATCH 0/3] Remove the max-ops-per-compound-limit Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f566f15d-5656-4e82-bf11-9da029d43d0e@kernel.org \
    --to=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox