[PATCH 0/6] prepare for dynamic server thread management

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] prepare for dynamic server thread management
@ 2024-10-23  2:37 NeilBrown
  2024-10-23  2:37 ` [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: NeilBrown @ 2024-10-23  2:37 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

These patches prepare the way for demand-based adjustment of the number
of server threads.  They primarily remove some places there the
configured thread count is used to configure other things.

With these in place only two more patches are needed to have demand
based thread count.  The details of how to configure this need to be
discussed to ensure we have considered all perspectives, and I would
rather than happen in the context of two patches, not in the context of
8.  So I'm sending these first in the hope that will land with minimal
fuss.  Once they do land I'll send the remainder (which you have already
seen) and will look forward to a fruitful discussion.

Thanks,
NeilBrown

 [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads.
 [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there
 [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
 [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting
 [PATCH 5/6] sunrpc: remove all connection limit configuration
 [PATCH 6/6] sunrpc: introduce possibility that requested number of

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads.
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
@ 2024-10-23  2:37 ` NeilBrown
  2024-10-23  2:37 ` [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2024-10-23  2:37 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

sp_nrthreads and sv_nrthreads are the number of threads that have been
explicitly requested.  Future patches will allow extra threads to be
created as needed.

So move the updating of these fields to code which is for updating
configuration rather that code that is for starting/stopping threads.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/svc.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 79879b7d39cb..bd4f02b34f44 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -713,9 +713,6 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 
 	rqstp->rq_err = -EAGAIN; /* No error yet */
 
-	serv->sv_nrthreads += 1;
-	pool->sp_nrthreads += 1;
-
 	/* Protected by whatever lock the service uses when calling
 	 * svc_set_num_threads()
 	 */
@@ -815,6 +812,8 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 			svc_exit_thread(rqstp);
 			return PTR_ERR(task);
 		}
+		serv->sv_nrthreads += 1;
+		chosen_pool->sp_nrthreads += 1;
 
 		rqstp->rq_task = task;
 		if (serv->sv_nrpools > 1)
@@ -844,6 +843,8 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 		victim = svc_pool_victim(serv, pool, &state);
 		if (!victim)
 			break;
+		victim->sp_nrthreads -= 1;
+		serv->sv_nrthreads -= 1;
 		svc_pool_wake_idle_thread(victim);
 		wait_on_bit(&victim->sp_flags, SP_VICTIM_REMAINS,
 			    TASK_IDLE);
@@ -959,8 +960,6 @@ svc_exit_thread(struct svc_rqst *rqstp)
 
 	list_del_rcu(&rqstp->rq_all);
 
-	pool->sp_nrthreads -= 1;
-	serv->sv_nrthreads -= 1;
 	svc_sock_update_bufs(serv);
 
 	svc_rqst_free(rqstp);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients.
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
  2024-10-23  2:37 ` [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
@ 2024-10-23  2:37 ` NeilBrown
  2024-10-23 13:42   ` Chuck Lever
  2024-10-23  2:37 ` [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits NeilBrown
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2024-10-23  2:37 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

If there are more non-courteous clients than the calculated limit, we
should fail the request rather than report a soft failure and
encouraging the client to retry indefinitely.

The only hard failure allowed for EXCHANGE_ID that doesn't clearly have
some other meaning is NFS4ERR_SERVERFAULT.  So use that, but explain why
in a comment at each place that it is returned.

If there are courteous clients which push us over the limit, then expedite
their removal.

This is not known to have caused a problem is production use, but
testing of lots of clients reports repeated NFS4ERR_DELAY responses
which doesn't seem helpful.

Also remove an outdated comment - we do use a slab cache.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 56b261608af4..ca6b5b52f77d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2212,21 +2212,20 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
 	return 1;
 }
 
-/* 
- * XXX Should we use a slab cache ?
- * This type of memory management is somewhat inefficient, but we use it
- * anyway since SETCLIENTID is not a common operation.
- */
 static struct nfs4_client *alloc_client(struct xdr_netobj name,
 				struct nfsd_net *nn)
 {
 	struct nfs4_client *clp;
 	int i;
 
-	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
+	if (atomic_read(&nn->nfs4_client_count) -
+	    atomic_read(&nn->nfsd_courtesy_clients) >= nn->nfs4_max_clients)
+		return ERR_PTR(-EUSERS);
+
+	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
+	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
 		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
-		return NULL;
-	}
+
 	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
 	if (clp == NULL)
 		return NULL;
@@ -3121,8 +3120,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
 	struct dentry *dentries[ARRAY_SIZE(client_files)];
 
 	clp = alloc_client(name, nn);
-	if (clp == NULL)
-		return NULL;
+	if (IS_ERR_OR_NULL(clp))
+		return clp;
 
 	ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred);
 	if (ret) {
@@ -3504,6 +3503,11 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	new = create_client(exid->clname, rqstp, &verf);
 	if (new == NULL)
 		return nfserr_jukebox;
+	if (IS_ERR(new))
+		/* Protocol has no specific error for "client limit reached".
+		 * NFS4ERR_RESOURCE is not permitted for EXCHANGE_ID
+		 */
+		return nfserr_serverfault;
 	status = copy_impl_id(new, exid);
 	if (status)
 		goto out_nolock;
@@ -4422,6 +4426,12 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	new = create_client(clname, rqstp, &clverifier);
 	if (new == NULL)
 		return nfserr_jukebox;
+	if (IS_ERR(new))
+		/* Protocol has no specific error for "client limit reached".
+		 * NFS4ERR_RESOURCE, while allowed for SETCLIENTID, implies
+		 * that a smaller COMPOUND might be successful.
+		 */
+		return nfserr_serverfault;
 	spin_lock(&nn->client_lock);
 	conf = find_confirmed_client_by_name(&clname, nn);
 	if (conf && client_has_state(conf)) {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
  2024-10-23  2:37 ` [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
  2024-10-23  2:37 ` [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
@ 2024-10-23  2:37 ` NeilBrown
  2024-10-23 11:48   ` Jeff Layton
  2024-10-23 13:55   ` Chuck Lever
  2024-10-23  2:37 ` [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: NeilBrown @ 2024-10-23  2:37 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

Currently per-client DRC slot limits (for v4.1+) are calculated when the
client connects, and they are left unchanged.  So earlier clients can
get a larger share when memory is tight.

The heuristic for choosing a number includes the number of configured
server threads.  This is problematic for 2 reasons.
1/ sv_nrthreads is different in different net namespaces, but the
   memory allocation is global across all namespaces.  So different
   namespaces get treated differently without good reason.
2/ a future patch will auto-configure number of threads based on
   load so that there will be no need to preconfigure a number.  This will
   make the current heuristic even more arbitrary.

NFSv4.1 allows the number of slots to be varied dynamically - in the
reply to each SEQUENCE op.  With this patch we provide a provisional
upper limit in the EXCHANGE_ID reply which might end up being too big,
and adjust it with each SEQUENCE reply.

The goal for when memory is tight is to allow each client to have a
similar number of slots.  So clients that ask for larger slots get more
memory.   This may not be ideal.  It could be changed later.

So we track the sum of the slot sizes of all active clients, and share
memory out based on the ratio of the slot size for a given client with
the sum of slot sizes.  We never allow more in a SEQUENCE reply than we
did in the EXCHANGE_ID reply.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++---------------------
 fs/nfsd/nfs4xdr.c   |  2 +-
 fs/nfsd/nfsd.h      |  6 +++-
 fs/nfsd/nfssvc.c    |  7 ++--
 fs/nfsd/state.h     |  2 +-
 fs/nfsd/xdr4.h      |  2 --
 6 files changed, 56 insertions(+), 44 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ca6b5b52f77d..834e9aa12b82 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1909,44 +1909,26 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
 }
 
 /*
- * XXX: If we run out of reserved DRC memory we could (up to a point)
- * re-negotiate active sessions and reduce their slot usage to make
- * room for new connections. For now we just fail the create session.
+ * When a client connects it gets a max_requests number that would allow
+ * it to use 1/8 of the memory we think can reasonably be used for the DRC.
+ * Later in response to SEQUENCE operations we further limit that when there
+ * are more than 8 concurrent clients.
  */
-static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
+static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
 {
 	u32 slotsize = slot_bytes(ca);
 	u32 num = ca->maxreqs;
-	unsigned long avail, total_avail;
-	unsigned int scale_factor;
+	unsigned long avail;
 
 	spin_lock(&nfsd_drc_lock);
-	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
-		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
-	else
-		/* We have handed out more space than we chose in
-		 * set_max_drc() to allow.  That isn't really a
-		 * problem as long as that doesn't make us think we
-		 * have lots more due to integer overflow.
-		 */
-		total_avail = 0;
-	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
-	/*
-	 * Never use more than a fraction of the remaining memory,
-	 * unless it's the only way to give this client a slot.
-	 * The chosen fraction is either 1/8 or 1/number of threads,
-	 * whichever is smaller.  This ensures there are adequate
-	 * slots to support multiple clients per thread.
-	 * Give the client one slot even if that would require
-	 * over-allocation--it is better than failure.
-	 */
-	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
 
-	avail = clamp_t(unsigned long, avail, slotsize,
-			total_avail/scale_factor);
-	num = min_t(int, num, avail / slotsize);
-	num = max_t(int, num, 1);
-	nfsd_drc_mem_used += num * slotsize;
+	avail = min(NFSD_MAX_MEM_PER_SESSION,
+		    nfsd_drc_max_mem / 8);
+
+	num = clamp_t(int, num, 1, avail / slotsize);
+
+	nfsd_drc_slotsize_sum += slotsize;
+
 	spin_unlock(&nfsd_drc_lock);
 
 	return num;
@@ -1957,10 +1939,33 @@ static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
 	int slotsize = slot_bytes(ca);
 
 	spin_lock(&nfsd_drc_lock);
-	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
+	nfsd_drc_slotsize_sum -= slotsize;
 	spin_unlock(&nfsd_drc_lock);
 }
 
+/*
+ * Report the number of slots that we would like the client to limit
+ * itself to.  When the number of clients is large, we start sharing
+ * memory so all clients get the same number of slots.
+ */
+static unsigned int nfsd4_get_drc_slots(struct nfsd4_session *session)
+{
+	u32 slotsize = slot_bytes(&session->se_fchannel);
+	unsigned long avail;
+	unsigned long slotsize_sum = READ_ONCE(nfsd_drc_slotsize_sum);
+
+	if (slotsize_sum < slotsize)
+		slotsize_sum = slotsize;
+
+	/* Find our share of avail mem across all active clients,
+	 * then limit to 1/8 of total, and configured max
+	 */
+	avail = min3(nfsd_drc_max_mem * slotsize / slotsize_sum,
+		     nfsd_drc_max_mem / 8,
+		     NFSD_MAX_MEM_PER_SESSION);
+	return max3(1UL, avail / slotsize, session->se_fchannel.maxreqs);
+}
+
 static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
 					   struct nfsd4_channel_attrs *battrs)
 {
@@ -3735,7 +3740,7 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
 	 * Note that we always allow at least one slot, because our
 	 * accounting is soft and provides no guarantees either way.
 	 */
-	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
+	ca->maxreqs = nfsd4_get_drc_mem(ca);
 
 	return nfs_ok;
 }
@@ -4229,10 +4234,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	slot = session->se_slots[seq->slotid];
 	dprintk("%s: slotid %d\n", __func__, seq->slotid);
 
-	/* We do not negotiate the number of slots yet, so set the
-	 * maxslots to the session maxreqs which is used to encode
-	 * sr_highest_slotid and the sr_target_slot id to maxslots */
-	seq->maxslots = session->se_fchannel.maxreqs;
+	/* Negotiate number of slots: set the target, and use the
+	 * same for max as long as it doesn't decrease the max
+	 * (that isn't allowed).
+	 */
+	seq->target_maxslots = nfsd4_get_drc_slots(session);
+	seq->maxslots = max(seq->maxslots, seq->target_maxslots);
 
 	trace_nfsd_slot_seqid_sequence(clp, seq, slot);
 	status = check_slot_seqid(seq->seqid, slot->sl_seqid,
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f118921250c3..e4e706872e54 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4955,7 +4955,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
 	if (nfserr != nfs_ok)
 		return nfserr;
 	/* sr_target_highest_slotid */
-	nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
+	nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
 	if (nfserr != nfs_ok)
 		return nfserr;
 	/* sr_status_flags */
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 4b56ba1e8e48..33c9db3ee8b6 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -90,7 +90,11 @@ extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
 extern struct mutex		nfsd_mutex;
 extern spinlock_t		nfsd_drc_lock;
 extern unsigned long		nfsd_drc_max_mem;
-extern unsigned long		nfsd_drc_mem_used;
+/* Storing the sum of the slot sizes for all active clients (across
+ * all net-namespaces) allows a share of total available memory to
+ * be allocaed to each client based on its slot size.
+ */
+extern unsigned long		nfsd_drc_slotsize_sum;
 extern atomic_t			nfsd_th_cnt;		/* number of available threads */
 
 extern const struct seq_operations nfs_exports_op;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 49e2f32102ab..e596eb5d10db 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -78,7 +78,7 @@ DEFINE_MUTEX(nfsd_mutex);
  */
 DEFINE_SPINLOCK(nfsd_drc_lock);
 unsigned long	nfsd_drc_max_mem;
-unsigned long	nfsd_drc_mem_used;
+unsigned long	nfsd_drc_slotsize_sum;
 
 #if IS_ENABLED(CONFIG_NFS_LOCALIO)
 static const struct svc_version *localio_versions[] = {
@@ -589,10 +589,13 @@ void nfsd_reset_versions(struct nfsd_net *nn)
  */
 static void set_max_drc(void)
 {
+	if (nfsd_drc_max_mem)
+		return;
+
 	#define NFSD_DRC_SIZE_SHIFT	7
 	nfsd_drc_max_mem = (nr_free_buffer_pages()
 					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
-	nfsd_drc_mem_used = 0;
+	nfsd_drc_slotsize_sum = 0;
 	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 79c743c01a47..fe71ae3c577b 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -214,7 +214,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
 /* Maximum number of slots per session. 160 is useful for long haul TCP */
 #define NFSD_MAX_SLOTS_PER_SESSION     160
 /* Maximum  session per slot cache size */
-#define NFSD_SLOT_CACHE_SIZE		2048
+#define NFSD_SLOT_CACHE_SIZE		2048UL
 /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
 #define NFSD_CACHE_SIZE_SLOTS_PER_SESSION	32
 #define NFSD_MAX_MEM_PER_SESSION  \
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 2a21a7662e03..71b87190a4a6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -575,9 +575,7 @@ struct nfsd4_sequence {
 	u32			slotid;			/* request/response */
 	u32			maxslots;		/* request/response */
 	u32			cachethis;		/* request */
-#if 0
 	u32			target_maxslots;	/* response */
-#endif /* not yet */
 	u32			status_flags;		/* response */
 };
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations.
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
                   ` (2 preceding siblings ...)
  2024-10-23  2:37 ` [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits NeilBrown
@ 2024-10-23  2:37 ` NeilBrown
  2024-10-23 12:08   ` Jeff Layton
  2024-10-23  2:37 ` [PATCH 5/6] sunrpc: remove all connection limit configuration NeilBrown
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2024-10-23  2:37 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

The heuristic for limiting the number of incoming connections to nfsd
currently uses sv_nrthreads - allowing more connections if more threads
were configured.

A future patch will allow number of threads to grow dynamically so that
there will be no need to configure sv_nrthreads.  So we need a different
solution for limiting connections.

It isn't clear what problem is solved by limiting connections (as
mentioned in a code comment) but the most likely problem is a connection
storm - many connections that are not doing productive work.  These will
be closed after about 6 minutes already but it might help to slow down a
storm.

This patch adds a per-connection flag XPT_PEER_VALID which indicates
that the peer has presented a filehandle for which it has some sort of
access.  i.e the peer is known to be trusted in some way.  We now only
count connections which have NOT been determined to be valid.  There
should be relative few of these at any given time.

If the number of non-validated peer exceed a limit - currently 64 - we
close the oldest non-validated peer to avoid having too many of these
useless connections.

Note that this patch significantly changes the meaning of the various
configuration parameters for "max connections".  The next patch will
remove all of these.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/callback.c               |  4 ----
 fs/nfs/callback_xdr.c           |  1 +
 fs/nfsd/netns.h                 |  4 ++--
 fs/nfsd/nfsfh.c                 |  2 ++
 include/linux/sunrpc/svc.h      |  2 +-
 include/linux/sunrpc/svc_xprt.h | 15 +++++++++++++++
 net/sunrpc/svc_xprt.c           | 33 +++++++++++++++++----------------
 7 files changed, 38 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 6cf92498a5ac..86bdc7d23fb9 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -211,10 +211,6 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
 		return ERR_PTR(-ENOMEM);
 	}
 	cb_info->serv = serv;
-	/* As there is only one thread we need to over-ride the
-	 * default maximum of 80 connections
-	 */
-	serv->sv_maxconn = 1024;
 	dprintk("nfs_callback_create_svc: service created\n");
 	return serv;
 }
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index fdeb0b34a3d3..4254ba3ee7c5 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -984,6 +984,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp)
 			nfs_put_client(cps.clp);
 			goto out_invalidcred;
 		}
+		svc_xprt_set_valid(rqstp->rq_xprt);
 	}
 
 	cps.minorversion = hdr_arg.minorversion;
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 26f7b34d1a03..a05a45bb1978 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -129,8 +129,8 @@ struct nfsd_net {
 	unsigned char writeverf[8];
 
 	/*
-	 * Max number of connections this nfsd container will allow. Defaults
-	 * to '0' which is means that it bases this on the number of threads.
+	 * Max number of non-validated connections this nfsd container
+	 * will allow.  Defaults to '0' gets mapped to 64.
 	 */
 	unsigned int max_connections;
 
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 40ad58a6a036..2f44de99f709 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -383,6 +383,8 @@ __fh_verify(struct svc_rqst *rqstp,
 		goto out;
 
 skip_pseudoflavor_check:
+	svc_xprt_set_valid(rqstp->rq_xprt);
+
 	/* Finally, check access permissions. */
 	error = nfsd_permission(cred, exp, dentry, access);
 out:
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index e68fecf6eab5..617ebfff2f30 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -81,7 +81,7 @@ struct svc_serv {
 	unsigned int		sv_xdrsize;	/* XDR buffer size */
 	struct list_head	sv_permsocks;	/* all permanent sockets */
 	struct list_head	sv_tempsocks;	/* all temporary sockets */
-	int			sv_tmpcnt;	/* count of temporary sockets */
+	int			sv_tmpcnt;	/* count of temporary "valid" sockets */
 	struct timer_list	sv_temptimer;	/* timer for aging temporary sockets */
 
 	char *			sv_name;	/* service name */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 0981e35a9fed..35929a7727c7 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -99,8 +99,23 @@ enum {
 	XPT_HANDSHAKE,		/* xprt requests a handshake */
 	XPT_TLS_SESSION,	/* transport-layer security established */
 	XPT_PEER_AUTH,		/* peer has been authenticated */
+	XPT_PEER_VALID,		/* peer has presented a filehandle that
+				 * it has access to.  It is NOT counted
+				 * in ->sv_tmpcnt.
+				 */
 };
 
+static inline void svc_xprt_set_valid(struct svc_xprt *xpt)
+{
+	if (test_bit(XPT_TEMP, &xpt->xpt_flags) &&
+	    !test_and_set_bit(XPT_PEER_VALID, &xpt->xpt_flags)) {
+		struct svc_serv *serv = xpt->xpt_server;
+		spin_lock(&serv->sv_lock);
+		serv->sv_tmpcnt -= 1;
+		spin_unlock(&serv->sv_lock);
+	}
+}
+
 static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u)
 {
 	spin_lock(&xpt->xpt_lock);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 43c57124de52..ff5b8bb8a88f 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -606,7 +606,8 @@ int svc_port_is_privileged(struct sockaddr *sin)
 }
 
 /*
- * Make sure that we don't have too many active connections. If we have,
+ * Make sure that we don't have too many connections that have not yet
+ * demonstrated that they have access the the NFS server. If we have,
  * something must be dropped. It's not clear what will happen if we allow
  * "too many" connections, but when dealing with network-facing software,
  * we have to code defensively. Here we do that by imposing hard limits.
@@ -625,27 +626,26 @@ int svc_port_is_privileged(struct sockaddr *sin)
  */
 static void svc_check_conn_limits(struct svc_serv *serv)
 {
-	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn :
-				(serv->sv_nrthreads+3) * 20;
+	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn : 64;
 
 	if (serv->sv_tmpcnt > limit) {
-		struct svc_xprt *xprt = NULL;
+		struct svc_xprt *xprt = NULL, *xprti;
 		spin_lock_bh(&serv->sv_lock);
 		if (!list_empty(&serv->sv_tempsocks)) {
-			/* Try to help the admin */
-			net_notice_ratelimited("%s: too many open connections, consider increasing the %s\n",
-					       serv->sv_name, serv->sv_maxconn ?
-					       "max number of connections" :
-					       "number of threads");
 			/*
 			 * Always select the oldest connection. It's not fair,
-			 * but so is life
+			 * but nor is life.
 			 */
-			xprt = list_entry(serv->sv_tempsocks.prev,
-					  struct svc_xprt,
-					  xpt_list);
-			set_bit(XPT_CLOSE, &xprt->xpt_flags);
-			svc_xprt_get(xprt);
+			list_for_each_entry_reverse(xprti, &serv->sv_tempsocks,
+						    xpt_list)
+			{
+				if (!test_bit(XPT_PEER_VALID, &xprti->xpt_flags)) {
+					xprt = xprti;
+					set_bit(XPT_CLOSE, &xprt->xpt_flags);
+					svc_xprt_get(xprt);
+					break;
+				}
+			}
 		}
 		spin_unlock_bh(&serv->sv_lock);
 
@@ -1039,7 +1039,8 @@ static void svc_delete_xprt(struct svc_xprt *xprt)
 
 	spin_lock_bh(&serv->sv_lock);
 	list_del_init(&xprt->xpt_list);
-	if (test_bit(XPT_TEMP, &xprt->xpt_flags))
+	if (test_bit(XPT_TEMP, &xprt->xpt_flags) &&
+	    !test_bit(XPT_PEER_VALID, &xprt->xpt_flags))
 		serv->sv_tmpcnt--;
 	spin_unlock_bh(&serv->sv_lock);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] sunrpc: remove all connection limit configuration
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
                   ` (3 preceding siblings ...)
  2024-10-23  2:37 ` [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
@ 2024-10-23  2:37 ` NeilBrown
  2024-10-23 12:50   ` Jeff Layton
  2024-10-23  2:37 ` [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2024-10-23  2:37 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

Now that the connection limit only apply to unconfirmed connections,
there is no need to configure it.  So remove all the configuration and
fix the number of unconfirmed connections as always 64 - which is
now given a name: XPT_MAX_TMP_CONN

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/lockd/svc.c                  |  8 -------
 fs/nfsd/netns.h                 |  6 -----
 fs/nfsd/nfsctl.c                | 42 ---------------------------------
 fs/nfsd/nfssvc.c                |  5 ----
 include/linux/sunrpc/svc.h      |  4 ----
 include/linux/sunrpc/svc_xprt.h |  6 +++++
 net/sunrpc/svc_xprt.c           |  8 +------
 7 files changed, 7 insertions(+), 72 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 4ec22c2f2ea3..7ded57ec3a60 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -70,9 +70,6 @@ static unsigned long		nlm_grace_period;
 unsigned long			nlm_timeout = LOCKD_DFLT_TIMEO;
 static int			nlm_udpport, nlm_tcpport;
 
-/* RLIM_NOFILE defaults to 1024. That seems like a reasonable default here. */
-static unsigned int		nlm_max_connections = 1024;
-
 /*
  * Constants needed for the sysctl interface.
  */
@@ -136,9 +133,6 @@ lockd(void *vrqstp)
 	 * NFS mount or NFS daemon has gone away.
 	 */
 	while (!svc_thread_should_stop(rqstp)) {
-		/* update sv_maxconn if it has changed */
-		rqstp->rq_server->sv_maxconn = nlm_max_connections;
-
 		nlmsvc_retry_blocked(rqstp);
 		svc_recv(rqstp);
 	}
@@ -340,7 +334,6 @@ static int lockd_get(void)
 		return -ENOMEM;
 	}
 
-	serv->sv_maxconn = nlm_max_connections;
 	error = svc_set_num_threads(serv, NULL, 1);
 	if (error < 0) {
 		svc_destroy(&serv);
@@ -542,7 +535,6 @@ module_param_call(nlm_udpport, param_set_port, param_get_int,
 module_param_call(nlm_tcpport, param_set_port, param_get_int,
 		  &nlm_tcpport, 0644);
 module_param(nsm_use_hostnames, bool, 0644);
-module_param(nlm_max_connections, uint, 0644);
 
 static int lockd_init_net(struct net *net)
 {
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index a05a45bb1978..4a07b8d0837b 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -128,12 +128,6 @@ struct nfsd_net {
 	seqlock_t writeverf_lock;
 	unsigned char writeverf[8];
 
-	/*
-	 * Max number of non-validated connections this nfsd container
-	 * will allow.  Defaults to '0' gets mapped to 64.
-	 */
-	unsigned int max_connections;
-
 	u32 clientid_base;
 	u32 clientid_counter;
 	u32 clverifier_counter;
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 3adbc05ebaac..95ea4393305b 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -48,7 +48,6 @@ enum {
 	NFSD_Versions,
 	NFSD_Ports,
 	NFSD_MaxBlkSize,
-	NFSD_MaxConnections,
 	NFSD_Filecache,
 	NFSD_Leasetime,
 	NFSD_Gracetime,
@@ -68,7 +67,6 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
 static ssize_t write_versions(struct file *file, char *buf, size_t size);
 static ssize_t write_ports(struct file *file, char *buf, size_t size);
 static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
-static ssize_t write_maxconn(struct file *file, char *buf, size_t size);
 #ifdef CONFIG_NFSD_V4
 static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
 static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
@@ -87,7 +85,6 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
 	[NFSD_Versions] = write_versions,
 	[NFSD_Ports] = write_ports,
 	[NFSD_MaxBlkSize] = write_maxblksize,
-	[NFSD_MaxConnections] = write_maxconn,
 #ifdef CONFIG_NFSD_V4
 	[NFSD_Leasetime] = write_leasetime,
 	[NFSD_Gracetime] = write_gracetime,
@@ -902,44 +899,6 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
 							nfsd_max_blksize);
 }
 
-/*
- * write_maxconn - Set or report the current max number of connections
- *
- * Input:
- *			buf:		ignored
- *			size:		zero
- * OR
- *
- * Input:
- *			buf:		C string containing an unsigned
- *					integer value representing the new
- *					number of max connections
- *			size:		non-zero length of C string in @buf
- * Output:
- *	On success:	passed-in buffer filled with '\n'-terminated C string
- *			containing numeric value of max_connections setting
- *			for this net namespace;
- *			return code is the size in bytes of the string
- *	On error:	return code is zero or a negative errno value
- */
-static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
-{
-	char *mesg = buf;
-	struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
-	unsigned int maxconn = nn->max_connections;
-
-	if (size > 0) {
-		int rv = get_uint(&mesg, &maxconn);
-
-		if (rv)
-			return rv;
-		trace_nfsd_ctl_maxconn(netns(file), maxconn);
-		nn->max_connections = maxconn;
-	}
-
-	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxconn);
-}
-
 #ifdef CONFIG_NFSD_V4
 static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
 				  time64_t *time, struct nfsd_net *nn)
@@ -1372,7 +1331,6 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
 		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
-		[NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO},
 #ifdef CONFIG_NFSD_V4
 		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index e596eb5d10db..1a172a7e9e0c 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -671,7 +671,6 @@ int nfsd_create_serv(struct net *net)
 	if (serv == NULL)
 		return -ENOMEM;
 
-	serv->sv_maxconn = nn->max_connections;
 	error = svc_bind(serv, net);
 	if (error < 0) {
 		svc_destroy(&serv);
@@ -957,11 +956,7 @@ nfsd(void *vrqstp)
 	 * The main request loop
 	 */
 	while (!svc_thread_should_stop(rqstp)) {
-		/* Update sv_maxconn if it has changed */
-		rqstp->rq_server->sv_maxconn = nn->max_connections;
-
 		svc_recv(rqstp);
-
 		nfsd_file_net_dispose(nn);
 	}
 
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 617ebfff2f30..9d288a673705 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -72,10 +72,6 @@ struct svc_serv {
 	spinlock_t		sv_lock;
 	unsigned int		sv_nprogs;	/* Number of sv_programs */
 	unsigned int		sv_nrthreads;	/* # of server threads */
-	unsigned int		sv_maxconn;	/* max connections allowed or
-						 * '0' causing max to be based
-						 * on number of threads. */
-
 	unsigned int		sv_max_payload;	/* datagram payload size */
 	unsigned int		sv_max_mesg;	/* max_payload + 1 page for overheads */
 	unsigned int		sv_xdrsize;	/* XDR buffer size */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 35929a7727c7..114051ad985a 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -105,6 +105,12 @@ enum {
 				 */
 };
 
+/*
+ * Maximum number of "tmp" connections - those without XPT_PEER_VALID -
+ * permitted on any service.
+ */
+#define XPT_MAX_TMP_CONN	64
+
 static inline void svc_xprt_set_valid(struct svc_xprt *xpt)
 {
 	if (test_bit(XPT_TEMP, &xpt->xpt_flags) &&
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index ff5b8bb8a88f..070bdeb50496 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -619,16 +619,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
  * The only somewhat efficient mechanism would be if drop old
  * connections from the same IP first. But right now we don't even
  * record the client IP in svc_sock.
- *
- * single-threaded services that expect a lot of clients will probably
- * need to set sv_maxconn to override the default value which is based
- * on the number of threads
  */
 static void svc_check_conn_limits(struct svc_serv *serv)
 {
-	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn : 64;
-
-	if (serv->sv_tmpcnt > limit) {
+	if (serv->sv_tmpcnt > XPT_MAX_TMP_CONN) {
 		struct svc_xprt *xprt = NULL, *xprti;
 		spin_lock_bh(&serv->sv_lock);
 		if (!list_empty(&serv->sv_tempsocks)) {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
                   ` (4 preceding siblings ...)
  2024-10-23  2:37 ` [PATCH 5/6] sunrpc: remove all connection limit configuration NeilBrown
@ 2024-10-23  2:37 ` NeilBrown
  2024-10-23 13:32   ` Jeff Layton
  2024-10-30  6:35   ` kernel test robot
  2024-10-23 14:00 ` [PATCH 0/6] prepare for dynamic server thread management Chuck Lever
  2025-10-28 15:47 ` Jeff Layton
  7 siblings, 2 replies; 21+ messages in thread
From: NeilBrown @ 2024-10-23  2:37 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

New fields sp_nractual and sv_nractual track how many actual threads are
running.  sp_nrhtreads and sv_nrthreads will be the number that were
explicitly request.  Currently nractually == nrthreads.

sv_nractual is used for sizing UDP incoming socket space - in the rare
case that UDP is used.  This is because each thread might need to keep a
request in the skbs.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/sunrpc/svc.h |  6 ++++--
 net/sunrpc/svc.c           | 22 +++++++++++++++-------
 net/sunrpc/svcsock.c       |  2 +-
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 9d288a673705..3f2c90061b4a 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -36,7 +36,8 @@
 struct svc_pool {
 	unsigned int		sp_id;		/* pool id; also node id on NUMA */
 	struct lwq		sp_xprts;	/* pending transports */
-	unsigned int		sp_nrthreads;	/* # of threads in pool */
+	unsigned int		sp_nrthreads;	/* # of threads requested for pool */
+	unsigned int		sp_nractual;	/* # of threads running */
 	struct list_head	sp_all_threads;	/* all server threads */
 	struct llist_head	sp_idle_threads; /* idle server threads */
 
@@ -71,7 +72,8 @@ struct svc_serv {
 	struct svc_stat *	sv_stats;	/* RPC statistics */
 	spinlock_t		sv_lock;
 	unsigned int		sv_nprogs;	/* Number of sv_programs */
-	unsigned int		sv_nrthreads;	/* # of server threads */
+	unsigned int		sv_nrthreads;	/* # of server threads requested*/
+	unsigned int		sv_nractual;	/* # of running threads */
 	unsigned int		sv_max_payload;	/* datagram payload size */
 	unsigned int		sv_max_mesg;	/* max_payload + 1 page for overheads */
 	unsigned int		sv_xdrsize;	/* XDR buffer size */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index bd4f02b34f44..d332f9d3d875 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -781,8 +781,12 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
 	}
 
 	if (pool && pool->sp_nrthreads) {
-		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
-		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
+		if (pool->sp_nrthreads <= pool->sp_nractual) {
+			set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
+			set_bit(SP_NEED_VICTIM, &pool->sp_flags);
+			pool->sp_nractual -= 1;
+			serv->sv_nractual -= 1;
+		}
 		return pool;
 	}
 	return NULL;
@@ -803,6 +807,12 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 		chosen_pool = svc_pool_next(serv, pool, &state);
 		node = svc_pool_map_get_node(chosen_pool->sp_id);
 
+		serv->sv_nrthreads += 1;
+		chosen_pool->sp_nrthreads += 1;
+
+		if (chosen_pool->sp_nrthreads <= chosen_pool->sp_nractual)
+			continue;
+
 		rqstp = svc_prepare_thread(serv, chosen_pool, node);
 		if (!rqstp)
 			return -ENOMEM;
@@ -812,8 +822,8 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 			svc_exit_thread(rqstp);
 			return PTR_ERR(task);
 		}
-		serv->sv_nrthreads += 1;
-		chosen_pool->sp_nrthreads += 1;
+		serv->sv_nractual += 1;
+		chosen_pool->sp_nractual += 1;
 
 		rqstp->rq_task = task;
 		if (serv->sv_nrpools > 1)
@@ -850,6 +860,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 			    TASK_IDLE);
 		nrservs++;
 	} while (nrservs < 0);
+	svc_sock_update_bufs(serv);
 	return 0;
 }
 
@@ -955,13 +966,10 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp)
 void
 svc_exit_thread(struct svc_rqst *rqstp)
 {
-	struct svc_serv	*serv = rqstp->rq_server;
 	struct svc_pool	*pool = rqstp->rq_pool;
 
 	list_del_rcu(&rqstp->rq_all);
 
-	svc_sock_update_bufs(serv);
-
 	svc_rqst_free(rqstp);
 
 	clear_and_wake_up_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 825ec5357691..191dbc648bd0 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -588,7 +588,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
 	     * provides an upper bound on the number of threads
 	     * which will access the socket.
 	     */
-	    svc_sock_setbufsize(svsk, serv->sv_nrthreads + 3);
+	    svc_sock_setbufsize(svsk, serv->sv_nractual + 3);
 
 	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 	err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
  2024-10-23  2:37 ` [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits NeilBrown
@ 2024-10-23 11:48   ` Jeff Layton
  2024-10-23 13:55   ` Chuck Lever
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff Layton @ 2024-10-23 11:48 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 2024-10-23 at 13:37 +1100, NeilBrown wrote:
> Currently per-client DRC slot limits (for v4.1+) are calculated when the
> client connects, and they are left unchanged.  So earlier clients can
> get a larger share when memory is tight.
> 
> The heuristic for choosing a number includes the number of configured
> server threads.  This is problematic for 2 reasons.
> 1/ sv_nrthreads is different in different net namespaces, but the
>    memory allocation is global across all namespaces.  So different
>    namespaces get treated differently without good reason.
> 2/ a future patch will auto-configure number of threads based on
>    load so that there will be no need to preconfigure a number.  This will
>    make the current heuristic even more arbitrary.
> 
> NFSv4.1 allows the number of slots to be varied dynamically - in the
> reply to each SEQUENCE op.  With this patch we provide a provisional
> upper limit in the EXCHANGE_ID reply which might end up being too big,
> and adjust it with each SEQUENCE reply.
> 
> The goal for when memory is tight is to allow each client to have a
> similar number of slots.  So clients that ask for larger slots get more
> memory.   This may not be ideal.  It could be changed later.
> 
> So we track the sum of the slot sizes of all active clients, and share
> memory out based on the ratio of the slot size for a given client with
> the sum of slot sizes.  We never allow more in a SEQUENCE reply than we
> did in the EXCHANGE_ID reply.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++---------------------
>  fs/nfsd/nfs4xdr.c   |  2 +-
>  fs/nfsd/nfsd.h      |  6 +++-
>  fs/nfsd/nfssvc.c    |  7 ++--
>  fs/nfsd/state.h     |  2 +-
>  fs/nfsd/xdr4.h      |  2 --
>  6 files changed, 56 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index ca6b5b52f77d..834e9aa12b82 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1909,44 +1909,26 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
>  }
>  
>  /*
> - * XXX: If we run out of reserved DRC memory we could (up to a point)
> - * re-negotiate active sessions and reduce their slot usage to make
> - * room for new connections. For now we just fail the create session.
> + * When a client connects it gets a max_requests number that would allow
> + * it to use 1/8 of the memory we think can reasonably be used for the DRC.
> + * Later in response to SEQUENCE operations we further limit that when there
> + * are more than 8 concurrent clients.
>   */
> -static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
> +static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
>  {
>  	u32 slotsize = slot_bytes(ca);
>  	u32 num = ca->maxreqs;
> -	unsigned long avail, total_avail;
> -	unsigned int scale_factor;
> +	unsigned long avail;
>  
>  	spin_lock(&nfsd_drc_lock);
> -	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> -		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> -	else
> -		/* We have handed out more space than we chose in
> -		 * set_max_drc() to allow.  That isn't really a
> -		 * problem as long as that doesn't make us think we
> -		 * have lots more due to integer overflow.
> -		 */
> -		total_avail = 0;
> -	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> -	/*
> -	 * Never use more than a fraction of the remaining memory,
> -	 * unless it's the only way to give this client a slot.
> -	 * The chosen fraction is either 1/8 or 1/number of threads,
> -	 * whichever is smaller.  This ensures there are adequate
> -	 * slots to support multiple clients per thread.
> -	 * Give the client one slot even if that would require
> -	 * over-allocation--it is better than failure.
> -	 */
> -	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
>  
> -	avail = clamp_t(unsigned long, avail, slotsize,
> -			total_avail/scale_factor);
> -	num = min_t(int, num, avail / slotsize);
> -	num = max_t(int, num, 1);
> -	nfsd_drc_mem_used += num * slotsize;
> +	avail = min(NFSD_MAX_MEM_PER_SESSION,
> +		    nfsd_drc_max_mem / 8);
> +
> +	num = clamp_t(int, num, 1, avail / slotsize);
> +
> +	nfsd_drc_slotsize_sum += slotsize;
> +
>  	spin_unlock(&nfsd_drc_lock);
>  
>  	return num;
> @@ -1957,10 +1939,33 @@ static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
>  	int slotsize = slot_bytes(ca);
>  
>  	spin_lock(&nfsd_drc_lock);
> -	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
> +	nfsd_drc_slotsize_sum -= slotsize;
>  	spin_unlock(&nfsd_drc_lock);
>  }
>  
> +/*
> + * Report the number of slots that we would like the client to limit
> + * itself to.  When the number of clients is large, we start sharing
> + * memory so all clients get the same number of slots.
> + */
> +static unsigned int nfsd4_get_drc_slots(struct nfsd4_session *session)
> +{
> +	u32 slotsize = slot_bytes(&session->se_fchannel);
> +	unsigned long avail;
> +	unsigned long slotsize_sum = READ_ONCE(nfsd_drc_slotsize_sum);
> +
> +	if (slotsize_sum < slotsize)
> +		slotsize_sum = slotsize;
> +
> +	/* Find our share of avail mem across all active clients,
> +	 * then limit to 1/8 of total, and configured max
> +	 */
> +	avail = min3(nfsd_drc_max_mem * slotsize / slotsize_sum,
> +		     nfsd_drc_max_mem / 8,
> +		     NFSD_MAX_MEM_PER_SESSION);
> +	return max3(1UL, avail / slotsize, session->se_fchannel.maxreqs);
> +}
> +
>  static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
>  					   struct nfsd4_channel_attrs *battrs)
>  {
> @@ -3735,7 +3740,7 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
>  	 * Note that we always allow at least one slot, because our
>  	 * accounting is soft and provides no guarantees either way.
>  	 */
> -	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
> +	ca->maxreqs = nfsd4_get_drc_mem(ca);
>  
>  	return nfs_ok;
>  }
> @@ -4229,10 +4234,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	slot = session->se_slots[seq->slotid];
>  	dprintk("%s: slotid %d\n", __func__, seq->slotid);
>  
> -	/* We do not negotiate the number of slots yet, so set the
> -	 * maxslots to the session maxreqs which is used to encode
> -	 * sr_highest_slotid and the sr_target_slot id to maxslots */
> -	seq->maxslots = session->se_fchannel.maxreqs;
> +	/* Negotiate number of slots: set the target, and use the
> +	 * same for max as long as it doesn't decrease the max
> +	 * (that isn't allowed).
> +	 */
> +	seq->target_maxslots = nfsd4_get_drc_slots(session);
> +	seq->maxslots = max(seq->maxslots, seq->target_maxslots);
>  
>  	trace_nfsd_slot_seqid_sequence(clp, seq, slot);
>  	status = check_slot_seqid(seq->seqid, slot->sl_seqid,
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index f118921250c3..e4e706872e54 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4955,7 +4955,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
>  	if (nfserr != nfs_ok)
>  		return nfserr;
>  	/* sr_target_highest_slotid */
> -	nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
> +	nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
>  	if (nfserr != nfs_ok)
>  		return nfserr;
>  	/* sr_status_flags */
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 4b56ba1e8e48..33c9db3ee8b6 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -90,7 +90,11 @@ extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
>  extern struct mutex		nfsd_mutex;
>  extern spinlock_t		nfsd_drc_lock;
>  extern unsigned long		nfsd_drc_max_mem;
> -extern unsigned long		nfsd_drc_mem_used;
> +/* Storing the sum of the slot sizes for all active clients (across
> + * all net-namespaces) allows a share of total available memory to
> + * be allocaed to each client based on its slot size.

nit: "allocated"

> + */
> +extern unsigned long		nfsd_drc_slotsize_sum;
>  extern atomic_t			nfsd_th_cnt;		/* number of available threads */
>  
>  extern const struct seq_operations nfs_exports_op;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 49e2f32102ab..e596eb5d10db 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -78,7 +78,7 @@ DEFINE_MUTEX(nfsd_mutex);
>   */
>  DEFINE_SPINLOCK(nfsd_drc_lock);
>  unsigned long	nfsd_drc_max_mem;
> -unsigned long	nfsd_drc_mem_used;
> +unsigned long	nfsd_drc_slotsize_sum;
>  
>  #if IS_ENABLED(CONFIG_NFS_LOCALIO)
>  static const struct svc_version *localio_versions[] = {
> @@ -589,10 +589,13 @@ void nfsd_reset_versions(struct nfsd_net *nn)
>   */
>  static void set_max_drc(void)
>  {
> +	if (nfsd_drc_max_mem)
> +		return;
> +
>  	#define NFSD_DRC_SIZE_SHIFT	7

Not a comment on your patch, per-se, but, it would be nice to document
the above constant. 44d8660d3bb0a just says:

    To prevent a few CREATE_SESSIONs from consuming all of memory we set an
    upper limit based on nr_free_buffer_pages().  1/2^10 has been too
    limiting in practice; 1/2^7 is still less than one percent.

So I guess that's 1/128 of the available free pages? Unfortunately,
that commit doesn't spell out why that's the magical ratio.

>  	nfsd_drc_max_mem = (nr_free_buffer_pages()
>  					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
> -	nfsd_drc_mem_used = 0;
> +	nfsd_drc_slotsize_sum = 0;
>  	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
>  }
>  
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 79c743c01a47..fe71ae3c577b 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -214,7 +214,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
>  /* Maximum number of slots per session. 160 is useful for long haul TCP */
>  #define NFSD_MAX_SLOTS_PER_SESSION     160
>  /* Maximum  session per slot cache size */
> -#define NFSD_SLOT_CACHE_SIZE		2048
> +#define NFSD_SLOT_CACHE_SIZE		2048UL
>  /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
>  #define NFSD_CACHE_SIZE_SLOTS_PER_SESSION	32
>  #define NFSD_MAX_MEM_PER_SESSION  \
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 2a21a7662e03..71b87190a4a6 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -575,9 +575,7 @@ struct nfsd4_sequence {
>  	u32			slotid;			/* request/response */
>  	u32			maxslots;		/* request/response */
>  	u32			cachethis;		/* request */
> -#if 0
>  	u32			target_maxslots;	/* response */
> -#endif /* not yet */
>  	u32			status_flags;		/* response */
>  };
>  

Your rationale and new algorithm for this seems sound.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations.
  2024-10-23  2:37 ` [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
@ 2024-10-23 12:08   ` Jeff Layton
  2024-10-23 21:18     ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Layton @ 2024-10-23 12:08 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 2024-10-23 at 13:37 +1100, NeilBrown wrote:
> The heuristic for limiting the number of incoming connections to nfsd
> currently uses sv_nrthreads - allowing more connections if more threads
> were configured.
> 
> A future patch will allow number of threads to grow dynamically so that
> there will be no need to configure sv_nrthreads.  So we need a different
> solution for limiting connections.
> 
> It isn't clear what problem is solved by limiting connections (as
> mentioned in a code comment) but the most likely problem is a connection
> storm - many connections that are not doing productive work.  These will
> be closed after about 6 minutes already but it might help to slow down a
> storm.
> 
> This patch adds a per-connection flag XPT_PEER_VALID which indicates
> that the peer has presented a filehandle for which it has some sort of
> access.  i.e the peer is known to be trusted in some way.  We now only
> count connections which have NOT been determined to be valid.  There
> should be relative few of these at any given time.
> 
> If the number of non-validated peer exceed a limit - currently 64 - we
> close the oldest non-validated peer to avoid having too many of these
> useless connections.
> 
> Note that this patch significantly changes the meaning of the various
> configuration parameters for "max connections".  The next patch will
> remove all of these.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfs/callback.c               |  4 ----
>  fs/nfs/callback_xdr.c           |  1 +
>  fs/nfsd/netns.h                 |  4 ++--
>  fs/nfsd/nfsfh.c                 |  2 ++
>  include/linux/sunrpc/svc.h      |  2 +-
>  include/linux/sunrpc/svc_xprt.h | 15 +++++++++++++++
>  net/sunrpc/svc_xprt.c           | 33 +++++++++++++++++----------------
>  7 files changed, 38 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index 6cf92498a5ac..86bdc7d23fb9 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -211,10 +211,6 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
>  		return ERR_PTR(-ENOMEM);
>  	}
>  	cb_info->serv = serv;
> -	/* As there is only one thread we need to over-ride the
> -	 * default maximum of 80 connections
> -	 */
> -	serv->sv_maxconn = 1024;
>  	dprintk("nfs_callback_create_svc: service created\n");
>  	return serv;
>  }
> diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
> index fdeb0b34a3d3..4254ba3ee7c5 100644
> --- a/fs/nfs/callback_xdr.c
> +++ b/fs/nfs/callback_xdr.c
> @@ -984,6 +984,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp)
>  			nfs_put_client(cps.clp);
>  			goto out_invalidcred;
>  		}
> +		svc_xprt_set_valid(rqstp->rq_xprt);
>  	}
>  
>  	cps.minorversion = hdr_arg.minorversion;
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 26f7b34d1a03..a05a45bb1978 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -129,8 +129,8 @@ struct nfsd_net {
>  	unsigned char writeverf[8];
>  
>  	/*
> -	 * Max number of connections this nfsd container will allow. Defaults
> -	 * to '0' which is means that it bases this on the number of threads.
> +	 * Max number of non-validated connections this nfsd container
> +	 * will allow.  Defaults to '0' gets mapped to 64.
>  	 */
>  	unsigned int max_connections;
>  
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 40ad58a6a036..2f44de99f709 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -383,6 +383,8 @@ __fh_verify(struct svc_rqst *rqstp,
>  		goto out;
>  
>  skip_pseudoflavor_check:
> +	svc_xprt_set_valid(rqstp->rq_xprt);
> +

This makes a lot of sense, but I don't see where lockd sets
XPT_PEER_VALID with this patch. Does it need a call in
nlm_lookup_file() or someplace similar?

>  	/* Finally, check access permissions. */
>  	error = nfsd_permission(cred, exp, dentry, access);
>  out:
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index e68fecf6eab5..617ebfff2f30 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -81,7 +81,7 @@ struct svc_serv {
>  	unsigned int		sv_xdrsize;	/* XDR buffer size */
>  	struct list_head	sv_permsocks;	/* all permanent sockets */
>  	struct list_head	sv_tempsocks;	/* all temporary sockets */
> -	int			sv_tmpcnt;	/* count of temporary sockets */
> +	int			sv_tmpcnt;	/* count of temporary "valid" sockets */
>  	struct timer_list	sv_temptimer;	/* timer for aging temporary sockets */
>  
>  	char *			sv_name;	/* service name */
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index 0981e35a9fed..35929a7727c7 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -99,8 +99,23 @@ enum {
>  	XPT_HANDSHAKE,		/* xprt requests a handshake */
>  	XPT_TLS_SESSION,	/* transport-layer security established */
>  	XPT_PEER_AUTH,		/* peer has been authenticated */
> +	XPT_PEER_VALID,		/* peer has presented a filehandle that
> +				 * it has access to.  It is NOT counted
> +				 * in ->sv_tmpcnt.
> +				 */
>  };
>  
> +static inline void svc_xprt_set_valid(struct svc_xprt *xpt)
> +{
> +	if (test_bit(XPT_TEMP, &xpt->xpt_flags) &&
> +	    !test_and_set_bit(XPT_PEER_VALID, &xpt->xpt_flags)) {
> +		struct svc_serv *serv = xpt->xpt_server;
> +		spin_lock(&serv->sv_lock);
> +		serv->sv_tmpcnt -= 1;
> +		spin_unlock(&serv->sv_lock);
> +	}
> +}
> +
>  static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u)
>  {
>  	spin_lock(&xpt->xpt_lock);
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 43c57124de52..ff5b8bb8a88f 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -606,7 +606,8 @@ int svc_port_is_privileged(struct sockaddr *sin)
>  }
>  
>  /*
> - * Make sure that we don't have too many active connections. If we have,
> + * Make sure that we don't have too many connections that have not yet
> + * demonstrated that they have access the the NFS server. If we have,
>   * something must be dropped. It's not clear what will happen if we allow
>   * "too many" connections, but when dealing with network-facing software,
>   * we have to code defensively. Here we do that by imposing hard limits.
> @@ -625,27 +626,26 @@ int svc_port_is_privileged(struct sockaddr *sin)
>   */
>  static void svc_check_conn_limits(struct svc_serv *serv)
>  {
> -	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn :
> -				(serv->sv_nrthreads+3) * 20;
> +	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn : 64;
>  
>  	if (serv->sv_tmpcnt > limit) {
> -		struct svc_xprt *xprt = NULL;
> +		struct svc_xprt *xprt = NULL, *xprti;
>  		spin_lock_bh(&serv->sv_lock);
>  		if (!list_empty(&serv->sv_tempsocks)) {
> -			/* Try to help the admin */
> -			net_notice_ratelimited("%s: too many open connections, consider increasing the %s\n",
> -					       serv->sv_name, serv->sv_maxconn ?
> -					       "max number of connections" :
> -					       "number of threads");
>  			/*
>  			 * Always select the oldest connection. It's not fair,
> -			 * but so is life
> +			 * but nor is life.
>  			 */
> -			xprt = list_entry(serv->sv_tempsocks.prev,
> -					  struct svc_xprt,
> -					  xpt_list);
> -			set_bit(XPT_CLOSE, &xprt->xpt_flags);
> -			svc_xprt_get(xprt);
> +			list_for_each_entry_reverse(xprti, &serv->sv_tempsocks,
> +						    xpt_list)
> +			{
> +				if (!test_bit(XPT_PEER_VALID, &xprti->xpt_flags)) {
> +					xprt = xprti;
> +					set_bit(XPT_CLOSE, &xprt->xpt_flags);
> +					svc_xprt_get(xprt);
> +					break;
> +				}
> +			}
>  		}
>  		spin_unlock_bh(&serv->sv_lock);
>  
> @@ -1039,7 +1039,8 @@ static void svc_delete_xprt(struct svc_xprt *xprt)
>  
>  	spin_lock_bh(&serv->sv_lock);
>  	list_del_init(&xprt->xpt_list);
> -	if (test_bit(XPT_TEMP, &xprt->xpt_flags))
> +	if (test_bit(XPT_TEMP, &xprt->xpt_flags) &&
> +	    !test_bit(XPT_PEER_VALID, &xprt->xpt_flags))
>  		serv->sv_tmpcnt--;
>  	spin_unlock_bh(&serv->sv_lock);
>  

Other than the comment about lockd, I like this:

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] sunrpc: remove all connection limit configuration
  2024-10-23  2:37 ` [PATCH 5/6] sunrpc: remove all connection limit configuration NeilBrown
@ 2024-10-23 12:50   ` Jeff Layton
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Layton @ 2024-10-23 12:50 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 2024-10-23 at 13:37 +1100, NeilBrown wrote:
> Now that the connection limit only apply to unconfirmed connections,
> there is no need to configure it.  So remove all the configuration and
> fix the number of unconfirmed connections as always 64 - which is
> now given a name: XPT_MAX_TMP_CONN
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/lockd/svc.c                  |  8 -------
>  fs/nfsd/netns.h                 |  6 -----
>  fs/nfsd/nfsctl.c                | 42 ---------------------------------
>  fs/nfsd/nfssvc.c                |  5 ----
>  include/linux/sunrpc/svc.h      |  4 ----
>  include/linux/sunrpc/svc_xprt.h |  6 +++++
>  net/sunrpc/svc_xprt.c           |  8 +------
>  7 files changed, 7 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 4ec22c2f2ea3..7ded57ec3a60 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -70,9 +70,6 @@ static unsigned long		nlm_grace_period;
>  unsigned long			nlm_timeout = LOCKD_DFLT_TIMEO;
>  static int			nlm_udpport, nlm_tcpport;
>  
> -/* RLIM_NOFILE defaults to 1024. That seems like a reasonable default here. */
> -static unsigned int		nlm_max_connections = 1024;
> -
>  /*
>   * Constants needed for the sysctl interface.
>   */
> @@ -136,9 +133,6 @@ lockd(void *vrqstp)
>  	 * NFS mount or NFS daemon has gone away.
>  	 */
>  	while (!svc_thread_should_stop(rqstp)) {
> -		/* update sv_maxconn if it has changed */
> -		rqstp->rq_server->sv_maxconn = nlm_max_connections;
> -
>  		nlmsvc_retry_blocked(rqstp);
>  		svc_recv(rqstp);
>  	}
> @@ -340,7 +334,6 @@ static int lockd_get(void)
>  		return -ENOMEM;
>  	}
>  
> -	serv->sv_maxconn = nlm_max_connections;
>  	error = svc_set_num_threads(serv, NULL, 1);
>  	if (error < 0) {
>  		svc_destroy(&serv);
> @@ -542,7 +535,6 @@ module_param_call(nlm_udpport, param_set_port, param_get_int,
>  module_param_call(nlm_tcpport, param_set_port, param_get_int,
>  		  &nlm_tcpport, 0644);
>  module_param(nsm_use_hostnames, bool, 0644);
> -module_param(nlm_max_connections, uint, 0644);
>  
>  static int lockd_init_net(struct net *net)
>  {
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index a05a45bb1978..4a07b8d0837b 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -128,12 +128,6 @@ struct nfsd_net {
>  	seqlock_t writeverf_lock;
>  	unsigned char writeverf[8];
>  
> -	/*
> -	 * Max number of non-validated connections this nfsd container
> -	 * will allow.  Defaults to '0' gets mapped to 64.
> -	 */
> -	unsigned int max_connections;
> -
>  	u32 clientid_base;
>  	u32 clientid_counter;
>  	u32 clverifier_counter;
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 3adbc05ebaac..95ea4393305b 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -48,7 +48,6 @@ enum {
>  	NFSD_Versions,
>  	NFSD_Ports,
>  	NFSD_MaxBlkSize,
> -	NFSD_MaxConnections,
>  	NFSD_Filecache,
>  	NFSD_Leasetime,
>  	NFSD_Gracetime,
> @@ -68,7 +67,6 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
>  static ssize_t write_versions(struct file *file, char *buf, size_t size);
>  static ssize_t write_ports(struct file *file, char *buf, size_t size);
>  static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
> -static ssize_t write_maxconn(struct file *file, char *buf, size_t size);
>  #ifdef CONFIG_NFSD_V4
>  static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
>  static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
> @@ -87,7 +85,6 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
>  	[NFSD_Versions] = write_versions,
>  	[NFSD_Ports] = write_ports,
>  	[NFSD_MaxBlkSize] = write_maxblksize,
> -	[NFSD_MaxConnections] = write_maxconn,
>  #ifdef CONFIG_NFSD_V4
>  	[NFSD_Leasetime] = write_leasetime,
>  	[NFSD_Gracetime] = write_gracetime,
> @@ -902,44 +899,6 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
>  							nfsd_max_blksize);
>  }
>  
> -/*
> - * write_maxconn - Set or report the current max number of connections
> - *
> - * Input:
> - *			buf:		ignored
> - *			size:		zero
> - * OR
> - *
> - * Input:
> - *			buf:		C string containing an unsigned
> - *					integer value representing the new
> - *					number of max connections
> - *			size:		non-zero length of C string in @buf
> - * Output:
> - *	On success:	passed-in buffer filled with '\n'-terminated C string
> - *			containing numeric value of max_connections setting
> - *			for this net namespace;
> - *			return code is the size in bytes of the string
> - *	On error:	return code is zero or a negative errno value
> - */
> -static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
> -{
> -	char *mesg = buf;
> -	struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
> -	unsigned int maxconn = nn->max_connections;
> -
> -	if (size > 0) {
> -		int rv = get_uint(&mesg, &maxconn);
> -
> -		if (rv)
> -			return rv;
> -		trace_nfsd_ctl_maxconn(netns(file), maxconn);
> -		nn->max_connections = maxconn;
> -	}
> -
> -	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxconn);
> -}
> -
>  #ifdef CONFIG_NFSD_V4
>  static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
>  				  time64_t *time, struct nfsd_net *nn)
> @@ -1372,7 +1331,6 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
>  		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
>  		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
> -		[NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO},
>  		[NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO},
>  #ifdef CONFIG_NFSD_V4
>  		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index e596eb5d10db..1a172a7e9e0c 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -671,7 +671,6 @@ int nfsd_create_serv(struct net *net)
>  	if (serv == NULL)
>  		return -ENOMEM;
>  
> -	serv->sv_maxconn = nn->max_connections;
>  	error = svc_bind(serv, net);
>  	if (error < 0) {
>  		svc_destroy(&serv);
> @@ -957,11 +956,7 @@ nfsd(void *vrqstp)
>  	 * The main request loop
>  	 */
>  	while (!svc_thread_should_stop(rqstp)) {
> -		/* Update sv_maxconn if it has changed */
> -		rqstp->rq_server->sv_maxconn = nn->max_connections;
> -
>  		svc_recv(rqstp);
> -
>  		nfsd_file_net_dispose(nn);
>  	}
>  
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 617ebfff2f30..9d288a673705 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -72,10 +72,6 @@ struct svc_serv {
>  	spinlock_t		sv_lock;
>  	unsigned int		sv_nprogs;	/* Number of sv_programs */
>  	unsigned int		sv_nrthreads;	/* # of server threads */
> -	unsigned int		sv_maxconn;	/* max connections allowed or
> -						 * '0' causing max to be based
> -						 * on number of threads. */
> -
>  	unsigned int		sv_max_payload;	/* datagram payload size */
>  	unsigned int		sv_max_mesg;	/* max_payload + 1 page for overheads */
>  	unsigned int		sv_xdrsize;	/* XDR buffer size */
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index 35929a7727c7..114051ad985a 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -105,6 +105,12 @@ enum {
>  				 */
>  };
>  
> +/*
> + * Maximum number of "tmp" connections - those without XPT_PEER_VALID -
> + * permitted on any service.
> + */
> +#define XPT_MAX_TMP_CONN	64
> +
>  static inline void svc_xprt_set_valid(struct svc_xprt *xpt)
>  {
>  	if (test_bit(XPT_TEMP, &xpt->xpt_flags) &&
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index ff5b8bb8a88f..070bdeb50496 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -619,16 +619,10 @@ int svc_port_is_privileged(struct sockaddr *sin)
>   * The only somewhat efficient mechanism would be if drop old
>   * connections from the same IP first. But right now we don't even
>   * record the client IP in svc_sock.
> - *
> - * single-threaded services that expect a lot of clients will probably
> - * need to set sv_maxconn to override the default value which is based
> - * on the number of threads
>   */
>  static void svc_check_conn_limits(struct svc_serv *serv)
>  {
> -	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn : 64;
> -
> -	if (serv->sv_tmpcnt > limit) {
> +	if (serv->sv_tmpcnt > XPT_MAX_TMP_CONN) {
>  		struct svc_xprt *xprt = NULL, *xprti;
>  		spin_lock_bh(&serv->sv_lock);
>  		if (!list_empty(&serv->sv_tempsocks)) {

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual
  2024-10-23  2:37 ` [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
@ 2024-10-23 13:32   ` Jeff Layton
  2024-10-30  6:35   ` kernel test robot
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff Layton @ 2024-10-23 13:32 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 2024-10-23 at 13:37 +1100, NeilBrown wrote:
> New fields sp_nractual and sv_nractual track how many actual threads are
> running.  sp_nrhtreads and sv_nrthreads will be the number that were
> explicitly request.  Currently nractually == nrthreads.
> 
> sv_nractual is used for sizing UDP incoming socket space - in the rare
> case that UDP is used.  This is because each thread might need to keep a
> request in the skbs.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  include/linux/sunrpc/svc.h |  6 ++++--
>  net/sunrpc/svc.c           | 22 +++++++++++++++-------
>  net/sunrpc/svcsock.c       |  2 +-
>  3 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 9d288a673705..3f2c90061b4a 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -36,7 +36,8 @@
>  struct svc_pool {
>  	unsigned int		sp_id;		/* pool id; also node id on NUMA */
>  	struct lwq		sp_xprts;	/* pending transports */
> -	unsigned int		sp_nrthreads;	/* # of threads in pool */
> +	unsigned int		sp_nrthreads;	/* # of threads requested for pool */
> +	unsigned int		sp_nractual;	/* # of threads running */
>  	struct list_head	sp_all_threads;	/* all server threads */
>  	struct llist_head	sp_idle_threads; /* idle server threads */
>  
> @@ -71,7 +72,8 @@ struct svc_serv {
>  	struct svc_stat *	sv_stats;	/* RPC statistics */
>  	spinlock_t		sv_lock;
>  	unsigned int		sv_nprogs;	/* Number of sv_programs */
> -	unsigned int		sv_nrthreads;	/* # of server threads */
> +	unsigned int		sv_nrthreads;	/* # of server threads requested*/
> +	unsigned int		sv_nractual;	/* # of running threads */
>  	unsigned int		sv_max_payload;	/* datagram payload size */
>  	unsigned int		sv_max_mesg;	/* max_payload + 1 page for overheads */
>  	unsigned int		sv_xdrsize;	/* XDR buffer size */
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index bd4f02b34f44..d332f9d3d875 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -781,8 +781,12 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
>  	}
>  
>  	if (pool && pool->sp_nrthreads) {
> -		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> -		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> +		if (pool->sp_nrthreads <= pool->sp_nractual) {
> +			set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> +			set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> +			pool->sp_nractual -= 1;
> +			serv->sv_nractual -= 1;
> +		}

This is a little strange. It decrements the thread counts before the
threads actually exit. So, these counts are really just aspirational at
this point. Not a bug, just a note to myself while I am going over
this.

>  		return pool;
>  	}
>  	return NULL;
> @@ -803,6 +807,12 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>  		chosen_pool = svc_pool_next(serv, pool, &state);
>  		node = svc_pool_map_get_node(chosen_pool->sp_id);
>  
> +		serv->sv_nrthreads += 1;
> +		chosen_pool->sp_nrthreads += 1;
> +
> +		if (chosen_pool->sp_nrthreads <= chosen_pool->sp_nractual)
> +			continue;
> +
>  		rqstp = svc_prepare_thread(serv, chosen_pool, node);
>  		if (!rqstp)
>  			return -ENOMEM;
> @@ -812,8 +822,8 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>  			svc_exit_thread(rqstp);
>  			return PTR_ERR(task);
>  		}
> -		serv->sv_nrthreads += 1;
> -		chosen_pool->sp_nrthreads += 1;
> +		serv->sv_nractual += 1;
> +		chosen_pool->sp_nractual += 1;
>  
>  		rqstp->rq_task = task;
>  		if (serv->sv_nrpools > 1)
> @@ -850,6 +860,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>  			    TASK_IDLE);
>  		nrservs++;
>  	} while (nrservs < 0);
> +	svc_sock_update_bufs(serv);

Nice little change here. No need to do this on every thread exit --
just do it once when they're all done. This change probably should be
split into a separate patch.

>  	return 0;
>  }
>  
> @@ -955,13 +966,10 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp)
>  void
>  svc_exit_thread(struct svc_rqst *rqstp)
>  {
> -	struct svc_serv	*serv = rqstp->rq_server;
>  	struct svc_pool	*pool = rqstp->rq_pool;
>  
>  	list_del_rcu(&rqstp->rq_all);
>  
> -	svc_sock_update_bufs(serv);
> -
>  	svc_rqst_free(rqstp);
>  
>  	clear_and_wake_up_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 825ec5357691..191dbc648bd0 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -588,7 +588,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  	     * provides an upper bound on the number of threads
>  	     * which will access the socket.
>  	     */
> -	    svc_sock_setbufsize(svsk, serv->sv_nrthreads + 3);
> +	    svc_sock_setbufsize(svsk, serv->sv_nractual + 3);
>  
>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>  	err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,

Other than maybe splitting that one change into a separate patch, this
looks good to me.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients.
  2024-10-23  2:37 ` [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
@ 2024-10-23 13:42   ` Chuck Lever
  2024-10-23 21:47     ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Chuck Lever @ 2024-10-23 13:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, Oct 23, 2024 at 01:37:02PM +1100, NeilBrown wrote:
> If there are more non-courteous clients than the calculated limit, we
> should fail the request rather than report a soft failure and
> encouraging the client to retry indefinitely.

Discussion:

This change has the potential to cause behavior regressions. I'm not
sure how clients will behave (eg, what error is reported to
administrators) if EXCH_ID / SETCLIENTID returns SERVERFAULT.

I can't find a more suitable status code than SERVERFAULT, however.

There is also the question of whether CREATE_SESSION, which also
might fail when server resources are over-extended, could return a
similar hard failure. (CREATE_SESSION has other spec-mandated
restrictions around using NFS4ERR_DELAY, however).


> The only hard failure allowed for EXCHANGE_ID that doesn't clearly have
> some other meaning is NFS4ERR_SERVERFAULT.  So use that, but explain why
> in a comment at each place that it is returned.
> 
> If there are courteous clients which push us over the limit, then expedite
> their removal.
> 
> This is not known to have caused a problem is production use, but

The current DELAY behavior is known to trigger an (interruptible)
infinite loop when a small-memory server can't create a new session.
Thus I believe the infinite loop behavior is a real issue that has
been observed and reported.


> testing of lots of clients reports repeated NFS4ERR_DELAY responses
> which doesn't seem helpful.

No argument from me. NFSD takes the current approach for exactly the
reason you mention above: there isn't a good choice of status code
to return in this case.

Nit: the description might better explain how this change is related
to or required by on-demand thread allocation. It seems a little
orthogonal to me right now. NBD.


> Also remove an outdated comment - we do use a slab cache.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 56b261608af4..ca6b5b52f77d 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2212,21 +2212,20 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
>  	return 1;
>  }
>  
> -/* 
> - * XXX Should we use a slab cache ?
> - * This type of memory management is somewhat inefficient, but we use it
> - * anyway since SETCLIENTID is not a common operation.
> - */
>  static struct nfs4_client *alloc_client(struct xdr_netobj name,
>  				struct nfsd_net *nn)
>  {
>  	struct nfs4_client *clp;
>  	int i;
>  
> -	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
> +	if (atomic_read(&nn->nfs4_client_count) -
> +	    atomic_read(&nn->nfsd_courtesy_clients) >= nn->nfs4_max_clients)
> +		return ERR_PTR(-EUSERS);
> +
> +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
> +	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
>  		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> -		return NULL;
> -	}
> +
>  	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
>  	if (clp == NULL)
>  		return NULL;
> @@ -3121,8 +3120,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
>  	struct dentry *dentries[ARRAY_SIZE(client_files)];
>  
>  	clp = alloc_client(name, nn);
> -	if (clp == NULL)
> -		return NULL;
> +	if (IS_ERR_OR_NULL(clp))
> +		return clp;
>  
>  	ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred);
>  	if (ret) {
> @@ -3504,6 +3503,11 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	new = create_client(exid->clname, rqstp, &verf);
>  	if (new == NULL)
>  		return nfserr_jukebox;
> +	if (IS_ERR(new))
> +		/* Protocol has no specific error for "client limit reached".
> +		 * NFS4ERR_RESOURCE is not permitted for EXCHANGE_ID
> +		 */
> +		return nfserr_serverfault;
>  	status = copy_impl_id(new, exid);
>  	if (status)
>  		goto out_nolock;
> @@ -4422,6 +4426,12 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	new = create_client(clname, rqstp, &clverifier);
>  	if (new == NULL)
>  		return nfserr_jukebox;
> +	if (IS_ERR(new))
> +		/* Protocol has no specific error for "client limit reached".
> +		 * NFS4ERR_RESOURCE, while allowed for SETCLIENTID, implies
> +		 * that a smaller COMPOUND might be successful.
> +		 */
> +		return nfserr_serverfault;
>  	spin_lock(&nn->client_lock);
>  	conf = find_confirmed_client_by_name(&clname, nn);
>  	if (conf && client_has_state(conf)) {
> -- 
> 2.46.0
> 

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
  2024-10-23  2:37 ` [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits NeilBrown
  2024-10-23 11:48   ` Jeff Layton
@ 2024-10-23 13:55   ` Chuck Lever
  2024-10-23 16:34     ` Tom Talpey
  1 sibling, 1 reply; 21+ messages in thread
From: Chuck Lever @ 2024-10-23 13:55 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, Oct 23, 2024 at 01:37:03PM +1100, NeilBrown wrote:
> Currently per-client DRC slot limits (for v4.1+) are calculated when the
> client connects, and they are left unchanged.  So earlier clients can
> get a larger share when memory is tight.
> 
> The heuristic for choosing a number includes the number of configured
> server threads.  This is problematic for 2 reasons.
> 1/ sv_nrthreads is different in different net namespaces, but the
>    memory allocation is global across all namespaces.  So different
>    namespaces get treated differently without good reason.
> 2/ a future patch will auto-configure number of threads based on
>    load so that there will be no need to preconfigure a number.  This will
>    make the current heuristic even more arbitrary.
> 
> NFSv4.1 allows the number of slots to be varied dynamically - in the
> reply to each SEQUENCE op.  With this patch we provide a provisional
> upper limit in the EXCHANGE_ID reply which might end up being too big,
> and adjust it with each SEQUENCE reply.
> 
> The goal for when memory is tight is to allow each client to have a
> similar number of slots.  So clients that ask for larger slots get more
> memory.   This may not be ideal.  It could be changed later.
> 
> So we track the sum of the slot sizes of all active clients, and share
> memory out based on the ratio of the slot size for a given client with
> the sum of slot sizes.  We never allow more in a SEQUENCE reply than we
> did in the EXCHANGE_ID reply.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

Dynamic session slot table sizing is one of our major "to-do" items
for NFSD, and it seems to have potential for breaking things if we
don't get it right. I'd prefer to prioritize getting this one merged
into nfsd-next first, it seems like an important change to have.


> ---
>  fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++---------------------
>  fs/nfsd/nfs4xdr.c   |  2 +-
>  fs/nfsd/nfsd.h      |  6 +++-
>  fs/nfsd/nfssvc.c    |  7 ++--
>  fs/nfsd/state.h     |  2 +-
>  fs/nfsd/xdr4.h      |  2 --
>  6 files changed, 56 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index ca6b5b52f77d..834e9aa12b82 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1909,44 +1909,26 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
>  }
>  
>  /*
> - * XXX: If we run out of reserved DRC memory we could (up to a point)
> - * re-negotiate active sessions and reduce their slot usage to make
> - * room for new connections. For now we just fail the create session.
> + * When a client connects it gets a max_requests number that would allow
> + * it to use 1/8 of the memory we think can reasonably be used for the DRC.
> + * Later in response to SEQUENCE operations we further limit that when there
> + * are more than 8 concurrent clients.
>   */
> -static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
> +static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
>  {
>  	u32 slotsize = slot_bytes(ca);
>  	u32 num = ca->maxreqs;
> -	unsigned long avail, total_avail;
> -	unsigned int scale_factor;
> +	unsigned long avail;
>  
>  	spin_lock(&nfsd_drc_lock);
> -	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> -		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> -	else
> -		/* We have handed out more space than we chose in
> -		 * set_max_drc() to allow.  That isn't really a
> -		 * problem as long as that doesn't make us think we
> -		 * have lots more due to integer overflow.
> -		 */
> -		total_avail = 0;
> -	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> -	/*
> -	 * Never use more than a fraction of the remaining memory,
> -	 * unless it's the only way to give this client a slot.
> -	 * The chosen fraction is either 1/8 or 1/number of threads,
> -	 * whichever is smaller.  This ensures there are adequate
> -	 * slots to support multiple clients per thread.
> -	 * Give the client one slot even if that would require
> -	 * over-allocation--it is better than failure.
> -	 */
> -	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
>  
> -	avail = clamp_t(unsigned long, avail, slotsize,
> -			total_avail/scale_factor);
> -	num = min_t(int, num, avail / slotsize);
> -	num = max_t(int, num, 1);
> -	nfsd_drc_mem_used += num * slotsize;
> +	avail = min(NFSD_MAX_MEM_PER_SESSION,
> +		    nfsd_drc_max_mem / 8);
> +
> +	num = clamp_t(int, num, 1, avail / slotsize);
> +
> +	nfsd_drc_slotsize_sum += slotsize;
> +
>  	spin_unlock(&nfsd_drc_lock);
>  
>  	return num;
> @@ -1957,10 +1939,33 @@ static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
>  	int slotsize = slot_bytes(ca);
>  
>  	spin_lock(&nfsd_drc_lock);
> -	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
> +	nfsd_drc_slotsize_sum -= slotsize;
>  	spin_unlock(&nfsd_drc_lock);
>  }
>  
> +/*
> + * Report the number of slots that we would like the client to limit
> + * itself to.  When the number of clients is large, we start sharing
> + * memory so all clients get the same number of slots.
> + */
> +static unsigned int nfsd4_get_drc_slots(struct nfsd4_session *session)
> +{
> +	u32 slotsize = slot_bytes(&session->se_fchannel);
> +	unsigned long avail;
> +	unsigned long slotsize_sum = READ_ONCE(nfsd_drc_slotsize_sum);
> +
> +	if (slotsize_sum < slotsize)
> +		slotsize_sum = slotsize;
> +
> +	/* Find our share of avail mem across all active clients,
> +	 * then limit to 1/8 of total, and configured max
> +	 */
> +	avail = min3(nfsd_drc_max_mem * slotsize / slotsize_sum,
> +		     nfsd_drc_max_mem / 8,
> +		     NFSD_MAX_MEM_PER_SESSION);
> +	return max3(1UL, avail / slotsize, session->se_fchannel.maxreqs);
> +}
> +
>  static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
>  					   struct nfsd4_channel_attrs *battrs)
>  {
> @@ -3735,7 +3740,7 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
>  	 * Note that we always allow at least one slot, because our
>  	 * accounting is soft and provides no guarantees either way.
>  	 */
> -	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
> +	ca->maxreqs = nfsd4_get_drc_mem(ca);
>  
>  	return nfs_ok;
>  }
> @@ -4229,10 +4234,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	slot = session->se_slots[seq->slotid];
>  	dprintk("%s: slotid %d\n", __func__, seq->slotid);
>  
> -	/* We do not negotiate the number of slots yet, so set the
> -	 * maxslots to the session maxreqs which is used to encode
> -	 * sr_highest_slotid and the sr_target_slot id to maxslots */
> -	seq->maxslots = session->se_fchannel.maxreqs;
> +	/* Negotiate number of slots: set the target, and use the
> +	 * same for max as long as it doesn't decrease the max
> +	 * (that isn't allowed).
> +	 */
> +	seq->target_maxslots = nfsd4_get_drc_slots(session);
> +	seq->maxslots = max(seq->maxslots, seq->target_maxslots);
>  
>  	trace_nfsd_slot_seqid_sequence(clp, seq, slot);
>  	status = check_slot_seqid(seq->seqid, slot->sl_seqid,
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index f118921250c3..e4e706872e54 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4955,7 +4955,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
>  	if (nfserr != nfs_ok)
>  		return nfserr;
>  	/* sr_target_highest_slotid */
> -	nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
> +	nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
>  	if (nfserr != nfs_ok)
>  		return nfserr;
>  	/* sr_status_flags */
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 4b56ba1e8e48..33c9db3ee8b6 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -90,7 +90,11 @@ extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
>  extern struct mutex		nfsd_mutex;
>  extern spinlock_t		nfsd_drc_lock;
>  extern unsigned long		nfsd_drc_max_mem;
> -extern unsigned long		nfsd_drc_mem_used;
> +/* Storing the sum of the slot sizes for all active clients (across
> + * all net-namespaces) allows a share of total available memory to
> + * be allocaed to each client based on its slot size.
> + */
> +extern unsigned long		nfsd_drc_slotsize_sum;
>  extern atomic_t			nfsd_th_cnt;		/* number of available threads */
>  
>  extern const struct seq_operations nfs_exports_op;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 49e2f32102ab..e596eb5d10db 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -78,7 +78,7 @@ DEFINE_MUTEX(nfsd_mutex);
>   */
>  DEFINE_SPINLOCK(nfsd_drc_lock);
>  unsigned long	nfsd_drc_max_mem;
> -unsigned long	nfsd_drc_mem_used;
> +unsigned long	nfsd_drc_slotsize_sum;
>  
>  #if IS_ENABLED(CONFIG_NFS_LOCALIO)
>  static const struct svc_version *localio_versions[] = {
> @@ -589,10 +589,13 @@ void nfsd_reset_versions(struct nfsd_net *nn)
>   */
>  static void set_max_drc(void)
>  {
> +	if (nfsd_drc_max_mem)
> +		return;
> +
>  	#define NFSD_DRC_SIZE_SHIFT	7
>  	nfsd_drc_max_mem = (nr_free_buffer_pages()
>  					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
> -	nfsd_drc_mem_used = 0;
> +	nfsd_drc_slotsize_sum = 0;
>  	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
>  }
>  
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 79c743c01a47..fe71ae3c577b 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -214,7 +214,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
>  /* Maximum number of slots per session. 160 is useful for long haul TCP */
>  #define NFSD_MAX_SLOTS_PER_SESSION     160
>  /* Maximum  session per slot cache size */
> -#define NFSD_SLOT_CACHE_SIZE		2048
> +#define NFSD_SLOT_CACHE_SIZE		2048UL
>  /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
>  #define NFSD_CACHE_SIZE_SLOTS_PER_SESSION	32
>  #define NFSD_MAX_MEM_PER_SESSION  \
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 2a21a7662e03..71b87190a4a6 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -575,9 +575,7 @@ struct nfsd4_sequence {
>  	u32			slotid;			/* request/response */
>  	u32			maxslots;		/* request/response */
>  	u32			cachethis;		/* request */
> -#if 0
>  	u32			target_maxslots;	/* response */
> -#endif /* not yet */
>  	u32			status_flags;		/* response */
>  };
>  
> -- 
> 2.46.0
> 

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/6] prepare for dynamic server thread management
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
                   ` (5 preceding siblings ...)
  2024-10-23  2:37 ` [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
@ 2024-10-23 14:00 ` Chuck Lever
  2025-10-28 15:47 ` Jeff Layton
  7 siblings, 0 replies; 21+ messages in thread
From: Chuck Lever @ 2024-10-23 14:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jeff Layton, linux-nfs, okorniev, Dai Ngo, Tom Talpey

On Wed, Oct 23, 2024 at 01:37:00PM +1100, NeilBrown wrote:
> These patches prepare the way for demand-based adjustment of the number
> of server threads.  They primarily remove some places there the
> configured thread count is used to configure other things.
> 
> With these in place only two more patches are needed to have demand
> based thread count.  The details of how to configure this need to be
> discussed to ensure we have considered all perspectives, and I would
> rather than happen in the context of two patches, not in the context of
> 8.  So I'm sending these first in the hope that will land with minimal
> fuss.  Once they do land I'll send the remainder (which you have already
> seen) and will look forward to a fruitful discussion.

3/6 seems like the star of this show, at least it's the change that
is most important to me, and could have the most potential for
introducing behavior regressions.

I know this will seem like I'm introducing further arbitrary delay,
but I've found that testing and soak time is still valuable for
reducing risk and disruption.

So, for the moment, can we focus on the details of getting 3/6 into
nfsd-next?


> Thanks,
> NeilBrown
> 
>  [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads.
>  [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there
>  [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
>  [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting
>  [PATCH 5/6] sunrpc: remove all connection limit configuration
>  [PATCH 6/6] sunrpc: introduce possibility that requested number of

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
  2024-10-23 13:55   ` Chuck Lever
@ 2024-10-23 16:34     ` Tom Talpey
  2024-10-23 21:53       ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Talpey @ 2024-10-23 16:34 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown; +Cc: Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo

On 10/23/2024 9:55 AM, Chuck Lever wrote:
> On Wed, Oct 23, 2024 at 01:37:03PM +1100, NeilBrown wrote:
>> Currently per-client DRC slot limits (for v4.1+) are calculated when the
>> client connects, and they are left unchanged.  So earlier clients can
>> get a larger share when memory is tight.
>>
>> The heuristic for choosing a number includes the number of configured
>> server threads.  This is problematic for 2 reasons.
>> 1/ sv_nrthreads is different in different net namespaces, but the
>>     memory allocation is global across all namespaces.  So different
>>     namespaces get treated differently without good reason.
>> 2/ a future patch will auto-configure number of threads based on
>>     load so that there will be no need to preconfigure a number.  This will
>>     make the current heuristic even more arbitrary.
>>
>> NFSv4.1 allows the number of slots to be varied dynamically - in the
>> reply to each SEQUENCE op.  With this patch we provide a provisional
>> upper limit in the EXCHANGE_ID reply which might end up being too big,
>> and adjust it with each SEQUENCE reply.
>>
>> The goal for when memory is tight is to allow each client to have a
>> similar number of slots.  So clients that ask for larger slots get more
>> memory.   This may not be ideal.  It could be changed later.
>>
>> So we track the sum of the slot sizes of all active clients, and share
>> memory out based on the ratio of the slot size for a given client with
>> the sum of slot sizes.  We never allow more in a SEQUENCE reply than we
>> did in the EXCHANGE_ID reply.
>>
>> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> Dynamic session slot table sizing is one of our major "to-do" items
> for NFSD, and it seems to have potential for breaking things if we
> don't get it right. I'd prefer to prioritize getting this one merged
> into nfsd-next first, it seems like an important change to have.

I agree but I do have one comment:

+ * Report the number of slots that we would like the client to limit
+ * itself to.  When the number of clients is large, we start sharing
+ * memory so all clients get the same number of slots.

That second sentence is a policy that is undoubtedly going to change
someday. Also, the "number of clients is large" is not exactly what's
being checked here, it's only looking at the current demand on whatever
the shared pool it. I'd just delete the second sentence.

Don't forget that the v4.1+ DRC doesn't have to be preallocated. By
design it's dynamic, for example if the client never performs any
non-idempotent operations, it can be completely null. Personally
I'd love to see that implemented someday.

Tom.

> 
> 
>> ---
>>   fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++---------------------
>>   fs/nfsd/nfs4xdr.c   |  2 +-
>>   fs/nfsd/nfsd.h      |  6 +++-
>>   fs/nfsd/nfssvc.c    |  7 ++--
>>   fs/nfsd/state.h     |  2 +-
>>   fs/nfsd/xdr4.h      |  2 --
>>   6 files changed, 56 insertions(+), 44 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index ca6b5b52f77d..834e9aa12b82 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1909,44 +1909,26 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
>>   }
>>   
>>   /*
>> - * XXX: If we run out of reserved DRC memory we could (up to a point)
>> - * re-negotiate active sessions and reduce their slot usage to make
>> - * room for new connections. For now we just fail the create session.
>> + * When a client connects it gets a max_requests number that would allow
>> + * it to use 1/8 of the memory we think can reasonably be used for the DRC.
>> + * Later in response to SEQUENCE operations we further limit that when there
>> + * are more than 8 concurrent clients.
>>    */
>> -static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
>> +static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
>>   {
>>   	u32 slotsize = slot_bytes(ca);
>>   	u32 num = ca->maxreqs;
>> -	unsigned long avail, total_avail;
>> -	unsigned int scale_factor;
>> +	unsigned long avail;
>>   
>>   	spin_lock(&nfsd_drc_lock);
>> -	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
>> -		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
>> -	else
>> -		/* We have handed out more space than we chose in
>> -		 * set_max_drc() to allow.  That isn't really a
>> -		 * problem as long as that doesn't make us think we
>> -		 * have lots more due to integer overflow.
>> -		 */
>> -		total_avail = 0;
>> -	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
>> -	/*
>> -	 * Never use more than a fraction of the remaining memory,
>> -	 * unless it's the only way to give this client a slot.
>> -	 * The chosen fraction is either 1/8 or 1/number of threads,
>> -	 * whichever is smaller.  This ensures there are adequate
>> -	 * slots to support multiple clients per thread.
>> -	 * Give the client one slot even if that would require
>> -	 * over-allocation--it is better than failure.
>> -	 */
>> -	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
>>   
>> -	avail = clamp_t(unsigned long, avail, slotsize,
>> -			total_avail/scale_factor);
>> -	num = min_t(int, num, avail / slotsize);
>> -	num = max_t(int, num, 1);
>> -	nfsd_drc_mem_used += num * slotsize;
>> +	avail = min(NFSD_MAX_MEM_PER_SESSION,
>> +		    nfsd_drc_max_mem / 8);
>> +
>> +	num = clamp_t(int, num, 1, avail / slotsize);
>> +
>> +	nfsd_drc_slotsize_sum += slotsize;
>> +
>>   	spin_unlock(&nfsd_drc_lock);
>>   
>>   	return num;
>> @@ -1957,10 +1939,33 @@ static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
>>   	int slotsize = slot_bytes(ca);
>>   
>>   	spin_lock(&nfsd_drc_lock);
>> -	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
>> +	nfsd_drc_slotsize_sum -= slotsize;
>>   	spin_unlock(&nfsd_drc_lock);
>>   }
>>   
>> +/*
>> + * Report the number of slots that we would like the client to limit
>> + * itself to.  When the number of clients is large, we start sharing
>> + * memory so all clients get the same number of slots.
>> + */
>> +static unsigned int nfsd4_get_drc_slots(struct nfsd4_session *session)
>> +{
>> +	u32 slotsize = slot_bytes(&session->se_fchannel);
>> +	unsigned long avail;
>> +	unsigned long slotsize_sum = READ_ONCE(nfsd_drc_slotsize_sum);
>> +
>> +	if (slotsize_sum < slotsize)
>> +		slotsize_sum = slotsize;
>> +
>> +	/* Find our share of avail mem across all active clients,
>> +	 * then limit to 1/8 of total, and configured max
>> +	 */
>> +	avail = min3(nfsd_drc_max_mem * slotsize / slotsize_sum,
>> +		     nfsd_drc_max_mem / 8,
>> +		     NFSD_MAX_MEM_PER_SESSION);
>> +	return max3(1UL, avail / slotsize, session->se_fchannel.maxreqs);
>> +}
>> +
>>   static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
>>   					   struct nfsd4_channel_attrs *battrs)
>>   {
>> @@ -3735,7 +3740,7 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
>>   	 * Note that we always allow at least one slot, because our
>>   	 * accounting is soft and provides no guarantees either way.
>>   	 */
>> -	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
>> +	ca->maxreqs = nfsd4_get_drc_mem(ca);
>>   
>>   	return nfs_ok;
>>   }
>> @@ -4229,10 +4234,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>   	slot = session->se_slots[seq->slotid];
>>   	dprintk("%s: slotid %d\n", __func__, seq->slotid);
>>   
>> -	/* We do not negotiate the number of slots yet, so set the
>> -	 * maxslots to the session maxreqs which is used to encode
>> -	 * sr_highest_slotid and the sr_target_slot id to maxslots */
>> -	seq->maxslots = session->se_fchannel.maxreqs;
>> +	/* Negotiate number of slots: set the target, and use the
>> +	 * same for max as long as it doesn't decrease the max
>> +	 * (that isn't allowed).
>> +	 */
>> +	seq->target_maxslots = nfsd4_get_drc_slots(session);
>> +	seq->maxslots = max(seq->maxslots, seq->target_maxslots);
>>   
>>   	trace_nfsd_slot_seqid_sequence(clp, seq, slot);
>>   	status = check_slot_seqid(seq->seqid, slot->sl_seqid,
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index f118921250c3..e4e706872e54 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -4955,7 +4955,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
>>   	if (nfserr != nfs_ok)
>>   		return nfserr;
>>   	/* sr_target_highest_slotid */
>> -	nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
>> +	nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
>>   	if (nfserr != nfs_ok)
>>   		return nfserr;
>>   	/* sr_status_flags */
>> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
>> index 4b56ba1e8e48..33c9db3ee8b6 100644
>> --- a/fs/nfsd/nfsd.h
>> +++ b/fs/nfsd/nfsd.h
>> @@ -90,7 +90,11 @@ extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
>>   extern struct mutex		nfsd_mutex;
>>   extern spinlock_t		nfsd_drc_lock;
>>   extern unsigned long		nfsd_drc_max_mem;
>> -extern unsigned long		nfsd_drc_mem_used;
>> +/* Storing the sum of the slot sizes for all active clients (across
>> + * all net-namespaces) allows a share of total available memory to
>> + * be allocaed to each client based on its slot size.
>> + */
>> +extern unsigned long		nfsd_drc_slotsize_sum;
>>   extern atomic_t			nfsd_th_cnt;		/* number of available threads */
>>   
>>   extern const struct seq_operations nfs_exports_op;
>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>> index 49e2f32102ab..e596eb5d10db 100644
>> --- a/fs/nfsd/nfssvc.c
>> +++ b/fs/nfsd/nfssvc.c
>> @@ -78,7 +78,7 @@ DEFINE_MUTEX(nfsd_mutex);
>>    */
>>   DEFINE_SPINLOCK(nfsd_drc_lock);
>>   unsigned long	nfsd_drc_max_mem;
>> -unsigned long	nfsd_drc_mem_used;
>> +unsigned long	nfsd_drc_slotsize_sum;
>>   
>>   #if IS_ENABLED(CONFIG_NFS_LOCALIO)
>>   static const struct svc_version *localio_versions[] = {
>> @@ -589,10 +589,13 @@ void nfsd_reset_versions(struct nfsd_net *nn)
>>    */
>>   static void set_max_drc(void)
>>   {
>> +	if (nfsd_drc_max_mem)
>> +		return;
>> +
>>   	#define NFSD_DRC_SIZE_SHIFT	7
>>   	nfsd_drc_max_mem = (nr_free_buffer_pages()
>>   					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
>> -	nfsd_drc_mem_used = 0;
>> +	nfsd_drc_slotsize_sum = 0;
>>   	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
>>   }
>>   
>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>> index 79c743c01a47..fe71ae3c577b 100644
>> --- a/fs/nfsd/state.h
>> +++ b/fs/nfsd/state.h
>> @@ -214,7 +214,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
>>   /* Maximum number of slots per session. 160 is useful for long haul TCP */
>>   #define NFSD_MAX_SLOTS_PER_SESSION     160
>>   /* Maximum  session per slot cache size */
>> -#define NFSD_SLOT_CACHE_SIZE		2048
>> +#define NFSD_SLOT_CACHE_SIZE		2048UL
>>   /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
>>   #define NFSD_CACHE_SIZE_SLOTS_PER_SESSION	32
>>   #define NFSD_MAX_MEM_PER_SESSION  \
>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
>> index 2a21a7662e03..71b87190a4a6 100644
>> --- a/fs/nfsd/xdr4.h
>> +++ b/fs/nfsd/xdr4.h
>> @@ -575,9 +575,7 @@ struct nfsd4_sequence {
>>   	u32			slotid;			/* request/response */
>>   	u32			maxslots;		/* request/response */
>>   	u32			cachethis;		/* request */
>> -#if 0
>>   	u32			target_maxslots;	/* response */
>> -#endif /* not yet */
>>   	u32			status_flags;		/* response */
>>   };
>>   
>> -- 
>> 2.46.0
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations.
  2024-10-23 12:08   ` Jeff Layton
@ 2024-10-23 21:18     ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2024-10-23 21:18 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 23 Oct 2024, Jeff Layton wrote:
> On Wed, 2024-10-23 at 13:37 +1100, NeilBrown wrote:
> > The heuristic for limiting the number of incoming connections to nfsd
> > currently uses sv_nrthreads - allowing more connections if more threads
> > were configured.
> > 
> > A future patch will allow number of threads to grow dynamically so that
> > there will be no need to configure sv_nrthreads.  So we need a different
> > solution for limiting connections.
> > 
> > It isn't clear what problem is solved by limiting connections (as
> > mentioned in a code comment) but the most likely problem is a connection
> > storm - many connections that are not doing productive work.  These will
> > be closed after about 6 minutes already but it might help to slow down a
> > storm.
> > 
> > This patch adds a per-connection flag XPT_PEER_VALID which indicates
> > that the peer has presented a filehandle for which it has some sort of
> > access.  i.e the peer is known to be trusted in some way.  We now only
> > count connections which have NOT been determined to be valid.  There
> > should be relative few of these at any given time.
> > 
> > If the number of non-validated peer exceed a limit - currently 64 - we
> > close the oldest non-validated peer to avoid having too many of these
> > useless connections.
> > 
> > Note that this patch significantly changes the meaning of the various
> > configuration parameters for "max connections".  The next patch will
> > remove all of these.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfs/callback.c               |  4 ----
> >  fs/nfs/callback_xdr.c           |  1 +
> >  fs/nfsd/netns.h                 |  4 ++--
> >  fs/nfsd/nfsfh.c                 |  2 ++
> >  include/linux/sunrpc/svc.h      |  2 +-
> >  include/linux/sunrpc/svc_xprt.h | 15 +++++++++++++++
> >  net/sunrpc/svc_xprt.c           | 33 +++++++++++++++++----------------
> >  7 files changed, 38 insertions(+), 23 deletions(-)
> > 
> > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > index 6cf92498a5ac..86bdc7d23fb9 100644
> > --- a/fs/nfs/callback.c
> > +++ b/fs/nfs/callback.c
> > @@ -211,10 +211,6 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
> >  		return ERR_PTR(-ENOMEM);
> >  	}
> >  	cb_info->serv = serv;
> > -	/* As there is only one thread we need to over-ride the
> > -	 * default maximum of 80 connections
> > -	 */
> > -	serv->sv_maxconn = 1024;
> >  	dprintk("nfs_callback_create_svc: service created\n");
> >  	return serv;
> >  }
> > diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
> > index fdeb0b34a3d3..4254ba3ee7c5 100644
> > --- a/fs/nfs/callback_xdr.c
> > +++ b/fs/nfs/callback_xdr.c
> > @@ -984,6 +984,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp)
> >  			nfs_put_client(cps.clp);
> >  			goto out_invalidcred;
> >  		}
> > +		svc_xprt_set_valid(rqstp->rq_xprt);
> >  	}
> >  
> >  	cps.minorversion = hdr_arg.minorversion;
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index 26f7b34d1a03..a05a45bb1978 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -129,8 +129,8 @@ struct nfsd_net {
> >  	unsigned char writeverf[8];
> >  
> >  	/*
> > -	 * Max number of connections this nfsd container will allow. Defaults
> > -	 * to '0' which is means that it bases this on the number of threads.
> > +	 * Max number of non-validated connections this nfsd container
> > +	 * will allow.  Defaults to '0' gets mapped to 64.
> >  	 */
> >  	unsigned int max_connections;
> >  
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index 40ad58a6a036..2f44de99f709 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -383,6 +383,8 @@ __fh_verify(struct svc_rqst *rqstp,
> >  		goto out;
> >  
> >  skip_pseudoflavor_check:
> > +	svc_xprt_set_valid(rqstp->rq_xprt);
> > +
> 
> This makes a lot of sense, but I don't see where lockd sets
> XPT_PEER_VALID with this patch. Does it need a call in
> nlm_lookup_file() or someplace similar?

Lockd calls nlm_svc_binding.fopen which is nlm_fopen() which calls
nfsd_open() which calls fh_verify() which calls svc_xprt_set_valid().


> 
> >  	/* Finally, check access permissions. */
> >  	error = nfsd_permission(cred, exp, dentry, access);
> >  out:
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index e68fecf6eab5..617ebfff2f30 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -81,7 +81,7 @@ struct svc_serv {
> >  	unsigned int		sv_xdrsize;	/* XDR buffer size */
> >  	struct list_head	sv_permsocks;	/* all permanent sockets */
> >  	struct list_head	sv_tempsocks;	/* all temporary sockets */
> > -	int			sv_tmpcnt;	/* count of temporary sockets */
> > +	int			sv_tmpcnt;	/* count of temporary "valid" sockets */
> >  	struct timer_list	sv_temptimer;	/* timer for aging temporary sockets */
> >  
> >  	char *			sv_name;	/* service name */
> > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> > index 0981e35a9fed..35929a7727c7 100644
> > --- a/include/linux/sunrpc/svc_xprt.h
> > +++ b/include/linux/sunrpc/svc_xprt.h
> > @@ -99,8 +99,23 @@ enum {
> >  	XPT_HANDSHAKE,		/* xprt requests a handshake */
> >  	XPT_TLS_SESSION,	/* transport-layer security established */
> >  	XPT_PEER_AUTH,		/* peer has been authenticated */
> > +	XPT_PEER_VALID,		/* peer has presented a filehandle that
> > +				 * it has access to.  It is NOT counted
> > +				 * in ->sv_tmpcnt.
> > +				 */
> >  };
> >  
> > +static inline void svc_xprt_set_valid(struct svc_xprt *xpt)
> > +{
> > +	if (test_bit(XPT_TEMP, &xpt->xpt_flags) &&
> > +	    !test_and_set_bit(XPT_PEER_VALID, &xpt->xpt_flags)) {
> > +		struct svc_serv *serv = xpt->xpt_server;
> > +		spin_lock(&serv->sv_lock);
> > +		serv->sv_tmpcnt -= 1;
> > +		spin_unlock(&serv->sv_lock);
> > +	}
> > +}
> > +
> >  static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u)
> >  {
> >  	spin_lock(&xpt->xpt_lock);
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index 43c57124de52..ff5b8bb8a88f 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -606,7 +606,8 @@ int svc_port_is_privileged(struct sockaddr *sin)
> >  }
> >  
> >  /*
> > - * Make sure that we don't have too many active connections. If we have,
> > + * Make sure that we don't have too many connections that have not yet
> > + * demonstrated that they have access the the NFS server. If we have,
> >   * something must be dropped. It's not clear what will happen if we allow
> >   * "too many" connections, but when dealing with network-facing software,
> >   * we have to code defensively. Here we do that by imposing hard limits.
> > @@ -625,27 +626,26 @@ int svc_port_is_privileged(struct sockaddr *sin)
> >   */
> >  static void svc_check_conn_limits(struct svc_serv *serv)
> >  {
> > -	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn :
> > -				(serv->sv_nrthreads+3) * 20;
> > +	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn : 64;
> >  
> >  	if (serv->sv_tmpcnt > limit) {
> > -		struct svc_xprt *xprt = NULL;
> > +		struct svc_xprt *xprt = NULL, *xprti;
> >  		spin_lock_bh(&serv->sv_lock);
> >  		if (!list_empty(&serv->sv_tempsocks)) {
> > -			/* Try to help the admin */
> > -			net_notice_ratelimited("%s: too many open connections, consider increasing the %s\n",
> > -					       serv->sv_name, serv->sv_maxconn ?
> > -					       "max number of connections" :
> > -					       "number of threads");
> >  			/*
> >  			 * Always select the oldest connection. It's not fair,
> > -			 * but so is life
> > +			 * but nor is life.
> >  			 */
> > -			xprt = list_entry(serv->sv_tempsocks.prev,
> > -					  struct svc_xprt,
> > -					  xpt_list);
> > -			set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > -			svc_xprt_get(xprt);
> > +			list_for_each_entry_reverse(xprti, &serv->sv_tempsocks,
> > +						    xpt_list)
> > +			{
> > +				if (!test_bit(XPT_PEER_VALID, &xprti->xpt_flags)) {
> > +					xprt = xprti;
> > +					set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > +					svc_xprt_get(xprt);
> > +					break;
> > +				}
> > +			}
> >  		}
> >  		spin_unlock_bh(&serv->sv_lock);
> >  
> > @@ -1039,7 +1039,8 @@ static void svc_delete_xprt(struct svc_xprt *xprt)
> >  
> >  	spin_lock_bh(&serv->sv_lock);
> >  	list_del_init(&xprt->xpt_list);
> > -	if (test_bit(XPT_TEMP, &xprt->xpt_flags))
> > +	if (test_bit(XPT_TEMP, &xprt->xpt_flags) &&
> > +	    !test_bit(XPT_PEER_VALID, &xprt->xpt_flags))
> >  		serv->sv_tmpcnt--;
> >  	spin_unlock_bh(&serv->sv_lock);
> >  
> 
> Other than the comment about lockd, I like this:
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients.
  2024-10-23 13:42   ` Chuck Lever
@ 2024-10-23 21:47     ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2024-10-23 21:47 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs, Dai Ngo, Tom Talpey

On Thu, 24 Oct 2024, Chuck Lever wrote:
> On Wed, Oct 23, 2024 at 01:37:02PM +1100, NeilBrown wrote:
> > If there are more non-courteous clients than the calculated limit, we
> > should fail the request rather than report a soft failure and
> > encouraging the client to retry indefinitely.
> 
> Discussion:
> 
> This change has the potential to cause behavior regressions. I'm not
> sure how clients will behave (eg, what error is reported to
> administrators) if EXCH_ID / SETCLIENTID returns SERVERFAULT.

Linux client will issues a pr_warn() and return EIO for mount.
If lease recovery is needed (e.g.  server boot) then I think the client
will retry indefinitely.  There isn't much else it can do.

> 
> I can't find a more suitable status code than SERVERFAULT, however.

Maybe we should never fail.  Always allow at least 1 slot ??

> 
> There is also the question of whether CREATE_SESSION, which also
> might fail when server resources are over-extended, could return a
> similar hard failure. (CREATE_SESSION has other spec-mandated
> restrictions around using NFS4ERR_DELAY, however).

Would that only be in the case of kmalloc() failure?  I think that is a
significantly different circumstance.  kmalloc() for small values never
fails in practice (citation needed).  Caches get cleans and pages are
written to swap and big processes are killed until something can be
freed.

This contrasts with that code in exchange_id which tries to guess an
amount of memory that shouldn't put too much burden on the server and so
should always be safe to allocate without risking OOM killing.

> 
> 
> > The only hard failure allowed for EXCHANGE_ID that doesn't clearly have
> > some other meaning is NFS4ERR_SERVERFAULT.  So use that, but explain why
> > in a comment at each place that it is returned.
> > 
> > If there are courteous clients which push us over the limit, then expedite
> > their removal.
> > 
> > This is not known to have caused a problem is production use, but
> 
> The current DELAY behavior is known to trigger an (interruptible)
> infinite loop when a small-memory server can't create a new session.
> Thus I believe the infinite loop behavior is a real issue that has
> been observed and reported.

I knew it had been observed with test code.  I didn't know it had be
reported for a genuine use-case.

> 
> 
> > testing of lots of clients reports repeated NFS4ERR_DELAY responses
> > which doesn't seem helpful.
> 
> No argument from me. NFSD takes the current approach for exactly the
> reason you mention above: there isn't a good choice of status code
> to return in this case.
> 
> Nit: the description might better explain how this change is related
> to or required by on-demand thread allocation. It seems a little
> orthogonal to me right now. NBD.

Yes, it is a bit of a tangent.  It might be seen as prep for the next
patch, but maybe it is completely independent..

Thanks,
NeilBrown


> 
> 
> > Also remove an outdated comment - we do use a slab cache.
> > 
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfs4state.c | 30 ++++++++++++++++++++----------
> >  1 file changed, 20 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 56b261608af4..ca6b5b52f77d 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -2212,21 +2212,20 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
> >  	return 1;
> >  }
> >  
> > -/* 
> > - * XXX Should we use a slab cache ?
> > - * This type of memory management is somewhat inefficient, but we use it
> > - * anyway since SETCLIENTID is not a common operation.
> > - */
> >  static struct nfs4_client *alloc_client(struct xdr_netobj name,
> >  				struct nfsd_net *nn)
> >  {
> >  	struct nfs4_client *clp;
> >  	int i;
> >  
> > -	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
> > +	if (atomic_read(&nn->nfs4_client_count) -
> > +	    atomic_read(&nn->nfsd_courtesy_clients) >= nn->nfs4_max_clients)
> > +		return ERR_PTR(-EUSERS);
> > +
> > +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
> > +	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
> >  		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> > -		return NULL;
> > -	}
> > +
> >  	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
> >  	if (clp == NULL)
> >  		return NULL;
> > @@ -3121,8 +3120,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
> >  	struct dentry *dentries[ARRAY_SIZE(client_files)];
> >  
> >  	clp = alloc_client(name, nn);
> > -	if (clp == NULL)
> > -		return NULL;
> > +	if (IS_ERR_OR_NULL(clp))
> > +		return clp;
> >  
> >  	ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred);
> >  	if (ret) {
> > @@ -3504,6 +3503,11 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	new = create_client(exid->clname, rqstp, &verf);
> >  	if (new == NULL)
> >  		return nfserr_jukebox;
> > +	if (IS_ERR(new))
> > +		/* Protocol has no specific error for "client limit reached".
> > +		 * NFS4ERR_RESOURCE is not permitted for EXCHANGE_ID
> > +		 */
> > +		return nfserr_serverfault;
> >  	status = copy_impl_id(new, exid);
> >  	if (status)
> >  		goto out_nolock;
> > @@ -4422,6 +4426,12 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	new = create_client(clname, rqstp, &clverifier);
> >  	if (new == NULL)
> >  		return nfserr_jukebox;
> > +	if (IS_ERR(new))
> > +		/* Protocol has no specific error for "client limit reached".
> > +		 * NFS4ERR_RESOURCE, while allowed for SETCLIENTID, implies
> > +		 * that a smaller COMPOUND might be successful.
> > +		 */
> > +		return nfserr_serverfault;
> >  	spin_lock(&nn->client_lock);
> >  	conf = find_confirmed_client_by_name(&clname, nn);
> >  	if (conf && client_has_state(conf)) {
> > -- 
> > 2.46.0
> > 
> 
> -- 
> Chuck Lever
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
  2024-10-23 16:34     ` Tom Talpey
@ 2024-10-23 21:53       ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2024-10-23 21:53 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Chuck Lever, Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo

On Thu, 24 Oct 2024, Tom Talpey wrote:
> On 10/23/2024 9:55 AM, Chuck Lever wrote:
> > On Wed, Oct 23, 2024 at 01:37:03PM +1100, NeilBrown wrote:
> >> Currently per-client DRC slot limits (for v4.1+) are calculated when the
> >> client connects, and they are left unchanged.  So earlier clients can
> >> get a larger share when memory is tight.
> >>
> >> The heuristic for choosing a number includes the number of configured
> >> server threads.  This is problematic for 2 reasons.
> >> 1/ sv_nrthreads is different in different net namespaces, but the
> >>     memory allocation is global across all namespaces.  So different
> >>     namespaces get treated differently without good reason.
> >> 2/ a future patch will auto-configure number of threads based on
> >>     load so that there will be no need to preconfigure a number.  This will
> >>     make the current heuristic even more arbitrary.
> >>
> >> NFSv4.1 allows the number of slots to be varied dynamically - in the
> >> reply to each SEQUENCE op.  With this patch we provide a provisional
> >> upper limit in the EXCHANGE_ID reply which might end up being too big,
> >> and adjust it with each SEQUENCE reply.
> >>
> >> The goal for when memory is tight is to allow each client to have a
> >> similar number of slots.  So clients that ask for larger slots get more
> >> memory.   This may not be ideal.  It could be changed later.
> >>
> >> So we track the sum of the slot sizes of all active clients, and share
> >> memory out based on the ratio of the slot size for a given client with
> >> the sum of slot sizes.  We never allow more in a SEQUENCE reply than we
> >> did in the EXCHANGE_ID reply.
> >>
> >> Signed-off-by: NeilBrown <neilb@suse.de>
> > 
> > Dynamic session slot table sizing is one of our major "to-do" items
> > for NFSD, and it seems to have potential for breaking things if we
> > don't get it right. I'd prefer to prioritize getting this one merged
> > into nfsd-next first, it seems like an important change to have.
> 
> I agree but I do have one comment:
> 
> + * Report the number of slots that we would like the client to limit
> + * itself to.  When the number of clients is large, we start sharing
> + * memory so all clients get the same number of slots.
> 
> That second sentence is a policy that is undoubtedly going to change
> someday. Also, the "number of clients is large" is not exactly what's
> being checked here, it's only looking at the current demand on whatever
> the shared pool it. I'd just delete the second sentence.

I think the calculations are slightly complex and deserve a comment
explaining the high-level goal.  So "sharing memory so all clients get
the same number of slots" is the bit I particularly want to keep.  I
agree that the code might be changed and we would need to ensure that
comment is changed too.  Maybe put the comment closer to the code?
I don't need to keep "number of clients is large".

> 
> Don't forget that the v4.1+ DRC doesn't have to be preallocated. By
> design it's dynamic, for example if the client never performs any
> non-idempotent operations, it can be completely null. Personally
> I'd love to see that implemented someday.

We don't exactly preallocate it.  Rather we prevent clients from asking
for more than we guess that we can comfortably allocate on demand.

The more I think about this, the more I think we should never ever fail.
IF there really are an enormous number of clients, they each get one
slot.

Thanks,
NeilBrown


> 
> Tom.
> 
> > 
> > 
> >> ---
> >>   fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++---------------------
> >>   fs/nfsd/nfs4xdr.c   |  2 +-
> >>   fs/nfsd/nfsd.h      |  6 +++-
> >>   fs/nfsd/nfssvc.c    |  7 ++--
> >>   fs/nfsd/state.h     |  2 +-
> >>   fs/nfsd/xdr4.h      |  2 --
> >>   6 files changed, 56 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> >> index ca6b5b52f77d..834e9aa12b82 100644
> >> --- a/fs/nfsd/nfs4state.c
> >> +++ b/fs/nfsd/nfs4state.c
> >> @@ -1909,44 +1909,26 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
> >>   }
> >>   
> >>   /*
> >> - * XXX: If we run out of reserved DRC memory we could (up to a point)
> >> - * re-negotiate active sessions and reduce their slot usage to make
> >> - * room for new connections. For now we just fail the create session.
> >> + * When a client connects it gets a max_requests number that would allow
> >> + * it to use 1/8 of the memory we think can reasonably be used for the DRC.
> >> + * Later in response to SEQUENCE operations we further limit that when there
> >> + * are more than 8 concurrent clients.
> >>    */
> >> -static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
> >> +static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
> >>   {
> >>   	u32 slotsize = slot_bytes(ca);
> >>   	u32 num = ca->maxreqs;
> >> -	unsigned long avail, total_avail;
> >> -	unsigned int scale_factor;
> >> +	unsigned long avail;
> >>   
> >>   	spin_lock(&nfsd_drc_lock);
> >> -	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> >> -		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> >> -	else
> >> -		/* We have handed out more space than we chose in
> >> -		 * set_max_drc() to allow.  That isn't really a
> >> -		 * problem as long as that doesn't make us think we
> >> -		 * have lots more due to integer overflow.
> >> -		 */
> >> -		total_avail = 0;
> >> -	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> >> -	/*
> >> -	 * Never use more than a fraction of the remaining memory,
> >> -	 * unless it's the only way to give this client a slot.
> >> -	 * The chosen fraction is either 1/8 or 1/number of threads,
> >> -	 * whichever is smaller.  This ensures there are adequate
> >> -	 * slots to support multiple clients per thread.
> >> -	 * Give the client one slot even if that would require
> >> -	 * over-allocation--it is better than failure.
> >> -	 */
> >> -	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
> >>   
> >> -	avail = clamp_t(unsigned long, avail, slotsize,
> >> -			total_avail/scale_factor);
> >> -	num = min_t(int, num, avail / slotsize);
> >> -	num = max_t(int, num, 1);
> >> -	nfsd_drc_mem_used += num * slotsize;
> >> +	avail = min(NFSD_MAX_MEM_PER_SESSION,
> >> +		    nfsd_drc_max_mem / 8);
> >> +
> >> +	num = clamp_t(int, num, 1, avail / slotsize);
> >> +
> >> +	nfsd_drc_slotsize_sum += slotsize;
> >> +
> >>   	spin_unlock(&nfsd_drc_lock);
> >>   
> >>   	return num;
> >> @@ -1957,10 +1939,33 @@ static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
> >>   	int slotsize = slot_bytes(ca);
> >>   
> >>   	spin_lock(&nfsd_drc_lock);
> >> -	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
> >> +	nfsd_drc_slotsize_sum -= slotsize;
> >>   	spin_unlock(&nfsd_drc_lock);
> >>   }
> >>   
> >> +/*
> >> + * Report the number of slots that we would like the client to limit
> >> + * itself to.  When the number of clients is large, we start sharing
> >> + * memory so all clients get the same number of slots.
> >> + */
> >> +static unsigned int nfsd4_get_drc_slots(struct nfsd4_session *session)
> >> +{
> >> +	u32 slotsize = slot_bytes(&session->se_fchannel);
> >> +	unsigned long avail;
> >> +	unsigned long slotsize_sum = READ_ONCE(nfsd_drc_slotsize_sum);
> >> +
> >> +	if (slotsize_sum < slotsize)
> >> +		slotsize_sum = slotsize;
> >> +
> >> +	/* Find our share of avail mem across all active clients,
> >> +	 * then limit to 1/8 of total, and configured max
> >> +	 */
> >> +	avail = min3(nfsd_drc_max_mem * slotsize / slotsize_sum,
> >> +		     nfsd_drc_max_mem / 8,
> >> +		     NFSD_MAX_MEM_PER_SESSION);
> >> +	return max3(1UL, avail / slotsize, session->se_fchannel.maxreqs);
> >> +}
> >> +
> >>   static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
> >>   					   struct nfsd4_channel_attrs *battrs)
> >>   {
> >> @@ -3735,7 +3740,7 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
> >>   	 * Note that we always allow at least one slot, because our
> >>   	 * accounting is soft and provides no guarantees either way.
> >>   	 */
> >> -	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
> >> +	ca->maxreqs = nfsd4_get_drc_mem(ca);
> >>   
> >>   	return nfs_ok;
> >>   }
> >> @@ -4229,10 +4234,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>   	slot = session->se_slots[seq->slotid];
> >>   	dprintk("%s: slotid %d\n", __func__, seq->slotid);
> >>   
> >> -	/* We do not negotiate the number of slots yet, so set the
> >> -	 * maxslots to the session maxreqs which is used to encode
> >> -	 * sr_highest_slotid and the sr_target_slot id to maxslots */
> >> -	seq->maxslots = session->se_fchannel.maxreqs;
> >> +	/* Negotiate number of slots: set the target, and use the
> >> +	 * same for max as long as it doesn't decrease the max
> >> +	 * (that isn't allowed).
> >> +	 */
> >> +	seq->target_maxslots = nfsd4_get_drc_slots(session);
> >> +	seq->maxslots = max(seq->maxslots, seq->target_maxslots);
> >>   
> >>   	trace_nfsd_slot_seqid_sequence(clp, seq, slot);
> >>   	status = check_slot_seqid(seq->seqid, slot->sl_seqid,
> >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >> index f118921250c3..e4e706872e54 100644
> >> --- a/fs/nfsd/nfs4xdr.c
> >> +++ b/fs/nfsd/nfs4xdr.c
> >> @@ -4955,7 +4955,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
> >>   	if (nfserr != nfs_ok)
> >>   		return nfserr;
> >>   	/* sr_target_highest_slotid */
> >> -	nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
> >> +	nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
> >>   	if (nfserr != nfs_ok)
> >>   		return nfserr;
> >>   	/* sr_status_flags */
> >> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> >> index 4b56ba1e8e48..33c9db3ee8b6 100644
> >> --- a/fs/nfsd/nfsd.h
> >> +++ b/fs/nfsd/nfsd.h
> >> @@ -90,7 +90,11 @@ extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
> >>   extern struct mutex		nfsd_mutex;
> >>   extern spinlock_t		nfsd_drc_lock;
> >>   extern unsigned long		nfsd_drc_max_mem;
> >> -extern unsigned long		nfsd_drc_mem_used;
> >> +/* Storing the sum of the slot sizes for all active clients (across
> >> + * all net-namespaces) allows a share of total available memory to
> >> + * be allocaed to each client based on its slot size.
> >> + */
> >> +extern unsigned long		nfsd_drc_slotsize_sum;
> >>   extern atomic_t			nfsd_th_cnt;		/* number of available threads */
> >>   
> >>   extern const struct seq_operations nfs_exports_op;
> >> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> >> index 49e2f32102ab..e596eb5d10db 100644
> >> --- a/fs/nfsd/nfssvc.c
> >> +++ b/fs/nfsd/nfssvc.c
> >> @@ -78,7 +78,7 @@ DEFINE_MUTEX(nfsd_mutex);
> >>    */
> >>   DEFINE_SPINLOCK(nfsd_drc_lock);
> >>   unsigned long	nfsd_drc_max_mem;
> >> -unsigned long	nfsd_drc_mem_used;
> >> +unsigned long	nfsd_drc_slotsize_sum;
> >>   
> >>   #if IS_ENABLED(CONFIG_NFS_LOCALIO)
> >>   static const struct svc_version *localio_versions[] = {
> >> @@ -589,10 +589,13 @@ void nfsd_reset_versions(struct nfsd_net *nn)
> >>    */
> >>   static void set_max_drc(void)
> >>   {
> >> +	if (nfsd_drc_max_mem)
> >> +		return;
> >> +
> >>   	#define NFSD_DRC_SIZE_SHIFT	7
> >>   	nfsd_drc_max_mem = (nr_free_buffer_pages()
> >>   					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
> >> -	nfsd_drc_mem_used = 0;
> >> +	nfsd_drc_slotsize_sum = 0;
> >>   	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
> >>   }
> >>   
> >> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> >> index 79c743c01a47..fe71ae3c577b 100644
> >> --- a/fs/nfsd/state.h
> >> +++ b/fs/nfsd/state.h
> >> @@ -214,7 +214,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
> >>   /* Maximum number of slots per session. 160 is useful for long haul TCP */
> >>   #define NFSD_MAX_SLOTS_PER_SESSION     160
> >>   /* Maximum  session per slot cache size */
> >> -#define NFSD_SLOT_CACHE_SIZE		2048
> >> +#define NFSD_SLOT_CACHE_SIZE		2048UL
> >>   /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
> >>   #define NFSD_CACHE_SIZE_SLOTS_PER_SESSION	32
> >>   #define NFSD_MAX_MEM_PER_SESSION  \
> >> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> >> index 2a21a7662e03..71b87190a4a6 100644
> >> --- a/fs/nfsd/xdr4.h
> >> +++ b/fs/nfsd/xdr4.h
> >> @@ -575,9 +575,7 @@ struct nfsd4_sequence {
> >>   	u32			slotid;			/* request/response */
> >>   	u32			maxslots;		/* request/response */
> >>   	u32			cachethis;		/* request */
> >> -#if 0
> >>   	u32			target_maxslots;	/* response */
> >> -#endif /* not yet */
> >>   	u32			status_flags;		/* response */
> >>   };
> >>   
> >> -- 
> >> 2.46.0
> >>
> > 
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual
  2024-10-23  2:37 ` [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
  2024-10-23 13:32   ` Jeff Layton
@ 2024-10-30  6:35   ` kernel test robot
  1 sibling, 0 replies; 21+ messages in thread
From: kernel test robot @ 2024-10-30  6:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: oe-lkp, lkp, linux-nfs, ying.huang, feng.tang, fengwei.yin,
	Chuck Lever, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	oliver.sang



Hello,

kernel test robot noticed a 10.2% regression of fsmark.files_per_sec on:


commit: d7f6562adeebe62458eb11437b260d3f470849cd ("[PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual")
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/SUNRPC-move-nrthreads-counting-to-start-stop-threads/20241023-104539
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link: https://lore.kernel.org/all/20241023024222.691745-7-neilb@suse.de/
patch subject: [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual

testcase: fsmark
config: x86_64-rhel-8.3
compiler: gcc-12
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:

	iterations: 1x
	nr_threads: 32t
	disk: 1SSD
	fs: btrfs
	fs2: nfsv4
	filesize: 8K
	test_size: 400M
	sync_method: fsyncBeforeClose
	nr_directories: 16d
	nr_files_per_directory: 256fpd
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------------------------------+
| testcase: change | fsmark: fsmark.files_per_sec 10.2% regression                                                  |
| test machine     | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory |
| test parameters  | cpufreq_governor=performance                                                                   |
|                  | disk=1SSD                                                                                      |
|                  | filesize=9B                                                                                    |
|                  | fs2=nfsv4                                                                                      |
|                  | fs=btrfs                                                                                       |
|                  | iterations=1x                                                                                  |
|                  | nr_directories=16d                                                                             |
|                  | nr_files_per_directory=256fpd                                                                  |
|                  | nr_threads=32t                                                                                 |
|                  | sync_method=fsyncBeforeClose                                                                   |
|                  | test_size=400M                                                                                 |
+------------------+------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202410301321.d8aebe67-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241030/202410301321.d8aebe67-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/8K/nfsv4/btrfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/400M/fsmark

commit: 
  4e9c43765c ("sunrpc: remove all connection limit configuration")
  d7f6562ade ("sunrpc: introduce possibility that requested number of threads is different from actual")

4e9c43765c3fd361 d7f6562adeebe62458eb11437b2 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.09 ± 78%    +173.9%       0.24 ± 29%  perf-stat.i.major-faults
      0.07 ± 66%     -61.1%       0.03 ± 23%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.02 ± 88%   +1806.2%       0.46 ±196%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    -17.50           -31.4%     -12.00        sched_debug.cpu.nr_uninterruptible.min
      5.32 ± 10%     -11.9%       4.69 ±  8%  sched_debug.cpu.nr_uninterruptible.stddev
  76584407 ±  5%     +40.6%  1.077e+08 ±  4%  fsmark.app_overhead
      3016 ±  2%     -10.2%       2707        fsmark.files_per_sec
     36.83            -7.7%      34.00        fsmark.time.percent_of_cpu_this_job_got
    285398 ±  4%     +10.5%     315320        meminfo.Active
    282671 ±  4%     +10.6%     312596        meminfo.Active(file)
     15709 ± 11%     -28.8%      11189 ±  2%  meminfo.Dirty
     70764 ±  4%     +10.6%      78241        proc-vmstat.nr_active_file
    468998            +5.4%     494463        proc-vmstat.nr_dirtied
      3931 ± 11%     -28.8%       2797 ±  2%  proc-vmstat.nr_dirty
    462247            +6.7%     493402        proc-vmstat.nr_written
     70764 ±  4%     +10.6%      78241        proc-vmstat.nr_zone_active_file
      3620 ± 10%     -30.7%       2507 ±  5%  proc-vmstat.nr_zone_write_pending
   1135721            +2.6%    1165013        proc-vmstat.numa_hit
   1086049            +2.7%    1115353        proc-vmstat.numa_local
     56063 ±  2%     +18.0%      66177        proc-vmstat.pgactivate
   1521044            +2.3%    1555275        proc-vmstat.pgalloc_normal
   2522626           +10.2%    2778923        proc-vmstat.pgpgout
      3.75 ± 26%      -1.9        1.80 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.75 ± 26%      -1.9        1.80 ±  7%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.54 ± 39%      -1.7        0.89 ±100%  perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
      2.54 ± 39%      -1.7        0.89 ±100%  perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write.writen.record__pushfn.perf_mmap__push
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write.writen
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.51 ±141%      +1.6        2.09 ± 29%  perf-profile.calltrace.cycles-pp.evlist_cpu_iterator__next.__evlist__disable.__cmd_record.cmd_record.run_builtin
      0.81 ±100%      +2.2        3.01 ± 44%  perf-profile.calltrace.cycles-pp.__evlist__disable.__cmd_record.cmd_record.run_builtin.handle_internal_command
      3.11 ± 27%      +2.6        5.74 ± 24%  perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.handle_internal_command.main
      3.11 ± 27%      +2.6        5.74 ± 24%  perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.handle_internal_command.main
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.generic_perform_write
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.ksys_write
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.61 ±141%      +1.5        2.08 ± 25%  perf-profile.children.cycles-pp.vfs_write
      0.81 ±100%      +2.2        3.01 ± 44%  perf-profile.children.cycles-pp.__evlist__disable


***************************************************************************************************
lkp-ivb-2ep2: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/9B/nfsv4/btrfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep2/400M/fsmark

commit: 
  4e9c43765c ("sunrpc: remove all connection limit configuration")
  d7f6562ade ("sunrpc: introduce possibility that requested number of threads is different from actual")

4e9c43765c3fd361 d7f6562adeebe62458eb11437b2 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.35            -2.0%       2.30        iostat.cpu.user
    137839 ± 27%     +41.1%     194529 ± 17%  numa-meminfo.node0.Active
    136284 ± 27%     +41.7%     193101 ± 17%  numa-meminfo.node0.Active(file)
      2.80 ± 32%      -1.6        1.16 ± 46%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      2.80 ± 32%      -1.6        1.16 ± 46%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
 1.397e+08 ±  2%     +14.5%  1.599e+08        fsmark.app_overhead
      4135           -10.2%       3712        fsmark.files_per_sec
     48.67            -8.2%      44.67        fsmark.time.percent_of_cpu_this_job_got
     34164 ± 27%     +41.6%      48381 ± 17%  numa-vmstat.node0.nr_active_file
    276516 ± 17%     +26.7%     350296 ± 10%  numa-vmstat.node0.nr_dirtied
    276215 ± 17%     +26.5%     349400 ± 10%  numa-vmstat.node0.nr_written
     34164 ± 27%     +41.6%      48381 ± 17%  numa-vmstat.node0.nr_zone_active_file
     80.71 ± 30%     +38.5%     111.75 ± 20%  sched_debug.cfs_rq:/.removed.load_avg.avg
    270.60 ± 12%     +16.0%     313.94 ±  9%  sched_debug.cfs_rq:/.removed.load_avg.stddev
     23.59 ± 44%     +49.9%      35.35 ± 24%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     23.58 ± 44%     +49.8%      35.33 ± 24%  sched_debug.cfs_rq:/.removed.util_avg.avg
      3281 ± 11%   +1047.0%      37635 ±193%  sched_debug.cpu.avg_idle.min
    598666            +7.0%     640842        proc-vmstat.nr_dirtied
   1074454            +1.1%    1086656        proc-vmstat.nr_file_pages
    197321            +3.7%     204605        proc-vmstat.nr_inactive_file
    598082            +6.8%     638786        proc-vmstat.nr_written
    197321            +3.7%     204605        proc-vmstat.nr_zone_inactive_file
   1716397            +2.3%    1755672        proc-vmstat.numa_hit
   1666723            +2.4%    1705951        proc-vmstat.numa_local
    145474            +7.4%     156311        proc-vmstat.pgactivate
   2183965            +2.0%    2226921        proc-vmstat.pgalloc_normal
   3250818           +10.5%    3591588        proc-vmstat.pgpgout
      0.00 ±223%    +633.3%       0.02 ± 53%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.01 ±223%    +729.3%       0.06 ± 29%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ±223%    +466.7%       0.03 ± 22%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
      0.00 ±223%    +973.3%       0.03 ± 61%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ±223%    +480.6%       0.09 ± 25%  perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.67 ±223%    +506.5%       4.04        perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      3.74 ± 52%    +190.1%      10.85 ± 36%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.68 ±223%    +526.0%       4.25 ±  8%  perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     18.67 ±173%    +375.2%      88.72 ± 23%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.01 ±143%    +212.5%       0.03 ± 47%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.66 ±223%    +504.3%       3.99        perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      3.69 ± 53%    +192.6%      10.79 ± 36%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.67 ±223%    +525.6%       4.17 ±  8%  perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     18.65 ±173%    +375.4%      88.66 ± 23%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/6] prepare for dynamic server thread management
  2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
                   ` (6 preceding siblings ...)
  2024-10-23 14:00 ` [PATCH 0/6] prepare for dynamic server thread management Chuck Lever
@ 2025-10-28 15:47 ` Jeff Layton
  2025-10-28 22:36   ` NeilBrown
  7 siblings, 1 reply; 21+ messages in thread
From: Jeff Layton @ 2025-10-28 15:47 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 2024-10-23 at 13:37 +1100, NeilBrown wrote:
> These patches prepare the way for demand-based adjustment of the number
> of server threads.  They primarily remove some places there the
> configured thread count is used to configure other things.
> 
> With these in place only two more patches are needed to have demand
> based thread count.  The details of how to configure this need to be
> discussed to ensure we have considered all perspectives, and I would
> rather than happen in the context of two patches, not in the context of
> 8.  So I'm sending these first in the hope that will land with minimal
> fuss.  Once they do land I'll send the remainder (which you have already
> seen) and will look forward to a fruitful discussion.
> 
> Thanks,
> NeilBrown
> 
>  [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads.
>  [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there
>  [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
>  [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting
>  [PATCH 5/6] sunrpc: remove all connection limit configuration
>  [PATCH 6/6] sunrpc: introduce possibility that requested number of

Hi Neil,

You sent this just over a year ago. It looks like this set didn't get
merged for some reason. Were you still pursuing this work? I have some
interest in seeing a dynamic thread count work, so if not I'd be
interested in picking up this work.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/6] prepare for dynamic server thread management
  2025-10-28 15:47 ` Jeff Layton
@ 2025-10-28 22:36   ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2025-10-28 22:36 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey

On Wed, 29 Oct 2025, Jeff Layton wrote:
> On Wed, 2024-10-23 at 13:37 +1100, NeilBrown wrote:
> > These patches prepare the way for demand-based adjustment of the number
> > of server threads.  They primarily remove some places there the
> > configured thread count is used to configure other things.
> > 
> > With these in place only two more patches are needed to have demand
> > based thread count.  The details of how to configure this need to be
> > discussed to ensure we have considered all perspectives, and I would
> > rather than happen in the context of two patches, not in the context of
> > 8.  So I'm sending these first in the hope that will land with minimal
> > fuss.  Once they do land I'll send the remainder (which you have already
> > seen) and will look forward to a fruitful discussion.
> > 
> > Thanks,
> > NeilBrown
> > 
> >  [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads.
> >  [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there
> >  [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits.
> >  [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting
> >  [PATCH 5/6] sunrpc: remove all connection limit configuration
> >  [PATCH 6/6] sunrpc: introduce possibility that requested number of
> 
> Hi Neil,
> 
> You sent this just over a year ago. It looks like this set didn't get
> merged for some reason. Were you still pursuing this work? I have some
> interest in seeing a dynamic thread count work, so if not I'd be
> interested in picking up this work.

Hi Jeff,
 thanks for asking.  This is still on my list of things that I want to
 pursue, but it clearly isn't getting any attention at present and is
 unlikely to any time this year.
 So if you are keen to push it that would be most welcome.

 My branch which I think was my most recent version of this has just
 been pushed to github.com/neilbrown/linux branch nfsd-dynathread
 in case there is anything useful there.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-10-28 22:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23  2:37 [PATCH 0/6] prepare for dynamic server thread management NeilBrown
2024-10-23  2:37 ` [PATCH 1/6] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
2024-10-23  2:37 ` [PATCH 2/6] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
2024-10-23 13:42   ` Chuck Lever
2024-10-23 21:47     ` NeilBrown
2024-10-23  2:37 ` [PATCH 3/6] nfs: dynamically adjust per-client DRC slot limits NeilBrown
2024-10-23 11:48   ` Jeff Layton
2024-10-23 13:55   ` Chuck Lever
2024-10-23 16:34     ` Tom Talpey
2024-10-23 21:53       ` NeilBrown
2024-10-23  2:37 ` [PATCH 4/6] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
2024-10-23 12:08   ` Jeff Layton
2024-10-23 21:18     ` NeilBrown
2024-10-23  2:37 ` [PATCH 5/6] sunrpc: remove all connection limit configuration NeilBrown
2024-10-23 12:50   ` Jeff Layton
2024-10-23  2:37 ` [PATCH 6/6] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
2024-10-23 13:32   ` Jeff Layton
2024-10-30  6:35   ` kernel test robot
2024-10-23 14:00 ` [PATCH 0/6] prepare for dynamic server thread management Chuck Lever
2025-10-28 15:47 ` Jeff Layton
2025-10-28 22:36   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox