[PATCH 00/14 RFC] support automatic changes to nfsd thread count

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/14 RFC] support automatic changes to nfsd thread count
@ 2024-07-15  7:14 NeilBrown
  2024-07-15  7:14 ` [PATCH 01/14] lockd: discard nlmsvc_timeout NeilBrown
                   ` (15 more replies)
  0 siblings, 16 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

This patch set (against nfsd-next) enables automatic adjustment of the
number of nfsd threads.  The number can increase under high load, and
reduce after idle periods.

The first few patches (1-6) are cleanups that may not be entirely
relevant to the current series.  They could safely land any time and
only need minimal review.

Patch 9,10,11 remove some places were sv_nrthreads are used for things
other than counting threads.  It is use to adjust other limits.  At the
time this seemed like an easy and sensible solution.  I now have to
repent of that short-cut and find a better way to impose reasonable
limits.

These and the other sundry patches (7,8,12) can, I think safely land
whenever that get sufficient review.  I think they are sensible even if
we won't end up adjusting threads dynamically.

Patches 13 and 14 build on all this to provide the desired
functionality.  Patch 13 allows the maximum to be configured, and patch
14 starts or stops threads based on some simple triggers.

For 13 I decided that if the user/admin makes no explicit configuration,
then the currently request number of threads becomes a minimum, and a
maximum is determined based on the amount of memory.  This will make
the patch set immediately useful but shouldn't unduly impact existing
configurations.

For patch 14 I only implemented starting a thread when there is work to
do but no threads to do it, and stopping a thread when it has been idle
for 5 seconds.  The start-up is deliberately serialised so at least one
NFS request is serviced between the decision to start a thread and the
action of starting it.  This hopefully encourages a ramping up of thread
count rather than a sudden jump.

There is certain room for discussion around the wisdom of these
heuristics, and what other heuristics are needed - we probably want a
shrinker to impose memory pressure of the number of threads.  We
probably want a thread to exit rather than retry when a memory
allocation in svc_alloc_arg() fails.

I certainly wouldn't recommend patch 14 landing in any hurry at all.

I'd love to hear what y'all think, and what experiences you have when
testing it.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/14] lockd: discard nlmsvc_timeout
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15  7:14 ` [PATCH 02/14] SUNRPC: make various functions static, or not exported NeilBrown
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use
that in the one place that nlmsvc_timeout is used.

In truth it *might* not always be the same as nlmsvc_timeout is only set
when lockd is started while nlm_timeout can be set at anytime via
sysctl.  I think this difference it not helpful so removing it is good.

Also remove the test for nlm_timout being 0.  This is not possible -
unless a module parameter is used to set the minimum timeout to 0, and
if that happens then it probably should be honoured.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/lockd/host.c             | 2 +-
 fs/lockd/svc.c              | 7 +------
 include/linux/lockd/lockd.h | 2 +-
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index c11516801784..5e6877c37f73 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -440,7 +440,7 @@ nlm_bind_host(struct nlm_host *host)
 	if ((clnt = host->h_rpcclnt) != NULL) {
 		nlm_rebind_host(host);
 	} else {
-		unsigned long increment = nlmsvc_timeout;
+		unsigned long increment = nlm_timeout * HZ;
 		struct rpc_timeout timeparms = {
 			.to_initval	= increment,
 			.to_increment	= increment,
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index ab8042a5b895..71713309967d 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -53,7 +53,6 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops);
 static DEFINE_MUTEX(nlmsvc_mutex);
 static unsigned int		nlmsvc_users;
 static struct svc_serv		*nlmsvc_serv;
-unsigned long			nlmsvc_timeout;
 
 static void nlmsvc_request_retry(struct timer_list *tl)
 {
@@ -68,7 +67,7 @@ unsigned int lockd_net_id;
  * and also changed through the sysctl interface.  -- Jamie Lokier, Aug 2003
  */
 static unsigned long		nlm_grace_period;
-static unsigned long		nlm_timeout = LOCKD_DFLT_TIMEO;
+unsigned long			nlm_timeout = LOCKD_DFLT_TIMEO;
 static int			nlm_udpport, nlm_tcpport;
 
 /* RLIM_NOFILE defaults to 1024. That seems like a reasonable default here. */
@@ -333,10 +332,6 @@ static int lockd_get(void)
 		printk(KERN_WARNING
 			"lockd_up: no pid, %d users??\n", nlmsvc_users);
 
-	if (!nlm_timeout)
-		nlm_timeout = LOCKD_DFLT_TIMEO;
-	nlmsvc_timeout = nlm_timeout * HZ;
-
 	serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, lockd);
 	if (!serv) {
 		printk(KERN_WARNING "lockd_up: create service failed\n");
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index 1b95fe31051f..61c4b9c41904 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -200,7 +200,7 @@ extern const struct svc_procedure nlmsvc_procedures[24];
 extern const struct svc_procedure nlmsvc_procedures4[24];
 #endif
 extern int			nlmsvc_grace_period;
-extern unsigned long		nlmsvc_timeout;
+extern unsigned long		nlm_timeout;
 extern bool			nsm_use_hostnames;
 extern u32			nsm_local_state;
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/14] SUNRPC: make various functions static, or not exported.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
  2024-07-15  7:14 ` [PATCH 01/14] lockd: discard nlmsvc_timeout NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15  7:14 ` [PATCH 03/14] nfsd: move nfsd_pool_stats_open into nfsctl.c NeilBrown
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

Various functions are only used within the sunrpc module, and several
are only use in the one file.  So clean up:

These are marked static, and any EXPORT is removed.
  svc_rcpb_setup()
  svc_rqst_alloc()
  svc_rqst_free()  - also moved before first use
  svc_rpcbind_set_version()
  svc_drop() - also moved to svc.c

These are now not EXPORTed, but are not static.
  svc_authenticate()
  svc_sock_update_bufs()

Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/sunrpc/svc.h     |  9 -------
 include/linux/sunrpc/svcauth.h |  1 -
 include/linux/sunrpc/svcsock.h |  2 --
 net/sunrpc/sunrpc.h            |  4 +++
 net/sunrpc/svc.c               | 48 ++++++++++++++++++----------------
 net/sunrpc/svc_xprt.c          |  9 -------
 net/sunrpc/svcauth.c           |  1 -
 net/sunrpc/svcsock.c           |  1 -
 8 files changed, 29 insertions(+), 46 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index a7d0406b9ef5..e4fa25fafa97 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -401,17 +401,13 @@ struct svc_procedure {
  */
 int sunrpc_set_pool_mode(const char *val);
 int sunrpc_get_pool_mode(char *val, size_t size);
-int svc_rpcb_setup(struct svc_serv *serv, struct net *net);
 void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net);
 int svc_bind(struct svc_serv *serv, struct net *net);
 struct svc_serv *svc_create(struct svc_program *, unsigned int,
 			    int (*threadfn)(void *data));
-struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
-					struct svc_pool *pool, int node);
 bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
 					 struct page *page);
 void		   svc_rqst_release_pages(struct svc_rqst *rqstp);
-void		   svc_rqst_free(struct svc_rqst *);
 void		   svc_exit_thread(struct svc_rqst *);
 struct svc_serv *  svc_create_pooled(struct svc_program *prog,
 				     struct svc_stat *stats,
@@ -446,11 +442,6 @@ int		   svc_generic_rpcbind_set(struct net *net,
 					   u32 version, int family,
 					   unsigned short proto,
 					   unsigned short port);
-int		   svc_rpcbind_set_version(struct net *net,
-					   const struct svc_program *progp,
-					   u32 version, int family,
-					   unsigned short proto,
-					   unsigned short port);
 
 #define	RPC_MAX_ADDRBUFLEN	(63U)
 
diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index 61c455f1e1f5..63cf6fb26dcc 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -151,7 +151,6 @@ struct auth_ops {
 
 struct svc_xprt;
 
-extern enum svc_auth_status svc_authenticate(struct svc_rqst *rqstp);
 extern rpc_authflavor_t svc_auth_flavor(struct svc_rqst *rqstp);
 extern int	svc_authorise(struct svc_rqst *rqstp);
 extern enum svc_auth_status svc_set_client(struct svc_rqst *rqstp);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 7c78ec6356b9..bf45d9e8492a 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -58,8 +58,6 @@ static inline u32 svc_sock_final_rec(struct svc_sock *svsk)
  */
 void		svc_recv(struct svc_rqst *rqstp);
 void		svc_send(struct svc_rqst *rqstp);
-void		svc_drop(struct svc_rqst *);
-void		svc_sock_update_bufs(struct svc_serv *serv);
 int		svc_addsock(struct svc_serv *serv, struct net *net,
 			    const int fd, char *name_return, const size_t len,
 			    const struct cred *cred);
diff --git a/net/sunrpc/sunrpc.h b/net/sunrpc/sunrpc.h
index d4a362c9e4b3..e3c6e3b63f0b 100644
--- a/net/sunrpc/sunrpc.h
+++ b/net/sunrpc/sunrpc.h
@@ -36,7 +36,11 @@ static inline int sock_is_loopback(struct sock *sk)
 	return loopback;
 }
 
+struct svc_serv;
+struct svc_rqst;
 int rpc_clients_notifier_register(void);
 void rpc_clients_notifier_unregister(void);
 void auth_domain_cleanup(void);
+void svc_sock_update_bufs(struct svc_serv *serv);
+enum svc_auth_status svc_authenticate(struct svc_rqst *rqstp);
 #endif /* _NET_SUNRPC_SUNRPC_H */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index e03f14024e47..072ad115ae3d 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -32,6 +32,7 @@
 #include <trace/events/sunrpc.h>
 
 #include "fail.h"
+#include "sunrpc.h"
 
 #define RPCDBG_FACILITY	RPCDBG_SVCDSP
 
@@ -417,7 +418,7 @@ struct svc_pool *svc_pool_for_cpu(struct svc_serv *serv)
 	return &serv->sv_pools[pidx % serv->sv_nrpools];
 }
 
-int svc_rpcb_setup(struct svc_serv *serv, struct net *net)
+static int svc_rpcb_setup(struct svc_serv *serv, struct net *net)
 {
 	int err;
 
@@ -429,7 +430,6 @@ int svc_rpcb_setup(struct svc_serv *serv, struct net *net)
 	svc_unregister(serv, net);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(svc_rpcb_setup);
 
 void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net)
 {
@@ -664,7 +664,20 @@ svc_release_buffer(struct svc_rqst *rqstp)
 			put_page(rqstp->rq_pages[i]);
 }
 
-struct svc_rqst *
+static void
+svc_rqst_free(struct svc_rqst *rqstp)
+{
+	folio_batch_release(&rqstp->rq_fbatch);
+	svc_release_buffer(rqstp);
+	if (rqstp->rq_scratch_page)
+		put_page(rqstp->rq_scratch_page);
+	kfree(rqstp->rq_resp);
+	kfree(rqstp->rq_argp);
+	kfree(rqstp->rq_auth_data);
+	kfree_rcu(rqstp, rq_rcu_head);
+}
+
+static struct svc_rqst *
 svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node)
 {
 	struct svc_rqst	*rqstp;
@@ -698,7 +711,6 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node)
 	svc_rqst_free(rqstp);
 	return NULL;
 }
-EXPORT_SYMBOL_GPL(svc_rqst_alloc);
 
 static struct svc_rqst *
 svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
@@ -933,24 +945,6 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp)
 	}
 }
 
-/*
- * Called from a server thread as it's exiting. Caller must hold the "service
- * mutex" for the service.
- */
-void
-svc_rqst_free(struct svc_rqst *rqstp)
-{
-	folio_batch_release(&rqstp->rq_fbatch);
-	svc_release_buffer(rqstp);
-	if (rqstp->rq_scratch_page)
-		put_page(rqstp->rq_scratch_page);
-	kfree(rqstp->rq_resp);
-	kfree(rqstp->rq_argp);
-	kfree(rqstp->rq_auth_data);
-	kfree_rcu(rqstp, rq_rcu_head);
-}
-EXPORT_SYMBOL_GPL(svc_rqst_free);
-
 void
 svc_exit_thread(struct svc_rqst *rqstp)
 {
@@ -1098,6 +1092,7 @@ static int __svc_register(struct net *net, const char *progname,
 	return error;
 }
 
+static
 int svc_rpcbind_set_version(struct net *net,
 			    const struct svc_program *progp,
 			    u32 version, int family,
@@ -1108,7 +1103,6 @@ int svc_rpcbind_set_version(struct net *net,
 				version, family, proto, port);
 
 }
-EXPORT_SYMBOL_GPL(svc_rpcbind_set_version);
 
 int svc_generic_rpcbind_set(struct net *net,
 			    const struct svc_program *progp,
@@ -1526,6 +1520,14 @@ svc_process_common(struct svc_rqst *rqstp)
 	goto sendit;
 }
 
+/*
+ * Drop request
+ */
+static void svc_drop(struct svc_rqst *rqstp)
+{
+	trace_svc_drop(rqstp);
+}
+
 /**
  * svc_process - Execute one RPC transaction
  * @rqstp: RPC transaction context
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d3735ab3e6d1..53ebc719ff5a 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -905,15 +905,6 @@ void svc_recv(struct svc_rqst *rqstp)
 }
 EXPORT_SYMBOL_GPL(svc_recv);
 
-/*
- * Drop request
- */
-void svc_drop(struct svc_rqst *rqstp)
-{
-	trace_svc_drop(rqstp);
-}
-EXPORT_SYMBOL_GPL(svc_drop);
-
 /**
  * svc_send - Return reply to client
  * @rqstp: RPC transaction context
diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c
index 1619211f0960..93d9e949e265 100644
--- a/net/sunrpc/svcauth.c
+++ b/net/sunrpc/svcauth.c
@@ -98,7 +98,6 @@ enum svc_auth_status svc_authenticate(struct svc_rqst *rqstp)
 	rqstp->rq_authop = aops;
 	return aops->accept(rqstp);
 }
-EXPORT_SYMBOL_GPL(svc_authenticate);
 
 /**
  * svc_set_client - Assign an appropriate 'auth_domain' as the client
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 6b3f01beb294..825ec5357691 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1378,7 +1378,6 @@ void svc_sock_update_bufs(struct svc_serv *serv)
 		set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
 	spin_unlock_bh(&serv->sv_lock);
 }
-EXPORT_SYMBOL_GPL(svc_sock_update_bufs);
 
 /*
  * Initialize socket for RPC use and create svc_sock struct
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/14] nfsd: move nfsd_pool_stats_open into nfsctl.c
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
  2024-07-15  7:14 ` [PATCH 01/14] lockd: discard nlmsvc_timeout NeilBrown
  2024-07-15  7:14 ` [PATCH 02/14] SUNRPC: make various functions static, or not exported NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15  7:14 ` [PATCH 04/14] nfsd: don't allocate the versions array NeilBrown
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

nfsd_pool_stats_open() is used in nfsctl.c, so move it there.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfsctl.c | 7 +++++++
 fs/nfsd/nfsd.h   | 2 --
 fs/nfsd/nfssvc.c | 7 -------
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9e0ea6fc2aa3..9b47723fc110 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -174,6 +174,13 @@ static int export_features_show(struct seq_file *m, void *v)
 
 DEFINE_SHOW_ATTRIBUTE(export_features);
 
+static int nfsd_pool_stats_open(struct inode *inode, struct file *file)
+{
+	struct nfsd_net *nn = net_generic(inode->i_sb->s_fs_info, nfsd_net_id);
+
+	return svc_pool_stats_open(&nn->nfsd_info, file);
+}
+
 static const struct file_operations pool_stats_operations = {
 	.open		= nfsd_pool_stats_open,
 	.read		= seq_read,
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index cec8697b1cd6..39e109a7d56d 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -111,8 +111,6 @@ int		nfsd_nrthreads(struct net *);
 int		nfsd_nrpools(struct net *);
 int		nfsd_get_nrthreads(int n, int *, struct net *);
 int		nfsd_set_nrthreads(int n, int *, struct net *);
-int		nfsd_pool_stats_open(struct inode *, struct file *);
-int		nfsd_pool_stats_release(struct inode *, struct file *);
 void		nfsd_shutdown_threads(struct net *net);
 
 bool		i_am_nfsd(void);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 0bc8eaa5e009..f25b26bc5670 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -1084,10 +1084,3 @@ bool nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 {
 	return true;
 }
-
-int nfsd_pool_stats_open(struct inode *inode, struct file *file)
-{
-	struct nfsd_net *nn = net_generic(inode->i_sb->s_fs_info, nfsd_net_id);
-
-	return svc_pool_stats_open(&nn->nfsd_info, file);
-}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/14] nfsd: don't allocate the versions array.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (2 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 03/14] nfsd: move nfsd_pool_stats_open into nfsctl.c NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-08-02 21:34   ` Mike Snitzer
  2024-07-15  7:14 ` [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int NeilBrown
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

Instead of using kmalloc to allocate an array for storing active version
info, just declare and array to the max size - it is only 5 or so.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/nfs4state.c |   2 +
 fs/nfsd/cache.h    |   2 +-
 fs/nfsd/netns.h    |   6 +--
 fs/nfsd/nfsctl.c   |  10 +++--
 fs/nfsd/nfsd.h     |   9 +++-
 fs/nfsd/nfssvc.c   | 100 ++++++++-------------------------------------
 6 files changed, 36 insertions(+), 93 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5b452411e8fd..68c663626480 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1953,6 +1953,8 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
 				if (lost_locks)
 					pr_warn("NFS: %s: lost %d locks\n",
 						clp->cl_hostname, lost_locks);
+				nfs4_free_state_owners(&freeme);
+
 				set_bit(ops->owner_flag_bit, &sp->so_flags);
 				nfs4_put_state_owner(sp);
 				status = nfs4_recovery_handle_error(clp, status);
diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
index 66a05fefae98..bb7addef4a31 100644
--- a/fs/nfsd/cache.h
+++ b/fs/nfsd/cache.h
@@ -10,7 +10,7 @@
 #define NFSCACHE_H
 
 #include <linux/sunrpc/svc.h>
-#include "netns.h"
+#include "nfsd.h"
 
 /*
  * Representation of a reply cache entry.
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 14ec15656320..238fc4e56e53 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -152,8 +152,8 @@ struct nfsd_net {
 	/*
 	 * Version information
 	 */
-	bool *nfsd_versions;
-	bool *nfsd4_minorversions;
+	bool nfsd_versions[NFSD_MAXVERS + 1];
+	bool nfsd4_minorversions[NFSD_SUPPORTED_MINOR_VERSION + 1];
 
 	/*
 	 * Duplicate reply cache
@@ -219,8 +219,6 @@ struct nfsd_net {
 #define nfsd_netns_ready(nn) ((nn)->sessionid_hashtbl)
 
 extern bool nfsd_support_version(int vers);
-extern void nfsd_netns_free_versions(struct nfsd_net *nn);
-
 extern unsigned int nfsd_net_id;
 
 void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9b47723fc110..5b0f2e0d7ccf 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2232,8 +2232,9 @@ int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info)
  */
 static __net_init int nfsd_net_init(struct net *net)
 {
-	int retval;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	int retval;
+	int i;
 
 	retval = nfsd_export_init(net);
 	if (retval)
@@ -2247,8 +2248,10 @@ static __net_init int nfsd_net_init(struct net *net)
 		goto out_repcache_error;
 	memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
 	nn->nfsd_svcstats.program = &nfsd_program;
-	nn->nfsd_versions = NULL;
-	nn->nfsd4_minorversions = NULL;
+	for (i = 0; i < sizeof(nn->nfsd_versions); i++)
+		nn->nfsd_versions[i] = nfsd_support_version(i);
+	for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++)
+		nn->nfsd4_minorversions[i] = nfsd_support_version(4);
 	nn->nfsd_info.mutex = &nfsd_mutex;
 	nn->nfsd_serv = NULL;
 	nfsd4_init_leases_net(nn);
@@ -2279,7 +2282,6 @@ static __net_exit void nfsd_net_exit(struct net *net)
 	percpu_counter_destroy_many(nn->counter, NFSD_STATS_COUNTERS_NUM);
 	nfsd_idmap_shutdown(net);
 	nfsd_export_shutdown(net);
-	nfsd_netns_free_versions(nn);
 }
 
 static struct pernet_operations nfsd_net_ops = {
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 39e109a7d56d..369c3b3ce53e 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -23,9 +23,7 @@
 
 #include <uapi/linux/nfsd/debug.h>
 
-#include "netns.h"
 #include "export.h"
-#include "stats.h"
 
 #undef ifdebug
 #ifdef CONFIG_SUNRPC_DEBUG
@@ -37,7 +35,14 @@
 /*
  * nfsd version
  */
+#define NFSD_MINVERS			2
+#define	NFSD_MAXVERS			4
 #define NFSD_SUPPORTED_MINOR_VERSION	2
+bool nfsd_support_version(int vers);
+
+#include "netns.h"
+#include "stats.h"
+
 /*
  * Maximum blocksizes supported by daemon under various circumstances.
  */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index f25b26bc5670..4438cdcd4873 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -116,15 +116,12 @@ static const struct svc_version *nfsd_version[] = {
 #endif
 };
 
-#define NFSD_MINVERS    	2
-#define NFSD_NRVERS		ARRAY_SIZE(nfsd_version)
-
 struct svc_program		nfsd_program = {
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
 	.pg_next		= &nfsd_acl_program,
 #endif
 	.pg_prog		= NFS_PROGRAM,		/* program number */
-	.pg_nvers		= NFSD_NRVERS,		/* nr of entries in nfsd_version */
+	.pg_nvers		= NFSD_MAXVERS+1,	/* nr of entries in nfsd_version */
 	.pg_vers		= nfsd_version,		/* version table */
 	.pg_name		= "nfsd",		/* program name */
 	.pg_class		= "nfsd",		/* authentication class */
@@ -135,78 +132,24 @@ struct svc_program		nfsd_program = {
 
 bool nfsd_support_version(int vers)
 {
-	if (vers >= NFSD_MINVERS && vers < NFSD_NRVERS)
+	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
 		return nfsd_version[vers] != NULL;
 	return false;
 }
 
-static bool *
-nfsd_alloc_versions(void)
-{
-	bool *vers = kmalloc_array(NFSD_NRVERS, sizeof(bool), GFP_KERNEL);
-	unsigned i;
-
-	if (vers) {
-		/* All compiled versions are enabled by default */
-		for (i = 0; i < NFSD_NRVERS; i++)
-			vers[i] = nfsd_support_version(i);
-	}
-	return vers;
-}
-
-static bool *
-nfsd_alloc_minorversions(void)
-{
-	bool *vers = kmalloc_array(NFSD_SUPPORTED_MINOR_VERSION + 1,
-			sizeof(bool), GFP_KERNEL);
-	unsigned i;
-
-	if (vers) {
-		/* All minor versions are enabled by default */
-		for (i = 0; i <= NFSD_SUPPORTED_MINOR_VERSION; i++)
-			vers[i] = nfsd_support_version(4);
-	}
-	return vers;
-}
-
-void
-nfsd_netns_free_versions(struct nfsd_net *nn)
-{
-	kfree(nn->nfsd_versions);
-	kfree(nn->nfsd4_minorversions);
-	nn->nfsd_versions = NULL;
-	nn->nfsd4_minorversions = NULL;
-}
-
-static void
-nfsd_netns_init_versions(struct nfsd_net *nn)
-{
-	if (!nn->nfsd_versions) {
-		nn->nfsd_versions = nfsd_alloc_versions();
-		nn->nfsd4_minorversions = nfsd_alloc_minorversions();
-		if (!nn->nfsd_versions || !nn->nfsd4_minorversions)
-			nfsd_netns_free_versions(nn);
-	}
-}
-
 int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change)
 {
-	if (vers < NFSD_MINVERS || vers >= NFSD_NRVERS)
+	if (vers < NFSD_MINVERS || vers > NFSD_MAXVERS)
 		return 0;
 	switch(change) {
 	case NFSD_SET:
-		if (nn->nfsd_versions)
-			nn->nfsd_versions[vers] = nfsd_support_version(vers);
+		nn->nfsd_versions[vers] = nfsd_support_version(vers);
 		break;
 	case NFSD_CLEAR:
-		nfsd_netns_init_versions(nn);
-		if (nn->nfsd_versions)
-			nn->nfsd_versions[vers] = false;
+		nn->nfsd_versions[vers] = false;
 		break;
 	case NFSD_TEST:
-		if (nn->nfsd_versions)
-			return nn->nfsd_versions[vers];
-		fallthrough;
+		return nn->nfsd_versions[vers];
 	case NFSD_AVAIL:
 		return nfsd_support_version(vers);
 	}
@@ -233,23 +176,16 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
 
 	switch(change) {
 	case NFSD_SET:
-		if (nn->nfsd4_minorversions) {
-			nfsd_vers(nn, 4, NFSD_SET);
-			nn->nfsd4_minorversions[minorversion] =
-				nfsd_vers(nn, 4, NFSD_TEST);
-		}
+		nfsd_vers(nn, 4, NFSD_SET);
+		nn->nfsd4_minorversions[minorversion] =
+			nfsd_vers(nn, 4, NFSD_TEST);
 		break;
 	case NFSD_CLEAR:
-		nfsd_netns_init_versions(nn);
-		if (nn->nfsd4_minorversions) {
-			nn->nfsd4_minorversions[minorversion] = false;
-			nfsd_adjust_nfsd_versions4(nn);
-		}
+		nn->nfsd4_minorversions[minorversion] = false;
+		nfsd_adjust_nfsd_versions4(nn);
 		break;
 	case NFSD_TEST:
-		if (nn->nfsd4_minorversions)
-			return nn->nfsd4_minorversions[minorversion];
-		return nfsd_vers(nn, 4, NFSD_TEST);
+		return nn->nfsd4_minorversions[minorversion];
 	case NFSD_AVAIL:
 		return minorversion <= NFSD_SUPPORTED_MINOR_VERSION &&
 			nfsd_vers(nn, 4, NFSD_AVAIL);
@@ -568,11 +504,11 @@ void nfsd_reset_versions(struct nfsd_net *nn)
 {
 	int i;
 
-	for (i = 0; i < NFSD_NRVERS; i++)
+	for (i = 0; i <= NFSD_MAXVERS; i++)
 		if (nfsd_vers(nn, i, NFSD_TEST))
 			return;
 
-	for (i = 0; i < NFSD_NRVERS; i++)
+	for (i = 0; i <= NFSD_MAXVERS; i++)
 		if (i != 4)
 			nfsd_vers(nn, i, NFSD_SET);
 		else {
@@ -905,17 +841,17 @@ nfsd_init_request(struct svc_rqst *rqstp,
 	if (likely(nfsd_vers(nn, rqstp->rq_vers, NFSD_TEST)))
 		return svc_generic_init_request(rqstp, progp, ret);
 
-	ret->mismatch.lovers = NFSD_NRVERS;
-	for (i = NFSD_MINVERS; i < NFSD_NRVERS; i++) {
+	ret->mismatch.lovers = NFSD_MAXVERS + 1;
+	for (i = NFSD_MINVERS; i <= NFSD_MAXVERS; i++) {
 		if (nfsd_vers(nn, i, NFSD_TEST)) {
 			ret->mismatch.lovers = i;
 			break;
 		}
 	}
-	if (ret->mismatch.lovers == NFSD_NRVERS)
+	if (ret->mismatch.lovers > NFSD_MAXVERS)
 		return rpc_prog_unavail;
 	ret->mismatch.hivers = NFSD_MINVERS;
-	for (i = NFSD_NRVERS - 1; i >= NFSD_MINVERS; i--) {
+	for (i = NFSD_MAXVERS; i >= NFSD_MINVERS; i--) {
 		if (nfsd_vers(nn, i, NFSD_TEST)) {
 			ret->mismatch.hivers = i;
 			break;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (3 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 04/14] nfsd: don't allocate the versions array NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15 14:12   ` Jeff Layton
  2024-07-15  7:14 ` [PATCH 06/14] sunrpc: don't take ->sv_lock when updating ->sv_nrthreads NeilBrown
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

sp_nrthreads is only ever accessed under the service mutex
  nlmsvc_mutex nfs_callback_mutex nfsd_mutex
so these is no need for it to be an atomic_t.

The fact that all code using it is single-threaded means that we can
simplify svc_pool_victim and remove the temporary elevation of
sp_nrthreads.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfsctl.c           |  2 +-
 fs/nfsd/nfssvc.c           |  2 +-
 include/linux/sunrpc/svc.h |  4 ++--
 net/sunrpc/svc.c           | 31 +++++++++++--------------------
 4 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 5b0f2e0d7ccf..d85b6d1fa31f 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1769,7 +1769,7 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb, struct genl_info *info)
 			struct svc_pool *sp = &nn->nfsd_serv->sv_pools[i];
 
 			err = nla_put_u32(skb, NFSD_A_SERVER_THREADS,
-					  atomic_read(&sp->sp_nrthreads));
+					  sp->sp_nrthreads);
 			if (err)
 				goto err_unlock;
 		}
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 4438cdcd4873..7377422a34df 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -641,7 +641,7 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net)
 
 	if (serv)
 		for (i = 0; i < serv->sv_nrpools && i < n; i++)
-			nthreads[i] = atomic_read(&serv->sv_pools[i].sp_nrthreads);
+			nthreads[i] = serv->sv_pools[i].sp_nrthreads;
 	return 0;
 }
 
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index e4fa25fafa97..99e9345d829e 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -33,9 +33,9 @@
  * node traffic on multi-node NUMA NFS servers.
  */
 struct svc_pool {
-	unsigned int		sp_id;	    	/* pool id; also node id on NUMA */
+	unsigned int		sp_id;		/* pool id; also node id on NUMA */
 	struct lwq		sp_xprts;	/* pending transports */
-	atomic_t		sp_nrthreads;	/* # of threads in pool */
+	unsigned int		sp_nrthreads;	/* # of threads in pool */
 	struct list_head	sp_all_threads;	/* all server threads */
 	struct llist_head	sp_idle_threads; /* idle server threads */
 
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 072ad115ae3d..0d8588bc693c 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -725,7 +725,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 	serv->sv_nrthreads += 1;
 	spin_unlock_bh(&serv->sv_lock);
 
-	atomic_inc(&pool->sp_nrthreads);
+	pool->sp_nrthreads += 1;
 
 	/* Protected by whatever lock the service uses when calling
 	 * svc_set_num_threads()
@@ -780,31 +780,22 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
 	struct svc_pool *pool;
 	unsigned int i;
 
-retry:
 	pool = target_pool;
 
-	if (pool != NULL) {
-		if (atomic_inc_not_zero(&pool->sp_nrthreads))
-			goto found_pool;
-		return NULL;
-	} else {
+	if (!pool) {
 		for (i = 0; i < serv->sv_nrpools; i++) {
 			pool = &serv->sv_pools[--(*state) % serv->sv_nrpools];
-			if (atomic_inc_not_zero(&pool->sp_nrthreads))
-				goto found_pool;
+			if (pool->sp_nrthreads)
+				break;
 		}
-		return NULL;
 	}
 
-found_pool:
-	set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
-	set_bit(SP_NEED_VICTIM, &pool->sp_flags);
-	if (!atomic_dec_and_test(&pool->sp_nrthreads))
+	if (pool && pool->sp_nrthreads) {
+		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
+		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
 		return pool;
-	/* Nothing left in this pool any more */
-	clear_bit(SP_NEED_VICTIM, &pool->sp_flags);
-	clear_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
-	goto retry;
+	}
+	return NULL;
 }
 
 static int
@@ -883,7 +874,7 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 	if (!pool)
 		nrservs -= serv->sv_nrthreads;
 	else
-		nrservs -= atomic_read(&pool->sp_nrthreads);
+		nrservs -= pool->sp_nrthreads;
 
 	if (nrservs > 0)
 		return svc_start_kthreads(serv, pool, nrservs);
@@ -953,7 +944,7 @@ svc_exit_thread(struct svc_rqst *rqstp)
 
 	list_del_rcu(&rqstp->rq_all);
 
-	atomic_dec(&pool->sp_nrthreads);
+	pool->sp_nrthreads -= 1;
 
 	spin_lock_bh(&serv->sv_lock);
 	serv->sv_nrthreads -= 1;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/14] sunrpc: don't take ->sv_lock when updating ->sv_nrthreads.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (4 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15  7:14 ` [PATCH 07/14] Change unshare_fs_struct() to never fail NeilBrown
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

As documented in svc_xprt.c, sv_nrthreads is protected by the service
mutex, and it does not need ->sv_lock.
(->sv_lock is needed only for sv_permsocks, sv_tempsocks, and
sv_tmpcnt).

So remove the unnecessary locking.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/svc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 0d8588bc693c..f4fc3d82e2bb 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -721,10 +721,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 	if (!rqstp)
 		return ERR_PTR(-ENOMEM);
 
-	spin_lock_bh(&serv->sv_lock);
 	serv->sv_nrthreads += 1;
-	spin_unlock_bh(&serv->sv_lock);
-
 	pool->sp_nrthreads += 1;
 
 	/* Protected by whatever lock the service uses when calling
@@ -945,10 +942,7 @@ svc_exit_thread(struct svc_rqst *rqstp)
 	list_del_rcu(&rqstp->rq_all);
 
 	pool->sp_nrthreads -= 1;
-
-	spin_lock_bh(&serv->sv_lock);
 	serv->sv_nrthreads -= 1;
-	spin_unlock_bh(&serv->sv_lock);
 	svc_sock_update_bufs(serv);
 
 	svc_rqst_free(rqstp);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/14] Change unshare_fs_struct() to never fail.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (5 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 06/14] sunrpc: don't take ->sv_lock when updating ->sv_nrthreads NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15 14:39   ` Jeff Layton
  2024-07-15  7:14 ` [PATCH 08/14] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

nfsd threads need to not share the init fs_struct as they need to
manipulate umask independently.  So they call unshare_fs_struct() and
are the only user of that function.

In the unlikely event that unshare_fs_struct() fails, the thread will
exit calling svc_exit_thread() BEFORE svc_thread_should_stop() reports
'true'.

This is a problem because svc_exit_thread() assumes that
svc_stop_threads() is running and consequently (in the nfsd case)
nfsd_mutex is held.  This ensures that the list_del_rcu() call in
svc_exit_thread() cannot race with any other manipulation of
->sp_all_threads.

While it would be possible to add some other exclusion, doing so would
introduce unnecessary complexity.  unshare_fs_struct() does not fail in
practice.  So the simplest solution is to make this explicit.  i.e.  use
__GFP_NOFAIL which is safe on such a small allocation - about 64 bytes.

Change unshare_fs_struct() to not return any error, and remove the error
handling from nfsd().

An alternate approach would be to create a variant of
kthread_create_on_node() which didn't set CLONE_FS.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/fs_struct.c            | 42 ++++++++++++++++++++-------------------
 fs/nfsd/nfssvc.c          |  9 +++------
 include/linux/fs_struct.h |  2 +-
 3 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index 64c2d0814ed6..49fba862e408 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -109,35 +109,39 @@ void exit_fs(struct task_struct *tsk)
 	}
 }
 
+static void init_fs_struct(struct fs_struct *fs, struct fs_struct *old)
+{
+	fs->users = 1;
+	fs->in_exec = 0;
+	spin_lock_init(&fs->lock);
+	seqcount_spinlock_init(&fs->seq, &fs->lock);
+	fs->umask = old->umask;
+
+	spin_lock(&old->lock);
+	fs->root = old->root;
+	path_get(&fs->root);
+	fs->pwd = old->pwd;
+	path_get(&fs->pwd);
+	spin_unlock(&old->lock);
+}
+
 struct fs_struct *copy_fs_struct(struct fs_struct *old)
 {
 	struct fs_struct *fs = kmem_cache_alloc(fs_cachep, GFP_KERNEL);
 	/* We don't need to lock fs - think why ;-) */
-	if (fs) {
-		fs->users = 1;
-		fs->in_exec = 0;
-		spin_lock_init(&fs->lock);
-		seqcount_spinlock_init(&fs->seq, &fs->lock);
-		fs->umask = old->umask;
-
-		spin_lock(&old->lock);
-		fs->root = old->root;
-		path_get(&fs->root);
-		fs->pwd = old->pwd;
-		path_get(&fs->pwd);
-		spin_unlock(&old->lock);
-	}
+	if (fs)
+		init_fs_struct(fs, old);
 	return fs;
 }
 
-int unshare_fs_struct(void)
+void unshare_fs_struct(void)
 {
 	struct fs_struct *fs = current->fs;
-	struct fs_struct *new_fs = copy_fs_struct(fs);
+	struct fs_struct *new_fs = kmem_cache_alloc(fs_cachep,
+						    GFP_KERNEL| __GFP_NOFAIL);
 	int kill;
 
-	if (!new_fs)
-		return -ENOMEM;
+	init_fs_struct(new_fs, fs);
 
 	task_lock(current);
 	spin_lock(&fs->lock);
@@ -148,8 +152,6 @@ int unshare_fs_struct(void)
 
 	if (kill)
 		free_fs_struct(fs);
-
-	return 0;
 }
 EXPORT_SYMBOL_GPL(unshare_fs_struct);
 
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 7377422a34df..f5de04a63c6f 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -873,11 +873,9 @@ nfsd(void *vrqstp)
 
 	/* At this point, the thread shares current->fs
 	 * with the init process. We need to create files with the
-	 * umask as defined by the client instead of init's umask. */
-	if (unshare_fs_struct() < 0) {
-		printk("Unable to start nfsd thread: out of memory\n");
-		goto out;
-	}
+	 * umask as defined by the client instead of init's umask.
+	 */
+	unshare_fs_struct();
 
 	current->fs->umask = 0;
 
@@ -899,7 +897,6 @@ nfsd(void *vrqstp)
 
 	atomic_dec(&nfsd_th_cnt);
 
-out:
 	/* Release the thread */
 	svc_exit_thread(rqstp);
 	return 0;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index 783b48dedb72..8282e6c7ff29 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -22,7 +22,7 @@ extern void set_fs_root(struct fs_struct *, const struct path *);
 extern void set_fs_pwd(struct fs_struct *, const struct path *);
 extern struct fs_struct *copy_fs_struct(struct fs_struct *);
 extern void free_fs_struct(struct fs_struct *);
-extern int unshare_fs_struct(void);
+extern void unshare_fs_struct(void);
 
 static inline void get_fs_root(struct fs_struct *fs, struct path *root)
 {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/14] SUNRPC: move nrthreads counting to start/stop threads.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (6 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 07/14] Change unshare_fs_struct() to never fail NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15  7:14 ` [PATCH 09/14] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

sp_nrthreads and sv_nrthreads are the number of threads that have been
explicitly requested.  Future patches will allow extra threads to be
created as needed.

So move the updating of these fields to code which is for updating
configuration rather that code that is for starting/stopping threads.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/svc.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index f4fc3d82e2bb..d814b2cfa84f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -721,9 +721,6 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
 	if (!rqstp)
 		return ERR_PTR(-ENOMEM);
 
-	serv->sv_nrthreads += 1;
-	pool->sp_nrthreads += 1;
-
 	/* Protected by whatever lock the service uses when calling
 	 * svc_set_num_threads()
 	 */
@@ -818,6 +815,8 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 			svc_exit_thread(rqstp);
 			return PTR_ERR(task);
 		}
+		serv->sv_nrthreads += 1;
+		chosen_pool->sp_nrthreads += 1;
 
 		rqstp->rq_task = task;
 		if (serv->sv_nrpools > 1)
@@ -840,6 +839,8 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 		victim = svc_pool_victim(serv, pool, &state);
 		if (!victim)
 			break;
+		victim->sp_nrthreads -= 1;
+		serv->sv_nrthreads -= 1;
 		svc_pool_wake_idle_thread(victim);
 		wait_on_bit(&victim->sp_flags, SP_VICTIM_REMAINS,
 			    TASK_IDLE);
@@ -941,8 +942,6 @@ svc_exit_thread(struct svc_rqst *rqstp)
 
 	list_del_rcu(&rqstp->rq_all);
 
-	pool->sp_nrthreads -= 1;
-	serv->sv_nrthreads -= 1;
 	svc_sock_update_bufs(serv);
 
 	svc_rqst_free(rqstp);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/14] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (7 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 08/14] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15 15:21   ` Jeff Layton
  2024-07-15  7:14 ` [PATCH 10/14] nfs: dynamically adjust per-client DRC slot limits NeilBrown
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

If there are more non-courteous clients than the calculated limit, we
should fail the request rather than report a soft failure and
encouraging the client to retry indefinitely.

If there a courteous clients which push us over the limit, then expedite
their removal.

This is not known to have caused a problem is production use, but
testing of lots of clients reports repeated NFS4ERR_DELAY responses
which doesn't seem helpful.

Also remove an outdated comment - we do use a slab cache.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a20c2c9d7d45..88936f3189e1 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2212,21 +2212,20 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
 	return 1;
 }
 
-/* 
- * XXX Should we use a slab cache ?
- * This type of memory management is somewhat inefficient, but we use it
- * anyway since SETCLIENTID is not a common operation.
- */
 static struct nfs4_client *alloc_client(struct xdr_netobj name,
 				struct nfsd_net *nn)
 {
 	struct nfs4_client *clp;
 	int i;
 
-	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
+	if (atomic_read(&nn->nfs4_client_count) -
+	    atomic_read(&nn->nfsd_courtesy_clients) >= nn->nfs4_max_clients)
+		return ERR_PTR(-EREMOTEIO);
+
+	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
+	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
 		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
-		return NULL;
-	}
+
 	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
 	if (clp == NULL)
 		return NULL;
@@ -3115,8 +3114,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
 	struct dentry *dentries[ARRAY_SIZE(client_files)];
 
 	clp = alloc_client(name, nn);
-	if (clp == NULL)
-		return NULL;
+	if (IS_ERR_OR_NULL(clp))
+		return clp;
 
 	ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred);
 	if (ret) {
@@ -3498,6 +3497,8 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	new = create_client(exid->clname, rqstp, &verf);
 	if (new == NULL)
 		return nfserr_jukebox;
+	if (IS_ERR(new))
+		return nfserr_resource;
 	status = copy_impl_id(new, exid);
 	if (status)
 		goto out_nolock;
@@ -4416,6 +4417,8 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	new = create_client(clname, rqstp, &clverifier);
 	if (new == NULL)
 		return nfserr_jukebox;
+	if (IS_ERR(new))
+		return nfserr_resource;
 	spin_lock(&nn->client_lock);
 	conf = find_confirmed_client_by_name(&clname, nn);
 	if (conf && client_has_state(conf)) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/14] nfs: dynamically adjust per-client DRC slot limits.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (8 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 09/14] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15  7:14 ` [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

Currently per-client DRC slot limits (for v4.1+) are calculated when the
client connects and are left unchanged.  So earlier clients can get a
larger share when memory is tight.

The heuristic for choosing a number includes the number of configured
server threads.  This is problematic for 2 reasons.
1/ sv_nrthreads is different in different net namespaces, but the
   memory allocation is global across all namespaces.  So different
   namespaces get treated differently without good reason.
2/ a future patch will auto-configure number of threads based on
   load so that there is no need to preconfigure a number.  This will
   make the current heuristic even more arbitrary.

NFSv4.1 allows the number of slots to be varied dynamically - in the
reply to each SEQUENCE op.  With this patch we provide a provisional
upper limit in the EXCHANGE_ID reply which might end up being too big,
and adjust it with each SEQUENCE reply.

The goal for when memory is tight is to allow each client to have a
similar number of slots.  So clients that ask for larger slots get more
memory.   This may not be ideal.  It could be changed later.

So we track the sum of the slot sizes of all active clients, and share
memory out based on the ratio of the slot size for a given client with
the total slot size.  We never allow more in a SEQUENCE reply than we
did in the EXCHANGE_ID reply.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 81 ++++++++++++++++++++++++---------------------
 fs/nfsd/nfs4xdr.c   |  2 +-
 fs/nfsd/nfsd.h      |  2 +-
 fs/nfsd/nfssvc.c    |  7 ++--
 fs/nfsd/state.h     |  2 +-
 fs/nfsd/xdr4.h      |  2 --
 6 files changed, 52 insertions(+), 44 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 88936f3189e1..4dd619e6010f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1909,44 +1909,26 @@ static inline u32 slot_bytes(struct nfsd4_channel_attrs *ca)
 }
 
 /*
- * XXX: If we run out of reserved DRC memory we could (up to a point)
- * re-negotiate active sessions and reduce their slot usage to make
- * room for new connections. For now we just fail the create session.
+ * When a client connects it gets a max_requests number that would allow
+ * it to use 1/8 of the memory we think can reasonably be used for the DRC.
+ * Later in response to SEQUENCE operations we further limit that when there
+ * are more than 8 concurrent clients.
  */
-static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
+static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
 {
 	u32 slotsize = slot_bytes(ca);
 	u32 num = ca->maxreqs;
-	unsigned long avail, total_avail;
-	unsigned int scale_factor;
+	unsigned long avail;
 
 	spin_lock(&nfsd_drc_lock);
-	if (nfsd_drc_max_mem > nfsd_drc_mem_used)
-		total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
-	else
-		/* We have handed out more space than we chose in
-		 * set_max_drc() to allow.  That isn't really a
-		 * problem as long as that doesn't make us think we
-		 * have lots more due to integer overflow.
-		 */
-		total_avail = 0;
-	avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
-	/*
-	 * Never use more than a fraction of the remaining memory,
-	 * unless it's the only way to give this client a slot.
-	 * The chosen fraction is either 1/8 or 1/number of threads,
-	 * whichever is smaller.  This ensures there are adequate
-	 * slots to support multiple clients per thread.
-	 * Give the client one slot even if that would require
-	 * over-allocation--it is better than failure.
-	 */
-	scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
 
-	avail = clamp_t(unsigned long, avail, slotsize,
-			total_avail/scale_factor);
-	num = min_t(int, num, avail / slotsize);
-	num = max_t(int, num, 1);
-	nfsd_drc_mem_used += num * slotsize;
+	avail = min(NFSD_MAX_MEM_PER_SESSION,
+		    nfsd_drc_max_mem / 8);
+
+	num = clamp_t(int, num, 1, avail / slotsize);
+
+	nfsd_drc_slotsize_sum += slotsize;
+
 	spin_unlock(&nfsd_drc_lock);
 
 	return num;
@@ -1957,10 +1939,33 @@ static void nfsd4_put_drc_mem(struct nfsd4_channel_attrs *ca)
 	int slotsize = slot_bytes(ca);
 
 	spin_lock(&nfsd_drc_lock);
-	nfsd_drc_mem_used -= slotsize * ca->maxreqs;
+	nfsd_drc_slotsize_sum -= slotsize;
 	spin_unlock(&nfsd_drc_lock);
 }
 
+/*
+ * Report the number of slots that we would like the client to limit
+ * itself to.  When the number of clients is large, we start sharing
+ * memory so all clients get the same number of slots.
+ */
+static unsigned int nfsd4_get_drc_slots(struct nfsd4_session *session)
+{
+	u32 slotsize = slot_bytes(&session->se_fchannel);
+	unsigned long avail;
+	unsigned long slotsize_sum = READ_ONCE(nfsd_drc_slotsize_sum);
+
+	if (slotsize_sum < slotsize)
+		slotsize_sum = slotsize;
+
+	/* Find our share of avail mem across all active clients,
+	 * then limit to 1/8 of total, and configured max
+	 */
+	avail = min3(nfsd_drc_max_mem * slotsize / nfsd_drc_slotsize_sum,
+		     nfsd_drc_max_mem / 8,
+		     NFSD_MAX_MEM_PER_SESSION);
+	return max3(1UL, avail / slotsize, session->se_fchannel.maxreqs);
+}
+
 static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs,
 					   struct nfsd4_channel_attrs *battrs)
 {
@@ -3726,7 +3731,7 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
 	 * Note that we always allow at least one slot, because our
 	 * accounting is soft and provides no guarantees either way.
 	 */
-	ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
+	ca->maxreqs = nfsd4_get_drc_mem(ca);
 
 	return nfs_ok;
 }
@@ -4220,10 +4225,12 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	slot = session->se_slots[seq->slotid];
 	dprintk("%s: slotid %d\n", __func__, seq->slotid);
 
-	/* We do not negotiate the number of slots yet, so set the
-	 * maxslots to the session maxreqs which is used to encode
-	 * sr_highest_slotid and the sr_target_slot id to maxslots */
-	seq->maxslots = session->se_fchannel.maxreqs;
+	/* Negotiate number of slots: set the target, and use the
+	 * same for max as long as it doesn't decrease the max
+	 * (that isn't allowed).
+	 */
+	seq->target_maxslots = nfsd4_get_drc_slots(session);
+	seq->maxslots = max(seq->maxslots, seq->target_maxslots);
 
 	trace_nfsd_slot_seqid_sequence(clp, seq, slot);
 	status = check_slot_seqid(seq->seqid, slot->sl_seqid,
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 42b41d55d4ed..a65812fcdae0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4961,7 +4961,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
 	if (nfserr != nfs_ok)
 		return nfserr;
 	/* sr_target_highest_slotid */
-	nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1);
+	nfserr = nfsd4_encode_slotid4(xdr, seq->target_maxslots - 1);
 	if (nfserr != nfs_ok)
 		return nfserr;
 	/* sr_status_flags */
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 369c3b3ce53e..e4c643255dc9 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -90,7 +90,7 @@ extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
 extern struct mutex		nfsd_mutex;
 extern spinlock_t		nfsd_drc_lock;
 extern unsigned long		nfsd_drc_max_mem;
-extern unsigned long		nfsd_drc_mem_used;
+extern unsigned long		nfsd_drc_slotsize_sum;
 extern atomic_t			nfsd_th_cnt;		/* number of available threads */
 
 extern const struct seq_operations nfs_exports_op;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index f5de04a63c6f..b005b2e2e6ad 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -78,7 +78,7 @@ DEFINE_MUTEX(nfsd_mutex);
  */
 DEFINE_SPINLOCK(nfsd_drc_lock);
 unsigned long	nfsd_drc_max_mem;
-unsigned long	nfsd_drc_mem_used;
+unsigned long	nfsd_drc_slotsize_sum;
 
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
 static const struct svc_version *nfsd_acl_version[] = {
@@ -532,10 +532,13 @@ void nfsd_reset_versions(struct nfsd_net *nn)
  */
 static void set_max_drc(void)
 {
+	if (nfsd_drc_max_mem)
+		return;
+
 	#define NFSD_DRC_SIZE_SHIFT	7
 	nfsd_drc_max_mem = (nr_free_buffer_pages()
 					>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
-	nfsd_drc_mem_used = 0;
+	nfsd_drc_slotsize_sum = 0;
 	dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index ffc217099d19..2b1d619bc00f 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -213,7 +213,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
 /* Maximum number of slots per session. 160 is useful for long haul TCP */
 #define NFSD_MAX_SLOTS_PER_SESSION     160
 /* Maximum  session per slot cache size */
-#define NFSD_SLOT_CACHE_SIZE		2048
+#define NFSD_SLOT_CACHE_SIZE		2048UL
 /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
 #define NFSD_CACHE_SIZE_SLOTS_PER_SESSION	32
 #define NFSD_MAX_MEM_PER_SESSION  \
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index fbdd42cde1fa..1c78a09bf63f 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -575,9 +575,7 @@ struct nfsd4_sequence {
 	u32			slotid;			/* request/response */
 	u32			maxslots;		/* request/response */
 	u32			cachethis;		/* request */
-#if 0
 	u32			target_maxslots;	/* response */
-#endif /* not yet */
 	u32			status_flags;		/* response */
 };
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (9 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 10/14] nfs: dynamically adjust per-client DRC slot limits NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15 15:52   ` Jeff Layton
  2024-07-15  7:14 ` [PATCH 12/14] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

The heuristic for limiting the number of incoming connections to nfsd
currently uses sv_nrthreads - allowing more connections if more threads
were configured.

A future patch will allow number of threads to grow dynamically so that
there is no need to configure sv_nrthreads.  So we need a different
solution for limiting connections.

It isn't clear what problem is solved by limiting connections (as
mentioned in a code comment) but the most likely problem is a connection
storm - many connections that are not doing productive work.  These will
be closed after about 6 minutes already but it might help to slow down a
storm.

This patch add a per-connection flag XPT_PEER_VALID which indicates
that the peer has presented a filehandle for which it has some sort of
access.  i.e the peer is known to be trusted in some way.  We now only
count connections which have NOT be determined to be valid.  There
should be relative few of these at any given time.

If the number of non-validated peer exceed as limit - currently 64 - we
close the oldest non-validated peer to avoid having too many of these
useless connections.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/netns.h                 |  4 ++--
 fs/nfsd/nfsfh.c                 |  8 ++++++++
 include/linux/sunrpc/svc.h      |  2 +-
 include/linux/sunrpc/svc_xprt.h |  4 ++++
 net/sunrpc/svc_xprt.c           | 33 +++++++++++++++++----------------
 5 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 238fc4e56e53..0d2ac15a5003 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -128,8 +128,8 @@ struct nfsd_net {
 	unsigned char writeverf[8];
 
 	/*
-	 * Max number of connections this nfsd container will allow. Defaults
-	 * to '0' which is means that it bases this on the number of threads.
+	 * Max number of non-validated connections this nfsd container
+	 * will allow.  Defaults to '0' gets mapped to 64.
 	 */
 	unsigned int max_connections;
 
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 0b75305fb5f5..08742bf8de02 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -391,6 +391,14 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 		goto out;
 
 skip_pseudoflavor_check:
+	if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags) &&
+	    !test_and_set_bit(XPT_PEER_VALID, &rqstp->rq_xprt->xpt_flags)) {
+		struct svc_serv *serv = rqstp->rq_server;
+		spin_lock(&serv->sv_lock);
+		serv->sv_tmpcnt -= 1;
+		spin_unlock(&serv->sv_lock);
+	}
+
 	/* Finally, check access permissions. */
 	error = nfsd_permission(rqstp, exp, dentry, access);
 out:
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 99e9345d829e..0b414af448e0 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -79,7 +79,7 @@ struct svc_serv {
 	unsigned int		sv_xdrsize;	/* XDR buffer size */
 	struct list_head	sv_permsocks;	/* all permanent sockets */
 	struct list_head	sv_tempsocks;	/* all temporary sockets */
-	int			sv_tmpcnt;	/* count of temporary sockets */
+	int			sv_tmpcnt;	/* count of temporary "valid" sockets */
 	struct timer_list	sv_temptimer;	/* timer for aging temporary sockets */
 
 	char *			sv_name;	/* service name */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 0981e35a9fed..92565133b3b6 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -99,6 +99,10 @@ enum {
 	XPT_HANDSHAKE,		/* xprt requests a handshake */
 	XPT_TLS_SESSION,	/* transport-layer security established */
 	XPT_PEER_AUTH,		/* peer has been authenticated */
+	XPT_PEER_VALID,		/* peer has presented a filehandle that
+				 * it has access to.  It is NOT counted
+				 * in ->sv_tmpcnt.
+				 */
 };
 
 static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 53ebc719ff5a..a9215e1a2f38 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -606,7 +606,8 @@ int svc_port_is_privileged(struct sockaddr *sin)
 }
 
 /*
- * Make sure that we don't have too many active connections. If we have,
+ * Make sure that we don't have too many connections that have not yet
+ * demonstrated that they have access the the NFS server. If we have,
  * something must be dropped. It's not clear what will happen if we allow
  * "too many" connections, but when dealing with network-facing software,
  * we have to code defensively. Here we do that by imposing hard limits.
@@ -625,27 +626,26 @@ int svc_port_is_privileged(struct sockaddr *sin)
  */
 static void svc_check_conn_limits(struct svc_serv *serv)
 {
-	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn :
-				(serv->sv_nrthreads+3) * 20;
+	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn : 64;
 
 	if (serv->sv_tmpcnt > limit) {
-		struct svc_xprt *xprt = NULL;
+		struct svc_xprt *xprt = NULL, *xprti;
 		spin_lock_bh(&serv->sv_lock);
 		if (!list_empty(&serv->sv_tempsocks)) {
-			/* Try to help the admin */
-			net_notice_ratelimited("%s: too many open connections, consider increasing the %s\n",
-					       serv->sv_name, serv->sv_maxconn ?
-					       "max number of connections" :
-					       "number of threads");
 			/*
 			 * Always select the oldest connection. It's not fair,
-			 * but so is life
+			 * but nor is life.
 			 */
-			xprt = list_entry(serv->sv_tempsocks.prev,
-					  struct svc_xprt,
-					  xpt_list);
-			set_bit(XPT_CLOSE, &xprt->xpt_flags);
-			svc_xprt_get(xprt);
+			list_for_each_entry_reverse(xprti, &serv->sv_tempsocks,
+						    xpt_list)
+			{
+				if (!test_bit(XPT_PEER_VALID, &xprti->xpt_flags)) {
+					xprt = xprti;
+					set_bit(XPT_CLOSE, &xprt->xpt_flags);
+					svc_xprt_get(xprt);
+					break;
+				}
+			}
 		}
 		spin_unlock_bh(&serv->sv_lock);
 
@@ -1039,7 +1039,8 @@ static void svc_delete_xprt(struct svc_xprt *xprt)
 
 	spin_lock_bh(&serv->sv_lock);
 	list_del_init(&xprt->xpt_list);
-	if (test_bit(XPT_TEMP, &xprt->xpt_flags))
+	if (test_bit(XPT_TEMP, &xprt->xpt_flags) &&
+	    !test_bit(XPT_PEER_VALID, &xprt->xpt_flags))
 		serv->sv_tmpcnt--;
 	spin_unlock_bh(&serv->sv_lock);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/14] sunrpc: introduce possibility that requested number of threads is different from actual
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (10 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15 16:00   ` Jeff Layton
  2024-07-15  7:14 ` [PATCH 13/14] nfsd: introduce concept of a maximum number of threads NeilBrown
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

New fields sp_nractual and sv_nractual track how many actual threads are
running.  sp_nrhtreads and sv_nrthreads will be the number that were
explicitly request.  Currently nractually == nrthreads.

sv_nractual is used for sizing UDP incoming socket space - in the rare
case that UDP is used.  This is because each thread might need to keep a
request in the skbs.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/sunrpc/svc.h |  4 +++-
 net/sunrpc/svc.c           | 22 +++++++++++++++-------
 net/sunrpc/svcsock.c       |  2 +-
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 0b414af448e0..363105fc6326 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -36,6 +36,7 @@ struct svc_pool {
 	unsigned int		sp_id;		/* pool id; also node id on NUMA */
 	struct lwq		sp_xprts;	/* pending transports */
 	unsigned int		sp_nrthreads;	/* # of threads in pool */
+	unsigned int		sp_nractual;	/* # of threads running */
 	struct list_head	sp_all_threads;	/* all server threads */
 	struct llist_head	sp_idle_threads; /* idle server threads */
 
@@ -69,7 +70,8 @@ struct svc_serv {
 	struct svc_program *	sv_program;	/* RPC program */
 	struct svc_stat *	sv_stats;	/* RPC statistics */
 	spinlock_t		sv_lock;
-	unsigned int		sv_nrthreads;	/* # of server threads */
+	unsigned int		sv_nrthreads;	/* # of server threads requested*/
+	unsigned int		sv_nractual;	/* # of running threads */
 	unsigned int		sv_maxconn;	/* max connections allowed or
 						 * '0' causing max to be based
 						 * on number of threads. */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index d814b2cfa84f..33c1a7793f63 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -785,8 +785,12 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
 	}
 
 	if (pool && pool->sp_nrthreads) {
-		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
-		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
+		if (pool->sp_nrthreads <= pool->sp_nractual) {
+			set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
+			set_bit(SP_NEED_VICTIM, &pool->sp_flags);
+			pool->sp_nractual -= 1;
+			serv->sv_nractual -= 1;
+		}
 		return pool;
 	}
 	return NULL;
@@ -806,6 +810,12 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 		chosen_pool = svc_pool_next(serv, pool, &state);
 		node = svc_pool_map_get_node(chosen_pool->sp_id);
 
+		serv->sv_nrthreads += 1;
+		chosen_pool->sp_nrthreads += 1;
+
+		if (chosen_pool->sp_nrthreads <= chosen_pool->sp_nractual)
+			continue;
+
 		rqstp = svc_prepare_thread(serv, chosen_pool, node);
 		if (IS_ERR(rqstp))
 			return PTR_ERR(rqstp);
@@ -815,8 +825,8 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 			svc_exit_thread(rqstp);
 			return PTR_ERR(task);
 		}
-		serv->sv_nrthreads += 1;
-		chosen_pool->sp_nrthreads += 1;
+		serv->sv_nractual += 1;
+		chosen_pool->sp_nractual += 1;
 
 		rqstp->rq_task = task;
 		if (serv->sv_nrpools > 1)
@@ -846,6 +856,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 			    TASK_IDLE);
 		nrservs++;
 	} while (nrservs < 0);
+	svc_sock_update_bufs(serv);
 	return 0;
 }
 
@@ -937,13 +948,10 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp)
 void
 svc_exit_thread(struct svc_rqst *rqstp)
 {
-	struct svc_serv	*serv = rqstp->rq_server;
 	struct svc_pool	*pool = rqstp->rq_pool;
 
 	list_del_rcu(&rqstp->rq_all);
 
-	svc_sock_update_bufs(serv);
-
 	svc_rqst_free(rqstp);
 
 	clear_and_wake_up_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 825ec5357691..191dbc648bd0 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -588,7 +588,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
 	     * provides an upper bound on the number of threads
 	     * which will access the socket.
 	     */
-	    svc_sock_setbufsize(svsk, serv->sv_nrthreads + 3);
+	    svc_sock_setbufsize(svsk, serv->sv_nractual + 3);
 
 	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 	err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (11 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 12/14] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15 17:06   ` Jeff Layton
  2024-07-15  7:14 ` [PATCH 14/14] nfsd: adjust number of running nfsd threads NeilBrown
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

A future patch will allow the number of threads in each nfsd pool to
vary dynamically.
The lower bound will be the number explicit requested via
/proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads

The upper bound can be set in each net-namespace by writing
/proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
there is no per-pool upper limit.

If no upper bound is set, then one is calculated.  A global upper limit
is chosen based on amount of memory.  This limit only affects dynamic
changes. Static configuration can always over-ride it.

We track how many threads are configured in each net namespace, with the
max or the min.  We also track how many net namespaces have nfsd
configured with only a min, not a max.

The difference between the calculated max and the total allocation is
available to be shared among those namespaces which don't have a maximum
configured.  Within a namespace, the available share is distributed
equally across all pools.

In the common case there is one namespace and one pool.  A small number
of threads are configured as a minimum and no maximum is set.  In this
case the effective maximum will be directly based on total memory.
Approximately 8 per gigabyte.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/netns.h  |  6 +++++
 fs/nfsd/nfsctl.c | 45 +++++++++++++++++++++++++++++++++++
 fs/nfsd/nfsd.h   |  4 ++++
 fs/nfsd/nfssvc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/trace.h  | 19 +++++++++++++++
 5 files changed, 135 insertions(+)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 0d2ac15a5003..329484696a42 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -133,6 +133,12 @@ struct nfsd_net {
 	 */
 	unsigned int max_connections;
 
+	/*
+	 * Maximum number of threads to auto-adjust up to.  If 0 then a
+	 * share of nfsd_max_threads will be used.
+	 */
+	unsigned int max_threads;
+
 	u32 clientid_base;
 	u32 clientid_counter;
 	u32 clverifier_counter;
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d85b6d1fa31f..37e9936567e9 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -48,6 +48,7 @@ enum {
 	NFSD_Ports,
 	NFSD_MaxBlkSize,
 	NFSD_MaxConnections,
+	NFSD_MaxThreads,
 	NFSD_Filecache,
 	NFSD_Leasetime,
 	NFSD_Gracetime,
@@ -68,6 +69,7 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size);
 static ssize_t write_ports(struct file *file, char *buf, size_t size);
 static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
 static ssize_t write_maxconn(struct file *file, char *buf, size_t size);
+static ssize_t write_maxthreads(struct file *file, char *buf, size_t size);
 #ifdef CONFIG_NFSD_V4
 static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
 static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
@@ -87,6 +89,7 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
 	[NFSD_Ports] = write_ports,
 	[NFSD_MaxBlkSize] = write_maxblksize,
 	[NFSD_MaxConnections] = write_maxconn,
+	[NFSD_MaxThreads] = write_maxthreads,
 #ifdef CONFIG_NFSD_V4
 	[NFSD_Leasetime] = write_leasetime,
 	[NFSD_Gracetime] = write_gracetime,
@@ -939,6 +942,47 @@ static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
 	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxconn);
 }
 
+/*
+ * write_maxthreads - Set or report the current max number threads
+ *
+ * Input:
+ *			buf:		ignored
+ *			size:		zero
+ * OR
+ *
+ * Input:
+ *			buf:		C string containing an unsigned
+ *					integer value representing the new
+ *					max number of threads
+ *			size:		non-zero length of C string in @buf
+ * Output:
+ *	On success:	passed-in buffer filled with '\n'-terminated C string
+ *			containing numeric value of max_threads setting
+ *			for this net namespace;
+ *			return code is the size in bytes of the string
+ *	On error:	return code is zero or a negative errno value
+ */
+static ssize_t write_maxthreads(struct file *file, char *buf, size_t size)
+{
+	char *mesg = buf;
+	struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
+	unsigned int maxthreads = nn->max_threads;
+
+	if (size > 0) {
+		int rv = get_uint(&mesg, &maxthreads);
+
+		if (rv)
+			return rv;
+		trace_nfsd_ctl_maxthreads(netns(file), maxthreads);
+		mutex_lock(&nfsd_mutex);
+		nn->max_threads = maxthreads;
+		nfsd_update_nets();
+		mutex_unlock(&nfsd_mutex);
+	}
+
+	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxthreads);
+}
+
 #ifdef CONFIG_NFSD_V4
 static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
 				  time64_t *time, struct nfsd_net *nn)
@@ -1372,6 +1416,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO},
+		[NFSD_MaxThreads] = {"max_threads", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO},
 #ifdef CONFIG_NFSD_V4
 		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index e4c643255dc9..6874c2de670b 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -156,6 +156,10 @@ int nfsd_create_serv(struct net *net);
 void nfsd_destroy_serv(struct net *net);
 
 extern int nfsd_max_blksize;
+void nfsd_update_nets(void);
+extern unsigned int	nfsd_max_threads;
+extern unsigned long	nfsd_net_used;
+extern unsigned int	nfsd_net_cnt;
 
 static inline int nfsd_v4client(struct svc_rqst *rq)
 {
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index b005b2e2e6ad..75d78c17756f 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -80,6 +80,21 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
 unsigned long	nfsd_drc_max_mem;
 unsigned long	nfsd_drc_slotsize_sum;
 
+/*
+ * nfsd_max_threads is auto-configured based on available ram.
+ * Each network namespace can configure a minimum number of threads
+ * and optionally a maximum.
+ * nfsd_net_used is the number of the max or min from each net namespace.
+ * nfsd_new_cnt is the number of net namespaces with a configured minimum
+ *    but no configured maximum.
+ * When nfsd_max_threads exceeds nfsd_net_used, the different is divided
+ * by nfsd_net_cnt and this number gives the excess above the configured minimum
+ * for all net namespaces without a configured maximum.
+ */
+unsigned int	nfsd_max_threads;
+unsigned long	nfsd_net_used;
+unsigned int	nfsd_net_cnt;
+
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
 static const struct svc_version *nfsd_acl_version[] = {
 # if defined(CONFIG_NFSD_V2_ACL)
@@ -130,6 +145,47 @@ struct svc_program		nfsd_program = {
 	.pg_rpcbind_set		= nfsd_rpcbind_set,
 };
 
+void nfsd_update_nets(void)
+{
+	struct net *net;
+
+	if (nfsd_max_threads == 0) {
+		nfsd_max_threads = (nr_free_buffer_pages() >> 7) /
+			(NFSSVC_MAXBLKSIZE >> PAGE_SHIFT);
+	}
+	nfsd_net_used = 0;
+	nfsd_net_cnt = 0;
+	down_read(&net_rwsem);
+	for_each_net(net) {
+		struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
+		if (!nn->nfsd_serv)
+			continue;
+		if (nn->max_threads > 0) {
+			nfsd_net_used += nn->max_threads;
+		} else {
+			nfsd_net_used += nn->nfsd_serv->sv_nrthreads;
+			nfsd_net_cnt += 1;
+		}
+	}
+	up_read(&net_rwsem);
+}
+
+static inline int nfsd_max_pool_threads(struct svc_pool *p, struct nfsd_net *nn)
+{
+	int svthreads = nn->nfsd_serv->sv_nrthreads;
+
+	if (nn->max_threads > 0)
+		return nn->max_threads;
+	if (nfsd_net_cnt == 0 || svthreads == 0)
+		return 0;
+	if (nfsd_max_threads < nfsd_net_cnt)
+		return p->sp_nrthreads;
+	/* Share nfsd_max_threads among all net, then among pools in this net. */
+	return p->sp_nrthreads +
+		nfsd_max_threads / nfsd_net_cnt * p->sp_nrthreads / svthreads;
+}
+
 bool nfsd_support_version(int vers)
 {
 	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
@@ -474,6 +530,7 @@ void nfsd_destroy_serv(struct net *net)
 	spin_lock(&nfsd_notifier_lock);
 	nn->nfsd_serv = NULL;
 	spin_unlock(&nfsd_notifier_lock);
+	nfsd_update_nets();
 
 	/* check if the notifier still has clients */
 	if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
@@ -614,6 +671,8 @@ int nfsd_create_serv(struct net *net)
 	nn->nfsd_serv = serv;
 	spin_unlock(&nfsd_notifier_lock);
 
+	nfsd_update_nets();
+
 	set_max_drc();
 	/* check if the notifier is already set */
 	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
@@ -720,6 +779,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
 			goto out;
 	}
 out:
+	nfsd_update_nets();
 	return err;
 }
 
@@ -759,6 +819,7 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
 	error = nfsd_set_nrthreads(n, nthreads, net);
 	if (error)
 		goto out_put;
+	nfsd_update_nets();
 	error = serv->sv_nrthreads;
 out_put:
 	if (serv->sv_nrthreads == 0)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 77bbd23aa150..92b888e178e8 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -2054,6 +2054,25 @@ TRACE_EVENT(nfsd_ctl_maxconn,
 	)
 );
 
+TRACE_EVENT(nfsd_ctl_maxthreads,
+	TP_PROTO(
+		const struct net *net,
+		int maxthreads
+	),
+	TP_ARGS(net, maxthreads),
+	TP_STRUCT__entry(
+		__field(unsigned int, netns_ino)
+		__field(int, maxthreads)
+	),
+	TP_fast_assign(
+		__entry->netns_ino = net->ns.inum;
+		__entry->maxthreads = maxthreads
+	),
+	TP_printk("maxthreads=%d",
+		__entry->maxthreads
+	)
+);
+
 TRACE_EVENT(nfsd_ctl_time,
 	TP_PROTO(
 		const struct net *net,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/14] nfsd: adjust number of running nfsd threads
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (12 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 13/14] nfsd: introduce concept of a maximum number of threads NeilBrown
@ 2024-07-15  7:14 ` NeilBrown
  2024-07-15 17:29 ` [PATCH 00/14 RFC] support automatic changes to nfsd thread count Jeff Layton
  2024-07-24 19:43 ` Chuck Lever III
  15 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-15  7:14 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

svc_recv() is changed to return a status.  This can be:
 -ETIMEDOUT - waited for 5 seconds and found nothing to do.  This is
          boring.  Also there are more actual threads than really
          needed.
 -EBUSY - I did something, but there is more stuff to do and no one
          idle who I can wake up to do it.
          BTW I successful set a flag: SP_TASK_STARTING.  You better
          clear it.
 0 - just minding my own business, nothing to see here.

nfsd() is changed to pay attention to this status.
In the case of -ETIMEDOUT, if the service mutex can be taken (trylock),
the thread becomes and RQ_VICTIM so that it will exit.
In the case of -EBUSY, if the actual number of threads is below
the calculated maximum, a new thread is started.  SP_TASK_STARTING
is cleared.

To support the above, some code is split out of svc_start_kthreads()
into svc_new_thread().

I think we want memory pressure to be able to push a thread into
returning -ETIMEDOUT.  That can come later.

There are printk's in here.  They can be discarded or turned into trace
points once we are sure about what we want.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfssvc.c               | 32 ++++++++++++++++-
 fs/nfsd/vfs.c                  |  1 +
 include/linux/sunrpc/svc.h     |  2 ++
 include/linux/sunrpc/svcsock.h |  2 +-
 net/sunrpc/svc.c               | 66 +++++++++++++++++++---------------
 net/sunrpc/svc_xprt.c          | 46 +++++++++++++++++++-----
 6 files changed, 110 insertions(+), 39 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 75d78c17756f..1c8a7dcbfc42 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -931,9 +931,11 @@ static int
 nfsd(void *vrqstp)
 {
 	struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
+	struct svc_pool *pool = rqstp->rq_pool;
 	struct svc_xprt *perm_sock = list_entry(rqstp->rq_server->sv_permsocks.next, typeof(struct svc_xprt), xpt_list);
 	struct net *net = perm_sock->xpt_net;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	bool have_mutex = false;
 
 	/* At this point, the thread shares current->fs
 	 * with the init process. We need to create files with the
@@ -954,7 +956,33 @@ nfsd(void *vrqstp)
 		/* Update sv_maxconn if it has changed */
 		rqstp->rq_server->sv_maxconn = nn->max_connections;
 
-		svc_recv(rqstp);
+		switch (svc_recv(rqstp)) {
+		case -ETIMEDOUT: /* Nothing to do */
+			if (mutex_trylock(&nfsd_mutex)) {
+				if (pool->sp_nractual > pool->sp_nrthreads) {
+					set_bit(RQ_VICTIM, &rqstp->rq_flags);
+					pool->sp_nractual -= 1;
+					printk("Kill a victim\n");
+					have_mutex = true;
+				} else
+					mutex_unlock(&nfsd_mutex);
+			} else printk("trylock failed\n");
+			break;
+		case -EBUSY: /* Too much to do */
+			if (pool->sp_nractual < nfsd_max_pool_threads(pool, nn) &&
+			    mutex_trylock(&nfsd_mutex)) {
+				// check no idle threads?
+				if (pool->sp_nractual < nfsd_max_pool_threads(pool,nn)) {
+					printk("start new thread\n");
+					svc_new_thread(rqstp->rq_server, pool);
+				}
+				mutex_unlock(&nfsd_mutex);
+			}
+			clear_bit(SP_TASK_STARTING, &pool->sp_flags);
+			break;
+		default:
+			break;
+		}
 
 		nfsd_file_net_dispose(nn);
 	}
@@ -963,6 +991,8 @@ nfsd(void *vrqstp)
 
 	/* Release the thread */
 	svc_exit_thread(rqstp);
+	if (have_mutex)
+		mutex_unlock(&nfsd_mutex);
 	return 0;
 }
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 29b1f3613800..92bc7c778411 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1203,6 +1203,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
 		commit_reset_write_verifier(nn, rqstp, host_err);
 		goto out_nfserr;
 	}
+	msleep(40);
 	*cnt = host_err;
 	nfsd_stats_io_write_add(nn, exp, *cnt);
 	fsnotify_modify(file);
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 363105fc6326..6c9d0e42f5d5 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -53,6 +53,7 @@ enum {
 	SP_TASK_PENDING,	/* still work to do even if no xprt is queued */
 	SP_NEED_VICTIM,		/* One thread needs to agree to exit */
 	SP_VICTIM_REMAINS,	/* One thread needs to actually exit */
+	SP_TASK_STARTING,	/* Task has started but not added to idle yet */
 };
 
 
@@ -410,6 +411,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
 bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
 					 struct page *page);
 void		   svc_rqst_release_pages(struct svc_rqst *rqstp);
+int		   svc_new_thread(struct svc_serv *serv, struct svc_pool *pool);
 void		   svc_exit_thread(struct svc_rqst *);
 struct svc_serv *  svc_create_pooled(struct svc_program *prog,
 				     struct svc_stat *stats,
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index bf45d9e8492a..11d43600eabb 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -56,7 +56,7 @@ static inline u32 svc_sock_final_rec(struct svc_sock *svsk)
 /*
  * Function prototypes.
  */
-void		svc_recv(struct svc_rqst *rqstp);
+int		svc_recv(struct svc_rqst *rqstp);
 void		svc_send(struct svc_rqst *rqstp);
 int		svc_addsock(struct svc_serv *serv, struct net *net,
 			    const int fd, char *name_return, const size_t len,
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 33c1a7793f63..26b6e73fc0de 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -796,19 +796,46 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
 	return NULL;
 }
 
-static int
-svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
+int svc_new_thread(struct svc_serv *serv, struct svc_pool *pool)
 {
 	struct svc_rqst	*rqstp;
 	struct task_struct *task;
-	struct svc_pool *chosen_pool;
-	unsigned int state = serv->sv_nrthreads-1;
 	int node;
 
-	do {
-		nrservs--;
-		chosen_pool = svc_pool_next(serv, pool, &state);
-		node = svc_pool_map_get_node(chosen_pool->sp_id);
+	node = svc_pool_map_get_node(pool->sp_id);
+
+	rqstp = svc_prepare_thread(serv, pool, node);
+	if (IS_ERR(rqstp))
+		return PTR_ERR(rqstp);
+	set_bit(SP_TASK_STARTING, &pool->sp_flags);
+	task = kthread_create_on_node(serv->sv_threadfn, rqstp,
+				      node, "%s", serv->sv_name);
+	if (IS_ERR(task)) {
+		clear_bit(SP_TASK_STARTING, &pool->sp_flags);
+		svc_exit_thread(rqstp);
+		return PTR_ERR(task);
+	}
+	serv->sv_nractual += 1;
+	pool->sp_nractual += 1;
+
+	rqstp->rq_task = task;
+	if (serv->sv_nrpools > 1)
+		svc_pool_map_set_cpumask(task, pool->sp_id);
+
+	svc_sock_update_bufs(serv);
+	wake_up_process(task);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(svc_new_thread);
+
+static int
+svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
+{
+	unsigned int state = serv->sv_nrthreads-1;
+	int err = 0;
+
+	while (!err && nrservs--) {
+		struct svc_pool *chosen_pool = svc_pool_next(serv, pool, &state);
 
 		serv->sv_nrthreads += 1;
 		chosen_pool->sp_nrthreads += 1;
@@ -816,27 +843,10 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 		if (chosen_pool->sp_nrthreads <= chosen_pool->sp_nractual)
 			continue;
 
-		rqstp = svc_prepare_thread(serv, chosen_pool, node);
-		if (IS_ERR(rqstp))
-			return PTR_ERR(rqstp);
-		task = kthread_create_on_node(serv->sv_threadfn, rqstp,
-					      node, "%s", serv->sv_name);
-		if (IS_ERR(task)) {
-			svc_exit_thread(rqstp);
-			return PTR_ERR(task);
-		}
-		serv->sv_nractual += 1;
-		chosen_pool->sp_nractual += 1;
-
-		rqstp->rq_task = task;
-		if (serv->sv_nrpools > 1)
-			svc_pool_map_set_cpumask(task, chosen_pool->sp_id);
-
-		svc_sock_update_bufs(serv);
-		wake_up_process(task);
-	} while (nrservs > 0);
+		err = svc_new_thread(serv, chosen_pool);
+	}
 
-	return 0;
+	return err;
 }
 
 static int
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index a9215e1a2f38..b382bc690670 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -729,15 +729,19 @@ svc_thread_should_sleep(struct svc_rqst *rqstp)
 	return true;
 }
 
-static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
+static bool svc_thread_wait_for_work(struct svc_rqst *rqstp)
 {
 	struct svc_pool *pool = rqstp->rq_pool;
+	bool did_wait = false;
 
 	if (svc_thread_should_sleep(rqstp)) {
 		set_current_state(TASK_IDLE | TASK_FREEZABLE);
 		llist_add(&rqstp->rq_idle, &pool->sp_idle_threads);
-		if (likely(svc_thread_should_sleep(rqstp)))
-			schedule();
+		clear_bit(SP_TASK_STARTING, &pool->sp_flags);
+		if (likely(svc_thread_should_sleep(rqstp))) {
+			schedule_timeout(5*HZ);
+			did_wait = true;
+		}
 
 		while (!llist_del_first_this(&pool->sp_idle_threads,
 					     &rqstp->rq_idle)) {
@@ -749,7 +753,12 @@ static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
 			 * for this new work.  This thread can safely sleep
 			 * until woken again.
 			 */
-			schedule();
+			if (did_wait) {
+				schedule_timeout(HZ);
+			} else {
+				schedule_timeout(5*HZ);
+				did_wait = true;
+			}
 			set_current_state(TASK_IDLE | TASK_FREEZABLE);
 		}
 		__set_current_state(TASK_RUNNING);
@@ -757,6 +766,7 @@ static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
 		cond_resched();
 	}
 	try_to_freeze();
+	return did_wait;
 }
 
 static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt)
@@ -840,6 +850,8 @@ static void svc_handle_xprt(struct svc_rqst *rqstp, struct svc_xprt *xprt)
 
 static void svc_thread_wake_next(struct svc_rqst *rqstp)
 {
+	clear_bit(SP_TASK_STARTING, &rqstp->rq_pool->sp_flags);
+
 	if (!svc_thread_should_sleep(rqstp))
 		/* More work pending after I dequeued some,
 		 * wake another worker
@@ -854,21 +866,31 @@ static void svc_thread_wake_next(struct svc_rqst *rqstp)
  * This code is carefully organised not to touch any cachelines in
  * the shared svc_serv structure, only cachelines in the local
  * svc_pool.
+ *
+ * Returns -ETIMEDOUT if idle for an extended period
+ *         -EBUSY is there is more work to do than available threads
+ *         0 otherwise.
  */
-void svc_recv(struct svc_rqst *rqstp)
+int svc_recv(struct svc_rqst *rqstp)
 {
 	struct svc_pool *pool = rqstp->rq_pool;
+	bool did_wait;
+	int ret = 0;
 
 	if (!svc_alloc_arg(rqstp))
-		return;
+		return ret;
 
-	svc_thread_wait_for_work(rqstp);
+	did_wait = svc_thread_wait_for_work(rqstp);
+
+	if (did_wait && svc_thread_should_sleep(rqstp) &&
+	    pool->sp_nractual > pool->sp_nrthreads)
+		ret = -ETIMEDOUT;
 
 	clear_bit(SP_TASK_PENDING, &pool->sp_flags);
 
 	if (svc_thread_should_stop(rqstp)) {
 		svc_thread_wake_next(rqstp);
-		return;
+		return ret;
 	}
 
 	rqstp->rq_xprt = svc_xprt_dequeue(pool);
@@ -882,8 +904,13 @@ void svc_recv(struct svc_rqst *rqstp)
 		 */
 		if (pool->sp_idle_threads.first)
 			rqstp->rq_chandle.thread_wait = 5 * HZ;
-		else
+		else {
 			rqstp->rq_chandle.thread_wait = 1 * HZ;
+			if (!did_wait &&
+			    !test_and_set_bit(SP_TASK_STARTING,
+					      &pool->sp_flags))
+				ret = -EBUSY;
+		}
 
 		trace_svc_xprt_dequeue(rqstp);
 		svc_handle_xprt(rqstp, xprt);
@@ -902,6 +929,7 @@ void svc_recv(struct svc_rqst *rqstp)
 		}
 	}
 #endif
+	return ret;
 }
 EXPORT_SYMBOL_GPL(svc_recv);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int.
  2024-07-15  7:14 ` [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int NeilBrown
@ 2024-07-15 14:12   ` Jeff Layton
  2024-07-15 14:33     ` Jeff Layton
  2024-07-16  1:33     ` NeilBrown
  0 siblings, 2 replies; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 14:12 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> sp_nrthreads is only ever accessed under the service mutex
>   nlmsvc_mutex nfs_callback_mutex nfsd_mutex
> so these is no need for it to be an atomic_t.
> 
> The fact that all code using it is single-threaded means that we can
> simplify svc_pool_victim and remove the temporary elevation of
> sp_nrthreads.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfsctl.c           |  2 +-
>  fs/nfsd/nfssvc.c           |  2 +-
>  include/linux/sunrpc/svc.h |  4 ++--
>  net/sunrpc/svc.c           | 31 +++++++++++--------------------
>  4 files changed, 15 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 5b0f2e0d7ccf..d85b6d1fa31f 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1769,7 +1769,7 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb, struct genl_info *info)
>  			struct svc_pool *sp = &nn->nfsd_serv->sv_pools[i];
>  
>  			err = nla_put_u32(skb, NFSD_A_SERVER_THREADS,
> -					  atomic_read(&sp->sp_nrthreads));
> +					  sp->sp_nrthreads);
>  			if (err)
>  				goto err_unlock;
>  		}
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 4438cdcd4873..7377422a34df 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -641,7 +641,7 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net)
>  
>  	if (serv)
>  		for (i = 0; i < serv->sv_nrpools && i < n; i++)
> -			nthreads[i] = atomic_read(&serv->sv_pools[i].sp_nrthreads);
> +			nthreads[i] = serv->sv_pools[i].sp_nrthreads;
>  	return 0;
>  }
>  
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index e4fa25fafa97..99e9345d829e 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -33,9 +33,9 @@
>   * node traffic on multi-node NUMA NFS servers.
>   */
>  struct svc_pool {
> -	unsigned int		sp_id;	    	/* pool id; also node id on NUMA */
> +	unsigned int		sp_id;		/* pool id; also node id on NUMA */
>  	struct lwq		sp_xprts;	/* pending transports */
> -	atomic_t		sp_nrthreads;	/* # of threads in pool */
> +	unsigned int		sp_nrthreads;	/* # of threads in pool */
>  	struct list_head	sp_all_threads;	/* all server threads */
>  	struct llist_head	sp_idle_threads; /* idle server threads */
>  
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 072ad115ae3d..0d8588bc693c 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -725,7 +725,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
>  	serv->sv_nrthreads += 1;
>  	spin_unlock_bh(&serv->sv_lock);
>  
> -	atomic_inc(&pool->sp_nrthreads);
> +	pool->sp_nrthreads += 1;
>  
>  	/* Protected by whatever lock the service uses when calling
>  	 * svc_set_num_threads()
> @@ -780,31 +780,22 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
>  	struct svc_pool *pool;
>  	unsigned int i;
>  
> -retry:
>  	pool = target_pool;
>  
> -	if (pool != NULL) {
> -		if (atomic_inc_not_zero(&pool->sp_nrthreads))
> -			goto found_pool;
> -		return NULL;
> -	} else {
> +	if (!pool) {
>  		for (i = 0; i < serv->sv_nrpools; i++) {
>  			pool = &serv->sv_pools[--(*state) % serv->sv_nrpools];
> -			if (atomic_inc_not_zero(&pool->sp_nrthreads))
> -				goto found_pool;
> +			if (pool->sp_nrthreads)
> +				break;
>  		}
> -		return NULL;
>  	}
>  
> -found_pool:
> -	set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> -	set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> -	if (!atomic_dec_and_test(&pool->sp_nrthreads))
> +	if (pool && pool->sp_nrthreads) {
> +		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> +		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
>  		return pool;
> -	/* Nothing left in this pool any more */
> -	clear_bit(SP_NEED_VICTIM, &pool->sp_flags);
> -	clear_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> -	goto retry;
> +	}
> +	return NULL;
>  }
>  
>  static int
> @@ -883,7 +874,7 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>  	if (!pool)
>  		nrservs -= serv->sv_nrthreads;
>  	else
> -		nrservs -= atomic_read(&pool->sp_nrthreads);
> +		nrservs -= pool->sp_nrthreads;
>  
>  	if (nrservs > 0)
>  		return svc_start_kthreads(serv, pool, nrservs);
> @@ -953,7 +944,7 @@ svc_exit_thread(struct svc_rqst *rqstp)
>  
>  	list_del_rcu(&rqstp->rq_all);
>  
> -	atomic_dec(&pool->sp_nrthreads);
> +	pool->sp_nrthreads -= 1;
>  
>  	spin_lock_bh(&serv->sv_lock);
>  	serv->sv_nrthreads -= 1;

I don't think svc_exit_thread is called with the nfsd_mutex held, so if
several threads were exiting at the same time, they could race here.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int.
  2024-07-15 14:12   ` Jeff Layton
@ 2024-07-15 14:33     ` Jeff Layton
  2024-07-16  1:33     ` NeilBrown
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 14:33 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 10:12 -0400, Jeff Layton wrote:
> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> > sp_nrthreads is only ever accessed under the service mutex
> >   nlmsvc_mutex nfs_callback_mutex nfsd_mutex
> > so these is no need for it to be an atomic_t.
> > 
> > The fact that all code using it is single-threaded means that we
> > can
> > simplify svc_pool_victim and remove the temporary elevation of
> > sp_nrthreads.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfsctl.c           |  2 +-
> >  fs/nfsd/nfssvc.c           |  2 +-
> >  include/linux/sunrpc/svc.h |  4 ++--
> >  net/sunrpc/svc.c           | 31 +++++++++++--------------------
> >  4 files changed, 15 insertions(+), 24 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 5b0f2e0d7ccf..d85b6d1fa31f 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -1769,7 +1769,7 @@ int nfsd_nl_threads_get_doit(struct sk_buff
> > *skb, struct genl_info *info)
> >  			struct svc_pool *sp = &nn->nfsd_serv-
> > >sv_pools[i];
> >  
> >  			err = nla_put_u32(skb,
> > NFSD_A_SERVER_THREADS,
> > -					  atomic_read(&sp-
> > >sp_nrthreads));
> > +					  sp->sp_nrthreads);
> >  			if (err)
> >  				goto err_unlock;
> >  		}
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index 4438cdcd4873..7377422a34df 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -641,7 +641,7 @@ int nfsd_get_nrthreads(int n, int *nthreads,
> > struct net *net)
> >  
> >  	if (serv)
> >  		for (i = 0; i < serv->sv_nrpools && i < n; i++)
> > -			nthreads[i] = atomic_read(&serv-
> > >sv_pools[i].sp_nrthreads);
> > +			nthreads[i] = serv-
> > >sv_pools[i].sp_nrthreads;
> >  	return 0;
> >  }
> >  
> > diff --git a/include/linux/sunrpc/svc.h
> > b/include/linux/sunrpc/svc.h
> > index e4fa25fafa97..99e9345d829e 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -33,9 +33,9 @@
> >   * node traffic on multi-node NUMA NFS servers.
> >   */
> >  struct svc_pool {
> > -	unsigned int		sp_id;	    	/* pool id; also
> > node id on NUMA */
> > +	unsigned int		sp_id;		/* pool id; also
> > node id on NUMA */
> >  	struct lwq		sp_xprts;	/* pending
> > transports */
> > -	atomic_t		sp_nrthreads;	/* # of threads in
> > pool */
> > +	unsigned int		sp_nrthreads;	/* # of threads in
> > pool */
> >  	struct list_head	sp_all_threads;	/* all
> > server threads */
> >  	struct llist_head	sp_idle_threads; /* idle server
> > threads */
> >  
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 072ad115ae3d..0d8588bc693c 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -725,7 +725,7 @@ svc_prepare_thread(struct svc_serv *serv,
> > struct svc_pool *pool, int node)
> >  	serv->sv_nrthreads += 1;
> >  	spin_unlock_bh(&serv->sv_lock);
> >  
> > -	atomic_inc(&pool->sp_nrthreads);
> > +	pool->sp_nrthreads += 1;
> >  
> >  	/* Protected by whatever lock the service uses when
> > calling
> >  	 * svc_set_num_threads()
> > @@ -780,31 +780,22 @@ svc_pool_victim(struct svc_serv *serv, struct
> > svc_pool *target_pool,
> >  	struct svc_pool *pool;
> >  	unsigned int i;
> >  
> > -retry:
> >  	pool = target_pool;
> >  
> > -	if (pool != NULL) {
> > -		if (atomic_inc_not_zero(&pool->sp_nrthreads))
> > -			goto found_pool;
> > -		return NULL;
> > -	} else {
> > +	if (!pool) {
> >  		for (i = 0; i < serv->sv_nrpools; i++) {
> >  			pool = &serv->sv_pools[--(*state) % serv-
> > >sv_nrpools];
> > -			if (atomic_inc_not_zero(&pool-
> > >sp_nrthreads))
> > -				goto found_pool;
> > +			if (pool->sp_nrthreads)
> > +				break;
> >  		}
> > -		return NULL;
> >  	}
> >  
> > -found_pool:
> > -	set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > -	set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> > -	if (!atomic_dec_and_test(&pool->sp_nrthreads))
> > +	if (pool && pool->sp_nrthreads) {
> > +		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > +		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> >  		return pool;
> > -	/* Nothing left in this pool any more */
> > -	clear_bit(SP_NEED_VICTIM, &pool->sp_flags);
> > -	clear_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > -	goto retry;
> > +	}
> > +	return NULL;
> >  }
> >  
> >  static int
> > @@ -883,7 +874,7 @@ svc_set_num_threads(struct svc_serv *serv,
> > struct svc_pool *pool, int nrservs)
> >  	if (!pool)
> >  		nrservs -= serv->sv_nrthreads;
> >  	else
> > -		nrservs -= atomic_read(&pool->sp_nrthreads);
> > +		nrservs -= pool->sp_nrthreads;
> >  
> >  	if (nrservs > 0)
> >  		return svc_start_kthreads(serv, pool, nrservs);
> > @@ -953,7 +944,7 @@ svc_exit_thread(struct svc_rqst *rqstp)
> >  
> >  	list_del_rcu(&rqstp->rq_all);
> >  
> > -	atomic_dec(&pool->sp_nrthreads);
> > +	pool->sp_nrthreads -= 1;
> >  
> >  	spin_lock_bh(&serv->sv_lock);
> >  	serv->sv_nrthreads -= 1;
> 
> I don't think svc_exit_thread is called with the nfsd_mutex held, so
> if
> several threads were exiting at the same time, they could race here.
> 


Ok, the changelog on #7 might point out why I'm wron here.

nfsd() calls svc_exit_thread when exiting, but I missed that that would
imply that svc_stop_kthreads() is running in another task (and that the
nfsd_mutex would actually be held). It also looks like they do have to
be torn down serially as well, so there should be no race there after
all.

Either way, could I trouble you to add a comment about this above
svc_exit_thread? That's a really subtle interaction and it would be
good to document it.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 07/14] Change unshare_fs_struct() to never fail.
  2024-07-15  7:14 ` [PATCH 07/14] Change unshare_fs_struct() to never fail NeilBrown
@ 2024-07-15 14:39   ` Jeff Layton
  2024-07-16  1:48     ` NeilBrown
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 14:39 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> nfsd threads need to not share the init fs_struct as they need to
> manipulate umask independently.  So they call unshare_fs_struct() and
> are the only user of that function.
> 
> In the unlikely event that unshare_fs_struct() fails, the thread will
> exit calling svc_exit_thread() BEFORE svc_thread_should_stop() reports
> 'true'.
> 
> This is a problem because svc_exit_thread() assumes that
> svc_stop_threads() is running and consequently (in the nfsd case)
> nfsd_mutex is held.  This ensures that the list_del_rcu() call in
> svc_exit_thread() cannot race with any other manipulation of
> ->sp_all_threads.
> 
> While it would be possible to add some other exclusion, doing so would
> introduce unnecessary complexity.  unshare_fs_struct() does not fail in
> practice.  So the simplest solution is to make this explicit.  i.e.  use
> __GFP_NOFAIL which is safe on such a small allocation - about 64 bytes.
> 

I know some folks are trying hard to get rid of (or minimize the use
of) __GFP_NOFAIL. This might not be a long term solution.

> Change unshare_fs_struct() to not return any error, and remove the error
> handling from nfsd().
> 
> An alternate approach would be to create a variant of
> kthread_create_on_node() which didn't set CLONE_FS.
> 

This sounds like it might be the better approach. I guess you could
just add a set of CLONE_* flags to struct kthread_create_info and fix
up the callers to set that appropriately?

> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/fs_struct.c            | 42 ++++++++++++++++++++-------------------
>  fs/nfsd/nfssvc.c          |  9 +++------
>  include/linux/fs_struct.h |  2 +-
>  3 files changed, 26 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/fs_struct.c b/fs/fs_struct.c
> index 64c2d0814ed6..49fba862e408 100644
> --- a/fs/fs_struct.c
> +++ b/fs/fs_struct.c
> @@ -109,35 +109,39 @@ void exit_fs(struct task_struct *tsk)
>  	}
>  }
>  
> +static void init_fs_struct(struct fs_struct *fs, struct fs_struct *old)
> +{
> +	fs->users = 1;
> +	fs->in_exec = 0;
> +	spin_lock_init(&fs->lock);
> +	seqcount_spinlock_init(&fs->seq, &fs->lock);
> +	fs->umask = old->umask;
> +
> +	spin_lock(&old->lock);
> +	fs->root = old->root;
> +	path_get(&fs->root);
> +	fs->pwd = old->pwd;
> +	path_get(&fs->pwd);
> +	spin_unlock(&old->lock);
> +}
> +
>  struct fs_struct *copy_fs_struct(struct fs_struct *old)
>  {
>  	struct fs_struct *fs = kmem_cache_alloc(fs_cachep, GFP_KERNEL);
>  	/* We don't need to lock fs - think why ;-) */
> -	if (fs) {
> -		fs->users = 1;
> -		fs->in_exec = 0;
> -		spin_lock_init(&fs->lock);
> -		seqcount_spinlock_init(&fs->seq, &fs->lock);
> -		fs->umask = old->umask;
> -
> -		spin_lock(&old->lock);
> -		fs->root = old->root;
> -		path_get(&fs->root);
> -		fs->pwd = old->pwd;
> -		path_get(&fs->pwd);
> -		spin_unlock(&old->lock);
> -	}
> +	if (fs)
> +		init_fs_struct(fs, old);
>  	return fs;
>  }
>  
> -int unshare_fs_struct(void)
> +void unshare_fs_struct(void)
>  {
>  	struct fs_struct *fs = current->fs;
> -	struct fs_struct *new_fs = copy_fs_struct(fs);
> +	struct fs_struct *new_fs = kmem_cache_alloc(fs_cachep,
> +						    GFP_KERNEL| __GFP_NOFAIL);
>  	int kill;
>  
> -	if (!new_fs)
> -		return -ENOMEM;
> +	init_fs_struct(new_fs, fs);
>  
>  	task_lock(current);
>  	spin_lock(&fs->lock);
> @@ -148,8 +152,6 @@ int unshare_fs_struct(void)
>  
>  	if (kill)
>  		free_fs_struct(fs);
> -
> -	return 0;
>  }
>  EXPORT_SYMBOL_GPL(unshare_fs_struct);
>  
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 7377422a34df..f5de04a63c6f 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -873,11 +873,9 @@ nfsd(void *vrqstp)
>  
>  	/* At this point, the thread shares current->fs
>  	 * with the init process. We need to create files with the
> -	 * umask as defined by the client instead of init's umask. */
> -	if (unshare_fs_struct() < 0) {
> -		printk("Unable to start nfsd thread: out of memory\n");
> -		goto out;
> -	}
> +	 * umask as defined by the client instead of init's umask.
> +	 */
> +	unshare_fs_struct();
>  
>  	current->fs->umask = 0;
>  
> @@ -899,7 +897,6 @@ nfsd(void *vrqstp)
>  
>  	atomic_dec(&nfsd_th_cnt);
>  
> -out:
>  	/* Release the thread */
>  	svc_exit_thread(rqstp);
>  	return 0;
> diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
> index 783b48dedb72..8282e6c7ff29 100644
> --- a/include/linux/fs_struct.h
> +++ b/include/linux/fs_struct.h
> @@ -22,7 +22,7 @@ extern void set_fs_root(struct fs_struct *, const struct path *);
>  extern void set_fs_pwd(struct fs_struct *, const struct path *);
>  extern struct fs_struct *copy_fs_struct(struct fs_struct *);
>  extern void free_fs_struct(struct fs_struct *);
> -extern int unshare_fs_struct(void);
> +extern void unshare_fs_struct(void);
>  
>  static inline void get_fs_root(struct fs_struct *fs, struct path *root)
>  {

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/14] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients.
  2024-07-15  7:14 ` [PATCH 09/14] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
@ 2024-07-15 15:21   ` Jeff Layton
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 15:21 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> If there are more non-courteous clients than the calculated limit, we
> should fail the request rather than report a soft failure and
> encouraging the client to retry indefinitely.
> 
> If there a courteous clients which push us over the limit, then expedite
> their removal.
> 
> This is not known to have caused a problem is production use, but
> testing of lots of clients reports repeated NFS4ERR_DELAY responses
> which doesn't seem helpful.
> 
> Also remove an outdated comment - we do use a slab cache.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index a20c2c9d7d45..88936f3189e1 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2212,21 +2212,20 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
>  	return 1;
>  }
>  
> -/* 
> - * XXX Should we use a slab cache ?
> - * This type of memory management is somewhat inefficient, but we use it
> - * anyway since SETCLIENTID is not a common operation.
> - */
>  static struct nfs4_client *alloc_client(struct xdr_netobj name,
>  				struct nfsd_net *nn)
>  {
>  	struct nfs4_client *clp;
>  	int i;
>  
> -	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
> +	if (atomic_read(&nn->nfs4_client_count) -
> +	    atomic_read(&nn->nfsd_courtesy_clients) >= nn->nfs4_max_clients)
> +		return ERR_PTR(-EREMOTEIO);
> +

nit: I know it gets remapped, but why EREMOTEIO? From nfsd's standpoint
this would seem to imply a problem on the client. Maybe:

#define EUSERS          87      /* Too many users */

...instead?

> +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
> +	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
>  		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> -		return NULL;
> -	}
> +
>  	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
>  	if (clp == NULL)
>  		return NULL;
> @@ -3115,8 +3114,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name,
>  	struct dentry *dentries[ARRAY_SIZE(client_files)];
>  
>  	clp = alloc_client(name, nn);
> -	if (clp == NULL)
> -		return NULL;
> +	if (IS_ERR_OR_NULL(clp))
> +		return clp;
>  
>  	ret = copy_cred(&clp->cl_cred, &rqstp->rq_cred);
>  	if (ret) {
> @@ -3498,6 +3497,8 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	new = create_client(exid->clname, rqstp, &verf);
>  	if (new == NULL)
>  		return nfserr_jukebox;
> +	if (IS_ERR(new))
> +		return nfserr_resource;
>  	status = copy_impl_id(new, exid);
>  	if (status)
>  		goto out_nolock;
> @@ -4416,6 +4417,8 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	new = create_client(clname, rqstp, &clverifier);
>  	if (new == NULL)
>  		return nfserr_jukebox;
> +	if (IS_ERR(new))
> +		return nfserr_resource;
>  	spin_lock(&nn->client_lock);
>  	conf = find_confirmed_client_by_name(&clname, nn);
>  	if (conf && client_has_state(conf)) {

Patch looks fine otherwise though.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations.
  2024-07-15  7:14 ` [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
@ 2024-07-15 15:52   ` Jeff Layton
  2024-07-16  2:04     ` NeilBrown
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 15:52 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> The heuristic for limiting the number of incoming connections to nfsd
> currently uses sv_nrthreads - allowing more connections if more threads
> were configured.
> 
> A future patch will allow number of threads to grow dynamically so that
> there is no need to configure sv_nrthreads.  So we need a different
> solution for limiting connections.
> 
> It isn't clear what problem is solved by limiting connections (as
> mentioned in a code comment) but the most likely problem is a connection
> storm - many connections that are not doing productive work.  These will
> be closed after about 6 minutes already but it might help to slow down a
> storm.
> 
> This patch add a per-connection flag XPT_PEER_VALID which indicates
> that the peer has presented a filehandle for which it has some sort of
> access.  i.e the peer is known to be trusted in some way.  We now only
> count connections which have NOT be determined to be valid.  There
> should be relative few of these at any given time.
> 
> If the number of non-validated peer exceed as limit - currently 64 - we
> close the oldest non-validated peer to avoid having too many of these
> useless connections.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/netns.h                 |  4 ++--
>  fs/nfsd/nfsfh.c                 |  8 ++++++++
>  include/linux/sunrpc/svc.h      |  2 +-
>  include/linux/sunrpc/svc_xprt.h |  4 ++++
>  net/sunrpc/svc_xprt.c           | 33 +++++++++++++++++----------------
>  5 files changed, 32 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 238fc4e56e53..0d2ac15a5003 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -128,8 +128,8 @@ struct nfsd_net {
>  	unsigned char writeverf[8];
>  
>  	/*
> -	 * Max number of connections this nfsd container will allow. Defaults
> -	 * to '0' which is means that it bases this on the number of threads.
> +	 * Max number of non-validated connections this nfsd container
> +	 * will allow.  Defaults to '0' gets mapped to 64.
>  	 */
>  	unsigned int max_connections;
>  
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 0b75305fb5f5..08742bf8de02 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -391,6 +391,14 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
>  		goto out;
>  
>  skip_pseudoflavor_check:
> +	if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags) &&
> +	    !test_and_set_bit(XPT_PEER_VALID, &rqstp->rq_xprt->xpt_flags)) {
> +		struct svc_serv *serv = rqstp->rq_server;
> +		spin_lock(&serv->sv_lock);
> +		serv->sv_tmpcnt -= 1;
> +		spin_unlock(&serv->sv_lock);
> +	}
> +

This is the only place you set XPT_PEER_VALID, but this change affects
more services than just nfsd. What about lockd? Do we need a similar
change there?

>  	/* Finally, check access permissions. */
>  	error = nfsd_permission(rqstp, exp, dentry, access);
>  out:
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 99e9345d829e..0b414af448e0 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -79,7 +79,7 @@ struct svc_serv {
>  	unsigned int		sv_xdrsize;	/* XDR buffer size */
>  	struct list_head	sv_permsocks;	/* all permanent sockets */
>  	struct list_head	sv_tempsocks;	/* all temporary sockets */
> -	int			sv_tmpcnt;	/* count of temporary sockets */
> +	int			sv_tmpcnt;	/* count of temporary "valid" sockets */
>  	struct timer_list	sv_temptimer;	/* timer for aging temporary sockets */
>  
>  	char *			sv_name;	/* service name */
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index 0981e35a9fed..92565133b3b6 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -99,6 +99,10 @@ enum {
>  	XPT_HANDSHAKE,		/* xprt requests a handshake */
>  	XPT_TLS_SESSION,	/* transport-layer security established */
>  	XPT_PEER_AUTH,		/* peer has been authenticated */
> +	XPT_PEER_VALID,		/* peer has presented a filehandle that
> +				 * it has access to.  It is NOT counted
> +				 * in ->sv_tmpcnt.
> +				 */
>  };
>  
>  static inline void unregister_xpt_user(struct svc_xprt *xpt, struct svc_xpt_user *u)
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 53ebc719ff5a..a9215e1a2f38 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -606,7 +606,8 @@ int svc_port_is_privileged(struct sockaddr *sin)
>  }
>  
>  /*
> - * Make sure that we don't have too many active connections. If we have,
> + * Make sure that we don't have too many connections that have not yet
> + * demonstrated that they have access the the NFS server. If we have,
>   * something must be dropped. It's not clear what will happen if we allow
>   * "too many" connections, but when dealing with network-facing software,
>   * we have to code defensively. Here we do that by imposing hard limits.
> @@ -625,27 +626,26 @@ int svc_port_is_privileged(struct sockaddr *sin)
>   */
>  static void svc_check_conn_limits(struct svc_serv *serv)
>  {
> -	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn :
> -				(serv->sv_nrthreads+3) * 20;
> +	unsigned int limit = serv->sv_maxconn ? serv->sv_maxconn : 64;
>  
>  	if (serv->sv_tmpcnt > limit) {
> -		struct svc_xprt *xprt = NULL;
> +		struct svc_xprt *xprt = NULL, *xprti;
>  		spin_lock_bh(&serv->sv_lock);
>  		if (!list_empty(&serv->sv_tempsocks)) {
> -			/* Try to help the admin */
> -			net_notice_ratelimited("%s: too many open connections, consider increasing the %s\n",
> -					       serv->sv_name, serv->sv_maxconn ?
> -					       "max number of connections" :
> -					       "number of threads");
>  			/*
>  			 * Always select the oldest connection. It's not fair,
> -			 * but so is life
> +			 * but nor is life.
>  			 */
> -			xprt = list_entry(serv->sv_tempsocks.prev,
> -					  struct svc_xprt,
> -					  xpt_list);
> -			set_bit(XPT_CLOSE, &xprt->xpt_flags);
> -			svc_xprt_get(xprt);
> +			list_for_each_entry_reverse(xprti, &serv->sv_tempsocks,
> +						    xpt_list)
> +			{
> +				if (!test_bit(XPT_PEER_VALID, &xprti->xpt_flags)) {
> +					xprt = xprti;
> +					set_bit(XPT_CLOSE, &xprt->xpt_flags);
> +					svc_xprt_get(xprt);
> +					break;
> +				}
> +			}
>  		}
>  		spin_unlock_bh(&serv->sv_lock);
>  
> @@ -1039,7 +1039,8 @@ static void svc_delete_xprt(struct svc_xprt *xprt)
>  
>  	spin_lock_bh(&serv->sv_lock);
>  	list_del_init(&xprt->xpt_list);
> -	if (test_bit(XPT_TEMP, &xprt->xpt_flags))
> +	if (test_bit(XPT_TEMP, &xprt->xpt_flags) &&
> +	    !test_bit(XPT_PEER_VALID, &xprt->xpt_flags))
>  		serv->sv_tmpcnt--;
>  	spin_unlock_bh(&serv->sv_lock);
>  

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 12/14] sunrpc: introduce possibility that requested number of threads is different from actual
  2024-07-15  7:14 ` [PATCH 12/14] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
@ 2024-07-15 16:00   ` Jeff Layton
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 16:00 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> New fields sp_nractual and sv_nractual track how many actual threads are
> running.  sp_nrhtreads and sv_nrthreads will be the number that were
> explicitly request.  Currently nractually == nrthreads.
> 
> sv_nractual is used for sizing UDP incoming socket space - in the rare
> case that UDP is used.  This is because each thread might need to keep a
> request in the skbs.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  include/linux/sunrpc/svc.h |  4 +++-
>  net/sunrpc/svc.c           | 22 +++++++++++++++-------
>  net/sunrpc/svcsock.c       |  2 +-
>  3 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 0b414af448e0..363105fc6326 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -36,6 +36,7 @@ struct svc_pool {
>  	unsigned int		sp_id;		/* pool id; also node id on NUMA */
>  	struct lwq		sp_xprts;	/* pending transports */
>  	unsigned int		sp_nrthreads;	/* # of threads in pool */
> +	unsigned int		sp_nractual;	/* # of threads running */
>  	struct list_head	sp_all_threads;	/* all server threads */
>  	struct llist_head	sp_idle_threads; /* idle server threads */
>  
> @@ -69,7 +70,8 @@ struct svc_serv {
>  	struct svc_program *	sv_program;	/* RPC program */
>  	struct svc_stat *	sv_stats;	/* RPC statistics */
>  	spinlock_t		sv_lock;
> -	unsigned int		sv_nrthreads;	/* # of server threads */
> +	unsigned int		sv_nrthreads;	/* # of server threads requested*/
> +	unsigned int		sv_nractual;	/* # of running threads */
>  	unsigned int		sv_maxconn;	/* max connections allowed or
>  						 * '0' causing max to be based
>  						 * on number of threads. */
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index d814b2cfa84f..33c1a7793f63 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -785,8 +785,12 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
>  	}
>  
>  	if (pool && pool->sp_nrthreads) {
> -		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> -		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> +		if (pool->sp_nrthreads <= pool->sp_nractual) {
> +			set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> +			set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> +			pool->sp_nractual -= 1;
> +			serv->sv_nractual -= 1;
> +		}
>  		return pool;
>  	}
>  	return NULL;
> @@ -806,6 +810,12 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>  		chosen_pool = svc_pool_next(serv, pool, &state);
>  		node = svc_pool_map_get_node(chosen_pool->sp_id);
>  
> +		serv->sv_nrthreads += 1;
> +		chosen_pool->sp_nrthreads += 1;
> +
> +		if (chosen_pool->sp_nrthreads <= chosen_pool->sp_nractual)
> +			continue;
> +
>  		rqstp = svc_prepare_thread(serv, chosen_pool, node);
>  		if (IS_ERR(rqstp))
>  			return PTR_ERR(rqstp);
> @@ -815,8 +825,8 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>  			svc_exit_thread(rqstp);
>  			return PTR_ERR(task);
>  		}
> -		serv->sv_nrthreads += 1;
> -		chosen_pool->sp_nrthreads += 1;
> +		serv->sv_nractual += 1;
> +		chosen_pool->sp_nractual += 1;
>  
>  		rqstp->rq_task = task;
>  		if (serv->sv_nrpools > 1)
> @@ -846,6 +856,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>  			    TASK_IDLE);
>  		nrservs++;
>  	} while (nrservs < 0);
> +	svc_sock_update_bufs(serv);
>  	return 0;
>  }
>  
> @@ -937,13 +948,10 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp)
>  void
>  svc_exit_thread(struct svc_rqst *rqstp)
>  {
> -	struct svc_serv	*serv = rqstp->rq_server;
>  	struct svc_pool	*pool = rqstp->rq_pool;
>  
>  	list_del_rcu(&rqstp->rq_all);
>  
> -	svc_sock_update_bufs(serv);
> -

I like that you're now only doing this once after all of the threads
are stopped. That might be worth mentioning in the changelog.

>  	svc_rqst_free(rqstp);
>  
>  	clear_and_wake_up_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 825ec5357691..191dbc648bd0 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -588,7 +588,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  	     * provides an upper bound on the number of threads
>  	     * which will access the socket.
>  	     */
> -	    svc_sock_setbufsize(svsk, serv->sv_nrthreads + 3);
> +	    svc_sock_setbufsize(svsk, serv->sv_nractual + 3);
>  
>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>  	err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
  2024-07-15  7:14 ` [PATCH 13/14] nfsd: introduce concept of a maximum number of threads NeilBrown
@ 2024-07-15 17:06   ` Jeff Layton
  2024-07-16  3:21     ` NeilBrown
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 17:06 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> A future patch will allow the number of threads in each nfsd pool to
> vary dynamically.
> The lower bound will be the number explicit requested via
> /proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads
> 
> The upper bound can be set in each net-namespace by writing
> /proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
> there is no per-pool upper limit.
> 
> If no upper bound is set, then one is calculated.  A global upper limit
> is chosen based on amount of memory.  This limit only affects dynamic
> changes. Static configuration can always over-ride it.
> 
> We track how many threads are configured in each net namespace, with the
> max or the min.  We also track how many net namespaces have nfsd
> configured with only a min, not a max.
> 
> The difference between the calculated max and the total allocation is
> available to be shared among those namespaces which don't have a maximum
> configured.  Within a namespace, the available share is distributed
> equally across all pools.
> 
> In the common case there is one namespace and one pool.  A small number
> of threads are configured as a minimum and no maximum is set.  In this
> case the effective maximum will be directly based on total memory.
> Approximately 8 per gigabyte.
> 


Some of this may come across as bikeshedding, but I'd probably prefer
that this work a bit differently:

1/ I don't think we should enable this universally -- at least not
initially. What I'd prefer to see is a new pool_mode for the dynamic
threadpools (maybe call it "dynamic"). That gives us a clear opt-in
mechanism. Later once we're convinced it's safe, we can make "dynamic"
the default instead of "global".

2/ Rather than specifying a max_threads value separately, why not allow
the old threads/pool_threads interface to set the max and just have a
reasonable minimum setting (like the current default of 8). Since we're
growing the threadpool dynamically, I don't see why we need to have a
real configurable minimum.

3/ the dynamic pool-mode should probably be layered on top of the
pernode pool mode. IOW, in a NUMA configuration, we should split the
threads across NUMA nodes.


> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/netns.h  |  6 +++++
>  fs/nfsd/nfsctl.c | 45 +++++++++++++++++++++++++++++++++++
>  fs/nfsd/nfsd.h   |  4 ++++
>  fs/nfsd/nfssvc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfsd/trace.h  | 19 +++++++++++++++
>  5 files changed, 135 insertions(+)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 0d2ac15a5003..329484696a42 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -133,6 +133,12 @@ struct nfsd_net {
>  	 */
>  	unsigned int max_connections;
>  
> +	/*
> +	 * Maximum number of threads to auto-adjust up to.  If 0 then a
> +	 * share of nfsd_max_threads will be used.
> +	 */
> +	unsigned int max_threads;
> +
>  	u32 clientid_base;
>  	u32 clientid_counter;
>  	u32 clverifier_counter;
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index d85b6d1fa31f..37e9936567e9 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -48,6 +48,7 @@ enum {
>  	NFSD_Ports,
>  	NFSD_MaxBlkSize,
>  	NFSD_MaxConnections,
> +	NFSD_MaxThreads,
>  	NFSD_Filecache,
>  	NFSD_Leasetime,
>  	NFSD_Gracetime,
> @@ -68,6 +69,7 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size);
>  static ssize_t write_ports(struct file *file, char *buf, size_t size);
>  static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
>  static ssize_t write_maxconn(struct file *file, char *buf, size_t size);
> +static ssize_t write_maxthreads(struct file *file, char *buf, size_t size);
>  #ifdef CONFIG_NFSD_V4
>  static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
>  static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
> @@ -87,6 +89,7 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
>  	[NFSD_Ports] = write_ports,
>  	[NFSD_MaxBlkSize] = write_maxblksize,
>  	[NFSD_MaxConnections] = write_maxconn,
> +	[NFSD_MaxThreads] = write_maxthreads,
>  #ifdef CONFIG_NFSD_V4
>  	[NFSD_Leasetime] = write_leasetime,
>  	[NFSD_Gracetime] = write_gracetime,
> @@ -939,6 +942,47 @@ static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
>  	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxconn);
>  }
>  
> +/*
> + * write_maxthreads - Set or report the current max number threads
> + *
> + * Input:
> + *			buf:		ignored
> + *			size:		zero
> + * OR
> + *
> + * Input:
> + *			buf:		C string containing an unsigned
> + *					integer value representing the new
> + *					max number of threads
> + *			size:		non-zero length of C string in @buf
> + * Output:
> + *	On success:	passed-in buffer filled with '\n'-terminated C string
> + *			containing numeric value of max_threads setting
> + *			for this net namespace;
> + *			return code is the size in bytes of the string
> + *	On error:	return code is zero or a negative errno value
> + */
> +static ssize_t write_maxthreads(struct file *file, char *buf, size_t size)
> +{
> +	char *mesg = buf;
> +	struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
> +	unsigned int maxthreads = nn->max_threads;
> +
> +	if (size > 0) {
> +		int rv = get_uint(&mesg, &maxthreads);
> +
> +		if (rv)
> +			return rv;
> +		trace_nfsd_ctl_maxthreads(netns(file), maxthreads);
> +		mutex_lock(&nfsd_mutex);
> +		nn->max_threads = maxthreads;
> +		nfsd_update_nets();
> +		mutex_unlock(&nfsd_mutex);
> +	}
> +
> +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxthreads);
> +}
> +
>  #ifdef CONFIG_NFSD_V4
>  static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
>  				  time64_t *time, struct nfsd_net *nn)
> @@ -1372,6 +1416,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
>  		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
>  		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
>  		[NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO},
> +		[NFSD_MaxThreads] = {"max_threads", &transaction_ops, S_IWUSR|S_IRUGO},
>  		[NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO},
>  #ifdef CONFIG_NFSD_V4
>  		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index e4c643255dc9..6874c2de670b 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -156,6 +156,10 @@ int nfsd_create_serv(struct net *net);
>  void nfsd_destroy_serv(struct net *net);
>  
>  extern int nfsd_max_blksize;
> +void nfsd_update_nets(void);
> +extern unsigned int	nfsd_max_threads;
> +extern unsigned long	nfsd_net_used;
> +extern unsigned int	nfsd_net_cnt;
>  
>  static inline int nfsd_v4client(struct svc_rqst *rq)
>  {
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index b005b2e2e6ad..75d78c17756f 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -80,6 +80,21 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
>  unsigned long	nfsd_drc_max_mem;
>  unsigned long	nfsd_drc_slotsize_sum;
>  
> +/*
> + * nfsd_max_threads is auto-configured based on available ram.
> + * Each network namespace can configure a minimum number of threads
> + * and optionally a maximum.
> + * nfsd_net_used is the number of the max or min from each net namespace.
> + * nfsd_new_cnt is the number of net namespaces with a configured minimum
> + *    but no configured maximum.
> + * When nfsd_max_threads exceeds nfsd_net_used, the different is divided
> + * by nfsd_net_cnt and this number gives the excess above the configured minimum
> + * for all net namespaces without a configured maximum.
> + */
> +unsigned int	nfsd_max_threads;
> +unsigned long	nfsd_net_used;
> +unsigned int	nfsd_net_cnt;
> +
>  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
>  static const struct svc_version *nfsd_acl_version[] = {
>  # if defined(CONFIG_NFSD_V2_ACL)
> @@ -130,6 +145,47 @@ struct svc_program		nfsd_program = {
>  	.pg_rpcbind_set		= nfsd_rpcbind_set,
>  };
>  
> +void nfsd_update_nets(void)
> +{
> +	struct net *net;
> +
> +	if (nfsd_max_threads == 0) {
> +		nfsd_max_threads = (nr_free_buffer_pages() >> 7) /
> +			(NFSSVC_MAXBLKSIZE >> PAGE_SHIFT);
> +	}
> +	nfsd_net_used = 0;
> +	nfsd_net_cnt = 0;
> +	down_read(&net_rwsem);
> +	for_each_net(net) {
> +		struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +
> +		if (!nn->nfsd_serv)
> +			continue;
> +		if (nn->max_threads > 0) {
> +			nfsd_net_used += nn->max_threads;
> +		} else {
> +			nfsd_net_used += nn->nfsd_serv->sv_nrthreads;
> +			nfsd_net_cnt += 1;
> +		}
> +	}
> +	up_read(&net_rwsem);
> +}
> +
> +static inline int nfsd_max_pool_threads(struct svc_pool *p, struct nfsd_net *nn)
> +{
> +	int svthreads = nn->nfsd_serv->sv_nrthreads;
> +
> +	if (nn->max_threads > 0)
> +		return nn->max_threads;
> +	if (nfsd_net_cnt == 0 || svthreads == 0)
> +		return 0;
> +	if (nfsd_max_threads < nfsd_net_cnt)
> +		return p->sp_nrthreads;
> +	/* Share nfsd_max_threads among all net, then among pools in this net. */
> +	return p->sp_nrthreads +
> +		nfsd_max_threads / nfsd_net_cnt * p->sp_nrthreads / svthreads;
> +}
> +
>  bool nfsd_support_version(int vers)
>  {
>  	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
> @@ -474,6 +530,7 @@ void nfsd_destroy_serv(struct net *net)
>  	spin_lock(&nfsd_notifier_lock);
>  	nn->nfsd_serv = NULL;
>  	spin_unlock(&nfsd_notifier_lock);
> +	nfsd_update_nets();
>  
>  	/* check if the notifier still has clients */
>  	if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
> @@ -614,6 +671,8 @@ int nfsd_create_serv(struct net *net)
>  	nn->nfsd_serv = serv;
>  	spin_unlock(&nfsd_notifier_lock);
>  
> +	nfsd_update_nets();
> +
>  	set_max_drc();
>  	/* check if the notifier is already set */
>  	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
> @@ -720,6 +779,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
>  			goto out;
>  	}
>  out:
> +	nfsd_update_nets();
>  	return err;
>  }
>  
> @@ -759,6 +819,7 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
>  	error = nfsd_set_nrthreads(n, nthreads, net);
>  	if (error)
>  		goto out_put;
> +	nfsd_update_nets();
>  	error = serv->sv_nrthreads;
>  out_put:
>  	if (serv->sv_nrthreads == 0)
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 77bbd23aa150..92b888e178e8 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -2054,6 +2054,25 @@ TRACE_EVENT(nfsd_ctl_maxconn,
>  	)
>  );
>  
> +TRACE_EVENT(nfsd_ctl_maxthreads,
> +	TP_PROTO(
> +		const struct net *net,
> +		int maxthreads
> +	),
> +	TP_ARGS(net, maxthreads),
> +	TP_STRUCT__entry(
> +		__field(unsigned int, netns_ino)
> +		__field(int, maxthreads)
> +	),
> +	TP_fast_assign(
> +		__entry->netns_ino = net->ns.inum;
> +		__entry->maxthreads = maxthreads
> +	),
> +	TP_printk("maxthreads=%d",
> +		__entry->maxthreads
> +	)
> +);
> +
>  TRACE_EVENT(nfsd_ctl_time,
>  	TP_PROTO(
>  		const struct net *net,

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/14 RFC] support automatic changes to nfsd thread count
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (13 preceding siblings ...)
  2024-07-15  7:14 ` [PATCH 14/14] nfsd: adjust number of running nfsd threads NeilBrown
@ 2024-07-15 17:29 ` Jeff Layton
  2024-07-24 19:43 ` Chuck Lever III
  15 siblings, 0 replies; 37+ messages in thread
From: Jeff Layton @ 2024-07-15 17:29 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> This patch set (against nfsd-next) enables automatic adjustment of the
> number of nfsd threads.  The number can increase under high load, and
> reduce after idle periods.
> 
> The first few patches (1-6) are cleanups that may not be entirely
> relevant to the current series.  They could safely land any time and
> only need minimal review.
> 
> Patch 9,10,11 remove some places were sv_nrthreads are used for things
> other than counting threads.  It is use to adjust other limits.  At the
> time this seemed like an easy and sensible solution.  I now have to
> repent of that short-cut and find a better way to impose reasonable
> limits.
> 
> These and the other sundry patches (7,8,12) can, I think safely land
> whenever that get sufficient review.  I think they are sensible even if
> we won't end up adjusting threads dynamically.
> 
> Patches 13 and 14 build on all this to provide the desired
> functionality.  Patch 13 allows the maximum to be configured, and patch
> 14 starts or stops threads based on some simple triggers.
> 
> For 13 I decided that if the user/admin makes no explicit configuration,
> then the currently request number of threads becomes a minimum, and a
> maximum is determined based on the amount of memory.  This will make
> the patch set immediately useful but shouldn't unduly impact existing
> configurations.
> 
> For patch 14 I only implemented starting a thread when there is work to
> do but no threads to do it, and stopping a thread when it has been idle
> for 5 seconds.  The start-up is deliberately serialised so at least one
> NFS request is serviced between the decision to start a thread and the
> action of starting it.  This hopefully encourages a ramping up of thread
> count rather than a sudden jump.
> 
> There is certain room for discussion around the wisdom of these
> heuristics, and what other heuristics are needed - we probably want a
> shrinker to impose memory pressure of the number of threads.  We
> probably want a thread to exit rather than retry when a memory
> allocation in svc_alloc_arg() fails.
> 
> I certainly wouldn't recommend patch 14 landing in any hurry at all.
> 
> I'd love to hear what y'all think, and what experiences you have when
> testing it.
> 
> 

This looks mostly reasonable, modulo a few nits on the later patches.
You can add my Reviewed-by to 1-9. 10-12 others look tentatively OK
too, but I'm less familiar with the slot handling code, and it sounds
like you're going to rework that part anyway.

For 13 I have some ideas about how we should present this from a user
interface standpoint that I wrote in my reply. The heuristics you came
up with in 14 look like a fine place to start.

Cheers!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int.
  2024-07-15 14:12   ` Jeff Layton
  2024-07-15 14:33     ` Jeff Layton
@ 2024-07-16  1:33     ` NeilBrown
  2024-07-24 19:36       ` Chuck Lever
  1 sibling, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-16  1:33 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Steve Dickson

On Tue, 16 Jul 2024, Jeff Layton wrote:
> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> > sp_nrthreads is only ever accessed under the service mutex
> >   nlmsvc_mutex nfs_callback_mutex nfsd_mutex
> > so these is no need for it to be an atomic_t.
> > 
> > The fact that all code using it is single-threaded means that we can
> > simplify svc_pool_victim and remove the temporary elevation of
> > sp_nrthreads.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfsctl.c           |  2 +-
> >  fs/nfsd/nfssvc.c           |  2 +-
> >  include/linux/sunrpc/svc.h |  4 ++--
> >  net/sunrpc/svc.c           | 31 +++++++++++--------------------
> >  4 files changed, 15 insertions(+), 24 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 5b0f2e0d7ccf..d85b6d1fa31f 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -1769,7 +1769,7 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb, struct genl_info *info)
> >  			struct svc_pool *sp = &nn->nfsd_serv->sv_pools[i];
> >  
> >  			err = nla_put_u32(skb, NFSD_A_SERVER_THREADS,
> > -					  atomic_read(&sp->sp_nrthreads));
> > +					  sp->sp_nrthreads);
> >  			if (err)
> >  				goto err_unlock;
> >  		}
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index 4438cdcd4873..7377422a34df 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -641,7 +641,7 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net)
> >  
> >  	if (serv)
> >  		for (i = 0; i < serv->sv_nrpools && i < n; i++)
> > -			nthreads[i] = atomic_read(&serv->sv_pools[i].sp_nrthreads);
> > +			nthreads[i] = serv->sv_pools[i].sp_nrthreads;
> >  	return 0;
> >  }
> >  
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index e4fa25fafa97..99e9345d829e 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -33,9 +33,9 @@
> >   * node traffic on multi-node NUMA NFS servers.
> >   */
> >  struct svc_pool {
> > -	unsigned int		sp_id;	    	/* pool id; also node id on NUMA */
> > +	unsigned int		sp_id;		/* pool id; also node id on NUMA */
> >  	struct lwq		sp_xprts;	/* pending transports */
> > -	atomic_t		sp_nrthreads;	/* # of threads in pool */
> > +	unsigned int		sp_nrthreads;	/* # of threads in pool */
> >  	struct list_head	sp_all_threads;	/* all server threads */
> >  	struct llist_head	sp_idle_threads; /* idle server threads */
> >  
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 072ad115ae3d..0d8588bc693c 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -725,7 +725,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
> >  	serv->sv_nrthreads += 1;
> >  	spin_unlock_bh(&serv->sv_lock);
> >  
> > -	atomic_inc(&pool->sp_nrthreads);
> > +	pool->sp_nrthreads += 1;
> >  
> >  	/* Protected by whatever lock the service uses when calling
> >  	 * svc_set_num_threads()
> > @@ -780,31 +780,22 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
> >  	struct svc_pool *pool;
> >  	unsigned int i;
> >  
> > -retry:
> >  	pool = target_pool;
> >  
> > -	if (pool != NULL) {
> > -		if (atomic_inc_not_zero(&pool->sp_nrthreads))
> > -			goto found_pool;
> > -		return NULL;
> > -	} else {
> > +	if (!pool) {
> >  		for (i = 0; i < serv->sv_nrpools; i++) {
> >  			pool = &serv->sv_pools[--(*state) % serv->sv_nrpools];
> > -			if (atomic_inc_not_zero(&pool->sp_nrthreads))
> > -				goto found_pool;
> > +			if (pool->sp_nrthreads)
> > +				break;
> >  		}
> > -		return NULL;
> >  	}
> >  
> > -found_pool:
> > -	set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > -	set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> > -	if (!atomic_dec_and_test(&pool->sp_nrthreads))
> > +	if (pool && pool->sp_nrthreads) {
> > +		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > +		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> >  		return pool;
> > -	/* Nothing left in this pool any more */
> > -	clear_bit(SP_NEED_VICTIM, &pool->sp_flags);
> > -	clear_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > -	goto retry;
> > +	}
> > +	return NULL;
> >  }
> >  
> >  static int
> > @@ -883,7 +874,7 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> >  	if (!pool)
> >  		nrservs -= serv->sv_nrthreads;
> >  	else
> > -		nrservs -= atomic_read(&pool->sp_nrthreads);
> > +		nrservs -= pool->sp_nrthreads;
> >  
> >  	if (nrservs > 0)
> >  		return svc_start_kthreads(serv, pool, nrservs);
> > @@ -953,7 +944,7 @@ svc_exit_thread(struct svc_rqst *rqstp)
> >  
> >  	list_del_rcu(&rqstp->rq_all);
> >  
> > -	atomic_dec(&pool->sp_nrthreads);
> > +	pool->sp_nrthreads -= 1;
> >  
> >  	spin_lock_bh(&serv->sv_lock);
> >  	serv->sv_nrthreads -= 1;
> 
> I don't think svc_exit_thread is called with the nfsd_mutex held, so if
> several threads were exiting at the same time, they could race here.

This is subtle and deserves explanation in the commit.
svc_exit_thread() is called in a thread *after* svc_thread_should_stop()
has returned true.  That means RQ_VICTIM is set and most likely
SP_NEED_VICTIM was set

SP_NEED_VICTIM is set in svc_pool_victim() which is called from
svc_stop_kthreads() which requires that the mutex is held.
svc_stop_kthreads() waits for SP_VICTIM_REMAINS to be cleared which is
the last thing that svc_exit_thread() does.
So when svc_exit_thread() is called, the mutex is held by some other
thread that is calling svc_set_num_threads().

This is also why the list_del_rcu() in svc_exit_thread() is safe.

The case there svc_exit_thread() is called but SP_NEED_VICTIM wasn't set
(only RQ_VICTIM) is in the ETIMEDOUT case of nfsd(), in which case
nfsd() ensures that the mutex is held.

This was why
 [PATCH 07/14] Change unshare_fs_struct() to never fail.
was needed.  If that fails in the current code, svc_exit_thread() can be
called without the mutex - which is already a theoretical problem for
the list_del_rcu().

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 07/14] Change unshare_fs_struct() to never fail.
  2024-07-15 14:39   ` Jeff Layton
@ 2024-07-16  1:48     ` NeilBrown
  0 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-16  1:48 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Steve Dickson

On Tue, 16 Jul 2024, Jeff Layton wrote:
> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> > nfsd threads need to not share the init fs_struct as they need to
> > manipulate umask independently.  So they call unshare_fs_struct() and
> > are the only user of that function.
> > 
> > In the unlikely event that unshare_fs_struct() fails, the thread will
> > exit calling svc_exit_thread() BEFORE svc_thread_should_stop() reports
> > 'true'.
> > 
> > This is a problem because svc_exit_thread() assumes that
> > svc_stop_threads() is running and consequently (in the nfsd case)
> > nfsd_mutex is held.  This ensures that the list_del_rcu() call in
> > svc_exit_thread() cannot race with any other manipulation of
> > ->sp_all_threads.
> > 
> > While it would be possible to add some other exclusion, doing so would
> > introduce unnecessary complexity.  unshare_fs_struct() does not fail in
> > practice.  So the simplest solution is to make this explicit.  i.e.  use
> > __GFP_NOFAIL which is safe on such a small allocation - about 64 bytes.
> > 
> 
> I know some folks are trying hard to get rid of (or minimize the use
> of) __GFP_NOFAIL. This might not be a long term solution.

Other folk are trying to make NOFAIL a standard option.

See
  https://lore.kernel.org/all/22363d0a-71db-4ba7-b5e1-8bb515811d1c@moroto.mountain/
and surrounding.  In that email Dan suggests GFP_SMALL as a standard
option that is used for smallish allocations and never fails (and warns
in the allocation is bigger than X).

Also
  https://lwn.net/Articles/964793/

> 
> > Change unshare_fs_struct() to not return any error, and remove the error
> > handling from nfsd().
> > 
> > An alternate approach would be to create a variant of
> > kthread_create_on_node() which didn't set CLONE_FS.
> > 
> 
> This sounds like it might be the better approach. I guess you could
> just add a set of CLONE_* flags to struct kthread_create_info and fix
> up the callers to set that appropriately?

I tried that first.  I didn't like it.  Lots of effort for little gain,
where __GFP_NOFAIL fixed the same problem more cleanly.
For reference (in case I do need it eventually) below is a patch from my
'git stash' history.

NeilBrown


 fs/fs_struct.c             | 23 -----------------------
 fs/nfsd/nfssvc.c           | 14 +++++---------
 include/linux/fs_struct.h  |  1 -
 include/linux/kthread.h    |  8 ++++++++
 include/linux/sunrpc/svc.h |  1 +
 kernel/kthread.c           | 33 +++++++++++++++++++--------------
 net/sunrpc/svc.c           |  6 ++++--
 7 files changed, 37 insertions(+), 49 deletions(-)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index 64c2d0814ed6..a94764084c8c 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -130,29 +130,6 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 	return fs;
 }
 
-int unshare_fs_struct(void)
-{
-	struct fs_struct *fs = current->fs;
-	struct fs_struct *new_fs = copy_fs_struct(fs);
-	int kill;
-
-	if (!new_fs)
-		return -ENOMEM;
-
-	task_lock(current);
-	spin_lock(&fs->lock);
-	kill = !--fs->users;
-	current->fs = new_fs;
-	spin_unlock(&fs->lock);
-	task_unlock(current);
-
-	if (kill)
-		free_fs_struct(fs);
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(unshare_fs_struct);
-
 int current_umask(void)
 {
 	return current->fs->umask;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c0d17b92b249..d37b9cbbc250 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -666,6 +666,7 @@ int nfsd_create_serv(struct net *net)
 	if (serv == NULL)
 		return -ENOMEM;
 
+	serv->sv_unshare_fs = true;
 	serv->sv_maxconn = nn->max_connections;
 	error = svc_bind(serv, net);
 	if (error < 0) {
@@ -915,14 +916,10 @@ nfsd(void *vrqstp)
 	struct net *net = perm_sock->xpt_net;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
-	/* At this point, the thread shares current->fs
-	 * with the init process. We need to create files with the
-	 * umask as defined by the client instead of init's umask. */
-	if (unshare_fs_struct() < 0) {
-		printk("Unable to start nfsd thread: out of memory\n");
-		goto out;
-	}
-
+	/* Thread was created with CLONE_FS disabled so we have
+	 * a private current->fs in which we can control umask
+	 * for file creation.
+	 */
 	current->fs->umask = 0;
 
 	atomic_inc(&nfsd_th_cnt);
@@ -943,7 +940,6 @@ nfsd(void *vrqstp)
 
 	atomic_dec(&nfsd_th_cnt);
 
-out:
 	/* Release the thread */
 	svc_exit_thread(rqstp);
 	return 0;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index 783b48dedb72..a854bfa4708c 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -22,7 +22,6 @@ extern void set_fs_root(struct fs_struct *, const struct path *);
 extern void set_fs_pwd(struct fs_struct *, const struct path *);
 extern struct fs_struct *copy_fs_struct(struct fs_struct *);
 extern void free_fs_struct(struct fs_struct *);
-extern int unshare_fs_struct(void);
 
 static inline void get_fs_root(struct fs_struct *fs, struct path *root)
 {
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b11f53c1ba2e..222779a40389 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -24,6 +24,8 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
  * the stopped state.  This is just a helper for kthread_create_on_node();
  * see the documentation there for more details.
  */
+#define kthread_create_on_node(threadfn, data, node, namefmt, arg...) \
+	kthread_create_on_node_flags(threadfn, data, NUMA_NO_NODE, CLONE_FS, namefmt, ##arg)
 #define kthread_create(threadfn, data, namefmt, arg...) \
 	kthread_create_on_node(threadfn, data, NUMA_NO_NODE, namefmt, ##arg)
 
@@ -33,6 +35,12 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 					  unsigned int cpu,
 					  const char *namefmt);
 
+struct task_struct *kthread_create_on_node_flags(int (*threadfn)(void *data),
+						 void *data,
+						 int node,
+						 int flags,
+						 const char *namefmt, ...);
+
 void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk);
 bool set_kthread_struct(struct task_struct *p);
 
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 23617da0e565..405f8ec8a505 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -87,6 +87,7 @@ struct svc_serv {
 	unsigned int		sv_nrpools;	/* number of thread pools */
 	struct svc_pool *	sv_pools;	/* array of thread pools */
 	int			(*sv_threadfn)(void *data);
+	bool			sv_unshare_fs;	/* Does serv need umask? */
 
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
 	struct lwq		sv_cb_list;	/* queue for callback requests
diff --git a/kernel/kthread.c b/kernel/kthread.c
index c5e40830c1f2..e97cbab40034 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -42,6 +42,7 @@ struct kthread_create_info
 	int (*threadfn)(void *data);
 	void *data;
 	int node;
+	int clone_flags;
 
 	/* Result passed back to kthread_create() from kthreadd. */
 	struct task_struct *result;
@@ -409,7 +410,7 @@ static void create_kthread(struct kthread_create_info *create)
 #endif
 	/* We want our own signal handler (we take no signals by default). */
 	pid = kernel_thread(kthread, create, create->full_name,
-			    CLONE_FS | CLONE_FILES | SIGCHLD);
+			    create->clone_flags | CLONE_FILES | SIGCHLD);
 	if (pid < 0) {
 		/* Release the structure when caller killed by a fatal signal. */
 		struct completion *done = xchg(&create->done, NULL);
@@ -424,11 +425,12 @@ static void create_kthread(struct kthread_create_info *create)
 	}
 }
 
-static __printf(4, 0)
-struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
-						    void *data, int node,
-						    const char namefmt[],
-						    va_list args)
+static __printf(5, 0)
+struct task_struct *__kthread_create_on_node_flags(int (*threadfn)(void *data),
+						   void *data,
+						   int node, int clone_flags,
+						   const char namefmt[],
+						   va_list args)
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 	struct task_struct *task;
@@ -440,6 +442,7 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
 	create->threadfn = threadfn;
 	create->data = data;
 	create->node = node;
+	create->clone_flags = clone_flags;
 	create->done = &done;
 	create->full_name = kvasprintf(GFP_KERNEL, namefmt, args);
 	if (!create->full_name) {
@@ -500,21 +503,23 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
  *
  * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
  */
-struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
-					   void *data, int node,
-					   const char namefmt[],
-					   ...)
+struct task_struct *kthread_create_on_node_flags(int (*threadfn)(void *data),
+						 void *data, int node,
+						 int clone_flags,
+						 const char namefmt[],
+						 ...)
 {
 	struct task_struct *task;
 	va_list args;
 
 	va_start(args, namefmt);
-	task = __kthread_create_on_node(threadfn, data, node, namefmt, args);
+	task = __kthread_create_on_node_flags(threadfn, data, node, clone_flags,
+					      namefmt, args);
 	va_end(args);
 
 	return task;
 }
-EXPORT_SYMBOL(kthread_create_on_node);
+EXPORT_SYMBOL(kthread_create_on_node_flags);
 
 static void __kthread_bind_mask(struct task_struct *p, const struct cpumask *mask, unsigned int state)
 {
@@ -870,8 +875,8 @@ __kthread_create_worker(int cpu, unsigned int flags,
 	if (cpu >= 0)
 		node = cpu_to_node(cpu);
 
-	task = __kthread_create_on_node(kthread_worker_fn, worker,
-						node, namefmt, args);
+	task = __kthread_create_on_node_flags(kthread_worker_fn, worker,
+					      node, CLONE_FS, namefmt, args);
 	if (IS_ERR(task))
 		goto fail_task;
 
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 2b4b1276d4e8..a3c94778b547 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -781,8 +781,10 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
 		rqstp = svc_prepare_thread(serv, chosen_pool, node);
 		if (IS_ERR(rqstp))
 			return PTR_ERR(rqstp);
-		task = kthread_create_on_node(serv->sv_threadfn, rqstp,
-					      node, "%s", serv->sv_name);
+		task = kthread_create_on_node_flags(serv->sv_threadfn, rqstp,
+						    node,
+						    serv->sv_unshare_fs ? 0 : CLONE_FS,
+						    "%s", serv->sv_name);
 		if (IS_ERR(task)) {
 			svc_exit_thread(rqstp);
 			return PTR_ERR(task);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations.
  2024-07-15 15:52   ` Jeff Layton
@ 2024-07-16  2:04     ` NeilBrown
  0 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-16  2:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Steve Dickson

On Tue, 16 Jul 2024, Jeff Layton wrote:
> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> > The heuristic for limiting the number of incoming connections to nfsd
> > currently uses sv_nrthreads - allowing more connections if more threads
> > were configured.
> > 
> > A future patch will allow number of threads to grow dynamically so that
> > there is no need to configure sv_nrthreads.  So we need a different
> > solution for limiting connections.
> > 
> > It isn't clear what problem is solved by limiting connections (as
> > mentioned in a code comment) but the most likely problem is a connection
> > storm - many connections that are not doing productive work.  These will
> > be closed after about 6 minutes already but it might help to slow down a
> > storm.
> > 
> > This patch add a per-connection flag XPT_PEER_VALID which indicates
> > that the peer has presented a filehandle for which it has some sort of
> > access.  i.e the peer is known to be trusted in some way.  We now only
> > count connections which have NOT be determined to be valid.  There
> > should be relative few of these at any given time.
> > 
> > If the number of non-validated peer exceed as limit - currently 64 - we
> > close the oldest non-validated peer to avoid having too many of these
> > useless connections.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/netns.h                 |  4 ++--
> >  fs/nfsd/nfsfh.c                 |  8 ++++++++
> >  include/linux/sunrpc/svc.h      |  2 +-
> >  include/linux/sunrpc/svc_xprt.h |  4 ++++
> >  net/sunrpc/svc_xprt.c           | 33 +++++++++++++++++----------------
> >  5 files changed, 32 insertions(+), 19 deletions(-)
> > 
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index 238fc4e56e53..0d2ac15a5003 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -128,8 +128,8 @@ struct nfsd_net {
> >  	unsigned char writeverf[8];
> >  
> >  	/*
> > -	 * Max number of connections this nfsd container will allow. Defaults
> > -	 * to '0' which is means that it bases this on the number of threads.
> > +	 * Max number of non-validated connections this nfsd container
> > +	 * will allow.  Defaults to '0' gets mapped to 64.
> >  	 */
> >  	unsigned int max_connections;
> >  
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index 0b75305fb5f5..08742bf8de02 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -391,6 +391,14 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> >  		goto out;
> >  
> >  skip_pseudoflavor_check:
> > +	if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags) &&
> > +	    !test_and_set_bit(XPT_PEER_VALID, &rqstp->rq_xprt->xpt_flags)) {
> > +		struct svc_serv *serv = rqstp->rq_server;
> > +		spin_lock(&serv->sv_lock);
> > +		serv->sv_tmpcnt -= 1;
> > +		spin_unlock(&serv->sv_lock);
> > +	}
> > +
> 
> This is the only place you set XPT_PEER_VALID, but this change affects
> more services than just nfsd. What about lockd? Do we need a similar
> change there?

Lockd calls nlmsvc_ops->fopen which is nlm_fopen() which calls
nfsd_open() which calls fh_verify().  So lockd is safe.

The nfs callback handler might need help, but it sets ->sv_maxconn=1024,
so I think it is safe for now.
(lockd defaults nlm_max_connections to 1024, so it is also safe without
calling fh_verify.  Maybe I should clean up)

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
  2024-07-15 17:06   ` Jeff Layton
@ 2024-07-16  3:21     ` NeilBrown
  2024-07-16 11:00       ` Jeff Layton
  0 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-07-16  3:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Steve Dickson

On Tue, 16 Jul 2024, Jeff Layton wrote:
> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> > A future patch will allow the number of threads in each nfsd pool to
> > vary dynamically.
> > The lower bound will be the number explicit requested via
> > /proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads
> > 
> > The upper bound can be set in each net-namespace by writing
> > /proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
> > there is no per-pool upper limit.
> > 
> > If no upper bound is set, then one is calculated.  A global upper limit
> > is chosen based on amount of memory.  This limit only affects dynamic
> > changes. Static configuration can always over-ride it.
> > 
> > We track how many threads are configured in each net namespace, with the
> > max or the min.  We also track how many net namespaces have nfsd
> > configured with only a min, not a max.
> > 
> > The difference between the calculated max and the total allocation is
> > available to be shared among those namespaces which don't have a maximum
> > configured.  Within a namespace, the available share is distributed
> > equally across all pools.
> > 
> > In the common case there is one namespace and one pool.  A small number
> > of threads are configured as a minimum and no maximum is set.  In this
> > case the effective maximum will be directly based on total memory.
> > Approximately 8 per gigabyte.
> > 
> 
> 
> Some of this may come across as bikeshedding, but I'd probably prefer
> that this work a bit differently:
> 
> 1/ I don't think we should enable this universally -- at least not
> initially. What I'd prefer to see is a new pool_mode for the dynamic
> threadpools (maybe call it "dynamic"). That gives us a clear opt-in
> mechanism. Later once we're convinced it's safe, we can make "dynamic"
> the default instead of "global".
> 
> 2/ Rather than specifying a max_threads value separately, why not allow
> the old threads/pool_threads interface to set the max and just have a
> reasonable minimum setting (like the current default of 8). Since we're
> growing the threadpool dynamically, I don't see why we need to have a
> real configurable minimum.
> 
> 3/ the dynamic pool-mode should probably be layered on top of the
> pernode pool mode. IOW, in a NUMA configuration, we should split the
> threads across NUMA nodes.

Maybe we should start by discussing the goal.  What do we want
configuration to look like when we finish?

I think we want it to be transparent.  Sysadmin does nothing, and it all
works perfectly.  Or as close to that as we can get.

So I think the "nproc" option to rpc.nfsd should eventually be ignored
except for whether it is zero or non-zero.  If it is non-zero we start 1
thread on each NUMA node and let that grow with limits imposed primarily
by memory pressure.

We should probably change

#define SVC_POOL_DEFAULT	SVC_POOL_GLOBAL

to

#define SVC_POOL_DEFAULT	SVC_POOL_AUTO

about 10 years ago, but failing that, maybe change it tomorrow?

I'm not sure how
    /proc/fs/nfsd/{threads,pool_threads}
should be handled.  Like you I don't think it is really useful to have
a configured minimum but I don't want them to be imposed as a maximum
because I want servers with the current default (of 8) to be able to
start more new threads if necessary without needing a config change.
Maybe that outcome can be delayed until rpc.nfsd is updated?

I don't really like the idea of a dynamic pool mode.  I want the pool to
*always* be dynamic.

Thanks,
NeilBrown

> 
> 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/netns.h  |  6 +++++
> >  fs/nfsd/nfsctl.c | 45 +++++++++++++++++++++++++++++++++++
> >  fs/nfsd/nfsd.h   |  4 ++++
> >  fs/nfsd/nfssvc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/nfsd/trace.h  | 19 +++++++++++++++
> >  5 files changed, 135 insertions(+)
> > 
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index 0d2ac15a5003..329484696a42 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -133,6 +133,12 @@ struct nfsd_net {
> >  	 */
> >  	unsigned int max_connections;
> >  
> > +	/*
> > +	 * Maximum number of threads to auto-adjust up to.  If 0 then a
> > +	 * share of nfsd_max_threads will be used.
> > +	 */
> > +	unsigned int max_threads;
> > +
> >  	u32 clientid_base;
> >  	u32 clientid_counter;
> >  	u32 clverifier_counter;
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index d85b6d1fa31f..37e9936567e9 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -48,6 +48,7 @@ enum {
> >  	NFSD_Ports,
> >  	NFSD_MaxBlkSize,
> >  	NFSD_MaxConnections,
> > +	NFSD_MaxThreads,
> >  	NFSD_Filecache,
> >  	NFSD_Leasetime,
> >  	NFSD_Gracetime,
> > @@ -68,6 +69,7 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size);
> >  static ssize_t write_ports(struct file *file, char *buf, size_t size);
> >  static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
> >  static ssize_t write_maxconn(struct file *file, char *buf, size_t size);
> > +static ssize_t write_maxthreads(struct file *file, char *buf, size_t size);
> >  #ifdef CONFIG_NFSD_V4
> >  static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
> >  static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
> > @@ -87,6 +89,7 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
> >  	[NFSD_Ports] = write_ports,
> >  	[NFSD_MaxBlkSize] = write_maxblksize,
> >  	[NFSD_MaxConnections] = write_maxconn,
> > +	[NFSD_MaxThreads] = write_maxthreads,
> >  #ifdef CONFIG_NFSD_V4
> >  	[NFSD_Leasetime] = write_leasetime,
> >  	[NFSD_Gracetime] = write_gracetime,
> > @@ -939,6 +942,47 @@ static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
> >  	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxconn);
> >  }
> >  
> > +/*
> > + * write_maxthreads - Set or report the current max number threads
> > + *
> > + * Input:
> > + *			buf:		ignored
> > + *			size:		zero
> > + * OR
> > + *
> > + * Input:
> > + *			buf:		C string containing an unsigned
> > + *					integer value representing the new
> > + *					max number of threads
> > + *			size:		non-zero length of C string in @buf
> > + * Output:
> > + *	On success:	passed-in buffer filled with '\n'-terminated C string
> > + *			containing numeric value of max_threads setting
> > + *			for this net namespace;
> > + *			return code is the size in bytes of the string
> > + *	On error:	return code is zero or a negative errno value
> > + */
> > +static ssize_t write_maxthreads(struct file *file, char *buf, size_t size)
> > +{
> > +	char *mesg = buf;
> > +	struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
> > +	unsigned int maxthreads = nn->max_threads;
> > +
> > +	if (size > 0) {
> > +		int rv = get_uint(&mesg, &maxthreads);
> > +
> > +		if (rv)
> > +			return rv;
> > +		trace_nfsd_ctl_maxthreads(netns(file), maxthreads);
> > +		mutex_lock(&nfsd_mutex);
> > +		nn->max_threads = maxthreads;
> > +		nfsd_update_nets();
> > +		mutex_unlock(&nfsd_mutex);
> > +	}
> > +
> > +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxthreads);
> > +}
> > +
> >  #ifdef CONFIG_NFSD_V4
> >  static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
> >  				  time64_t *time, struct nfsd_net *nn)
> > @@ -1372,6 +1416,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
> >  		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
> >  		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
> >  		[NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO},
> > +		[NFSD_MaxThreads] = {"max_threads", &transaction_ops, S_IWUSR|S_IRUGO},
> >  		[NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO},
> >  #ifdef CONFIG_NFSD_V4
> >  		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index e4c643255dc9..6874c2de670b 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -156,6 +156,10 @@ int nfsd_create_serv(struct net *net);
> >  void nfsd_destroy_serv(struct net *net);
> >  
> >  extern int nfsd_max_blksize;
> > +void nfsd_update_nets(void);
> > +extern unsigned int	nfsd_max_threads;
> > +extern unsigned long	nfsd_net_used;
> > +extern unsigned int	nfsd_net_cnt;
> >  
> >  static inline int nfsd_v4client(struct svc_rqst *rq)
> >  {
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index b005b2e2e6ad..75d78c17756f 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -80,6 +80,21 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
> >  unsigned long	nfsd_drc_max_mem;
> >  unsigned long	nfsd_drc_slotsize_sum;
> >  
> > +/*
> > + * nfsd_max_threads is auto-configured based on available ram.
> > + * Each network namespace can configure a minimum number of threads
> > + * and optionally a maximum.
> > + * nfsd_net_used is the number of the max or min from each net namespace.
> > + * nfsd_new_cnt is the number of net namespaces with a configured minimum
> > + *    but no configured maximum.
> > + * When nfsd_max_threads exceeds nfsd_net_used, the different is divided
> > + * by nfsd_net_cnt and this number gives the excess above the configured minimum
> > + * for all net namespaces without a configured maximum.
> > + */
> > +unsigned int	nfsd_max_threads;
> > +unsigned long	nfsd_net_used;
> > +unsigned int	nfsd_net_cnt;
> > +
> >  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> >  static const struct svc_version *nfsd_acl_version[] = {
> >  # if defined(CONFIG_NFSD_V2_ACL)
> > @@ -130,6 +145,47 @@ struct svc_program		nfsd_program = {
> >  	.pg_rpcbind_set		= nfsd_rpcbind_set,
> >  };
> >  
> > +void nfsd_update_nets(void)
> > +{
> > +	struct net *net;
> > +
> > +	if (nfsd_max_threads == 0) {
> > +		nfsd_max_threads = (nr_free_buffer_pages() >> 7) /
> > +			(NFSSVC_MAXBLKSIZE >> PAGE_SHIFT);
> > +	}
> > +	nfsd_net_used = 0;
> > +	nfsd_net_cnt = 0;
> > +	down_read(&net_rwsem);
> > +	for_each_net(net) {
> > +		struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > +
> > +		if (!nn->nfsd_serv)
> > +			continue;
> > +		if (nn->max_threads > 0) {
> > +			nfsd_net_used += nn->max_threads;
> > +		} else {
> > +			nfsd_net_used += nn->nfsd_serv->sv_nrthreads;
> > +			nfsd_net_cnt += 1;
> > +		}
> > +	}
> > +	up_read(&net_rwsem);
> > +}
> > +
> > +static inline int nfsd_max_pool_threads(struct svc_pool *p, struct nfsd_net *nn)
> > +{
> > +	int svthreads = nn->nfsd_serv->sv_nrthreads;
> > +
> > +	if (nn->max_threads > 0)
> > +		return nn->max_threads;
> > +	if (nfsd_net_cnt == 0 || svthreads == 0)
> > +		return 0;
> > +	if (nfsd_max_threads < nfsd_net_cnt)
> > +		return p->sp_nrthreads;
> > +	/* Share nfsd_max_threads among all net, then among pools in this net. */
> > +	return p->sp_nrthreads +
> > +		nfsd_max_threads / nfsd_net_cnt * p->sp_nrthreads / svthreads;
> > +}
> > +
> >  bool nfsd_support_version(int vers)
> >  {
> >  	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
> > @@ -474,6 +530,7 @@ void nfsd_destroy_serv(struct net *net)
> >  	spin_lock(&nfsd_notifier_lock);
> >  	nn->nfsd_serv = NULL;
> >  	spin_unlock(&nfsd_notifier_lock);
> > +	nfsd_update_nets();
> >  
> >  	/* check if the notifier still has clients */
> >  	if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
> > @@ -614,6 +671,8 @@ int nfsd_create_serv(struct net *net)
> >  	nn->nfsd_serv = serv;
> >  	spin_unlock(&nfsd_notifier_lock);
> >  
> > +	nfsd_update_nets();
> > +
> >  	set_max_drc();
> >  	/* check if the notifier is already set */
> >  	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
> > @@ -720,6 +779,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
> >  			goto out;
> >  	}
> >  out:
> > +	nfsd_update_nets();
> >  	return err;
> >  }
> >  
> > @@ -759,6 +819,7 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
> >  	error = nfsd_set_nrthreads(n, nthreads, net);
> >  	if (error)
> >  		goto out_put;
> > +	nfsd_update_nets();
> >  	error = serv->sv_nrthreads;
> >  out_put:
> >  	if (serv->sv_nrthreads == 0)
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index 77bbd23aa150..92b888e178e8 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -2054,6 +2054,25 @@ TRACE_EVENT(nfsd_ctl_maxconn,
> >  	)
> >  );
> >  
> > +TRACE_EVENT(nfsd_ctl_maxthreads,
> > +	TP_PROTO(
> > +		const struct net *net,
> > +		int maxthreads
> > +	),
> > +	TP_ARGS(net, maxthreads),
> > +	TP_STRUCT__entry(
> > +		__field(unsigned int, netns_ino)
> > +		__field(int, maxthreads)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->netns_ino = net->ns.inum;
> > +		__entry->maxthreads = maxthreads
> > +	),
> > +	TP_printk("maxthreads=%d",
> > +		__entry->maxthreads
> > +	)
> > +);
> > +
> >  TRACE_EVENT(nfsd_ctl_time,
> >  	TP_PROTO(
> >  		const struct net *net,
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
  2024-07-16  3:21     ` NeilBrown
@ 2024-07-16 11:00       ` Jeff Layton
  2024-07-16 13:31         ` Chuck Lever III
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Layton @ 2024-07-16 11:00 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Steve Dickson

On Tue, 2024-07-16 at 13:21 +1000, NeilBrown wrote:
> On Tue, 16 Jul 2024, Jeff Layton wrote:
> > On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> > > A future patch will allow the number of threads in each nfsd pool to
> > > vary dynamically.
> > > The lower bound will be the number explicit requested via
> > > /proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads
> > > 
> > > The upper bound can be set in each net-namespace by writing
> > > /proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
> > > there is no per-pool upper limit.
> > > 
> > > If no upper bound is set, then one is calculated.  A global upper limit
> > > is chosen based on amount of memory.  This limit only affects dynamic
> > > changes. Static configuration can always over-ride it.
> > > 
> > > We track how many threads are configured in each net namespace, with the
> > > max or the min.  We also track how many net namespaces have nfsd
> > > configured with only a min, not a max.
> > > 
> > > The difference between the calculated max and the total allocation is
> > > available to be shared among those namespaces which don't have a maximum
> > > configured.  Within a namespace, the available share is distributed
> > > equally across all pools.
> > > 
> > > In the common case there is one namespace and one pool.  A small number
> > > of threads are configured as a minimum and no maximum is set.  In this
> > > case the effective maximum will be directly based on total memory.
> > > Approximately 8 per gigabyte.
> > > 
> > 
> > 
> > Some of this may come across as bikeshedding, but I'd probably prefer
> > that this work a bit differently:
> > 
> > 1/ I don't think we should enable this universally -- at least not
> > initially. What I'd prefer to see is a new pool_mode for the dynamic
> > threadpools (maybe call it "dynamic"). That gives us a clear opt-in
> > mechanism. Later once we're convinced it's safe, we can make "dynamic"
> > the default instead of "global".
> > 
> > 2/ Rather than specifying a max_threads value separately, why not allow
> > the old threads/pool_threads interface to set the max and just have a
> > reasonable minimum setting (like the current default of 8). Since we're
> > growing the threadpool dynamically, I don't see why we need to have a
> > real configurable minimum.
> > 
> > 3/ the dynamic pool-mode should probably be layered on top of the
> > pernode pool mode. IOW, in a NUMA configuration, we should split the
> > threads across NUMA nodes.
> 
> Maybe we should start by discussing the goal.  What do we want
> configuration to look like when we finish?
> 
> I think we want it to be transparent.  Sysadmin does nothing, and it all
> works perfectly.  Or as close to that as we can get.
> 

That's a nice eventual goal, but what do we do if we make this change
and it's not behaving for them? We need some way for them to revert to
traditional behavior if the new mode isn't working well.

> So I think the "nproc" option to rpc.nfsd should eventually be ignored
> except for whether it is zero or non-zero.  If it is non-zero we start 1
> thread on each NUMA node and let that grow with limits imposed primarily
> by memory pressure.
> 
> We should probably change
> 
> #define SVC_POOL_DEFAULT	SVC_POOL_GLOBAL
> 
> to
> 
> #define SVC_POOL_DEFAULT	SVC_POOL_AUTO
> 
> about 10 years ago, but failing that, maybe change it tomorrow?
> 

At this point, I wouldn't change the defaults until we're ready to make
dynamic mode the default.

> I'm not sure how
>     /proc/fs/nfsd/{threads,pool_threads}
> should be handled. 
> 

Technically, I'm fine with _not_ handling them here. We do have the new
netlink interfaces that are better suited for this. We could make the
opt-in for dynamic mode contingent on using that somehow.

> Like you I don't think it is really useful to have
> a configured minimum but I don't want them to be imposed as a maximum
> because I want servers with the current default (of 8) to be able to
> start more new threads if necessary without needing a config change.
> Maybe that outcome can be delayed until rpc.nfsd is updated?
> 

That's a bit too aggressive for my tastes. I really do think we need to
allow people to opt-in for this at first. Once we've grown comfortable
with how it all works, we can consider changing the default then.

> I don't really like the idea of a dynamic pool mode.  I want the pool to
> *always* be dynamic.
> 
> 

I think that's a good eventual goal, but I think we need to proceed
with caution. Given that this is all based around heuristics, we'll
need a way for people to revert to more traditional behavior if it's
not working well for them. Making this into a pool-mode and allowing
people to opt-in initially seems like a simple way to do that.

I am fine with eventually discarding the pool-mode settings altogether
if we get dynamic mode working well enough. I'd just prefer a more
incremental approach to getting there.

> 
> > 
> > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > >  fs/nfsd/netns.h  |  6 +++++
> > >  fs/nfsd/nfsctl.c | 45 +++++++++++++++++++++++++++++++++++
> > >  fs/nfsd/nfsd.h   |  4 ++++
> > >  fs/nfsd/nfssvc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/nfsd/trace.h  | 19 +++++++++++++++
> > >  5 files changed, 135 insertions(+)
> > > 
> > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > > index 0d2ac15a5003..329484696a42 100644
> > > --- a/fs/nfsd/netns.h
> > > +++ b/fs/nfsd/netns.h
> > > @@ -133,6 +133,12 @@ struct nfsd_net {
> > >  	 */
> > >  	unsigned int max_connections;
> > >  
> > > +	/*
> > > +	 * Maximum number of threads to auto-adjust up to.  If 0 then a
> > > +	 * share of nfsd_max_threads will be used.
> > > +	 */
> > > +	unsigned int max_threads;
> > > +
> > >  	u32 clientid_base;
> > >  	u32 clientid_counter;
> > >  	u32 clverifier_counter;
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index d85b6d1fa31f..37e9936567e9 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > > @@ -48,6 +48,7 @@ enum {
> > >  	NFSD_Ports,
> > >  	NFSD_MaxBlkSize,
> > >  	NFSD_MaxConnections,
> > > +	NFSD_MaxThreads,
> > >  	NFSD_Filecache,
> > >  	NFSD_Leasetime,
> > >  	NFSD_Gracetime,
> > > @@ -68,6 +69,7 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size);
> > >  static ssize_t write_ports(struct file *file, char *buf, size_t size);
> > >  static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
> > >  static ssize_t write_maxconn(struct file *file, char *buf, size_t size);
> > > +static ssize_t write_maxthreads(struct file *file, char *buf, size_t size);
> > >  #ifdef CONFIG_NFSD_V4
> > >  static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
> > >  static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
> > > @@ -87,6 +89,7 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
> > >  	[NFSD_Ports] = write_ports,
> > >  	[NFSD_MaxBlkSize] = write_maxblksize,
> > >  	[NFSD_MaxConnections] = write_maxconn,
> > > +	[NFSD_MaxThreads] = write_maxthreads,
> > >  #ifdef CONFIG_NFSD_V4
> > >  	[NFSD_Leasetime] = write_leasetime,
> > >  	[NFSD_Gracetime] = write_gracetime,
> > > @@ -939,6 +942,47 @@ static ssize_t write_maxconn(struct file *file, char *buf, size_t size)
> > >  	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxconn);
> > >  }
> > >  
> > > +/*
> > > + * write_maxthreads - Set or report the current max number threads
> > > + *
> > > + * Input:
> > > + *			buf:		ignored
> > > + *			size:		zero
> > > + * OR
> > > + *
> > > + * Input:
> > > + *			buf:		C string containing an unsigned
> > > + *					integer value representing the new
> > > + *					max number of threads
> > > + *			size:		non-zero length of C string in @buf
> > > + * Output:
> > > + *	On success:	passed-in buffer filled with '\n'-terminated C string
> > > + *			containing numeric value of max_threads setting
> > > + *			for this net namespace;
> > > + *			return code is the size in bytes of the string
> > > + *	On error:	return code is zero or a negative errno value
> > > + */
> > > +static ssize_t write_maxthreads(struct file *file, char *buf, size_t size)
> > > +{
> > > +	char *mesg = buf;
> > > +	struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
> > > +	unsigned int maxthreads = nn->max_threads;
> > > +
> > > +	if (size > 0) {
> > > +		int rv = get_uint(&mesg, &maxthreads);
> > > +
> > > +		if (rv)
> > > +			return rv;
> > > +		trace_nfsd_ctl_maxthreads(netns(file), maxthreads);
> > > +		mutex_lock(&nfsd_mutex);
> > > +		nn->max_threads = maxthreads;
> > > +		nfsd_update_nets();
> > > +		mutex_unlock(&nfsd_mutex);
> > > +	}
> > > +
> > > +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", maxthreads);
> > > +}
> > > +
> > >  #ifdef CONFIG_NFSD_V4
> > >  static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
> > >  				  time64_t *time, struct nfsd_net *nn)
> > > @@ -1372,6 +1416,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
> > >  		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
> > >  		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
> > >  		[NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO},
> > > +		[NFSD_MaxThreads] = {"max_threads", &transaction_ops, S_IWUSR|S_IRUGO},
> > >  		[NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO},
> > >  #ifdef CONFIG_NFSD_V4
> > >  		[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
> > > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > > index e4c643255dc9..6874c2de670b 100644
> > > --- a/fs/nfsd/nfsd.h
> > > +++ b/fs/nfsd/nfsd.h
> > > @@ -156,6 +156,10 @@ int nfsd_create_serv(struct net *net);
> > >  void nfsd_destroy_serv(struct net *net);
> > >  
> > >  extern int nfsd_max_blksize;
> > > +void nfsd_update_nets(void);
> > > +extern unsigned int	nfsd_max_threads;
> > > +extern unsigned long	nfsd_net_used;
> > > +extern unsigned int	nfsd_net_cnt;
> > >  
> > >  static inline int nfsd_v4client(struct svc_rqst *rq)
> > >  {
> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index b005b2e2e6ad..75d78c17756f 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -80,6 +80,21 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
> > >  unsigned long	nfsd_drc_max_mem;
> > >  unsigned long	nfsd_drc_slotsize_sum;
> > >  
> > > +/*
> > > + * nfsd_max_threads is auto-configured based on available ram.
> > > + * Each network namespace can configure a minimum number of threads
> > > + * and optionally a maximum.
> > > + * nfsd_net_used is the number of the max or min from each net namespace.
> > > + * nfsd_new_cnt is the number of net namespaces with a configured minimum
> > > + *    but no configured maximum.
> > > + * When nfsd_max_threads exceeds nfsd_net_used, the different is divided
> > > + * by nfsd_net_cnt and this number gives the excess above the configured minimum
> > > + * for all net namespaces without a configured maximum.
> > > + */
> > > +unsigned int	nfsd_max_threads;
> > > +unsigned long	nfsd_net_used;
> > > +unsigned int	nfsd_net_cnt;
> > > +
> > >  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> > >  static const struct svc_version *nfsd_acl_version[] = {
> > >  # if defined(CONFIG_NFSD_V2_ACL)
> > > @@ -130,6 +145,47 @@ struct svc_program		nfsd_program = {
> > >  	.pg_rpcbind_set		= nfsd_rpcbind_set,
> > >  };
> > >  
> > > +void nfsd_update_nets(void)
> > > +{
> > > +	struct net *net;
> > > +
> > > +	if (nfsd_max_threads == 0) {
> > > +		nfsd_max_threads = (nr_free_buffer_pages() >> 7) /
> > > +			(NFSSVC_MAXBLKSIZE >> PAGE_SHIFT);
> > > +	}
> > > +	nfsd_net_used = 0;
> > > +	nfsd_net_cnt = 0;
> > > +	down_read(&net_rwsem);
> > > +	for_each_net(net) {
> > > +		struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > +
> > > +		if (!nn->nfsd_serv)
> > > +			continue;
> > > +		if (nn->max_threads > 0) {
> > > +			nfsd_net_used += nn->max_threads;
> > > +		} else {
> > > +			nfsd_net_used += nn->nfsd_serv->sv_nrthreads;
> > > +			nfsd_net_cnt += 1;
> > > +		}
> > > +	}
> > > +	up_read(&net_rwsem);
> > > +}
> > > +
> > > +static inline int nfsd_max_pool_threads(struct svc_pool *p, struct nfsd_net *nn)
> > > +{
> > > +	int svthreads = nn->nfsd_serv->sv_nrthreads;
> > > +
> > > +	if (nn->max_threads > 0)
> > > +		return nn->max_threads;
> > > +	if (nfsd_net_cnt == 0 || svthreads == 0)
> > > +		return 0;
> > > +	if (nfsd_max_threads < nfsd_net_cnt)
> > > +		return p->sp_nrthreads;
> > > +	/* Share nfsd_max_threads among all net, then among pools in this net. */
> > > +	return p->sp_nrthreads +
> > > +		nfsd_max_threads / nfsd_net_cnt * p->sp_nrthreads / svthreads;
> > > +}
> > > +
> > >  bool nfsd_support_version(int vers)
> > >  {
> > >  	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
> > > @@ -474,6 +530,7 @@ void nfsd_destroy_serv(struct net *net)
> > >  	spin_lock(&nfsd_notifier_lock);
> > >  	nn->nfsd_serv = NULL;
> > >  	spin_unlock(&nfsd_notifier_lock);
> > > +	nfsd_update_nets();
> > >  
> > >  	/* check if the notifier still has clients */
> > >  	if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
> > > @@ -614,6 +671,8 @@ int nfsd_create_serv(struct net *net)
> > >  	nn->nfsd_serv = serv;
> > >  	spin_unlock(&nfsd_notifier_lock);
> > >  
> > > +	nfsd_update_nets();
> > > +
> > >  	set_max_drc();
> > >  	/* check if the notifier is already set */
> > >  	if (atomic_inc_return(&nfsd_notifier_refcount) == 1) {
> > > @@ -720,6 +779,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
> > >  			goto out;
> > >  	}
> > >  out:
> > > +	nfsd_update_nets();
> > >  	return err;
> > >  }
> > >  
> > > @@ -759,6 +819,7 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
> > >  	error = nfsd_set_nrthreads(n, nthreads, net);
> > >  	if (error)
> > >  		goto out_put;
> > > +	nfsd_update_nets();
> > >  	error = serv->sv_nrthreads;
> > >  out_put:
> > >  	if (serv->sv_nrthreads == 0)
> > > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > > index 77bbd23aa150..92b888e178e8 100644
> > > --- a/fs/nfsd/trace.h
> > > +++ b/fs/nfsd/trace.h
> > > @@ -2054,6 +2054,25 @@ TRACE_EVENT(nfsd_ctl_maxconn,
> > >  	)
> > >  );
> > >  
> > > +TRACE_EVENT(nfsd_ctl_maxthreads,
> > > +	TP_PROTO(
> > > +		const struct net *net,
> > > +		int maxthreads
> > > +	),
> > > +	TP_ARGS(net, maxthreads),
> > > +	TP_STRUCT__entry(
> > > +		__field(unsigned int, netns_ino)
> > > +		__field(int, maxthreads)
> > > +	),
> > > +	TP_fast_assign(
> > > +		__entry->netns_ino = net->ns.inum;
> > > +		__entry->maxthreads = maxthreads
> > > +	),
> > > +	TP_printk("maxthreads=%d",
> > > +		__entry->maxthreads
> > > +	)
> > > +);
> > > +
> > >  TRACE_EVENT(nfsd_ctl_time,
> > >  	TP_PROTO(
> > >  		const struct net *net,
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
  2024-07-16 11:00       ` Jeff Layton
@ 2024-07-16 13:31         ` Chuck Lever III
  2024-07-16 18:49           ` Tom Talpey
  0 siblings, 1 reply; 37+ messages in thread
From: Chuck Lever III @ 2024-07-16 13:31 UTC (permalink / raw)
  To: Jeff Layton, Neil Brown
  Cc: Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Steve Dickson



> On Jul 16, 2024, at 7:00 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Tue, 2024-07-16 at 13:21 +1000, NeilBrown wrote:
>> On Tue, 16 Jul 2024, Jeff Layton wrote:
>>> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
>>>> A future patch will allow the number of threads in each nfsd pool to
>>>> vary dynamically.
>>>> The lower bound will be the number explicit requested via
>>>> /proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads
>>>> 
>>>> The upper bound can be set in each net-namespace by writing
>>>> /proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
>>>> there is no per-pool upper limit.
>>>> 
>>>> If no upper bound is set, then one is calculated.  A global upper limit
>>>> is chosen based on amount of memory.  This limit only affects dynamic
>>>> changes. Static configuration can always over-ride it.
>>>> 
>>>> We track how many threads are configured in each net namespace, with the
>>>> max or the min.  We also track how many net namespaces have nfsd
>>>> configured with only a min, not a max.
>>>> 
>>>> The difference between the calculated max and the total allocation is
>>>> available to be shared among those namespaces which don't have a maximum
>>>> configured.  Within a namespace, the available share is distributed
>>>> equally across all pools.
>>>> 
>>>> In the common case there is one namespace and one pool.  A small number
>>>> of threads are configured as a minimum and no maximum is set.  In this
>>>> case the effective maximum will be directly based on total memory.
>>>> Approximately 8 per gigabyte.
>>>> 
>>> 
>>> 
>>> Some of this may come across as bikeshedding, but I'd probably prefer
>>> that this work a bit differently:
>>> 
>>> 1/ I don't think we should enable this universally -- at least not
>>> initially. What I'd prefer to see is a new pool_mode for the dynamic
>>> threadpools (maybe call it "dynamic"). That gives us a clear opt-in
>>> mechanism. Later once we're convinced it's safe, we can make "dynamic"
>>> the default instead of "global".
>>> 
>>> 2/ Rather than specifying a max_threads value separately, why not allow
>>> the old threads/pool_threads interface to set the max and just have a
>>> reasonable minimum setting (like the current default of 8). Since we're
>>> growing the threadpool dynamically, I don't see why we need to have a
>>> real configurable minimum.
>>> 
>>> 3/ the dynamic pool-mode should probably be layered on top of the
>>> pernode pool mode. IOW, in a NUMA configuration, we should split the
>>> threads across NUMA nodes.
>> 
>> Maybe we should start by discussing the goal.  What do we want
>> configuration to look like when we finish?
>> 
>> I think we want it to be transparent.  Sysadmin does nothing, and it all
>> works perfectly.  Or as close to that as we can get.
>> 
> 
> That's a nice eventual goal, but what do we do if we make this change
> and it's not behaving for them? We need some way for them to revert to
> traditional behavior if the new mode isn't working well.

As Steve pointed out (privately) there are likely to be cases
where the dynamic thread count adjustment creates too many
threads or somehow triggers a DoS. Admins want the ability to
disable new features that cause trouble, and it is impossible
for us to to say truthfully that we have predicted every
misbehavior.

So +1 for having a mechanism for getting back the traditional
behavior, at least until we have confidence it is not going
to have troubling side-effects.

Yes, in a perfect world, fully autonomous thread count
adjustment would be amazing. Let's aim for that, but take
baby steps to get there.

--
Chuck Lever



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
  2024-07-16 13:31         ` Chuck Lever III
@ 2024-07-16 18:49           ` Tom Talpey
  2024-07-17 15:24             ` Chuck Lever III
  0 siblings, 1 reply; 37+ messages in thread
From: Tom Talpey @ 2024-07-16 18:49 UTC (permalink / raw)
  To: Chuck Lever III, Jeff Layton, Neil Brown
  Cc: Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo, Steve Dickson

On 7/16/2024 9:31 AM, Chuck Lever III wrote:
> 
> 
>> On Jul 16, 2024, at 7:00 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>
>> On Tue, 2024-07-16 at 13:21 +1000, NeilBrown wrote:
>>> On Tue, 16 Jul 2024, Jeff Layton wrote:
>>>> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
>>>>> A future patch will allow the number of threads in each nfsd pool to
>>>>> vary dynamically.
>>>>> The lower bound will be the number explicit requested via
>>>>> /proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads
>>>>>
>>>>> The upper bound can be set in each net-namespace by writing
>>>>> /proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
>>>>> there is no per-pool upper limit.
>>>>>
>>>>> If no upper bound is set, then one is calculated.  A global upper limit
>>>>> is chosen based on amount of memory.  This limit only affects dynamic
>>>>> changes. Static configuration can always over-ride it.
>>>>>
>>>>> We track how many threads are configured in each net namespace, with the
>>>>> max or the min.  We also track how many net namespaces have nfsd
>>>>> configured with only a min, not a max.
>>>>>
>>>>> The difference between the calculated max and the total allocation is
>>>>> available to be shared among those namespaces which don't have a maximum
>>>>> configured.  Within a namespace, the available share is distributed
>>>>> equally across all pools.
>>>>>
>>>>> In the common case there is one namespace and one pool.  A small number
>>>>> of threads are configured as a minimum and no maximum is set.  In this
>>>>> case the effective maximum will be directly based on total memory.
>>>>> Approximately 8 per gigabyte.
>>>>>
>>>>
>>>>
>>>> Some of this may come across as bikeshedding, but I'd probably prefer
>>>> that this work a bit differently:
>>>>
>>>> 1/ I don't think we should enable this universally -- at least not
>>>> initially. What I'd prefer to see is a new pool_mode for the dynamic
>>>> threadpools (maybe call it "dynamic"). That gives us a clear opt-in
>>>> mechanism. Later once we're convinced it's safe, we can make "dynamic"
>>>> the default instead of "global".
>>>>
>>>> 2/ Rather than specifying a max_threads value separately, why not allow
>>>> the old threads/pool_threads interface to set the max and just have a
>>>> reasonable minimum setting (like the current default of 8). Since we're
>>>> growing the threadpool dynamically, I don't see why we need to have a
>>>> real configurable minimum.
>>>>
>>>> 3/ the dynamic pool-mode should probably be layered on top of the
>>>> pernode pool mode. IOW, in a NUMA configuration, we should split the
>>>> threads across NUMA nodes.
>>>
>>> Maybe we should start by discussing the goal.  What do we want
>>> configuration to look like when we finish?
>>>
>>> I think we want it to be transparent.  Sysadmin does nothing, and it all
>>> works perfectly.  Or as close to that as we can get.
>>>
>>
>> That's a nice eventual goal, but what do we do if we make this change
>> and it's not behaving for them? We need some way for them to revert to
>> traditional behavior if the new mode isn't working well.
> 
> As Steve pointed out (privately) there are likely to be cases
> where the dynamic thread count adjustment creates too many
> threads or somehow triggers a DoS. Admins want the ability to
> disable new features that cause trouble, and it is impossible
> for us to to say truthfully that we have predicted every
> misbehavior.
> 
> So +1 for having a mechanism for getting back the traditional
> behavior, at least until we have confidence it is not going
> to have troubling side-effects.

+1 on a configurable maximum as well, but I'll add a concern about
the NUMA node thing.

Not all CPU cores are created equal any more, there are "performance"
and "efficiency" (Atom) cores and there can be a big difference. Also
there are NUMA nodes with no CPUs at all, memory-only for example.
Then, CXL scrambles the topology again.

Let's not forget that these nfsd threads call into the filesystems,
which may desire very different NUMA affinities, for example the nfsd
protocol side may prefer to be near the network adapter, while the
filesystem side, the storage. And RDMA can bypass memory copy costs.

Thread count only addresses a fraction of these.

> Yes, in a perfect world, fully autonomous thread count
> adjustment would be amazing. Let's aim for that, but take
> baby steps to get there.

Amazing indeed, and just as unlikely to be perfect. Caution is good.

Tom.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
  2024-07-16 18:49           ` Tom Talpey
@ 2024-07-17 15:24             ` Chuck Lever III
  0 siblings, 0 replies; 37+ messages in thread
From: Chuck Lever III @ 2024-07-17 15:24 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Jeff Layton, Neil Brown, Linux NFS Mailing List,
	Olga Kornievskaia, Dai Ngo, Steve Dickson



> On Jul 16, 2024, at 2:49 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 7/16/2024 9:31 AM, Chuck Lever III wrote:
>>> On Jul 16, 2024, at 7:00 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> On Tue, 2024-07-16 at 13:21 +1000, NeilBrown wrote:
>>>> On Tue, 16 Jul 2024, Jeff Layton wrote:
>>>>> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
>>>>>> A future patch will allow the number of threads in each nfsd pool to
>>>>>> vary dynamically.
>>>>>> The lower bound will be the number explicit requested via
>>>>>> /proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads
>>>>>> 
>>>>>> The upper bound can be set in each net-namespace by writing
>>>>>> /proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
>>>>>> there is no per-pool upper limit.
>>>>>> 
>>>>>> If no upper bound is set, then one is calculated.  A global upper limit
>>>>>> is chosen based on amount of memory.  This limit only affects dynamic
>>>>>> changes. Static configuration can always over-ride it.
>>>>>> 
>>>>>> We track how many threads are configured in each net namespace, with the
>>>>>> max or the min.  We also track how many net namespaces have nfsd
>>>>>> configured with only a min, not a max.
>>>>>> 
>>>>>> The difference between the calculated max and the total allocation is
>>>>>> available to be shared among those namespaces which don't have a maximum
>>>>>> configured.  Within a namespace, the available share is distributed
>>>>>> equally across all pools.
>>>>>> 
>>>>>> In the common case there is one namespace and one pool.  A small number
>>>>>> of threads are configured as a minimum and no maximum is set.  In this
>>>>>> case the effective maximum will be directly based on total memory.
>>>>>> Approximately 8 per gigabyte.
>>>>>> 
>>>>> 
>>>>> 
>>>>> Some of this may come across as bikeshedding, but I'd probably prefer
>>>>> that this work a bit differently:
>>>>> 
>>>>> 1/ I don't think we should enable this universally -- at least not
>>>>> initially. What I'd prefer to see is a new pool_mode for the dynamic
>>>>> threadpools (maybe call it "dynamic"). That gives us a clear opt-in
>>>>> mechanism. Later once we're convinced it's safe, we can make "dynamic"
>>>>> the default instead of "global".
>>>>> 
>>>>> 2/ Rather than specifying a max_threads value separately, why not allow
>>>>> the old threads/pool_threads interface to set the max and just have a
>>>>> reasonable minimum setting (like the current default of 8). Since we're
>>>>> growing the threadpool dynamically, I don't see why we need to have a
>>>>> real configurable minimum.
>>>>> 
>>>>> 3/ the dynamic pool-mode should probably be layered on top of the
>>>>> pernode pool mode. IOW, in a NUMA configuration, we should split the
>>>>> threads across NUMA nodes.
>>>> 
>>>> Maybe we should start by discussing the goal.  What do we want
>>>> configuration to look like when we finish?
>>>> 
>>>> I think we want it to be transparent.  Sysadmin does nothing, and it all
>>>> works perfectly.  Or as close to that as we can get.
>>>> 
>>> 
>>> That's a nice eventual goal, but what do we do if we make this change
>>> and it's not behaving for them? We need some way for them to revert to
>>> traditional behavior if the new mode isn't working well.
>> As Steve pointed out (privately) there are likely to be cases
>> where the dynamic thread count adjustment creates too many
>> threads or somehow triggers a DoS. Admins want the ability to
>> disable new features that cause trouble, and it is impossible
>> for us to to say truthfully that we have predicted every
>> misbehavior.
>> So +1 for having a mechanism for getting back the traditional
>> behavior, at least until we have confidence it is not going
>> to have troubling side-effects.
> 
> +1 on a configurable maximum as well, but I'll add a concern about
> the NUMA node thing.
> 
> Not all CPU cores are created equal any more, there are "performance"
> and "efficiency" (Atom) cores and there can be a big difference. Also
> there are NUMA nodes with no CPUs at all, memory-only for example.
> Then, CXL scrambles the topology again.

I think it wouldn't be difficult to make the svc_pool_map skip
creating svc thread pools on NUMA nodes with no CPUs. And perhaps
the min/max settings need to be per pool?

But the idea with dynamic thread pool sizing is that if a pool
(or node) is not getting NFS traffic, then its thread pool will
not grow.


> Let's not forget that these nfsd threads call into the filesystems,
> which may desire very different NUMA affinities, for example the nfsd
> protocol side may prefer to be near the network adapter, while the
> filesystem side, the storage. And RDMA can bypass memory copy costs.

Agreed, these issues still require administrator attention when
configuring a high performance system.


> Thread count only addresses a fraction of these.
> 
>> Yes, in a perfect world, fully autonomous thread count
>> adjustment would be amazing. Let's aim for that, but take
>> baby steps to get there.
> 
> Amazing indeed, and just as unlikely to be perfect. Caution is good.
> 
> Tom.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int.
  2024-07-16  1:33     ` NeilBrown
@ 2024-07-24 19:36       ` Chuck Lever
  0 siblings, 0 replies; 37+ messages in thread
From: Chuck Lever @ 2024-07-24 19:36 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Steve Dickson

On Tue, Jul 16, 2024 at 11:33:40AM +1000, NeilBrown wrote:
> On Tue, 16 Jul 2024, Jeff Layton wrote:
> > On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
> > > sp_nrthreads is only ever accessed under the service mutex
> > >   nlmsvc_mutex nfs_callback_mutex nfsd_mutex
> > > so these is no need for it to be an atomic_t.
> > > 
> > > The fact that all code using it is single-threaded means that we can
> > > simplify svc_pool_victim and remove the temporary elevation of
> > > sp_nrthreads.
> > > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > >  fs/nfsd/nfsctl.c           |  2 +-
> > >  fs/nfsd/nfssvc.c           |  2 +-
> > >  include/linux/sunrpc/svc.h |  4 ++--
> > >  net/sunrpc/svc.c           | 31 +++++++++++--------------------
> > >  4 files changed, 15 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index 5b0f2e0d7ccf..d85b6d1fa31f 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > > @@ -1769,7 +1769,7 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb, struct genl_info *info)
> > >  			struct svc_pool *sp = &nn->nfsd_serv->sv_pools[i];
> > >  
> > >  			err = nla_put_u32(skb, NFSD_A_SERVER_THREADS,
> > > -					  atomic_read(&sp->sp_nrthreads));
> > > +					  sp->sp_nrthreads);
> > >  			if (err)
> > >  				goto err_unlock;
> > >  		}
> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index 4438cdcd4873..7377422a34df 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -641,7 +641,7 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net)
> > >  
> > >  	if (serv)
> > >  		for (i = 0; i < serv->sv_nrpools && i < n; i++)
> > > -			nthreads[i] = atomic_read(&serv->sv_pools[i].sp_nrthreads);
> > > +			nthreads[i] = serv->sv_pools[i].sp_nrthreads;
> > >  	return 0;
> > >  }
> > >  
> > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > > index e4fa25fafa97..99e9345d829e 100644
> > > --- a/include/linux/sunrpc/svc.h
> > > +++ b/include/linux/sunrpc/svc.h
> > > @@ -33,9 +33,9 @@
> > >   * node traffic on multi-node NUMA NFS servers.
> > >   */
> > >  struct svc_pool {
> > > -	unsigned int		sp_id;	    	/* pool id; also node id on NUMA */
> > > +	unsigned int		sp_id;		/* pool id; also node id on NUMA */
> > >  	struct lwq		sp_xprts;	/* pending transports */
> > > -	atomic_t		sp_nrthreads;	/* # of threads in pool */
> > > +	unsigned int		sp_nrthreads;	/* # of threads in pool */
> > >  	struct list_head	sp_all_threads;	/* all server threads */
> > >  	struct llist_head	sp_idle_threads; /* idle server threads */
> > >  
> > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > > index 072ad115ae3d..0d8588bc693c 100644
> > > --- a/net/sunrpc/svc.c
> > > +++ b/net/sunrpc/svc.c
> > > @@ -725,7 +725,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
> > >  	serv->sv_nrthreads += 1;
> > >  	spin_unlock_bh(&serv->sv_lock);
> > >  
> > > -	atomic_inc(&pool->sp_nrthreads);
> > > +	pool->sp_nrthreads += 1;
> > >  
> > >  	/* Protected by whatever lock the service uses when calling
> > >  	 * svc_set_num_threads()
> > > @@ -780,31 +780,22 @@ svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
> > >  	struct svc_pool *pool;
> > >  	unsigned int i;
> > >  
> > > -retry:
> > >  	pool = target_pool;
> > >  
> > > -	if (pool != NULL) {
> > > -		if (atomic_inc_not_zero(&pool->sp_nrthreads))
> > > -			goto found_pool;
> > > -		return NULL;
> > > -	} else {
> > > +	if (!pool) {
> > >  		for (i = 0; i < serv->sv_nrpools; i++) {
> > >  			pool = &serv->sv_pools[--(*state) % serv->sv_nrpools];
> > > -			if (atomic_inc_not_zero(&pool->sp_nrthreads))
> > > -				goto found_pool;
> > > +			if (pool->sp_nrthreads)
> > > +				break;
> > >  		}
> > > -		return NULL;
> > >  	}
> > >  
> > > -found_pool:
> > > -	set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > > -	set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> > > -	if (!atomic_dec_and_test(&pool->sp_nrthreads))
> > > +	if (pool && pool->sp_nrthreads) {
> > > +		set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > > +		set_bit(SP_NEED_VICTIM, &pool->sp_flags);
> > >  		return pool;
> > > -	/* Nothing left in this pool any more */
> > > -	clear_bit(SP_NEED_VICTIM, &pool->sp_flags);
> > > -	clear_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
> > > -	goto retry;
> > > +	}
> > > +	return NULL;
> > >  }
> > >  
> > >  static int
> > > @@ -883,7 +874,7 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> > >  	if (!pool)
> > >  		nrservs -= serv->sv_nrthreads;
> > >  	else
> > > -		nrservs -= atomic_read(&pool->sp_nrthreads);
> > > +		nrservs -= pool->sp_nrthreads;
> > >  
> > >  	if (nrservs > 0)
> > >  		return svc_start_kthreads(serv, pool, nrservs);
> > > @@ -953,7 +944,7 @@ svc_exit_thread(struct svc_rqst *rqstp)
> > >  
> > >  	list_del_rcu(&rqstp->rq_all);
> > >  
> > > -	atomic_dec(&pool->sp_nrthreads);
> > > +	pool->sp_nrthreads -= 1;
> > >  
> > >  	spin_lock_bh(&serv->sv_lock);
> > >  	serv->sv_nrthreads -= 1;
> > 
> > I don't think svc_exit_thread is called with the nfsd_mutex held, so if
> > several threads were exiting at the same time, they could race here.
> 
> This is subtle and deserves explanation in the commit.

Hi Neil, assuming you mean "commit message" here, are you planning
to resend 5/14 with this update?


> svc_exit_thread() is called in a thread *after* svc_thread_should_stop()
> has returned true.  That means RQ_VICTIM is set and most likely
> SP_NEED_VICTIM was set
> 
> SP_NEED_VICTIM is set in svc_pool_victim() which is called from
> svc_stop_kthreads() which requires that the mutex is held.
> svc_stop_kthreads() waits for SP_VICTIM_REMAINS to be cleared which is
> the last thing that svc_exit_thread() does.
> So when svc_exit_thread() is called, the mutex is held by some other
> thread that is calling svc_set_num_threads().
> 
> This is also why the list_del_rcu() in svc_exit_thread() is safe.
> 
> The case there svc_exit_thread() is called but SP_NEED_VICTIM wasn't set
> (only RQ_VICTIM) is in the ETIMEDOUT case of nfsd(), in which case
> nfsd() ensures that the mutex is held.
> 
> This was why
>  [PATCH 07/14] Change unshare_fs_struct() to never fail.
> was needed.  If that fails in the current code, svc_exit_thread() can be
> called without the mutex - which is already a theoretical problem for
> the list_del_rcu().
> 
> Thanks,
> NeilBrown

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/14 RFC] support automatic changes to nfsd thread count
  2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
                   ` (14 preceding siblings ...)
  2024-07-15 17:29 ` [PATCH 00/14 RFC] support automatic changes to nfsd thread count Jeff Layton
@ 2024-07-24 19:43 ` Chuck Lever III
  2024-07-24 21:25   ` NeilBrown
  15 siblings, 1 reply; 37+ messages in thread
From: Chuck Lever III @ 2024-07-24 19:43 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jeff Layton, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Steve Dickson

> On Jul 15, 2024, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> 
> This patch set (against nfsd-next) enables automatic adjustment of the
> number of nfsd threads.  The number can increase under high load, and
> reduce after idle periods.
> 
> The first few patches (1-6) are cleanups that may not be entirely
> relevant to the current series.  They could safely land any time and
> only need minimal review.

I'm trying to get moving on this series. So, I've reviewed 1-6,
with one minor comment (posted previously). If you plan to
repost 5/14, let me know, or you can send me a set of edits for
its patch description and I can apply what's already been posted
to nfsd-next now.

I stopped at 7/14 because we should resolve whether to continue
adding NOFAIL in new code. My impression, from attending the
various LSF sessions on this topic, was that community consensus
is "NOFAIL is NO BUENO". If we feel the community discussion is
ongoing rather than concluded, then we'll have to sort this out
ourselves.

--
Chuck Lever

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/14 RFC] support automatic changes to nfsd thread count
  2024-07-24 19:43 ` Chuck Lever III
@ 2024-07-24 21:25   ` NeilBrown
  0 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-07-24 21:25 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Jeff Layton, Linux NFS Mailing List, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Steve Dickson

On Thu, 25 Jul 2024, Chuck Lever III wrote:
> 
> 
> > On Jul 15, 2024, at 3:14 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > This patch set (against nfsd-next) enables automatic adjustment of the
> > number of nfsd threads.  The number can increase under high load, and
> > reduce after idle periods.
> > 
> > The first few patches (1-6) are cleanups that may not be entirely
> > relevant to the current series.  They could safely land any time and
> > only need minimal review.
> 
> I'm trying to get moving on this series. So, I've reviewed 1-6,
> with one minor comment (posted previously). If you plan to
> repost 5/14, let me know, or you can send me a set of edits for
> its patch description and I can apply what's already been posted
> to nfsd-next now.

I do have plans - for code comments as well as commit comments.  I'll
try to send something on Friday.

> 
> I stopped at 7/14 because we should resolve whether to continue
> adding NOFAIL in new code. My impression, from attending the
> various LSF sessions on this topic, was that community consensus
> is "NOFAIL is NO BUENO". If we feel the community discussion is
> ongoing rather than concluded, then we'll have to sort this out
> ourselves.

I will post that NOFAIL patch more broadly - including fs-devel and
linux-mm and see if any consensus emerges.

Thanks,
NeilBrown

> 
> 
> --
> Chuck Lever
> 
> 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/14] nfsd: don't allocate the versions array.
  2024-07-15  7:14 ` [PATCH 04/14] nfsd: don't allocate the versions array NeilBrown
@ 2024-08-02 21:34   ` Mike Snitzer
  2024-08-02 23:04     ` NeilBrown
  0 siblings, 1 reply; 37+ messages in thread
From: Mike Snitzer @ 2024-08-02 21:34 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever, Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Steve Dickson

On Mon, Jul 15, 2024 at 05:14:17PM +1000, NeilBrown wrote:
> Instead of using kmalloc to allocate an array for storing active version
> info, just declare and array to the max size - it is only 5 or so.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfs/nfs4state.c |   2 +
>  fs/nfsd/cache.h    |   2 +-
>  fs/nfsd/netns.h    |   6 +--
>  fs/nfsd/nfsctl.c   |  10 +++--
>  fs/nfsd/nfsd.h     |   9 +++-
>  fs/nfsd/nfssvc.c   | 100 ++++++++-------------------------------------
>  6 files changed, 36 insertions(+), 93 deletions(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5b452411e8fd..68c663626480 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1953,6 +1953,8 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
>  				if (lost_locks)
>  					pr_warn("NFS: %s: lost %d locks\n",
>  						clp->cl_hostname, lost_locks);
> +				nfs4_free_state_owners(&freeme);
> +
>  				set_bit(ops->owner_flag_bit, &sp->so_flags);
>  				nfs4_put_state_owner(sp);
>  				status = nfs4_recovery_handle_error(clp, status);

Hey Neil,

This call to nfs4_free_state_owners() feels out of place given the
rest of this patch.  Was it meant to be folded into a different
patch?

Thanks,
Mike


> diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> index 66a05fefae98..bb7addef4a31 100644
> --- a/fs/nfsd/cache.h
> +++ b/fs/nfsd/cache.h
> @@ -10,7 +10,7 @@
>  #define NFSCACHE_H
>  
>  #include <linux/sunrpc/svc.h>
> -#include "netns.h"
> +#include "nfsd.h"
>  
>  /*
>   * Representation of a reply cache entry.
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 14ec15656320..238fc4e56e53 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -152,8 +152,8 @@ struct nfsd_net {
>  	/*
>  	 * Version information
>  	 */
> -	bool *nfsd_versions;
> -	bool *nfsd4_minorversions;
> +	bool nfsd_versions[NFSD_MAXVERS + 1];
> +	bool nfsd4_minorversions[NFSD_SUPPORTED_MINOR_VERSION + 1];
>  
>  	/*
>  	 * Duplicate reply cache
> @@ -219,8 +219,6 @@ struct nfsd_net {
>  #define nfsd_netns_ready(nn) ((nn)->sessionid_hashtbl)
>  
>  extern bool nfsd_support_version(int vers);
> -extern void nfsd_netns_free_versions(struct nfsd_net *nn);
> -
>  extern unsigned int nfsd_net_id;
>  
>  void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 9b47723fc110..5b0f2e0d7ccf 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -2232,8 +2232,9 @@ int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info)
>   */
>  static __net_init int nfsd_net_init(struct net *net)
>  {
> -	int retval;
>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +	int retval;
> +	int i;
>  
>  	retval = nfsd_export_init(net);
>  	if (retval)
> @@ -2247,8 +2248,10 @@ static __net_init int nfsd_net_init(struct net *net)
>  		goto out_repcache_error;
>  	memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
>  	nn->nfsd_svcstats.program = &nfsd_program;
> -	nn->nfsd_versions = NULL;
> -	nn->nfsd4_minorversions = NULL;
> +	for (i = 0; i < sizeof(nn->nfsd_versions); i++)
> +		nn->nfsd_versions[i] = nfsd_support_version(i);
> +	for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++)
> +		nn->nfsd4_minorversions[i] = nfsd_support_version(4);
>  	nn->nfsd_info.mutex = &nfsd_mutex;
>  	nn->nfsd_serv = NULL;
>  	nfsd4_init_leases_net(nn);
> @@ -2279,7 +2282,6 @@ static __net_exit void nfsd_net_exit(struct net *net)
>  	percpu_counter_destroy_many(nn->counter, NFSD_STATS_COUNTERS_NUM);
>  	nfsd_idmap_shutdown(net);
>  	nfsd_export_shutdown(net);
> -	nfsd_netns_free_versions(nn);
>  }
>  
>  static struct pernet_operations nfsd_net_ops = {
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 39e109a7d56d..369c3b3ce53e 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -23,9 +23,7 @@
>  
>  #include <uapi/linux/nfsd/debug.h>
>  
> -#include "netns.h"
>  #include "export.h"
> -#include "stats.h"
>  
>  #undef ifdebug
>  #ifdef CONFIG_SUNRPC_DEBUG
> @@ -37,7 +35,14 @@
>  /*
>   * nfsd version
>   */
> +#define NFSD_MINVERS			2
> +#define	NFSD_MAXVERS			4
>  #define NFSD_SUPPORTED_MINOR_VERSION	2
> +bool nfsd_support_version(int vers);
> +
> +#include "netns.h"
> +#include "stats.h"
> +
>  /*
>   * Maximum blocksizes supported by daemon under various circumstances.
>   */
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index f25b26bc5670..4438cdcd4873 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -116,15 +116,12 @@ static const struct svc_version *nfsd_version[] = {
>  #endif
>  };
>  
> -#define NFSD_MINVERS    	2
> -#define NFSD_NRVERS		ARRAY_SIZE(nfsd_version)
> -
>  struct svc_program		nfsd_program = {
>  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
>  	.pg_next		= &nfsd_acl_program,
>  #endif
>  	.pg_prog		= NFS_PROGRAM,		/* program number */
> -	.pg_nvers		= NFSD_NRVERS,		/* nr of entries in nfsd_version */
> +	.pg_nvers		= NFSD_MAXVERS+1,	/* nr of entries in nfsd_version */
>  	.pg_vers		= nfsd_version,		/* version table */
>  	.pg_name		= "nfsd",		/* program name */
>  	.pg_class		= "nfsd",		/* authentication class */
> @@ -135,78 +132,24 @@ struct svc_program		nfsd_program = {
>  
>  bool nfsd_support_version(int vers)
>  {
> -	if (vers >= NFSD_MINVERS && vers < NFSD_NRVERS)
> +	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
>  		return nfsd_version[vers] != NULL;
>  	return false;
>  }
>  
> -static bool *
> -nfsd_alloc_versions(void)
> -{
> -	bool *vers = kmalloc_array(NFSD_NRVERS, sizeof(bool), GFP_KERNEL);
> -	unsigned i;
> -
> -	if (vers) {
> -		/* All compiled versions are enabled by default */
> -		for (i = 0; i < NFSD_NRVERS; i++)
> -			vers[i] = nfsd_support_version(i);
> -	}
> -	return vers;
> -}
> -
> -static bool *
> -nfsd_alloc_minorversions(void)
> -{
> -	bool *vers = kmalloc_array(NFSD_SUPPORTED_MINOR_VERSION + 1,
> -			sizeof(bool), GFP_KERNEL);
> -	unsigned i;
> -
> -	if (vers) {
> -		/* All minor versions are enabled by default */
> -		for (i = 0; i <= NFSD_SUPPORTED_MINOR_VERSION; i++)
> -			vers[i] = nfsd_support_version(4);
> -	}
> -	return vers;
> -}
> -
> -void
> -nfsd_netns_free_versions(struct nfsd_net *nn)
> -{
> -	kfree(nn->nfsd_versions);
> -	kfree(nn->nfsd4_minorversions);
> -	nn->nfsd_versions = NULL;
> -	nn->nfsd4_minorversions = NULL;
> -}
> -
> -static void
> -nfsd_netns_init_versions(struct nfsd_net *nn)
> -{
> -	if (!nn->nfsd_versions) {
> -		nn->nfsd_versions = nfsd_alloc_versions();
> -		nn->nfsd4_minorversions = nfsd_alloc_minorversions();
> -		if (!nn->nfsd_versions || !nn->nfsd4_minorversions)
> -			nfsd_netns_free_versions(nn);
> -	}
> -}
> -
>  int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change)
>  {
> -	if (vers < NFSD_MINVERS || vers >= NFSD_NRVERS)
> +	if (vers < NFSD_MINVERS || vers > NFSD_MAXVERS)
>  		return 0;
>  	switch(change) {
>  	case NFSD_SET:
> -		if (nn->nfsd_versions)
> -			nn->nfsd_versions[vers] = nfsd_support_version(vers);
> +		nn->nfsd_versions[vers] = nfsd_support_version(vers);
>  		break;
>  	case NFSD_CLEAR:
> -		nfsd_netns_init_versions(nn);
> -		if (nn->nfsd_versions)
> -			nn->nfsd_versions[vers] = false;
> +		nn->nfsd_versions[vers] = false;
>  		break;
>  	case NFSD_TEST:
> -		if (nn->nfsd_versions)
> -			return nn->nfsd_versions[vers];
> -		fallthrough;
> +		return nn->nfsd_versions[vers];
>  	case NFSD_AVAIL:
>  		return nfsd_support_version(vers);
>  	}
> @@ -233,23 +176,16 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
>  
>  	switch(change) {
>  	case NFSD_SET:
> -		if (nn->nfsd4_minorversions) {
> -			nfsd_vers(nn, 4, NFSD_SET);
> -			nn->nfsd4_minorversions[minorversion] =
> -				nfsd_vers(nn, 4, NFSD_TEST);
> -		}
> +		nfsd_vers(nn, 4, NFSD_SET);
> +		nn->nfsd4_minorversions[minorversion] =
> +			nfsd_vers(nn, 4, NFSD_TEST);
>  		break;
>  	case NFSD_CLEAR:
> -		nfsd_netns_init_versions(nn);
> -		if (nn->nfsd4_minorversions) {
> -			nn->nfsd4_minorversions[minorversion] = false;
> -			nfsd_adjust_nfsd_versions4(nn);
> -		}
> +		nn->nfsd4_minorversions[minorversion] = false;
> +		nfsd_adjust_nfsd_versions4(nn);
>  		break;
>  	case NFSD_TEST:
> -		if (nn->nfsd4_minorversions)
> -			return nn->nfsd4_minorversions[minorversion];
> -		return nfsd_vers(nn, 4, NFSD_TEST);
> +		return nn->nfsd4_minorversions[minorversion];
>  	case NFSD_AVAIL:
>  		return minorversion <= NFSD_SUPPORTED_MINOR_VERSION &&
>  			nfsd_vers(nn, 4, NFSD_AVAIL);
> @@ -568,11 +504,11 @@ void nfsd_reset_versions(struct nfsd_net *nn)
>  {
>  	int i;
>  
> -	for (i = 0; i < NFSD_NRVERS; i++)
> +	for (i = 0; i <= NFSD_MAXVERS; i++)
>  		if (nfsd_vers(nn, i, NFSD_TEST))
>  			return;
>  
> -	for (i = 0; i < NFSD_NRVERS; i++)
> +	for (i = 0; i <= NFSD_MAXVERS; i++)
>  		if (i != 4)
>  			nfsd_vers(nn, i, NFSD_SET);
>  		else {
> @@ -905,17 +841,17 @@ nfsd_init_request(struct svc_rqst *rqstp,
>  	if (likely(nfsd_vers(nn, rqstp->rq_vers, NFSD_TEST)))
>  		return svc_generic_init_request(rqstp, progp, ret);
>  
> -	ret->mismatch.lovers = NFSD_NRVERS;
> -	for (i = NFSD_MINVERS; i < NFSD_NRVERS; i++) {
> +	ret->mismatch.lovers = NFSD_MAXVERS + 1;
> +	for (i = NFSD_MINVERS; i <= NFSD_MAXVERS; i++) {
>  		if (nfsd_vers(nn, i, NFSD_TEST)) {
>  			ret->mismatch.lovers = i;
>  			break;
>  		}
>  	}
> -	if (ret->mismatch.lovers == NFSD_NRVERS)
> +	if (ret->mismatch.lovers > NFSD_MAXVERS)
>  		return rpc_prog_unavail;
>  	ret->mismatch.hivers = NFSD_MINVERS;
> -	for (i = NFSD_NRVERS - 1; i >= NFSD_MINVERS; i--) {
> +	for (i = NFSD_MAXVERS; i >= NFSD_MINVERS; i--) {
>  		if (nfsd_vers(nn, i, NFSD_TEST)) {
>  			ret->mismatch.hivers = i;
>  			break;
> -- 
> 2.44.0
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/14] nfsd: don't allocate the versions array.
  2024-08-02 21:34   ` Mike Snitzer
@ 2024-08-02 23:04     ` NeilBrown
  2024-08-05  4:55       ` NeilBrown
  0 siblings, 1 reply; 37+ messages in thread
From: NeilBrown @ 2024-08-02 23:04 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Chuck Lever, Jeff Layton, linux-nfs, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Steve Dickson

On Sat, 03 Aug 2024, Mike Snitzer wrote:
> On Mon, Jul 15, 2024 at 05:14:17PM +1000, NeilBrown wrote:
> > Instead of using kmalloc to allocate an array for storing active version
> > info, just declare and array to the max size - it is only 5 or so.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfs/nfs4state.c |   2 +
> >  fs/nfsd/cache.h    |   2 +-
> >  fs/nfsd/netns.h    |   6 +--
> >  fs/nfsd/nfsctl.c   |  10 +++--
> >  fs/nfsd/nfsd.h     |   9 +++-
> >  fs/nfsd/nfssvc.c   | 100 ++++++++-------------------------------------
> >  6 files changed, 36 insertions(+), 93 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 5b452411e8fd..68c663626480 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -1953,6 +1953,8 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
> >  				if (lost_locks)
> >  					pr_warn("NFS: %s: lost %d locks\n",
> >  						clp->cl_hostname, lost_locks);
> > +				nfs4_free_state_owners(&freeme);
> > +
> >  				set_bit(ops->owner_flag_bit, &sp->so_flags);
> >  				nfs4_put_state_owner(sp);
> >  				status = nfs4_recovery_handle_error(clp, status);
> 
> Hey Neil,
> 
> This call to nfs4_free_state_owners() feels out of place given the
> rest of this patch.  Was it meant to be folded into a different
> patch?

I think I was writing a different patch (there is a case where the
lost-locks warning can be incorrect) and I got half way and stopped for
some reason.  Then later I wanted to do this patch and did "git stash"
but I hadn't saved all my work.  So when I later did a "save all
buffers" that change went into the wrong patch.

I'll follow up on Monday - thanks for noticing and letting me know!

NeilBrown


> 
> Thanks,
> Mike
> 
> 
> > diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> > index 66a05fefae98..bb7addef4a31 100644
> > --- a/fs/nfsd/cache.h
> > +++ b/fs/nfsd/cache.h
> > @@ -10,7 +10,7 @@
> >  #define NFSCACHE_H
> >  
> >  #include <linux/sunrpc/svc.h>
> > -#include "netns.h"
> > +#include "nfsd.h"
> >  
> >  /*
> >   * Representation of a reply cache entry.
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index 14ec15656320..238fc4e56e53 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -152,8 +152,8 @@ struct nfsd_net {
> >  	/*
> >  	 * Version information
> >  	 */
> > -	bool *nfsd_versions;
> > -	bool *nfsd4_minorversions;
> > +	bool nfsd_versions[NFSD_MAXVERS + 1];
> > +	bool nfsd4_minorversions[NFSD_SUPPORTED_MINOR_VERSION + 1];
> >  
> >  	/*
> >  	 * Duplicate reply cache
> > @@ -219,8 +219,6 @@ struct nfsd_net {
> >  #define nfsd_netns_ready(nn) ((nn)->sessionid_hashtbl)
> >  
> >  extern bool nfsd_support_version(int vers);
> > -extern void nfsd_netns_free_versions(struct nfsd_net *nn);
> > -
> >  extern unsigned int nfsd_net_id;
> >  
> >  void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 9b47723fc110..5b0f2e0d7ccf 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -2232,8 +2232,9 @@ int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info)
> >   */
> >  static __net_init int nfsd_net_init(struct net *net)
> >  {
> > -	int retval;
> >  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > +	int retval;
> > +	int i;
> >  
> >  	retval = nfsd_export_init(net);
> >  	if (retval)
> > @@ -2247,8 +2248,10 @@ static __net_init int nfsd_net_init(struct net *net)
> >  		goto out_repcache_error;
> >  	memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
> >  	nn->nfsd_svcstats.program = &nfsd_program;
> > -	nn->nfsd_versions = NULL;
> > -	nn->nfsd4_minorversions = NULL;
> > +	for (i = 0; i < sizeof(nn->nfsd_versions); i++)
> > +		nn->nfsd_versions[i] = nfsd_support_version(i);
> > +	for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++)
> > +		nn->nfsd4_minorversions[i] = nfsd_support_version(4);
> >  	nn->nfsd_info.mutex = &nfsd_mutex;
> >  	nn->nfsd_serv = NULL;
> >  	nfsd4_init_leases_net(nn);
> > @@ -2279,7 +2282,6 @@ static __net_exit void nfsd_net_exit(struct net *net)
> >  	percpu_counter_destroy_many(nn->counter, NFSD_STATS_COUNTERS_NUM);
> >  	nfsd_idmap_shutdown(net);
> >  	nfsd_export_shutdown(net);
> > -	nfsd_netns_free_versions(nn);
> >  }
> >  
> >  static struct pernet_operations nfsd_net_ops = {
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index 39e109a7d56d..369c3b3ce53e 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -23,9 +23,7 @@
> >  
> >  #include <uapi/linux/nfsd/debug.h>
> >  
> > -#include "netns.h"
> >  #include "export.h"
> > -#include "stats.h"
> >  
> >  #undef ifdebug
> >  #ifdef CONFIG_SUNRPC_DEBUG
> > @@ -37,7 +35,14 @@
> >  /*
> >   * nfsd version
> >   */
> > +#define NFSD_MINVERS			2
> > +#define	NFSD_MAXVERS			4
> >  #define NFSD_SUPPORTED_MINOR_VERSION	2
> > +bool nfsd_support_version(int vers);
> > +
> > +#include "netns.h"
> > +#include "stats.h"
> > +
> >  /*
> >   * Maximum blocksizes supported by daemon under various circumstances.
> >   */
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index f25b26bc5670..4438cdcd4873 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -116,15 +116,12 @@ static const struct svc_version *nfsd_version[] = {
> >  #endif
> >  };
> >  
> > -#define NFSD_MINVERS    	2
> > -#define NFSD_NRVERS		ARRAY_SIZE(nfsd_version)
> > -
> >  struct svc_program		nfsd_program = {
> >  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> >  	.pg_next		= &nfsd_acl_program,
> >  #endif
> >  	.pg_prog		= NFS_PROGRAM,		/* program number */
> > -	.pg_nvers		= NFSD_NRVERS,		/* nr of entries in nfsd_version */
> > +	.pg_nvers		= NFSD_MAXVERS+1,	/* nr of entries in nfsd_version */
> >  	.pg_vers		= nfsd_version,		/* version table */
> >  	.pg_name		= "nfsd",		/* program name */
> >  	.pg_class		= "nfsd",		/* authentication class */
> > @@ -135,78 +132,24 @@ struct svc_program		nfsd_program = {
> >  
> >  bool nfsd_support_version(int vers)
> >  {
> > -	if (vers >= NFSD_MINVERS && vers < NFSD_NRVERS)
> > +	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
> >  		return nfsd_version[vers] != NULL;
> >  	return false;
> >  }
> >  
> > -static bool *
> > -nfsd_alloc_versions(void)
> > -{
> > -	bool *vers = kmalloc_array(NFSD_NRVERS, sizeof(bool), GFP_KERNEL);
> > -	unsigned i;
> > -
> > -	if (vers) {
> > -		/* All compiled versions are enabled by default */
> > -		for (i = 0; i < NFSD_NRVERS; i++)
> > -			vers[i] = nfsd_support_version(i);
> > -	}
> > -	return vers;
> > -}
> > -
> > -static bool *
> > -nfsd_alloc_minorversions(void)
> > -{
> > -	bool *vers = kmalloc_array(NFSD_SUPPORTED_MINOR_VERSION + 1,
> > -			sizeof(bool), GFP_KERNEL);
> > -	unsigned i;
> > -
> > -	if (vers) {
> > -		/* All minor versions are enabled by default */
> > -		for (i = 0; i <= NFSD_SUPPORTED_MINOR_VERSION; i++)
> > -			vers[i] = nfsd_support_version(4);
> > -	}
> > -	return vers;
> > -}
> > -
> > -void
> > -nfsd_netns_free_versions(struct nfsd_net *nn)
> > -{
> > -	kfree(nn->nfsd_versions);
> > -	kfree(nn->nfsd4_minorversions);
> > -	nn->nfsd_versions = NULL;
> > -	nn->nfsd4_minorversions = NULL;
> > -}
> > -
> > -static void
> > -nfsd_netns_init_versions(struct nfsd_net *nn)
> > -{
> > -	if (!nn->nfsd_versions) {
> > -		nn->nfsd_versions = nfsd_alloc_versions();
> > -		nn->nfsd4_minorversions = nfsd_alloc_minorversions();
> > -		if (!nn->nfsd_versions || !nn->nfsd4_minorversions)
> > -			nfsd_netns_free_versions(nn);
> > -	}
> > -}
> > -
> >  int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change)
> >  {
> > -	if (vers < NFSD_MINVERS || vers >= NFSD_NRVERS)
> > +	if (vers < NFSD_MINVERS || vers > NFSD_MAXVERS)
> >  		return 0;
> >  	switch(change) {
> >  	case NFSD_SET:
> > -		if (nn->nfsd_versions)
> > -			nn->nfsd_versions[vers] = nfsd_support_version(vers);
> > +		nn->nfsd_versions[vers] = nfsd_support_version(vers);
> >  		break;
> >  	case NFSD_CLEAR:
> > -		nfsd_netns_init_versions(nn);
> > -		if (nn->nfsd_versions)
> > -			nn->nfsd_versions[vers] = false;
> > +		nn->nfsd_versions[vers] = false;
> >  		break;
> >  	case NFSD_TEST:
> > -		if (nn->nfsd_versions)
> > -			return nn->nfsd_versions[vers];
> > -		fallthrough;
> > +		return nn->nfsd_versions[vers];
> >  	case NFSD_AVAIL:
> >  		return nfsd_support_version(vers);
> >  	}
> > @@ -233,23 +176,16 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
> >  
> >  	switch(change) {
> >  	case NFSD_SET:
> > -		if (nn->nfsd4_minorversions) {
> > -			nfsd_vers(nn, 4, NFSD_SET);
> > -			nn->nfsd4_minorversions[minorversion] =
> > -				nfsd_vers(nn, 4, NFSD_TEST);
> > -		}
> > +		nfsd_vers(nn, 4, NFSD_SET);
> > +		nn->nfsd4_minorversions[minorversion] =
> > +			nfsd_vers(nn, 4, NFSD_TEST);
> >  		break;
> >  	case NFSD_CLEAR:
> > -		nfsd_netns_init_versions(nn);
> > -		if (nn->nfsd4_minorversions) {
> > -			nn->nfsd4_minorversions[minorversion] = false;
> > -			nfsd_adjust_nfsd_versions4(nn);
> > -		}
> > +		nn->nfsd4_minorversions[minorversion] = false;
> > +		nfsd_adjust_nfsd_versions4(nn);
> >  		break;
> >  	case NFSD_TEST:
> > -		if (nn->nfsd4_minorversions)
> > -			return nn->nfsd4_minorversions[minorversion];
> > -		return nfsd_vers(nn, 4, NFSD_TEST);
> > +		return nn->nfsd4_minorversions[minorversion];
> >  	case NFSD_AVAIL:
> >  		return minorversion <= NFSD_SUPPORTED_MINOR_VERSION &&
> >  			nfsd_vers(nn, 4, NFSD_AVAIL);
> > @@ -568,11 +504,11 @@ void nfsd_reset_versions(struct nfsd_net *nn)
> >  {
> >  	int i;
> >  
> > -	for (i = 0; i < NFSD_NRVERS; i++)
> > +	for (i = 0; i <= NFSD_MAXVERS; i++)
> >  		if (nfsd_vers(nn, i, NFSD_TEST))
> >  			return;
> >  
> > -	for (i = 0; i < NFSD_NRVERS; i++)
> > +	for (i = 0; i <= NFSD_MAXVERS; i++)
> >  		if (i != 4)
> >  			nfsd_vers(nn, i, NFSD_SET);
> >  		else {
> > @@ -905,17 +841,17 @@ nfsd_init_request(struct svc_rqst *rqstp,
> >  	if (likely(nfsd_vers(nn, rqstp->rq_vers, NFSD_TEST)))
> >  		return svc_generic_init_request(rqstp, progp, ret);
> >  
> > -	ret->mismatch.lovers = NFSD_NRVERS;
> > -	for (i = NFSD_MINVERS; i < NFSD_NRVERS; i++) {
> > +	ret->mismatch.lovers = NFSD_MAXVERS + 1;
> > +	for (i = NFSD_MINVERS; i <= NFSD_MAXVERS; i++) {
> >  		if (nfsd_vers(nn, i, NFSD_TEST)) {
> >  			ret->mismatch.lovers = i;
> >  			break;
> >  		}
> >  	}
> > -	if (ret->mismatch.lovers == NFSD_NRVERS)
> > +	if (ret->mismatch.lovers > NFSD_MAXVERS)
> >  		return rpc_prog_unavail;
> >  	ret->mismatch.hivers = NFSD_MINVERS;
> > -	for (i = NFSD_NRVERS - 1; i >= NFSD_MINVERS; i--) {
> > +	for (i = NFSD_MAXVERS; i >= NFSD_MINVERS; i--) {
> >  		if (nfsd_vers(nn, i, NFSD_TEST)) {
> >  			ret->mismatch.hivers = i;
> >  			break;
> > -- 
> > 2.44.0
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/14] nfsd: don't allocate the versions array.
  2024-08-02 23:04     ` NeilBrown
@ 2024-08-05  4:55       ` NeilBrown
  0 siblings, 0 replies; 37+ messages in thread
From: NeilBrown @ 2024-08-05  4:55 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton, Mike Snitzer
  Cc: linux-nfs, Olga Kornievskaia, Dai Ngo, Tom Talpey, Steve Dickson

On Sat, 03 Aug 2024, NeilBrown wrote:
> On Sat, 03 Aug 2024, Mike Snitzer wrote:
> > On Mon, Jul 15, 2024 at 05:14:17PM +1000, NeilBrown wrote:
> > > Instead of using kmalloc to allocate an array for storing active version
> > > info, just declare and array to the max size - it is only 5 or so.
> > > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > >  fs/nfs/nfs4state.c |   2 +
> > >  fs/nfsd/cache.h    |   2 +-
> > >  fs/nfsd/netns.h    |   6 +--
> > >  fs/nfsd/nfsctl.c   |  10 +++--
> > >  fs/nfsd/nfsd.h     |   9 +++-
> > >  fs/nfsd/nfssvc.c   | 100 ++++++++-------------------------------------
> > >  6 files changed, 36 insertions(+), 93 deletions(-)
> > > 
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 5b452411e8fd..68c663626480 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -1953,6 +1953,8 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
> > >  				if (lost_locks)
> > >  					pr_warn("NFS: %s: lost %d locks\n",
> > >  						clp->cl_hostname, lost_locks);
> > > +				nfs4_free_state_owners(&freeme);
> > > +
> > >  				set_bit(ops->owner_flag_bit, &sp->so_flags);
> > >  				nfs4_put_state_owner(sp);
> > >  				status = nfs4_recovery_handle_error(clp, status);
> > 
> > Hey Neil,
> > 
> > This call to nfs4_free_state_owners() feels out of place given the
> > rest of this patch.  Was it meant to be folded into a different
> > patch?
> 
> I think I was writing a different patch (there is a case where the
> lost-locks warning can be incorrect) and I got half way and stopped for
> some reason.  Then later I wanted to do this patch and did "git stash"
> but I hadn't saved all my work.  So when I later did a "save all
> buffers" that change went into the wrong patch.
> 
> I'll follow up on Monday - thanks for noticing and letting me know!

Chuck / Jeff - could you please just remove the change to fs/nfs/ from
this patch - that is probably easier than me sending an incremental
patch or a revised version.

Thanks,
NeilBrown


> 
> NeilBrown
> 
> 
> > 
> > Thanks,
> > Mike
> > 
> > 
> > > diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
> > > index 66a05fefae98..bb7addef4a31 100644
> > > --- a/fs/nfsd/cache.h
> > > +++ b/fs/nfsd/cache.h
> > > @@ -10,7 +10,7 @@
> > >  #define NFSCACHE_H
> > >  
> > >  #include <linux/sunrpc/svc.h>
> > > -#include "netns.h"
> > > +#include "nfsd.h"
> > >  
> > >  /*
> > >   * Representation of a reply cache entry.
> > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > > index 14ec15656320..238fc4e56e53 100644
> > > --- a/fs/nfsd/netns.h
> > > +++ b/fs/nfsd/netns.h
> > > @@ -152,8 +152,8 @@ struct nfsd_net {
> > >  	/*
> > >  	 * Version information
> > >  	 */
> > > -	bool *nfsd_versions;
> > > -	bool *nfsd4_minorversions;
> > > +	bool nfsd_versions[NFSD_MAXVERS + 1];
> > > +	bool nfsd4_minorversions[NFSD_SUPPORTED_MINOR_VERSION + 1];
> > >  
> > >  	/*
> > >  	 * Duplicate reply cache
> > > @@ -219,8 +219,6 @@ struct nfsd_net {
> > >  #define nfsd_netns_ready(nn) ((nn)->sessionid_hashtbl)
> > >  
> > >  extern bool nfsd_support_version(int vers);
> > > -extern void nfsd_netns_free_versions(struct nfsd_net *nn);
> > > -
> > >  extern unsigned int nfsd_net_id;
> > >  
> > >  void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index 9b47723fc110..5b0f2e0d7ccf 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > > @@ -2232,8 +2232,9 @@ int nfsd_nl_pool_mode_get_doit(struct sk_buff *skb, struct genl_info *info)
> > >   */
> > >  static __net_init int nfsd_net_init(struct net *net)
> > >  {
> > > -	int retval;
> > >  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > +	int retval;
> > > +	int i;
> > >  
> > >  	retval = nfsd_export_init(net);
> > >  	if (retval)
> > > @@ -2247,8 +2248,10 @@ static __net_init int nfsd_net_init(struct net *net)
> > >  		goto out_repcache_error;
> > >  	memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
> > >  	nn->nfsd_svcstats.program = &nfsd_program;
> > > -	nn->nfsd_versions = NULL;
> > > -	nn->nfsd4_minorversions = NULL;
> > > +	for (i = 0; i < sizeof(nn->nfsd_versions); i++)
> > > +		nn->nfsd_versions[i] = nfsd_support_version(i);
> > > +	for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++)
> > > +		nn->nfsd4_minorversions[i] = nfsd_support_version(4);
> > >  	nn->nfsd_info.mutex = &nfsd_mutex;
> > >  	nn->nfsd_serv = NULL;
> > >  	nfsd4_init_leases_net(nn);
> > > @@ -2279,7 +2282,6 @@ static __net_exit void nfsd_net_exit(struct net *net)
> > >  	percpu_counter_destroy_many(nn->counter, NFSD_STATS_COUNTERS_NUM);
> > >  	nfsd_idmap_shutdown(net);
> > >  	nfsd_export_shutdown(net);
> > > -	nfsd_netns_free_versions(nn);
> > >  }
> > >  
> > >  static struct pernet_operations nfsd_net_ops = {
> > > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > > index 39e109a7d56d..369c3b3ce53e 100644
> > > --- a/fs/nfsd/nfsd.h
> > > +++ b/fs/nfsd/nfsd.h
> > > @@ -23,9 +23,7 @@
> > >  
> > >  #include <uapi/linux/nfsd/debug.h>
> > >  
> > > -#include "netns.h"
> > >  #include "export.h"
> > > -#include "stats.h"
> > >  
> > >  #undef ifdebug
> > >  #ifdef CONFIG_SUNRPC_DEBUG
> > > @@ -37,7 +35,14 @@
> > >  /*
> > >   * nfsd version
> > >   */
> > > +#define NFSD_MINVERS			2
> > > +#define	NFSD_MAXVERS			4
> > >  #define NFSD_SUPPORTED_MINOR_VERSION	2
> > > +bool nfsd_support_version(int vers);
> > > +
> > > +#include "netns.h"
> > > +#include "stats.h"
> > > +
> > >  /*
> > >   * Maximum blocksizes supported by daemon under various circumstances.
> > >   */
> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index f25b26bc5670..4438cdcd4873 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -116,15 +116,12 @@ static const struct svc_version *nfsd_version[] = {
> > >  #endif
> > >  };
> > >  
> > > -#define NFSD_MINVERS    	2
> > > -#define NFSD_NRVERS		ARRAY_SIZE(nfsd_version)
> > > -
> > >  struct svc_program		nfsd_program = {
> > >  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> > >  	.pg_next		= &nfsd_acl_program,
> > >  #endif
> > >  	.pg_prog		= NFS_PROGRAM,		/* program number */
> > > -	.pg_nvers		= NFSD_NRVERS,		/* nr of entries in nfsd_version */
> > > +	.pg_nvers		= NFSD_MAXVERS+1,	/* nr of entries in nfsd_version */
> > >  	.pg_vers		= nfsd_version,		/* version table */
> > >  	.pg_name		= "nfsd",		/* program name */
> > >  	.pg_class		= "nfsd",		/* authentication class */
> > > @@ -135,78 +132,24 @@ struct svc_program		nfsd_program = {
> > >  
> > >  bool nfsd_support_version(int vers)
> > >  {
> > > -	if (vers >= NFSD_MINVERS && vers < NFSD_NRVERS)
> > > +	if (vers >= NFSD_MINVERS && vers <= NFSD_MAXVERS)
> > >  		return nfsd_version[vers] != NULL;
> > >  	return false;
> > >  }
> > >  
> > > -static bool *
> > > -nfsd_alloc_versions(void)
> > > -{
> > > -	bool *vers = kmalloc_array(NFSD_NRVERS, sizeof(bool), GFP_KERNEL);
> > > -	unsigned i;
> > > -
> > > -	if (vers) {
> > > -		/* All compiled versions are enabled by default */
> > > -		for (i = 0; i < NFSD_NRVERS; i++)
> > > -			vers[i] = nfsd_support_version(i);
> > > -	}
> > > -	return vers;
> > > -}
> > > -
> > > -static bool *
> > > -nfsd_alloc_minorversions(void)
> > > -{
> > > -	bool *vers = kmalloc_array(NFSD_SUPPORTED_MINOR_VERSION + 1,
> > > -			sizeof(bool), GFP_KERNEL);
> > > -	unsigned i;
> > > -
> > > -	if (vers) {
> > > -		/* All minor versions are enabled by default */
> > > -		for (i = 0; i <= NFSD_SUPPORTED_MINOR_VERSION; i++)
> > > -			vers[i] = nfsd_support_version(4);
> > > -	}
> > > -	return vers;
> > > -}
> > > -
> > > -void
> > > -nfsd_netns_free_versions(struct nfsd_net *nn)
> > > -{
> > > -	kfree(nn->nfsd_versions);
> > > -	kfree(nn->nfsd4_minorversions);
> > > -	nn->nfsd_versions = NULL;
> > > -	nn->nfsd4_minorversions = NULL;
> > > -}
> > > -
> > > -static void
> > > -nfsd_netns_init_versions(struct nfsd_net *nn)
> > > -{
> > > -	if (!nn->nfsd_versions) {
> > > -		nn->nfsd_versions = nfsd_alloc_versions();
> > > -		nn->nfsd4_minorversions = nfsd_alloc_minorversions();
> > > -		if (!nn->nfsd_versions || !nn->nfsd4_minorversions)
> > > -			nfsd_netns_free_versions(nn);
> > > -	}
> > > -}
> > > -
> > >  int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change)
> > >  {
> > > -	if (vers < NFSD_MINVERS || vers >= NFSD_NRVERS)
> > > +	if (vers < NFSD_MINVERS || vers > NFSD_MAXVERS)
> > >  		return 0;
> > >  	switch(change) {
> > >  	case NFSD_SET:
> > > -		if (nn->nfsd_versions)
> > > -			nn->nfsd_versions[vers] = nfsd_support_version(vers);
> > > +		nn->nfsd_versions[vers] = nfsd_support_version(vers);
> > >  		break;
> > >  	case NFSD_CLEAR:
> > > -		nfsd_netns_init_versions(nn);
> > > -		if (nn->nfsd_versions)
> > > -			nn->nfsd_versions[vers] = false;
> > > +		nn->nfsd_versions[vers] = false;
> > >  		break;
> > >  	case NFSD_TEST:
> > > -		if (nn->nfsd_versions)
> > > -			return nn->nfsd_versions[vers];
> > > -		fallthrough;
> > > +		return nn->nfsd_versions[vers];
> > >  	case NFSD_AVAIL:
> > >  		return nfsd_support_version(vers);
> > >  	}
> > > @@ -233,23 +176,16 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
> > >  
> > >  	switch(change) {
> > >  	case NFSD_SET:
> > > -		if (nn->nfsd4_minorversions) {
> > > -			nfsd_vers(nn, 4, NFSD_SET);
> > > -			nn->nfsd4_minorversions[minorversion] =
> > > -				nfsd_vers(nn, 4, NFSD_TEST);
> > > -		}
> > > +		nfsd_vers(nn, 4, NFSD_SET);
> > > +		nn->nfsd4_minorversions[minorversion] =
> > > +			nfsd_vers(nn, 4, NFSD_TEST);
> > >  		break;
> > >  	case NFSD_CLEAR:
> > > -		nfsd_netns_init_versions(nn);
> > > -		if (nn->nfsd4_minorversions) {
> > > -			nn->nfsd4_minorversions[minorversion] = false;
> > > -			nfsd_adjust_nfsd_versions4(nn);
> > > -		}
> > > +		nn->nfsd4_minorversions[minorversion] = false;
> > > +		nfsd_adjust_nfsd_versions4(nn);
> > >  		break;
> > >  	case NFSD_TEST:
> > > -		if (nn->nfsd4_minorversions)
> > > -			return nn->nfsd4_minorversions[minorversion];
> > > -		return nfsd_vers(nn, 4, NFSD_TEST);
> > > +		return nn->nfsd4_minorversions[minorversion];
> > >  	case NFSD_AVAIL:
> > >  		return minorversion <= NFSD_SUPPORTED_MINOR_VERSION &&
> > >  			nfsd_vers(nn, 4, NFSD_AVAIL);
> > > @@ -568,11 +504,11 @@ void nfsd_reset_versions(struct nfsd_net *nn)
> > >  {
> > >  	int i;
> > >  
> > > -	for (i = 0; i < NFSD_NRVERS; i++)
> > > +	for (i = 0; i <= NFSD_MAXVERS; i++)
> > >  		if (nfsd_vers(nn, i, NFSD_TEST))
> > >  			return;
> > >  
> > > -	for (i = 0; i < NFSD_NRVERS; i++)
> > > +	for (i = 0; i <= NFSD_MAXVERS; i++)
> > >  		if (i != 4)
> > >  			nfsd_vers(nn, i, NFSD_SET);
> > >  		else {
> > > @@ -905,17 +841,17 @@ nfsd_init_request(struct svc_rqst *rqstp,
> > >  	if (likely(nfsd_vers(nn, rqstp->rq_vers, NFSD_TEST)))
> > >  		return svc_generic_init_request(rqstp, progp, ret);
> > >  
> > > -	ret->mismatch.lovers = NFSD_NRVERS;
> > > -	for (i = NFSD_MINVERS; i < NFSD_NRVERS; i++) {
> > > +	ret->mismatch.lovers = NFSD_MAXVERS + 1;
> > > +	for (i = NFSD_MINVERS; i <= NFSD_MAXVERS; i++) {
> > >  		if (nfsd_vers(nn, i, NFSD_TEST)) {
> > >  			ret->mismatch.lovers = i;
> > >  			break;
> > >  		}
> > >  	}
> > > -	if (ret->mismatch.lovers == NFSD_NRVERS)
> > > +	if (ret->mismatch.lovers > NFSD_MAXVERS)
> > >  		return rpc_prog_unavail;
> > >  	ret->mismatch.hivers = NFSD_MINVERS;
> > > -	for (i = NFSD_NRVERS - 1; i >= NFSD_MINVERS; i--) {
> > > +	for (i = NFSD_MAXVERS; i >= NFSD_MINVERS; i--) {
> > >  		if (nfsd_vers(nn, i, NFSD_TEST)) {
> > >  			ret->mismatch.hivers = i;
> > >  			break;
> > > -- 
> > > 2.44.0
> > > 
> > > 
> > 
> 
> 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-08-05  4:56 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
2024-07-15  7:14 ` [PATCH 01/14] lockd: discard nlmsvc_timeout NeilBrown
2024-07-15  7:14 ` [PATCH 02/14] SUNRPC: make various functions static, or not exported NeilBrown
2024-07-15  7:14 ` [PATCH 03/14] nfsd: move nfsd_pool_stats_open into nfsctl.c NeilBrown
2024-07-15  7:14 ` [PATCH 04/14] nfsd: don't allocate the versions array NeilBrown
2024-08-02 21:34   ` Mike Snitzer
2024-08-02 23:04     ` NeilBrown
2024-08-05  4:55       ` NeilBrown
2024-07-15  7:14 ` [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int NeilBrown
2024-07-15 14:12   ` Jeff Layton
2024-07-15 14:33     ` Jeff Layton
2024-07-16  1:33     ` NeilBrown
2024-07-24 19:36       ` Chuck Lever
2024-07-15  7:14 ` [PATCH 06/14] sunrpc: don't take ->sv_lock when updating ->sv_nrthreads NeilBrown
2024-07-15  7:14 ` [PATCH 07/14] Change unshare_fs_struct() to never fail NeilBrown
2024-07-15 14:39   ` Jeff Layton
2024-07-16  1:48     ` NeilBrown
2024-07-15  7:14 ` [PATCH 08/14] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
2024-07-15  7:14 ` [PATCH 09/14] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
2024-07-15 15:21   ` Jeff Layton
2024-07-15  7:14 ` [PATCH 10/14] nfs: dynamically adjust per-client DRC slot limits NeilBrown
2024-07-15  7:14 ` [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
2024-07-15 15:52   ` Jeff Layton
2024-07-16  2:04     ` NeilBrown
2024-07-15  7:14 ` [PATCH 12/14] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
2024-07-15 16:00   ` Jeff Layton
2024-07-15  7:14 ` [PATCH 13/14] nfsd: introduce concept of a maximum number of threads NeilBrown
2024-07-15 17:06   ` Jeff Layton
2024-07-16  3:21     ` NeilBrown
2024-07-16 11:00       ` Jeff Layton
2024-07-16 13:31         ` Chuck Lever III
2024-07-16 18:49           ` Tom Talpey
2024-07-17 15:24             ` Chuck Lever III
2024-07-15  7:14 ` [PATCH 14/14] nfsd: adjust number of running nfsd threads NeilBrown
2024-07-15 17:29 ` [PATCH 00/14 RFC] support automatic changes to nfsd thread count Jeff Layton
2024-07-24 19:43 ` Chuck Lever III
2024-07-24 21:25   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox