Linux NFS development
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns
@ 2024-01-25 19:53 Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 01/13] sunrpc: don't change ->sv_stats if it doesn't exist Josef Bacik
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

v1: https://lore.kernel.org/linux-nfs/cover.1706124811.git.josef@toxicpanda.com/

v1->v2:
- rework the sunprc service creation to take a pointer to the sv_stats.
- dropped ->pg_stats from the svc_program.
- converted all of the nfsd global stats to per-network namespace.
- added the ability to point at a specific rpc_stat for rpc program creation.
- converted the rpc stats for nfs to per-network namespace.

-- Original email --
Hello,

We're currently deploying NFS internally and have run into some oddities with
our usage of containers.  All of the services that mount and export NFS volumes
run inside of containers, specifically all the namespaces including network
namespaces.  Our monitoring is done on a per-container basis, so we need access
to the nfs and nfsd stats that are under /proc/net/sunrpc.  However these are
only tied to the init_net, which makes them invisible to containers in a
different network namespace.

Fix this so that these files are tied to the network namespace.  This allows us
to avoid the hack of bind mounting the hosts /proc into the container in order
to do proper monitoring.  Thanks,

Josef

Josef Bacik (13):
  sunrpc: don't change ->sv_stats if it doesn't exist
  nfs: stop setting ->pg_stats for unused stats
  sunrpc: pass in the sv_stats struct through svc_create*
  sunrpc: remove ->pg_stats from svc_program
  sunrpc: add a struct rpc_stats arg to rpc_create_args
  sunrpc: use the struct net as the svc proc private
  nfsd: rename NFSD_NET_* to NFSD_STATS_*
  nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
  nfsd: make all of the nfsd stats per-network namespace
  nfsd: move th_cnt into nfsd_net
  nfsd: make svc_stat per-network namespace instead of global
  nfs: expose /proc/net/sunrpc/nfs in net namespaces
  nfs: make the rpc_stat per net namespace

 fs/lockd/svc.c              |  5 +--
 fs/nfs/callback.c           |  5 +--
 fs/nfs/client.c             |  5 ++-
 fs/nfs/inode.c              |  8 ++---
 fs/nfs/internal.h           |  2 --
 fs/nfs/netns.h              |  2 ++
 fs/nfsd/cache.h             |  2 --
 fs/nfsd/netns.h             | 28 ++++++++++++---
 fs/nfsd/nfs4proc.c          |  6 ++--
 fs/nfsd/nfs4state.c         |  3 +-
 fs/nfsd/nfscache.c          | 40 +++++----------------
 fs/nfsd/nfsctl.c            | 16 ++++-----
 fs/nfsd/nfsfh.c             |  3 +-
 fs/nfsd/nfssvc.c            | 13 +++----
 fs/nfsd/stats.c             | 53 ++++++++++++----------------
 fs/nfsd/stats.h             | 70 +++++++++++++------------------------
 fs/nfsd/vfs.c               |  5 +--
 include/linux/sunrpc/clnt.h |  1 +
 include/linux/sunrpc/svc.h  |  9 +++--
 net/sunrpc/clnt.c           |  2 +-
 net/sunrpc/stats.c          |  2 +-
 net/sunrpc/svc.c            | 44 ++++++++++++++---------
 22 files changed, 147 insertions(+), 177 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 01/13] sunrpc: don't change ->sv_stats if it doesn't exist
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 02/13] nfs: stop setting ->pg_stats for unused stats Josef Bacik
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

We check for the existence of ->sv_stats elsewhere except in the core
processing code.  It appears that only nfsd actual exports these values
anywhere, everybody else just has a write only copy of sv_stats in their
svc_program.  Add a check for ->sv_stats before every adjustment to
allow us to eliminate the stats struct from all the users who don't
report the stats.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 net/sunrpc/svc.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index f60c93e5a25d..d2e6f3d59218 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1375,7 +1375,8 @@ svc_process_common(struct svc_rqst *rqstp)
 		goto err_bad_proc;
 
 	/* Syntactic check complete */
-	serv->sv_stats->rpccnt++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpccnt++;
 	trace_svc_process(rqstp, progp->pg_name);
 
 	aoffset = xdr_stream_pos(xdr);
@@ -1427,7 +1428,8 @@ svc_process_common(struct svc_rqst *rqstp)
 	goto close_xprt;
 
 err_bad_rpc:
-	serv->sv_stats->rpcbadfmt++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpcbadfmt++;
 	xdr_stream_encode_u32(xdr, RPC_MSG_DENIED);
 	xdr_stream_encode_u32(xdr, RPC_MISMATCH);
 	/* Only RPCv2 supported */
@@ -1438,7 +1440,8 @@ svc_process_common(struct svc_rqst *rqstp)
 err_bad_auth:
 	dprintk("svc: authentication failed (%d)\n",
 		be32_to_cpu(rqstp->rq_auth_stat));
-	serv->sv_stats->rpcbadauth++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpcbadauth++;
 	/* Restore write pointer to location of reply status: */
 	xdr_truncate_encode(xdr, XDR_UNIT * 2);
 	xdr_stream_encode_u32(xdr, RPC_MSG_DENIED);
@@ -1448,7 +1451,8 @@ svc_process_common(struct svc_rqst *rqstp)
 
 err_bad_prog:
 	dprintk("svc: unknown program %d\n", rqstp->rq_prog);
-	serv->sv_stats->rpcbadfmt++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpcbadfmt++;
 	*rqstp->rq_accept_statp = rpc_prog_unavail;
 	goto sendit;
 
@@ -1456,7 +1460,8 @@ svc_process_common(struct svc_rqst *rqstp)
 	svc_printk(rqstp, "unknown version (%d for prog %d, %s)\n",
 		       rqstp->rq_vers, rqstp->rq_prog, progp->pg_name);
 
-	serv->sv_stats->rpcbadfmt++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpcbadfmt++;
 	*rqstp->rq_accept_statp = rpc_prog_mismatch;
 
 	/*
@@ -1470,19 +1475,22 @@ svc_process_common(struct svc_rqst *rqstp)
 err_bad_proc:
 	svc_printk(rqstp, "unknown procedure (%d)\n", rqstp->rq_proc);
 
-	serv->sv_stats->rpcbadfmt++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpcbadfmt++;
 	*rqstp->rq_accept_statp = rpc_proc_unavail;
 	goto sendit;
 
 err_garbage_args:
 	svc_printk(rqstp, "failed to decode RPC header\n");
 
-	serv->sv_stats->rpcbadfmt++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpcbadfmt++;
 	*rqstp->rq_accept_statp = rpc_garbage_args;
 	goto sendit;
 
 err_system_err:
-	serv->sv_stats->rpcbadfmt++;
+	if (serv->sv_stats)
+		serv->sv_stats->rpcbadfmt++;
 	*rqstp->rq_accept_statp = rpc_system_err;
 	goto sendit;
 }
@@ -1534,7 +1542,8 @@ void svc_process(struct svc_rqst *rqstp)
 out_baddir:
 	svc_printk(rqstp, "bad direction 0x%08x, dropping request\n",
 		   be32_to_cpu(*p));
-	rqstp->rq_server->sv_stats->rpcbadfmt++;
+	if (rqstp->rq_server->sv_stats)
+		rqstp->rq_server->sv_stats->rpcbadfmt++;
 out_drop:
 	svc_drop(rqstp);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 02/13] nfs: stop setting ->pg_stats for unused stats
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 01/13] sunrpc: don't change ->sv_stats if it doesn't exist Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create* Josef Bacik
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

A lot of places are setting a blank svc_stats in ->pg_stats and never
utilizing these stats.  Remove all of these extra structs as we're not
reporting these stats anywhere.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/lockd/svc.c    | 3 ---
 fs/nfs/callback.c | 3 ---
 fs/nfsd/nfssvc.c  | 5 -----
 3 files changed, 11 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index ce5862482097..ab8042a5b895 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -710,8 +710,6 @@ static const struct svc_version *nlmsvc_version[] = {
 #endif
 };
 
-static struct svc_stat		nlmsvc_stats;
-
 #define NLM_NRVERS	ARRAY_SIZE(nlmsvc_version)
 static struct svc_program	nlmsvc_program = {
 	.pg_prog		= NLM_PROGRAM,		/* program number */
@@ -719,7 +717,6 @@ static struct svc_program	nlmsvc_program = {
 	.pg_vers		= nlmsvc_version,	/* version table */
 	.pg_name		= "lockd",		/* service name */
 	.pg_class		= "nfsd",		/* share authentication with nfsd */
-	.pg_stats		= &nlmsvc_stats,	/* stats table */
 	.pg_authenticate	= &lockd_authenticate,	/* export authentication */
 	.pg_init_request	= svc_generic_init_request,
 	.pg_rpcbind_set		= svc_generic_rpcbind_set,
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 760d27dd7225..8adfcd4c8c1a 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -356,15 +356,12 @@ static const struct svc_version *nfs4_callback_version[] = {
 	[4] = &nfs4_callback_version4,
 };
 
-static struct svc_stat nfs4_callback_stats;
-
 static struct svc_program nfs4_callback_program = {
 	.pg_prog = NFS4_CALLBACK,			/* RPC service number */
 	.pg_nvers = ARRAY_SIZE(nfs4_callback_version),	/* Number of entries */
 	.pg_vers = nfs4_callback_version,		/* version table */
 	.pg_name = "NFSv4 callback",			/* service name */
 	.pg_class = "nfs",				/* authentication class */
-	.pg_stats = &nfs4_callback_stats,
 	.pg_authenticate = nfs_callback_authenticate,
 	.pg_init_request = svc_generic_init_request,
 	.pg_rpcbind_set	= svc_generic_rpcbind_set,
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 9a894c3511ba..a0b117107e86 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -80,7 +80,6 @@ unsigned long	nfsd_drc_max_mem;
 unsigned long	nfsd_drc_mem_used;
 
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
-static struct svc_stat	nfsd_acl_svcstats;
 static const struct svc_version *nfsd_acl_version[] = {
 # if defined(CONFIG_NFSD_V2_ACL)
 	[2] = &nfsd_acl_version2,
@@ -99,15 +98,11 @@ static struct svc_program	nfsd_acl_program = {
 	.pg_vers		= nfsd_acl_version,
 	.pg_name		= "nfsacl",
 	.pg_class		= "nfsd",
-	.pg_stats		= &nfsd_acl_svcstats,
 	.pg_authenticate	= &svc_set_client,
 	.pg_init_request	= nfsd_acl_init_request,
 	.pg_rpcbind_set		= nfsd_acl_rpcbind_set,
 };
 
-static struct svc_stat	nfsd_acl_svcstats = {
-	.program	= &nfsd_acl_program,
-};
 #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
 
 static const struct svc_version *nfsd_version[] = {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create*
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 01/13] sunrpc: don't change ->sv_stats if it doesn't exist Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 02/13] nfs: stop setting ->pg_stats for unused stats Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 20:56   ` Chuck Lever
  2024-01-25 19:53 ` [PATCH v2 04/13] sunrpc: remove ->pg_stats from svc_program Josef Bacik
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

Since only one service actually reports the rpc stats there's not much
of a reason to have a pointer to it in the svc_program struct.  Adjust
the svc_create* functions to take the sv_stats as an argument and pass
the struct through there as desired instead of getting it from the
svc_program->pg_stats.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/lockd/svc.c             |  2 +-
 fs/nfs/callback.c          |  2 +-
 fs/nfsd/nfssvc.c           |  3 ++-
 include/linux/sunrpc/svc.h |  8 ++++----
 net/sunrpc/svc.c           | 17 ++++++++++-------
 5 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index ab8042a5b895..8fbbfc9aad69 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -337,7 +337,7 @@ static int lockd_get(void)
 		nlm_timeout = LOCKD_DFLT_TIMEO;
 	nlmsvc_timeout = nlm_timeout * HZ;
 
-	serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, lockd);
+	serv = svc_create(&nlmsvc_program, NULL, LOCKD_BUFSIZE, lockd);
 	if (!serv) {
 		printk(KERN_WARNING "lockd_up: create service failed\n");
 		return -ENOMEM;
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 8adfcd4c8c1a..4d56b4e73525 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -202,7 +202,7 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
 	if (minorversion)
 		return ERR_PTR(-ENOTSUPP);
 #endif
-	serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE,
+	serv = svc_create(&nfs4_callback_program, NULL, NFS4_CALLBACK_BUFSIZE,
 			  threadfn);
 	if (!serv) {
 		printk(KERN_ERR "nfs_callback_create_svc: create service failed\n");
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index a0b117107e86..d640f893021a 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -661,7 +661,8 @@ int nfsd_create_serv(struct net *net)
 	if (nfsd_max_blksize == 0)
 		nfsd_max_blksize = nfsd_get_default_max_blksize();
 	nfsd_reset_versions(nn);
-	serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd);
+	serv = svc_create_pooled(&nfsd_program, &nfsd_svcstats,
+				 nfsd_max_blksize, nfsd);
 	if (serv == NULL)
 		return -ENOMEM;
 
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 67cf1c9efd80..2a1447fa5ef2 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -402,8 +402,8 @@ struct svc_procedure {
 int svc_rpcb_setup(struct svc_serv *serv, struct net *net);
 void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net);
 int svc_bind(struct svc_serv *serv, struct net *net);
-struct svc_serv *svc_create(struct svc_program *, unsigned int,
-			    int (*threadfn)(void *data));
+struct svc_serv *svc_create(struct svc_program *, struct svc_stat *,
+			    unsigned int, int (*threadfn)(void *data));
 struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
 					struct svc_pool *pool, int node);
 bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
@@ -411,8 +411,8 @@ bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
 void		   svc_rqst_release_pages(struct svc_rqst *rqstp);
 void		   svc_rqst_free(struct svc_rqst *);
 void		   svc_exit_thread(struct svc_rqst *);
-struct svc_serv *  svc_create_pooled(struct svc_program *, unsigned int,
-				     int (*threadfn)(void *data));
+struct svc_serv *  svc_create_pooled(struct svc_program *, struct svc_stat *,
+				     unsigned int, int (*threadfn)(void *data));
 int		   svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
 int		   svc_pool_stats_open(struct svc_info *si, struct file *file);
 void		   svc_process(struct svc_rqst *rqstp);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index d2e6f3d59218..f76ef8a3dd43 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -451,8 +451,8 @@ __svc_init_bc(struct svc_serv *serv)
  * Create an RPC service
  */
 static struct svc_serv *
-__svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
-	     int (*threadfn)(void *data))
+__svc_create(struct svc_program *prog, struct svc_stat *stats,
+	     unsigned int bufsize, int npools, int (*threadfn)(void *data))
 {
 	struct svc_serv	*serv;
 	unsigned int vers;
@@ -463,7 +463,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
 		return NULL;
 	serv->sv_name      = prog->pg_name;
 	serv->sv_program   = prog;
-	serv->sv_stats     = prog->pg_stats;
+	serv->sv_stats     = stats;
 	if (bufsize > RPCSVC_MAXPAYLOAD)
 		bufsize = RPCSVC_MAXPAYLOAD;
 	serv->sv_max_payload = bufsize? bufsize : 4096;
@@ -521,34 +521,37 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
 /**
  * svc_create - Create an RPC service
  * @prog: the RPC program the new service will handle
+ * @stats: the stats struct if desired
  * @bufsize: maximum message size for @prog
  * @threadfn: a function to service RPC requests for @prog
  *
  * Returns an instantiated struct svc_serv object or NULL.
  */
-struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
-			    int (*threadfn)(void *data))
+struct svc_serv *svc_create(struct svc_program *prog, struct svc_stat *stats,
+			    unsigned int bufsize, int (*threadfn)(void *data))
 {
-	return __svc_create(prog, bufsize, 1, threadfn);
+	return __svc_create(prog, stats, bufsize, 1, threadfn);
 }
 EXPORT_SYMBOL_GPL(svc_create);
 
 /**
  * svc_create_pooled - Create an RPC service with pooled threads
  * @prog: the RPC program the new service will handle
+ * @stats: the stats struct if desired
  * @bufsize: maximum message size for @prog
  * @threadfn: a function to service RPC requests for @prog
  *
  * Returns an instantiated struct svc_serv object or NULL.
  */
 struct svc_serv *svc_create_pooled(struct svc_program *prog,
+				   struct svc_stat *stats,
 				   unsigned int bufsize,
 				   int (*threadfn)(void *data))
 {
 	struct svc_serv *serv;
 	unsigned int npools = svc_pool_map_get();
 
-	serv = __svc_create(prog, bufsize, npools, threadfn);
+	serv = __svc_create(prog, stats, bufsize, npools, threadfn);
 	if (!serv)
 		goto out_err;
 	return serv;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 04/13] sunrpc: remove ->pg_stats from svc_program
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (2 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create* Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args Josef Bacik
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

Now that this isn't used anywhere, remove it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfsd/nfssvc.c           | 1 -
 include/linux/sunrpc/svc.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d640f893021a..d98a6abad990 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -127,7 +127,6 @@ struct svc_program		nfsd_program = {
 	.pg_vers		= nfsd_version,		/* version table */
 	.pg_name		= "nfsd",		/* program name */
 	.pg_class		= "nfsd",		/* authentication class */
-	.pg_stats		= &nfsd_svcstats,	/* version table */
 	.pg_authenticate	= &svc_set_client,	/* export authentication */
 	.pg_init_request	= nfsd_init_request,
 	.pg_rpcbind_set		= nfsd_rpcbind_set,
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 2a1447fa5ef2..9745ac61d83d 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -339,7 +339,6 @@ struct svc_program {
 	const struct svc_version **pg_vers;	/* version array */
 	char *			pg_name;	/* service name */
 	char *			pg_class;	/* class name: services sharing authentication */
-	struct svc_stat *	pg_stats;	/* rpc statistics */
 	enum svc_auth_status	(*pg_authenticate)(struct svc_rqst *rqstp);
 	__be32			(*pg_init_request)(struct svc_rqst *,
 						   const struct svc_program *,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (3 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 04/13] sunrpc: remove ->pg_stats from svc_program Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 20:53   ` Chuck Lever
  2024-01-25 19:53 ` [PATCH v2 06/13] sunrpc: use the struct net as the svc proc private Josef Bacik
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

We want to be able to have our rpc stats handled in a per network
namespace manner, so add an option to rpc_create_args to specify a
different rpc_stats struct instead of using the one on the rpc_program.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfs/client.c             | 2 +-
 include/linux/sunrpc/clnt.h | 1 +
 net/sunrpc/clnt.c           | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 44eca51b2808..590be14f182f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -73,7 +73,7 @@ const struct rpc_program nfs_program = {
 	.number			= NFS_PROGRAM,
 	.nrvers			= ARRAY_SIZE(nfs_version),
 	.version		= nfs_version,
-	.stats			= &nfs_rpcstat,
+	.stats                  = &nfs_rpcstat,
 	.pipe_dir_name		= NFS_PIPE_DIRNAME,
 };
 
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 5e9d1469c6fa..5321585c778f 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -139,6 +139,7 @@ struct rpc_create_args {
 	const char		*servername;
 	const char		*nodename;
 	const struct rpc_program *program;
+	struct rpc_stat		*stats;
 	u32			prognumber;	/* overrides program->number */
 	u32			version;
 	rpc_authflavor_t	authflavor;
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index cda0935a68c9..bc8c209fc0c7 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -405,7 +405,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
 	clnt->cl_maxproc  = version->nrprocs;
 	clnt->cl_prog     = args->prognumber ? : program->number;
 	clnt->cl_vers     = version->number;
-	clnt->cl_stats    = program->stats;
+	clnt->cl_stats    = args->stats ? : program->stats;
 	clnt->cl_metrics  = rpc_alloc_iostats(clnt);
 	rpc_init_pipe_dir_head(&clnt->cl_pipedir_objects);
 	err = -ENOMEM;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 06/13] sunrpc: use the struct net as the svc proc private
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (4 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 07/13] nfsd: rename NFSD_NET_* to NFSD_STATS_* Josef Bacik
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

nfsd is the only thing using this helper, and it doesn't use the private
currently.  When we switch to per-network namespace stats we will need
the struct net * in order to get to the nfsd_net.  Use the net as the
proc private so we can utilize this when we make the switch over.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 net/sunrpc/stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 65fc1297c6df..383860cb1d5b 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -314,7 +314,7 @@ EXPORT_SYMBOL_GPL(rpc_proc_unregister);
 struct proc_dir_entry *
 svc_proc_register(struct net *net, struct svc_stat *statp, const struct proc_ops *proc_ops)
 {
-	return do_register(net, statp->program->pg_name, statp, proc_ops);
+	return do_register(net, statp->program->pg_name, net, proc_ops);
 }
 EXPORT_SYMBOL_GPL(svc_proc_register);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 07/13] nfsd: rename NFSD_NET_* to NFSD_STATS_*
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (5 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 06/13] sunrpc: use the struct net as the svc proc private Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 08/13] nfsd: expose /proc/net/sunrpc/nfsd in net namespaces Josef Bacik
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

We're going to merge the stats all into per network namespace in
subsequent patches, rename these nn counters to be consistent with the
rest of the stats.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfsd/netns.h    | 4 ++--
 fs/nfsd/nfscache.c | 4 ++--
 fs/nfsd/stats.h    | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 74b4360779a1..e3605cb5f044 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -26,9 +26,9 @@ struct nfsd4_client_tracking_ops;
 
 enum {
 	/* cache misses due only to checksum comparison failures */
-	NFSD_NET_PAYLOAD_MISSES,
+	NFSD_STATS_PAYLOAD_MISSES,
 	/* amount of memory (in bytes) currently consumed by the DRC */
-	NFSD_NET_DRC_MEM_USAGE,
+	NFSD_STATS_DRC_MEM_USAGE,
 	NFSD_NET_COUNTERS_NUM
 };
 
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 5c1a4a0aa605..3d4a9d181c43 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -687,7 +687,7 @@ int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
 		   atomic_read(&nn->num_drc_entries));
 	seq_printf(m, "hash buckets:          %u\n", 1 << nn->maskbits);
 	seq_printf(m, "mem usage:             %lld\n",
-		   percpu_counter_sum_positive(&nn->counter[NFSD_NET_DRC_MEM_USAGE]));
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_DRC_MEM_USAGE]));
 	seq_printf(m, "cache hits:            %lld\n",
 		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]));
 	seq_printf(m, "cache misses:          %lld\n",
@@ -695,7 +695,7 @@ int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "not cached:            %lld\n",
 		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]));
 	seq_printf(m, "payload misses:        %lld\n",
-		   percpu_counter_sum_positive(&nn->counter[NFSD_NET_PAYLOAD_MISSES]));
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_PAYLOAD_MISSES]));
 	seq_printf(m, "longest chain len:     %u\n", nn->longest_chain);
 	seq_printf(m, "cachesize at longest:  %u\n", nn->longest_chain_cachesize);
 	return 0;
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h
index 14f50c660b61..7ed4325ac691 100644
--- a/fs/nfsd/stats.h
+++ b/fs/nfsd/stats.h
@@ -81,17 +81,17 @@ static inline void nfsd_stats_io_write_add(struct svc_export *exp, s64 amount)
 
 static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn)
 {
-	percpu_counter_inc(&nn->counter[NFSD_NET_PAYLOAD_MISSES]);
+	percpu_counter_inc(&nn->counter[NFSD_STATS_PAYLOAD_MISSES]);
 }
 
 static inline void nfsd_stats_drc_mem_usage_add(struct nfsd_net *nn, s64 amount)
 {
-	percpu_counter_add(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount);
+	percpu_counter_add(&nn->counter[NFSD_STATS_DRC_MEM_USAGE], amount);
 }
 
 static inline void nfsd_stats_drc_mem_usage_sub(struct nfsd_net *nn, s64 amount)
 {
-	percpu_counter_sub(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount);
+	percpu_counter_sub(&nn->counter[NFSD_STATS_DRC_MEM_USAGE], amount);
 }
 
 #ifdef CONFIG_NFSD_V4
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 08/13] nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (6 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 07/13] nfsd: rename NFSD_NET_* to NFSD_STATS_* Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 09/13] nfsd: make all of the nfsd stats per-network namespace Josef Bacik
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

We are running nfsd servers inside of containers with their own network
namespace, and we want to monitor these services using the stats found
in /proc.  However these are not exposed in the proc inside of the
container, so we have to bind mount the host /proc into our containers
to get at this information.

Separate out the stat counters init and the proc registration, and move
the proc registration into the pernet operations entry and exit points
so that these stats can be exposed inside of network namespaces.

This is an intermediate step, this just exposes the global counters in
the network namespace.  Subsequent patches will move these counters into
the per-network namespace container.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfsd/nfsctl.c |  8 +++++---
 fs/nfsd/stats.c  | 21 ++++++---------------
 fs/nfsd/stats.h  |  6 ++++--
 3 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index f206ca32e7f5..b57480b50e35 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1679,6 +1679,7 @@ static __net_init int nfsd_net_init(struct net *net)
 	nfsd4_init_leases_net(nn);
 	get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key));
 	seqlock_init(&nn->writeverf_lock);
+	nfsd_proc_stat_init(net);
 
 	return 0;
 
@@ -1699,6 +1700,7 @@ static __net_exit void nfsd_net_exit(struct net *net)
 {
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
+	nfsd_proc_stat_shutdown(net);
 	nfsd_net_reply_cache_destroy(nn);
 	nfsd_idmap_shutdown(net);
 	nfsd_export_shutdown(net);
@@ -1722,7 +1724,7 @@ static int __init init_nfsd(void)
 	retval = nfsd4_init_pnfs();
 	if (retval)
 		goto out_free_slabs;
-	retval = nfsd_stat_init();	/* Statistics */
+	retval = nfsd_stat_counters_init();	/* Statistics */
 	if (retval)
 		goto out_free_pnfs;
 	retval = nfsd_drc_slab_create();
@@ -1762,7 +1764,7 @@ static int __init init_nfsd(void)
 	nfsd_lockd_shutdown();
 	nfsd_drc_slab_free();
 out_free_stat:
-	nfsd_stat_shutdown();
+	nfsd_stat_counters_destroy();
 out_free_pnfs:
 	nfsd4_exit_pnfs();
 out_free_slabs:
@@ -1780,7 +1782,7 @@ static void __exit exit_nfsd(void)
 	nfsd_drc_slab_free();
 	remove_proc_entry("fs/nfs/exports", NULL);
 	remove_proc_entry("fs/nfs", NULL);
-	nfsd_stat_shutdown();
+	nfsd_stat_counters_destroy();
 	nfsd_lockd_shutdown();
 	nfsd4_free_slabs();
 	nfsd4_exit_pnfs();
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c
index 12d79f5d4eb1..394a65a33942 100644
--- a/fs/nfsd/stats.c
+++ b/fs/nfsd/stats.c
@@ -108,31 +108,22 @@ void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num)
 		percpu_counter_destroy(&counters[i]);
 }
 
-static int nfsd_stat_counters_init(void)
+int nfsd_stat_counters_init(void)
 {
 	return nfsd_percpu_counters_init(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM);
 }
 
-static void nfsd_stat_counters_destroy(void)
+void nfsd_stat_counters_destroy(void)
 {
 	nfsd_percpu_counters_destroy(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM);
 }
 
-int nfsd_stat_init(void)
+void nfsd_proc_stat_init(struct net *net)
 {
-	int err;
-
-	err = nfsd_stat_counters_init();
-	if (err)
-		return err;
-
-	svc_proc_register(&init_net, &nfsd_svcstats, &nfsd_proc_ops);
-
-	return 0;
+	svc_proc_register(net, &nfsd_svcstats, &nfsd_proc_ops);
 }
 
-void nfsd_stat_shutdown(void)
+void nfsd_proc_stat_shutdown(struct net *net)
 {
-	nfsd_stat_counters_destroy();
-	svc_proc_unregister(&init_net, "nfsd");
+	svc_proc_unregister(net, "nfsd");
 }
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h
index 7ed4325ac691..38811aa7d13e 100644
--- a/fs/nfsd/stats.h
+++ b/fs/nfsd/stats.h
@@ -40,8 +40,10 @@ extern struct svc_stat		nfsd_svcstats;
 int nfsd_percpu_counters_init(struct percpu_counter *counters, int num);
 void nfsd_percpu_counters_reset(struct percpu_counter *counters, int num);
 void nfsd_percpu_counters_destroy(struct percpu_counter *counters, int num);
-int nfsd_stat_init(void);
-void nfsd_stat_shutdown(void);
+int nfsd_stat_counters_init(void);
+void nfsd_stat_counters_destroy(void);
+void nfsd_proc_stat_init(struct net *net);
+void nfsd_proc_stat_shutdown(struct net *net);
 
 static inline void nfsd_stats_rc_hits_inc(void)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 09/13] nfsd: make all of the nfsd stats per-network namespace
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (7 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 08/13] nfsd: expose /proc/net/sunrpc/nfsd in net namespaces Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net Josef Bacik
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

We have a global set of counters that we modify for all of the nfsd
operations, but now that we're exposing these stats across all network
namespaces we need to make the stats also be per-network namespace.  We
already have some caching stats that are per-network namespace, so move
these definitions into the same counter and then adjust all the helpers
and users of these stats to provide the appropriate nfsd_net struct so
that the stats are maintained for the per-network namespace objects.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfsd/cache.h     |  2 --
 fs/nfsd/netns.h     | 17 ++++++++++++--
 fs/nfsd/nfs4proc.c  |  6 ++---
 fs/nfsd/nfs4state.c |  3 ++-
 fs/nfsd/nfscache.c  | 36 ++++++------------------------
 fs/nfsd/nfsctl.c    | 12 +++-------
 fs/nfsd/nfsfh.c     |  3 ++-
 fs/nfsd/stats.c     | 26 ++++++++++++----------
 fs/nfsd/stats.h     | 54 ++++++++++++++++-----------------------------
 fs/nfsd/vfs.c       |  5 +++--
 10 files changed, 68 insertions(+), 96 deletions(-)

diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
index 4cbe0434cbb8..66a05fefae98 100644
--- a/fs/nfsd/cache.h
+++ b/fs/nfsd/cache.h
@@ -80,8 +80,6 @@ enum {
 
 int	nfsd_drc_slab_create(void);
 void	nfsd_drc_slab_free(void);
-int	nfsd_net_reply_cache_init(struct nfsd_net *nn);
-void	nfsd_net_reply_cache_destroy(struct nfsd_net *nn);
 int	nfsd_reply_cache_init(struct nfsd_net *);
 void	nfsd_reply_cache_shutdown(struct nfsd_net *);
 int	nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index e3605cb5f044..0cef4bb407a9 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -11,6 +11,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <linux/filelock.h>
+#include <linux/nfs4.h>
 #include <linux/percpu_counter.h>
 #include <linux/siphash.h>
 
@@ -29,7 +30,19 @@ enum {
 	NFSD_STATS_PAYLOAD_MISSES,
 	/* amount of memory (in bytes) currently consumed by the DRC */
 	NFSD_STATS_DRC_MEM_USAGE,
-	NFSD_NET_COUNTERS_NUM
+	NFSD_STATS_RC_HITS,		/* repcache hits */
+	NFSD_STATS_RC_MISSES,		/* repcache misses */
+	NFSD_STATS_RC_NOCACHE,		/* uncached reqs */
+	NFSD_STATS_FH_STALE,		/* FH stale error */
+	NFSD_STATS_IO_READ,		/* bytes returned to read requests */
+	NFSD_STATS_IO_WRITE,		/* bytes passed in write requests */
+#ifdef CONFIG_NFSD_V4
+	NFSD_STATS_FIRST_NFS4_OP,	/* count of individual nfsv4 operations */
+	NFSD_STATS_LAST_NFS4_OP = NFSD_STATS_FIRST_NFS4_OP + LAST_NFS4_OP,
+#define NFSD_STATS_NFS4_OP(op)	(NFSD_STATS_FIRST_NFS4_OP + (op))
+	NFSD_STATS_WDELEG_GETATTR,	/* count of getattr conflict with wdeleg */
+#endif
+	NFSD_STATS_COUNTERS_NUM
 };
 
 /*
@@ -164,7 +177,7 @@ struct nfsd_net {
 	atomic_t                 num_drc_entries;
 
 	/* Per-netns stats counters */
-	struct percpu_counter    counter[NFSD_NET_COUNTERS_NUM];
+	struct percpu_counter    counter[NFSD_STATS_COUNTERS_NUM];
 
 	/* longest hash chain seen */
 	unsigned int             longest_chain;
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 14712fa08f76..648ff427005e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2490,10 +2490,10 @@ nfsd4_proc_null(struct svc_rqst *rqstp)
 	return rpc_success;
 }
 
-static inline void nfsd4_increment_op_stats(u32 opnum)
+static inline void nfsd4_increment_op_stats(struct nfsd_net *nn, u32 opnum)
 {
 	if (opnum >= FIRST_NFS4_OP && opnum <= LAST_NFS4_OP)
-		percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_NFS4_OP(opnum)]);
+		percpu_counter_inc(&nn->counter[NFSD_STATS_NFS4_OP(opnum)]);
 }
 
 static const struct nfsd4_operation nfsd4_ops[];
@@ -2768,7 +2768,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
 					   status, nfsd4_op_name(op->opnum));
 
 		nfsd4_cstate_clear_replay(cstate);
-		nfsd4_increment_op_stats(op->opnum);
+		nfsd4_increment_op_stats(nn, op->opnum);
 	}
 
 	fh_put(current_fh);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fbf8b426950e..83c260ab485d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -8457,6 +8457,7 @@ __be32
 nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode)
 {
 	__be32 status;
+	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
 	struct file_lock_context *ctx;
 	struct file_lock *fl;
 	struct nfs4_delegation *dp;
@@ -8486,7 +8487,7 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode)
 			}
 break_lease:
 			spin_unlock(&ctx->flc_lock);
-			nfsd_stats_wdeleg_getattr_inc();
+			nfsd_stats_wdeleg_getattr_inc(nn);
 			status = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
 			if (status != nfserr_jukebox ||
 					!nfsd_wait_for_delegreturn(rqstp, inode))
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 3d4a9d181c43..cfcc6ac8f255 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -176,27 +176,6 @@ void nfsd_drc_slab_free(void)
 	kmem_cache_destroy(drc_slab);
 }
 
-/**
- * nfsd_net_reply_cache_init - per net namespace reply cache set-up
- * @nn: nfsd_net being initialized
- *
- * Returns zero on succes; otherwise a negative errno is returned.
- */
-int nfsd_net_reply_cache_init(struct nfsd_net *nn)
-{
-	return nfsd_percpu_counters_init(nn->counter, NFSD_NET_COUNTERS_NUM);
-}
-
-/**
- * nfsd_net_reply_cache_destroy - per net namespace reply cache tear-down
- * @nn: nfsd_net being freed
- *
- */
-void nfsd_net_reply_cache_destroy(struct nfsd_net *nn)
-{
-	nfsd_percpu_counters_destroy(nn->counter, NFSD_NET_COUNTERS_NUM);
-}
-
 int nfsd_reply_cache_init(struct nfsd_net *nn)
 {
 	unsigned int hashsize;
@@ -501,7 +480,7 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct nfsd_cacherep *key,
 int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
 		      unsigned int len, struct nfsd_cacherep **cacherep)
 {
-	struct nfsd_net		*nn;
+	struct nfsd_net		*nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
 	struct nfsd_cacherep	*rp, *found;
 	__wsum			csum;
 	struct nfsd_drc_bucket	*b;
@@ -510,7 +489,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
 	int rtn = RC_DOIT;
 
 	if (type == RC_NOCACHE) {
-		nfsd_stats_rc_nocache_inc();
+		nfsd_stats_rc_nocache_inc(nn);
 		goto out;
 	}
 
@@ -520,7 +499,6 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
 	 * Since the common case is a cache miss followed by an insert,
 	 * preallocate an entry.
 	 */
-	nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
 	rp = nfsd_cacherep_alloc(rqstp, csum, nn);
 	if (!rp)
 		goto out;
@@ -537,7 +515,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
 
 	nfsd_cacherep_dispose(&dispose);
 
-	nfsd_stats_rc_misses_inc();
+	nfsd_stats_rc_misses_inc(nn);
 	atomic_inc(&nn->num_drc_entries);
 	nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
 	goto out;
@@ -545,7 +523,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
 found_entry:
 	/* We found a matching entry which is either in progress or done. */
 	nfsd_reply_cache_free_locked(NULL, rp, nn);
-	nfsd_stats_rc_hits_inc();
+	nfsd_stats_rc_hits_inc(nn);
 	rtn = RC_DROPIT;
 	rp = found;
 
@@ -689,11 +667,11 @@ int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "mem usage:             %lld\n",
 		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_DRC_MEM_USAGE]));
 	seq_printf(m, "cache hits:            %lld\n",
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]));
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_HITS]));
 	seq_printf(m, "cache misses:          %lld\n",
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES]));
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_MISSES]));
 	seq_printf(m, "not cached:            %lld\n",
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]));
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_NOCACHE]));
 	seq_printf(m, "payload misses:        %lld\n",
 		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_PAYLOAD_MISSES]));
 	seq_printf(m, "longest chain len:     %u\n", nn->longest_chain);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index b57480b50e35..ea3c8114245c 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1671,7 +1671,7 @@ static __net_init int nfsd_net_init(struct net *net)
 	retval = nfsd_idmap_init(net);
 	if (retval)
 		goto out_idmap_error;
-	retval = nfsd_net_reply_cache_init(nn);
+	retval = nfsd_stat_counters_init(nn);
 	if (retval)
 		goto out_repcache_error;
 	nn->nfsd_versions = NULL;
@@ -1701,7 +1701,7 @@ static __net_exit void nfsd_net_exit(struct net *net)
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
 	nfsd_proc_stat_shutdown(net);
-	nfsd_net_reply_cache_destroy(nn);
+	nfsd_stat_counters_destroy(nn);
 	nfsd_idmap_shutdown(net);
 	nfsd_export_shutdown(net);
 	nfsd_netns_free_versions(nn);
@@ -1724,12 +1724,9 @@ static int __init init_nfsd(void)
 	retval = nfsd4_init_pnfs();
 	if (retval)
 		goto out_free_slabs;
-	retval = nfsd_stat_counters_init();	/* Statistics */
-	if (retval)
-		goto out_free_pnfs;
 	retval = nfsd_drc_slab_create();
 	if (retval)
-		goto out_free_stat;
+		goto out_free_pnfs;
 	nfsd_lockd_init();	/* lockd->nfsd callbacks */
 	retval = create_proc_exports_entry();
 	if (retval)
@@ -1763,8 +1760,6 @@ static int __init init_nfsd(void)
 out_free_lockd:
 	nfsd_lockd_shutdown();
 	nfsd_drc_slab_free();
-out_free_stat:
-	nfsd_stat_counters_destroy();
 out_free_pnfs:
 	nfsd4_exit_pnfs();
 out_free_slabs:
@@ -1782,7 +1777,6 @@ static void __exit exit_nfsd(void)
 	nfsd_drc_slab_free();
 	remove_proc_entry("fs/nfs/exports", NULL);
 	remove_proc_entry("fs/nfs", NULL);
-	nfsd_stat_counters_destroy();
 	nfsd_lockd_shutdown();
 	nfsd4_free_slabs();
 	nfsd4_exit_pnfs();
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index dbfa0ac13564..40fecf7b224f 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -327,6 +327,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 __be32
 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 {
+	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
 	struct svc_export *exp = NULL;
 	struct dentry	*dentry;
 	__be32		error;
@@ -395,7 +396,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 out:
 	trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error);
 	if (error == nfserr_stale)
-		nfsd_stats_fh_stale_inc(exp);
+		nfsd_stats_fh_stale_inc(nn, exp);
 	return error;
 }
 
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c
index 394a65a33942..44e275324b06 100644
--- a/fs/nfsd/stats.c
+++ b/fs/nfsd/stats.c
@@ -34,15 +34,17 @@ struct svc_stat		nfsd_svcstats = {
 
 static int nfsd_show(struct seq_file *seq, void *v)
 {
+	struct net *net = pde_data(file_inode(seq->file));
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	int i;
 
 	seq_printf(seq, "rc %lld %lld %lld\nfh %lld 0 0 0 0\nio %lld %lld\n",
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]),
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES]),
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]),
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_FH_STALE]),
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_READ]),
-		   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE]));
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_HITS]),
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_MISSES]),
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_RC_NOCACHE]),
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_FH_STALE]),
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_IO_READ]),
+		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_IO_WRITE]));
 
 	/* thread usage: */
 	seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt));
@@ -63,10 +65,10 @@ static int nfsd_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "proc4ops %u", LAST_NFS4_OP + 1);
 	for (i = 0; i <= LAST_NFS4_OP; i++) {
 		seq_printf(seq, " %lld",
-			   percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_NFS4_OP(i)]));
+			   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_NFS4_OP(i)]));
 	}
 	seq_printf(seq, "\nwdeleg_getattr %lld",
-		percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_WDELEG_GETATTR]));
+		percpu_counter_sum_positive(&nn->counter[NFSD_STATS_WDELEG_GETATTR]));
 
 	seq_putc(seq, '\n');
 #endif
@@ -108,14 +110,14 @@ void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num)
 		percpu_counter_destroy(&counters[i]);
 }
 
-int nfsd_stat_counters_init(void)
+int nfsd_stat_counters_init(struct nfsd_net *nn)
 {
-	return nfsd_percpu_counters_init(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM);
+	return nfsd_percpu_counters_init(nn->counter, NFSD_STATS_COUNTERS_NUM);
 }
 
-void nfsd_stat_counters_destroy(void)
+void nfsd_stat_counters_destroy(struct nfsd_net *nn)
 {
-	nfsd_percpu_counters_destroy(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM);
+	nfsd_percpu_counters_destroy(nn->counter, NFSD_STATS_COUNTERS_NUM);
 }
 
 void nfsd_proc_stat_init(struct net *net)
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h
index 38811aa7d13e..c24be4ddbe7d 100644
--- a/fs/nfsd/stats.h
+++ b/fs/nfsd/stats.h
@@ -10,26 +10,7 @@
 #include <uapi/linux/nfsd/stats.h>
 #include <linux/percpu_counter.h>
 
-
-enum {
-	NFSD_STATS_RC_HITS,		/* repcache hits */
-	NFSD_STATS_RC_MISSES,		/* repcache misses */
-	NFSD_STATS_RC_NOCACHE,		/* uncached reqs */
-	NFSD_STATS_FH_STALE,		/* FH stale error */
-	NFSD_STATS_IO_READ,		/* bytes returned to read requests */
-	NFSD_STATS_IO_WRITE,		/* bytes passed in write requests */
-#ifdef CONFIG_NFSD_V4
-	NFSD_STATS_FIRST_NFS4_OP,	/* count of individual nfsv4 operations */
-	NFSD_STATS_LAST_NFS4_OP = NFSD_STATS_FIRST_NFS4_OP + LAST_NFS4_OP,
-#define NFSD_STATS_NFS4_OP(op)	(NFSD_STATS_FIRST_NFS4_OP + (op))
-	NFSD_STATS_WDELEG_GETATTR,	/* count of getattr conflict with wdeleg */
-#endif
-	NFSD_STATS_COUNTERS_NUM
-};
-
 struct nfsd_stats {
-	struct percpu_counter	counter[NFSD_STATS_COUNTERS_NUM];
-
 	atomic_t	th_cnt;		/* number of available threads */
 };
 
@@ -40,43 +21,46 @@ extern struct svc_stat		nfsd_svcstats;
 int nfsd_percpu_counters_init(struct percpu_counter *counters, int num);
 void nfsd_percpu_counters_reset(struct percpu_counter *counters, int num);
 void nfsd_percpu_counters_destroy(struct percpu_counter *counters, int num);
-int nfsd_stat_counters_init(void);
-void nfsd_stat_counters_destroy(void);
+int nfsd_stat_counters_init(struct nfsd_net *nn);
+void nfsd_stat_counters_destroy(struct nfsd_net *nn);
 void nfsd_proc_stat_init(struct net *net);
 void nfsd_proc_stat_shutdown(struct net *net);
 
-static inline void nfsd_stats_rc_hits_inc(void)
+static inline void nfsd_stats_rc_hits_inc(struct nfsd_net *nn)
 {
-	percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_HITS]);
+	percpu_counter_inc(&nn->counter[NFSD_STATS_RC_HITS]);
 }
 
-static inline void nfsd_stats_rc_misses_inc(void)
+static inline void nfsd_stats_rc_misses_inc(struct nfsd_net *nn)
 {
-	percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_MISSES]);
+	percpu_counter_inc(&nn->counter[NFSD_STATS_RC_MISSES]);
 }
 
-static inline void nfsd_stats_rc_nocache_inc(void)
+static inline void nfsd_stats_rc_nocache_inc(struct nfsd_net *nn)
 {
-	percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]);
+	percpu_counter_inc(&nn->counter[NFSD_STATS_RC_NOCACHE]);
 }
 
-static inline void nfsd_stats_fh_stale_inc(struct svc_export *exp)
+static inline void nfsd_stats_fh_stale_inc(struct nfsd_net *nn,
+					   struct svc_export *exp)
 {
-	percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]);
+	percpu_counter_inc(&nn->counter[NFSD_STATS_FH_STALE]);
 	if (exp && exp->ex_stats)
 		percpu_counter_inc(&exp->ex_stats->counter[EXP_STATS_FH_STALE]);
 }
 
-static inline void nfsd_stats_io_read_add(struct svc_export *exp, s64 amount)
+static inline void nfsd_stats_io_read_add(struct nfsd_net *nn,
+					  struct svc_export *exp, s64 amount)
 {
-	percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount);
+	percpu_counter_add(&nn->counter[NFSD_STATS_IO_READ], amount);
 	if (exp && exp->ex_stats)
 		percpu_counter_add(&exp->ex_stats->counter[EXP_STATS_IO_READ], amount);
 }
 
-static inline void nfsd_stats_io_write_add(struct svc_export *exp, s64 amount)
+static inline void nfsd_stats_io_write_add(struct nfsd_net *nn,
+					   struct svc_export *exp, s64 amount)
 {
-	percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount);
+	percpu_counter_add(&nn->counter[NFSD_STATS_IO_WRITE], amount);
 	if (exp && exp->ex_stats)
 		percpu_counter_add(&exp->ex_stats->counter[EXP_STATS_IO_WRITE], amount);
 }
@@ -97,9 +81,9 @@ static inline void nfsd_stats_drc_mem_usage_sub(struct nfsd_net *nn, s64 amount)
 }
 
 #ifdef CONFIG_NFSD_V4
-static inline void nfsd_stats_wdeleg_getattr_inc(void)
+static inline void nfsd_stats_wdeleg_getattr_inc(struct nfsd_net *nn)
 {
-	percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_WDELEG_GETATTR]);
+	percpu_counter_inc(&nn->counter[NFSD_STATS_WDELEG_GETATTR]);
 }
 #endif
 #endif /* _NFSD_STATS_H */
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index f57749cd6f0b..33af4bc7f423 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1002,7 +1002,8 @@ static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			       unsigned long *count, u32 *eof, ssize_t host_err)
 {
 	if (host_err >= 0) {
-		nfsd_stats_io_read_add(fhp->fh_export, host_err);
+		struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+		nfsd_stats_io_read_add(nn, fhp->fh_export, host_err);
 		*eof = nfsd_eof_on_read(file, offset, host_err, *count);
 		*count = host_err;
 		fsnotify_access(file);
@@ -1185,7 +1186,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
 		goto out_nfserr;
 	}
 	*cnt = host_err;
-	nfsd_stats_io_write_add(exp, *cnt);
+	nfsd_stats_io_write_add(nn, exp, *cnt);
 	fsnotify_modify(file);
 	host_err = filemap_check_wb_err(file->f_mapping, since);
 	if (host_err < 0)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (8 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 09/13] nfsd: make all of the nfsd stats per-network namespace Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 21:01   ` Chuck Lever
  2024-01-25 19:53 ` [PATCH v2 11/13] nfsd: make svc_stat per-network namespace instead of global Josef Bacik
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

This is the last global stat, move it into nfsd_net and adjust all the
users to use that variant instead of the global one.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfsd/netns.h  | 3 +++
 fs/nfsd/nfssvc.c | 4 ++--
 fs/nfsd/stats.c  | 4 ++--
 fs/nfsd/stats.h  | 6 ------
 4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 0cef4bb407a9..8d3f4cb7cab4 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -179,6 +179,9 @@ struct nfsd_net {
 	/* Per-netns stats counters */
 	struct percpu_counter    counter[NFSD_STATS_COUNTERS_NUM];
 
+	/* number of available threads */
+	atomic_t                 th_cnt;
+
 	/* longest hash chain seen */
 	unsigned int             longest_chain;
 
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d98a6abad990..0961b95dcef6 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -924,7 +924,7 @@ nfsd(void *vrqstp)
 
 	current->fs->umask = 0;
 
-	atomic_inc(&nfsdstats.th_cnt);
+	atomic_inc(&nn->th_cnt);
 
 	set_freezable();
 
@@ -940,7 +940,7 @@ nfsd(void *vrqstp)
 		nfsd_file_net_dispose(nn);
 	}
 
-	atomic_dec(&nfsdstats.th_cnt);
+	atomic_dec(&nn->th_cnt);
 
 out:
 	/* Release the thread */
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c
index 44e275324b06..360e6dbf4e5c 100644
--- a/fs/nfsd/stats.c
+++ b/fs/nfsd/stats.c
@@ -27,7 +27,6 @@
 
 #include "nfsd.h"
 
-struct nfsd_stats	nfsdstats;
 struct svc_stat		nfsd_svcstats = {
 	.program	= &nfsd_program,
 };
@@ -47,7 +46,7 @@ static int nfsd_show(struct seq_file *seq, void *v)
 		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_IO_WRITE]));
 
 	/* thread usage: */
-	seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt));
+	seq_printf(seq, "th %u 0", atomic_read(&nn->th_cnt));
 
 	/* deprecated thread usage histogram stats */
 	for (i = 0; i < 10; i++)
@@ -112,6 +111,7 @@ void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num)
 
 int nfsd_stat_counters_init(struct nfsd_net *nn)
 {
+	atomic_set(&nn->th_cnt, 0);
 	return nfsd_percpu_counters_init(nn->counter, NFSD_STATS_COUNTERS_NUM);
 }
 
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h
index c24be4ddbe7d..5675d283a537 100644
--- a/fs/nfsd/stats.h
+++ b/fs/nfsd/stats.h
@@ -10,12 +10,6 @@
 #include <uapi/linux/nfsd/stats.h>
 #include <linux/percpu_counter.h>
 
-struct nfsd_stats {
-	atomic_t	th_cnt;		/* number of available threads */
-};
-
-extern struct nfsd_stats	nfsdstats;
-
 extern struct svc_stat		nfsd_svcstats;
 
 int nfsd_percpu_counters_init(struct percpu_counter *counters, int num);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 11/13] nfsd: make svc_stat per-network namespace instead of global
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (9 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 12/13] nfs: expose /proc/net/sunrpc/nfs in net namespaces Josef Bacik
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

The final bit of stats that is global is the rpc svc_stat.  Move this
into the nfsd_net struct and use that everywhere instead of the global
struct.  Remove the unused global struct.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfsd/netns.h  |  4 ++++
 fs/nfsd/nfsctl.c |  2 ++
 fs/nfsd/nfssvc.c |  2 +-
 fs/nfsd/stats.c  | 10 ++++------
 fs/nfsd/stats.h  |  2 --
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 8d3f4cb7cab4..8fa23898669d 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -14,6 +14,7 @@
 #include <linux/nfs4.h>
 #include <linux/percpu_counter.h>
 #include <linux/siphash.h>
+#include <linux/sunrpc/stats.h>
 
 /* Hash tables for nfs4_clientid state */
 #define CLIENT_HASH_BITS                 4
@@ -182,6 +183,9 @@ struct nfsd_net {
 	/* number of available threads */
 	atomic_t                 th_cnt;
 
+	/* sunrpc svc stats */
+	struct svc_stat          nfsd_svcstats;
+
 	/* longest hash chain seen */
 	unsigned int             longest_chain;
 
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index ea3c8114245c..5a5547bd6ecf 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1674,6 +1674,8 @@ static __net_init int nfsd_net_init(struct net *net)
 	retval = nfsd_stat_counters_init(nn);
 	if (retval)
 		goto out_repcache_error;
+	memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
+	nn->nfsd_svcstats.program = &nfsd_program;
 	nn->nfsd_versions = NULL;
 	nn->nfsd4_minorversions = NULL;
 	nfsd4_init_leases_net(nn);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 0961b95dcef6..ee39d4702811 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -660,7 +660,7 @@ int nfsd_create_serv(struct net *net)
 	if (nfsd_max_blksize == 0)
 		nfsd_max_blksize = nfsd_get_default_max_blksize();
 	nfsd_reset_versions(nn);
-	serv = svc_create_pooled(&nfsd_program, &nfsd_svcstats,
+	serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats,
 				 nfsd_max_blksize, nfsd);
 	if (serv == NULL)
 		return -ENOMEM;
diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c
index 360e6dbf4e5c..06f52dcda528 100644
--- a/fs/nfsd/stats.c
+++ b/fs/nfsd/stats.c
@@ -27,10 +27,6 @@
 
 #include "nfsd.h"
 
-struct svc_stat		nfsd_svcstats = {
-	.program	= &nfsd_program,
-};
-
 static int nfsd_show(struct seq_file *seq, void *v)
 {
 	struct net *net = pde_data(file_inode(seq->file));
@@ -56,7 +52,7 @@ static int nfsd_show(struct seq_file *seq, void *v)
 	seq_puts(seq, "\nra 0 0 0 0 0 0 0 0 0 0 0 0\n");
 
 	/* show my rpc info */
-	svc_seq_show(seq, &nfsd_svcstats);
+	svc_seq_show(seq, &nn->nfsd_svcstats);
 
 #ifdef CONFIG_NFSD_V4
 	/* Show count for individual nfsv4 operations */
@@ -122,7 +118,9 @@ void nfsd_stat_counters_destroy(struct nfsd_net *nn)
 
 void nfsd_proc_stat_init(struct net *net)
 {
-	svc_proc_register(net, &nfsd_svcstats, &nfsd_proc_ops);
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
+	svc_proc_register(net, &nn->nfsd_svcstats, &nfsd_proc_ops);
 }
 
 void nfsd_proc_stat_shutdown(struct net *net)
diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h
index 5675d283a537..d2753e975dfd 100644
--- a/fs/nfsd/stats.h
+++ b/fs/nfsd/stats.h
@@ -10,8 +10,6 @@
 #include <uapi/linux/nfsd/stats.h>
 #include <linux/percpu_counter.h>
 
-extern struct svc_stat		nfsd_svcstats;
-
 int nfsd_percpu_counters_init(struct percpu_counter *counters, int num);
 void nfsd_percpu_counters_reset(struct percpu_counter *counters, int num);
 void nfsd_percpu_counters_destroy(struct percpu_counter *counters, int num);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 12/13] nfs: expose /proc/net/sunrpc/nfs in net namespaces
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (10 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 11/13] nfsd: make svc_stat per-network namespace instead of global Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-25 19:53 ` [PATCH v2 13/13] nfs: make the rpc_stat per net namespace Josef Bacik
  2024-01-26 13:12 ` [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Jeff Layton
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

We're using nfs mounts inside of containers in production and noticed
that the nfs stats are not exposed in /proc.  This is a problem for us
as we use these stats for monitoring, and have to do this awkward bind
mount from the main host into the container in order to get to these
states.

Add the rpc_proc_register call to the pernet operations entry and exit
points so these stats can be exposed inside of network namespaces.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfs/inode.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index ebb8d60e1152..e11e9c34aa56 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2427,11 +2427,13 @@ EXPORT_SYMBOL_GPL(nfs_net_id);
 static int nfs_net_init(struct net *net)
 {
 	nfs_clients_init(net);
+	rpc_proc_register(net, &nfs_rpcstat);
 	return nfs_fs_proc_net_init(net);
 }
 
 static void nfs_net_exit(struct net *net)
 {
+	rpc_proc_unregister(net, "nfs");
 	nfs_fs_proc_net_exit(net);
 	nfs_clients_exit(net);
 }
@@ -2486,15 +2488,12 @@ static int __init init_nfs_fs(void)
 	if (err)
 		goto out1;
 
-	rpc_proc_register(&init_net, &nfs_rpcstat);
-
 	err = register_nfs_fs();
 	if (err)
 		goto out0;
 
 	return 0;
 out0:
-	rpc_proc_unregister(&init_net, "nfs");
 	nfs_destroy_directcache();
 out1:
 	nfs_destroy_writepagecache();
@@ -2524,7 +2523,6 @@ static void __exit exit_nfs_fs(void)
 	nfs_destroy_inodecache();
 	nfs_destroy_nfspagecache();
 	unregister_pernet_subsys(&nfs_net_ops);
-	rpc_proc_unregister(&init_net, "nfs");
 	unregister_nfs_fs();
 	nfs_fs_proc_exit();
 	nfsiod_stop();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 13/13] nfs: make the rpc_stat per net namespace
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (11 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 12/13] nfs: expose /proc/net/sunrpc/nfs in net namespaces Josef Bacik
@ 2024-01-25 19:53 ` Josef Bacik
  2024-01-26 13:12 ` [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Jeff Layton
  13 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 19:53 UTC (permalink / raw)
  To: linux-nfs, kernel-team

Now that we're exposing the rpc stats on a per-network namespace basis,
move this struct into struct nfs_net and use that to make sure only the
per-network namespace stats are exposed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/nfs/client.c   | 5 ++++-
 fs/nfs/inode.c    | 4 +++-
 fs/nfs/internal.h | 2 --
 fs/nfs/netns.h    | 2 ++
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 590be14f182f..4d9249c99989 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -73,7 +73,6 @@ const struct rpc_program nfs_program = {
 	.number			= NFS_PROGRAM,
 	.nrvers			= ARRAY_SIZE(nfs_version),
 	.version		= nfs_version,
-	.stats                  = &nfs_rpcstat,
 	.pipe_dir_name		= NFS_PIPE_DIRNAME,
 };
 
@@ -502,6 +501,7 @@ int nfs_create_rpc_client(struct nfs_client *clp,
 			  const struct nfs_client_initdata *cl_init,
 			  rpc_authflavor_t flavor)
 {
+	struct nfs_net		*nn = net_generic(clp->cl_net, nfs_net_id);
 	struct rpc_clnt		*clnt = NULL;
 	struct rpc_create_args args = {
 		.net		= clp->cl_net,
@@ -513,6 +513,7 @@ int nfs_create_rpc_client(struct nfs_client *clp,
 		.servername	= clp->cl_hostname,
 		.nodename	= cl_init->nodename,
 		.program	= &nfs_program,
+		.stats		= &nn->rpcstats,
 		.version	= clp->rpc_ops->version,
 		.authflavor	= flavor,
 		.cred		= cl_init->cred,
@@ -1175,6 +1176,8 @@ void nfs_clients_init(struct net *net)
 #endif
 	spin_lock_init(&nn->nfs_client_lock);
 	nn->boot_time = ktime_get_real();
+	memset(&nn->rpcstats, 0, sizeof(nn->rpcstats));
+	nn->rpcstats.program = &nfs_program;
 
 	nfs_netns_sysfs_setup(nn, net);
 }
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e11e9c34aa56..91b4d811958a 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2426,8 +2426,10 @@ EXPORT_SYMBOL_GPL(nfs_net_id);
 
 static int nfs_net_init(struct net *net)
 {
+	struct nfs_net *nn = net_generic(net, nfs_net_id);
+
 	nfs_clients_init(net);
-	rpc_proc_register(net, &nfs_rpcstat);
+	rpc_proc_register(net, &nn->rpcstats);
 	return nfs_fs_proc_net_init(net);
 }
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index e3722ce6722e..06253695fe53 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -449,8 +449,6 @@ int nfs_try_get_tree(struct fs_context *);
 int nfs_get_tree_common(struct fs_context *);
 void nfs_kill_super(struct super_block *);
 
-extern struct rpc_stat nfs_rpcstat;
-
 extern int __init register_nfs_fs(void);
 extern void __exit unregister_nfs_fs(void);
 extern bool nfs_sb_active(struct super_block *sb);
diff --git a/fs/nfs/netns.h b/fs/nfs/netns.h
index c8374f74dce1..a68b21603ea9 100644
--- a/fs/nfs/netns.h
+++ b/fs/nfs/netns.h
@@ -9,6 +9,7 @@
 #include <linux/nfs4.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
+#include <linux/sunrpc/stats.h>
 
 struct bl_dev_msg {
 	int32_t status;
@@ -34,6 +35,7 @@ struct nfs_net {
 	struct nfs_netns_client *nfs_client;
 	spinlock_t nfs_client_lock;
 	ktime_t boot_time;
+	struct rpc_stat rpcstats;
 #ifdef CONFIG_PROC_FS
 	struct proc_dir_entry *proc_nfsfs;
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args
  2024-01-25 19:53 ` [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args Josef Bacik
@ 2024-01-25 20:53   ` Chuck Lever
  2024-01-25 21:54     ` Josef Bacik
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2024-01-25 20:53 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-nfs, kernel-team

On Thu, Jan 25, 2024 at 02:53:15PM -0500, Josef Bacik wrote:
> We want to be able to have our rpc stats handled in a per network
> namespace manner, so add an option to rpc_create_args to specify a
> different rpc_stats struct instead of using the one on the rpc_program.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/nfs/client.c             | 2 +-
>  include/linux/sunrpc/clnt.h | 1 +
>  net/sunrpc/clnt.c           | 2 +-
>  3 files changed, 3 insertions(+), 2 deletions(-)

I know it isn't obvious to an outside observer, but the
maintainership of the NFS server and client are separate.

NFS client patches go To: Trond and Anna, Cc: linux-nfs

NFS server patches go To: Jeff and Chuck, Cc: linux-nfs

and you can Cc: server patches on the reviewers listed in
MAINTAINERS too if you like.


> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 44eca51b2808..590be14f182f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -73,7 +73,7 @@ const struct rpc_program nfs_program = {
>  	.number			= NFS_PROGRAM,
>  	.nrvers			= ARRAY_SIZE(nfs_version),
>  	.version		= nfs_version,
> -	.stats			= &nfs_rpcstat,
> +	.stats                  = &nfs_rpcstat,
>  	.pipe_dir_name		= NFS_PIPE_DIRNAME,
>  };
>  
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index 5e9d1469c6fa..5321585c778f 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -139,6 +139,7 @@ struct rpc_create_args {
>  	const char		*servername;
>  	const char		*nodename;
>  	const struct rpc_program *program;
> +	struct rpc_stat		*stats;
>  	u32			prognumber;	/* overrides program->number */
>  	u32			version;
>  	rpc_authflavor_t	authflavor;
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index cda0935a68c9..bc8c209fc0c7 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -405,7 +405,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
>  	clnt->cl_maxproc  = version->nrprocs;
>  	clnt->cl_prog     = args->prognumber ? : program->number;
>  	clnt->cl_vers     = version->number;
> -	clnt->cl_stats    = program->stats;
> +	clnt->cl_stats    = args->stats ? : program->stats;
>  	clnt->cl_metrics  = rpc_alloc_iostats(clnt);
>  	rpc_init_pipe_dir_head(&clnt->cl_pipedir_objects);
>  	err = -ENOMEM;
> -- 
> 2.43.0
> 
> 

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create*
  2024-01-25 19:53 ` [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create* Josef Bacik
@ 2024-01-25 20:56   ` Chuck Lever
  2024-01-25 21:56     ` Josef Bacik
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2024-01-25 20:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-nfs, kernel-team

On Thu, Jan 25, 2024 at 02:53:13PM -0500, Josef Bacik wrote:
> Since only one service actually reports the rpc stats there's not much
> of a reason to have a pointer to it in the svc_program struct.  Adjust
> the svc_create* functions to take the sv_stats as an argument and pass
> the struct through there as desired instead of getting it from the
> svc_program->pg_stats.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/lockd/svc.c             |  2 +-
>  fs/nfs/callback.c          |  2 +-
>  fs/nfsd/nfssvc.c           |  3 ++-
>  include/linux/sunrpc/svc.h |  8 ++++----
>  net/sunrpc/svc.c           | 17 ++++++++++-------
>  5 files changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index ab8042a5b895..8fbbfc9aad69 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -337,7 +337,7 @@ static int lockd_get(void)
>  		nlm_timeout = LOCKD_DFLT_TIMEO;
>  	nlmsvc_timeout = nlm_timeout * HZ;
>  
> -	serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, lockd);
> +	serv = svc_create(&nlmsvc_program, NULL, LOCKD_BUFSIZE, lockd);
>  	if (!serv) {
>  		printk(KERN_WARNING "lockd_up: create service failed\n");
>  		return -ENOMEM;
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index 8adfcd4c8c1a..4d56b4e73525 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -202,7 +202,7 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
>  	if (minorversion)
>  		return ERR_PTR(-ENOTSUPP);
>  #endif
> -	serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE,
> +	serv = svc_create(&nfs4_callback_program, NULL, NFS4_CALLBACK_BUFSIZE,
>  			  threadfn);
>  	if (!serv) {
>  		printk(KERN_ERR "nfs_callback_create_svc: create service failed\n");
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index a0b117107e86..d640f893021a 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -661,7 +661,8 @@ int nfsd_create_serv(struct net *net)
>  	if (nfsd_max_blksize == 0)
>  		nfsd_max_blksize = nfsd_get_default_max_blksize();
>  	nfsd_reset_versions(nn);
> -	serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd);
> +	serv = svc_create_pooled(&nfsd_program, &nfsd_svcstats,
> +				 nfsd_max_blksize, nfsd);
>  	if (serv == NULL)
>  		return -ENOMEM;
>  
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 67cf1c9efd80..2a1447fa5ef2 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -402,8 +402,8 @@ struct svc_procedure {
>  int svc_rpcb_setup(struct svc_serv *serv, struct net *net);
>  void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net);
>  int svc_bind(struct svc_serv *serv, struct net *net);
> -struct svc_serv *svc_create(struct svc_program *, unsigned int,
> -			    int (*threadfn)(void *data));
> +struct svc_serv *svc_create(struct svc_program *, struct svc_stat *,
> +			    unsigned int, int (*threadfn)(void *data));
>  struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
>  					struct svc_pool *pool, int node);
>  bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
> @@ -411,8 +411,8 @@ bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
>  void		   svc_rqst_release_pages(struct svc_rqst *rqstp);
>  void		   svc_rqst_free(struct svc_rqst *);
>  void		   svc_exit_thread(struct svc_rqst *);
> -struct svc_serv *  svc_create_pooled(struct svc_program *, unsigned int,
> -				     int (*threadfn)(void *data));
> +struct svc_serv *  svc_create_pooled(struct svc_program *, struct svc_stat *,
> +				     unsigned int, int (*threadfn)(void *data));
>  int		   svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
>  int		   svc_pool_stats_open(struct svc_info *si, struct file *file);
>  void		   svc_process(struct svc_rqst *rqstp);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index d2e6f3d59218..f76ef8a3dd43 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -451,8 +451,8 @@ __svc_init_bc(struct svc_serv *serv)
>   * Create an RPC service
>   */
>  static struct svc_serv *
> -__svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
> -	     int (*threadfn)(void *data))
> +__svc_create(struct svc_program *prog, struct svc_stat *stats,
> +	     unsigned int bufsize, int npools, int (*threadfn)(void *data))
>  {
>  	struct svc_serv	*serv;
>  	unsigned int vers;
> @@ -463,7 +463,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
>  		return NULL;
>  	serv->sv_name      = prog->pg_name;
>  	serv->sv_program   = prog;
> -	serv->sv_stats     = prog->pg_stats;
> +	serv->sv_stats     = stats;
>  	if (bufsize > RPCSVC_MAXPAYLOAD)
>  		bufsize = RPCSVC_MAXPAYLOAD;
>  	serv->sv_max_payload = bufsize? bufsize : 4096;
> @@ -521,34 +521,37 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
>  /**
>   * svc_create - Create an RPC service
>   * @prog: the RPC program the new service will handle
> + * @stats: the stats struct if desired
>   * @bufsize: maximum message size for @prog
>   * @threadfn: a function to service RPC requests for @prog
>   *
>   * Returns an instantiated struct svc_serv object or NULL.
>   */

Here's the only minor quibble I have so far:

svc_create's callers don't use stats, so maybe you don't need
to add an @stats argument for this API.

Fwiw, I haven't gotten all the way to the end of the series yet.


> -struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
> -			    int (*threadfn)(void *data))
> +struct svc_serv *svc_create(struct svc_program *prog, struct svc_stat *stats,
> +			    unsigned int bufsize, int (*threadfn)(void *data))
>  {
> -	return __svc_create(prog, bufsize, 1, threadfn);
> +	return __svc_create(prog, stats, bufsize, 1, threadfn);
>  }
>  EXPORT_SYMBOL_GPL(svc_create);
>  
>  /**
>   * svc_create_pooled - Create an RPC service with pooled threads
>   * @prog: the RPC program the new service will handle
> + * @stats: the stats struct if desired
>   * @bufsize: maximum message size for @prog
>   * @threadfn: a function to service RPC requests for @prog
>   *
>   * Returns an instantiated struct svc_serv object or NULL.
>   */
>  struct svc_serv *svc_create_pooled(struct svc_program *prog,
> +				   struct svc_stat *stats,
>  				   unsigned int bufsize,
>  				   int (*threadfn)(void *data))
>  {
>  	struct svc_serv *serv;
>  	unsigned int npools = svc_pool_map_get();
>  
> -	serv = __svc_create(prog, bufsize, npools, threadfn);
> +	serv = __svc_create(prog, stats, bufsize, npools, threadfn);
>  	if (!serv)
>  		goto out_err;
>  	return serv;
> -- 
> 2.43.0
> 
> 

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-25 19:53 ` [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net Josef Bacik
@ 2024-01-25 21:01   ` Chuck Lever
  2024-01-25 21:56     ` Josef Bacik
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever @ 2024-01-25 21:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-nfs, kernel-team

On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> This is the last global stat, move it into nfsd_net and adjust all the
> users to use that variant instead of the global one.

Hm. I thought nfsd threads were a global resource -- they service
all network namespaces. So, shouldn't the same thread count be
surfaced to all containers? Won't they all see all of the nfsd
processes?


> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/nfsd/netns.h  | 3 +++
>  fs/nfsd/nfssvc.c | 4 ++--
>  fs/nfsd/stats.c  | 4 ++--
>  fs/nfsd/stats.h  | 6 ------
>  4 files changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 0cef4bb407a9..8d3f4cb7cab4 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -179,6 +179,9 @@ struct nfsd_net {
>  	/* Per-netns stats counters */
>  	struct percpu_counter    counter[NFSD_STATS_COUNTERS_NUM];
>  
> +	/* number of available threads */
> +	atomic_t                 th_cnt;
> +
>  	/* longest hash chain seen */
>  	unsigned int             longest_chain;
>  
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index d98a6abad990..0961b95dcef6 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -924,7 +924,7 @@ nfsd(void *vrqstp)
>  
>  	current->fs->umask = 0;
>  
> -	atomic_inc(&nfsdstats.th_cnt);
> +	atomic_inc(&nn->th_cnt);
>  
>  	set_freezable();
>  
> @@ -940,7 +940,7 @@ nfsd(void *vrqstp)
>  		nfsd_file_net_dispose(nn);
>  	}
>  
> -	atomic_dec(&nfsdstats.th_cnt);
> +	atomic_dec(&nn->th_cnt);
>  
>  out:
>  	/* Release the thread */
> diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c
> index 44e275324b06..360e6dbf4e5c 100644
> --- a/fs/nfsd/stats.c
> +++ b/fs/nfsd/stats.c
> @@ -27,7 +27,6 @@
>  
>  #include "nfsd.h"
>  
> -struct nfsd_stats	nfsdstats;
>  struct svc_stat		nfsd_svcstats = {
>  	.program	= &nfsd_program,
>  };
> @@ -47,7 +46,7 @@ static int nfsd_show(struct seq_file *seq, void *v)
>  		   percpu_counter_sum_positive(&nn->counter[NFSD_STATS_IO_WRITE]));
>  
>  	/* thread usage: */
> -	seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt));
> +	seq_printf(seq, "th %u 0", atomic_read(&nn->th_cnt));
>  
>  	/* deprecated thread usage histogram stats */
>  	for (i = 0; i < 10; i++)
> @@ -112,6 +111,7 @@ void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num)
>  
>  int nfsd_stat_counters_init(struct nfsd_net *nn)
>  {
> +	atomic_set(&nn->th_cnt, 0);
>  	return nfsd_percpu_counters_init(nn->counter, NFSD_STATS_COUNTERS_NUM);
>  }
>  
> diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h
> index c24be4ddbe7d..5675d283a537 100644
> --- a/fs/nfsd/stats.h
> +++ b/fs/nfsd/stats.h
> @@ -10,12 +10,6 @@
>  #include <uapi/linux/nfsd/stats.h>
>  #include <linux/percpu_counter.h>
>  
> -struct nfsd_stats {
> -	atomic_t	th_cnt;		/* number of available threads */
> -};
> -
> -extern struct nfsd_stats	nfsdstats;
> -
>  extern struct svc_stat		nfsd_svcstats;
>  
>  int nfsd_percpu_counters_init(struct percpu_counter *counters, int num);
> -- 
> 2.43.0
> 
> 

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args
  2024-01-25 20:53   ` Chuck Lever
@ 2024-01-25 21:54     ` Josef Bacik
  2024-01-25 22:30       ` Jeff Layton
  2024-01-26 13:49       ` Chuck Lever III
  0 siblings, 2 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 21:54 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, kernel-team

On Thu, Jan 25, 2024 at 03:53:24PM -0500, Chuck Lever wrote:
> On Thu, Jan 25, 2024 at 02:53:15PM -0500, Josef Bacik wrote:
> > We want to be able to have our rpc stats handled in a per network
> > namespace manner, so add an option to rpc_create_args to specify a
> > different rpc_stats struct instead of using the one on the rpc_program.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/nfs/client.c             | 2 +-
> >  include/linux/sunrpc/clnt.h | 1 +
> >  net/sunrpc/clnt.c           | 2 +-
> >  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> I know it isn't obvious to an outside observer, but the
> maintainership of the NFS server and client are separate.
> 
> NFS client patches go To: Trond and Anna, Cc: linux-nfs
> 
> NFS server patches go To: Jeff and Chuck, Cc: linux-nfs
> 
> and you can Cc: server patches on the reviewers listed in
> MAINTAINERS too if you like.

Sounds good, should I split the series up completely?  I think the nfsd and nfs
sunrpc related changes are unique to either stack so I can split it out into two
different series if that would help.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-25 21:01   ` Chuck Lever
@ 2024-01-25 21:56     ` Josef Bacik
  2024-01-26 13:01       ` Jeff Layton
  0 siblings, 1 reply; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 21:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, kernel-team

On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
> On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> > This is the last global stat, move it into nfsd_net and adjust all the
> > users to use that variant instead of the global one.
> 
> Hm. I thought nfsd threads were a global resource -- they service
> all network namespaces. So, shouldn't the same thread count be
> surfaced to all containers? Won't they all see all of the nfsd
> processes?
> 

I don't think we want the network namespaces seeing how many threads exist in
the entire system right?

Additionally it appears that we can have multiple threads per network namespace,
so it's not like this will just show 1 for each individual nn, it'll show
however many threads have been configured for that nfsd in that network
namespace.

I'm good either way, but it makes sense to me to only surface the network
namespace related thread count.  I could probably have a global counter and only
surface the global counter if net == &init_net.  Let me know what you prefer.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create*
  2024-01-25 20:56   ` Chuck Lever
@ 2024-01-25 21:56     ` Josef Bacik
  0 siblings, 0 replies; 30+ messages in thread
From: Josef Bacik @ 2024-01-25 21:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, kernel-team

On Thu, Jan 25, 2024 at 03:56:00PM -0500, Chuck Lever wrote:
> On Thu, Jan 25, 2024 at 02:53:13PM -0500, Josef Bacik wrote:
> > Since only one service actually reports the rpc stats there's not much
> > of a reason to have a pointer to it in the svc_program struct.  Adjust
> > the svc_create* functions to take the sv_stats as an argument and pass
> > the struct through there as desired instead of getting it from the
> > svc_program->pg_stats.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/lockd/svc.c             |  2 +-
> >  fs/nfs/callback.c          |  2 +-
> >  fs/nfsd/nfssvc.c           |  3 ++-
> >  include/linux/sunrpc/svc.h |  8 ++++----
> >  net/sunrpc/svc.c           | 17 ++++++++++-------
> >  5 files changed, 18 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index ab8042a5b895..8fbbfc9aad69 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -337,7 +337,7 @@ static int lockd_get(void)
> >  		nlm_timeout = LOCKD_DFLT_TIMEO;
> >  	nlmsvc_timeout = nlm_timeout * HZ;
> >  
> > -	serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, lockd);
> > +	serv = svc_create(&nlmsvc_program, NULL, LOCKD_BUFSIZE, lockd);
> >  	if (!serv) {
> >  		printk(KERN_WARNING "lockd_up: create service failed\n");
> >  		return -ENOMEM;
> > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > index 8adfcd4c8c1a..4d56b4e73525 100644
> > --- a/fs/nfs/callback.c
> > +++ b/fs/nfs/callback.c
> > @@ -202,7 +202,7 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
> >  	if (minorversion)
> >  		return ERR_PTR(-ENOTSUPP);
> >  #endif
> > -	serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE,
> > +	serv = svc_create(&nfs4_callback_program, NULL, NFS4_CALLBACK_BUFSIZE,
> >  			  threadfn);
> >  	if (!serv) {
> >  		printk(KERN_ERR "nfs_callback_create_svc: create service failed\n");
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index a0b117107e86..d640f893021a 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -661,7 +661,8 @@ int nfsd_create_serv(struct net *net)
> >  	if (nfsd_max_blksize == 0)
> >  		nfsd_max_blksize = nfsd_get_default_max_blksize();
> >  	nfsd_reset_versions(nn);
> > -	serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd);
> > +	serv = svc_create_pooled(&nfsd_program, &nfsd_svcstats,
> > +				 nfsd_max_blksize, nfsd);
> >  	if (serv == NULL)
> >  		return -ENOMEM;
> >  
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index 67cf1c9efd80..2a1447fa5ef2 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -402,8 +402,8 @@ struct svc_procedure {
> >  int svc_rpcb_setup(struct svc_serv *serv, struct net *net);
> >  void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net);
> >  int svc_bind(struct svc_serv *serv, struct net *net);
> > -struct svc_serv *svc_create(struct svc_program *, unsigned int,
> > -			    int (*threadfn)(void *data));
> > +struct svc_serv *svc_create(struct svc_program *, struct svc_stat *,
> > +			    unsigned int, int (*threadfn)(void *data));
> >  struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv,
> >  					struct svc_pool *pool, int node);
> >  bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
> > @@ -411,8 +411,8 @@ bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
> >  void		   svc_rqst_release_pages(struct svc_rqst *rqstp);
> >  void		   svc_rqst_free(struct svc_rqst *);
> >  void		   svc_exit_thread(struct svc_rqst *);
> > -struct svc_serv *  svc_create_pooled(struct svc_program *, unsigned int,
> > -				     int (*threadfn)(void *data));
> > +struct svc_serv *  svc_create_pooled(struct svc_program *, struct svc_stat *,
> > +				     unsigned int, int (*threadfn)(void *data));
> >  int		   svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
> >  int		   svc_pool_stats_open(struct svc_info *si, struct file *file);
> >  void		   svc_process(struct svc_rqst *rqstp);
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index d2e6f3d59218..f76ef8a3dd43 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -451,8 +451,8 @@ __svc_init_bc(struct svc_serv *serv)
> >   * Create an RPC service
> >   */
> >  static struct svc_serv *
> > -__svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
> > -	     int (*threadfn)(void *data))
> > +__svc_create(struct svc_program *prog, struct svc_stat *stats,
> > +	     unsigned int bufsize, int npools, int (*threadfn)(void *data))
> >  {
> >  	struct svc_serv	*serv;
> >  	unsigned int vers;
> > @@ -463,7 +463,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
> >  		return NULL;
> >  	serv->sv_name      = prog->pg_name;
> >  	serv->sv_program   = prog;
> > -	serv->sv_stats     = prog->pg_stats;
> > +	serv->sv_stats     = stats;
> >  	if (bufsize > RPCSVC_MAXPAYLOAD)
> >  		bufsize = RPCSVC_MAXPAYLOAD;
> >  	serv->sv_max_payload = bufsize? bufsize : 4096;
> > @@ -521,34 +521,37 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
> >  /**
> >   * svc_create - Create an RPC service
> >   * @prog: the RPC program the new service will handle
> > + * @stats: the stats struct if desired
> >   * @bufsize: maximum message size for @prog
> >   * @threadfn: a function to service RPC requests for @prog
> >   *
> >   * Returns an instantiated struct svc_serv object or NULL.
> >   */
> 
> Here's the only minor quibble I have so far:
> 
> svc_create's callers don't use stats, so maybe you don't need
> to add an @stats argument for this API.
> 
> Fwiw, I haven't gotten all the way to the end of the series yet.

Yup you're right, I can drop this bit.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args
  2024-01-25 21:54     ` Josef Bacik
@ 2024-01-25 22:30       ` Jeff Layton
  2024-01-26 13:49       ` Chuck Lever III
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2024-01-25 22:30 UTC (permalink / raw)
  To: Josef Bacik, Chuck Lever; +Cc: linux-nfs, kernel-team

On Thu, 2024-01-25 at 16:54 -0500, Josef Bacik wrote:
> On Thu, Jan 25, 2024 at 03:53:24PM -0500, Chuck Lever wrote:
> > On Thu, Jan 25, 2024 at 02:53:15PM -0500, Josef Bacik wrote:
> > > We want to be able to have our rpc stats handled in a per network
> > > namespace manner, so add an option to rpc_create_args to specify a
> > > different rpc_stats struct instead of using the one on the rpc_program.
> > > 
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > ---
> > >  fs/nfs/client.c             | 2 +-
> > >  include/linux/sunrpc/clnt.h | 1 +
> > >  net/sunrpc/clnt.c           | 2 +-
> > >  3 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > I know it isn't obvious to an outside observer, but the
> > maintainership of the NFS server and client are separate.
> > 
> > NFS client patches go To: Trond and Anna, Cc: linux-nfs
> > 
> > NFS server patches go To: Jeff and Chuck, Cc: linux-nfs
> > 
> > and you can Cc: server patches on the reviewers listed in
> > MAINTAINERS too if you like.
> 
> Sounds good, should I split the series up completely?  I think the nfsd and nfs
> sunrpc related changes are unique to either stack so I can split it out into two
> different series if that would help.  Thanks,
> 

Yes, that'd probably be best. The client-side changes will probably go
in via Trond or Anna, and Chuck will (likely) take the server-side
changes.

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-25 21:56     ` Josef Bacik
@ 2024-01-26 13:01       ` Jeff Layton
  2024-01-26 13:48         ` Chuck Lever III
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Layton @ 2024-01-26 13:01 UTC (permalink / raw)
  To: Josef Bacik, Chuck Lever; +Cc: linux-nfs, kernel-team

On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
> On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
> > On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> > > This is the last global stat, move it into nfsd_net and adjust all the
> > > users to use that variant instead of the global one.
> > 
> > Hm. I thought nfsd threads were a global resource -- they service
> > all network namespaces. So, shouldn't the same thread count be
> > surfaced to all containers? Won't they all see all of the nfsd
> > processes?
> > 
> 

Each container is going to start /proc/fs/nfsd/threads number of threads
regardless. I hadn't actually grokked that they just get tossed onto the
pile of threads that service requests.

Is is possible for one container to start a small number of threads but
have its client load be such that it spills over and ends up stealing
threads from other containers?

> I don't think we want the network namespaces seeing how many threads exist in
> the entire system right?
> 
> Additionally it appears that we can have multiple threads per network namespace,
> so it's not like this will just show 1 for each individual nn, it'll show
> however many threads have been configured for that nfsd in that network
> namespace.
> 
> I'm good either way, but it makes sense to me to only surface the network
> namespace related thread count.  I could probably have a global counter and only
> surface the global counter if net == &init_net.  Let me know what you prefer.
> Thanks,
> 

The old stat meant "number of threads that are currently busy". The new
one will mean "number of threads that are busy servicing requests in
this namespace". The latter seems more useful on a per-ns basis.

In fact, it might be interesting to throw out some kind of warning when
the th_cnt exceeds the number of threads that you've allocated in the
container. That might be an indicator that you didn't spin up enough and
are poaching from other containers.

Or... maybe we should be queueing the RPC when that happens so that you
can't use more than you've allocated for the container?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns
  2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
                   ` (12 preceding siblings ...)
  2024-01-25 19:53 ` [PATCH v2 13/13] nfs: make the rpc_stat per net namespace Josef Bacik
@ 2024-01-26 13:12 ` Jeff Layton
  13 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2024-01-26 13:12 UTC (permalink / raw)
  To: Josef Bacik, linux-nfs, kernel-team

On Thu, 2024-01-25 at 14:53 -0500, Josef Bacik wrote:
> v1: https://lore.kernel.org/linux-nfs/cover.1706124811.git.josef@toxicpanda.com/
> 
> v1->v2:
> - rework the sunprc service creation to take a pointer to the sv_stats.
> - dropped ->pg_stats from the svc_program.
> - converted all of the nfsd global stats to per-network namespace.
> - added the ability to point at a specific rpc_stat for rpc program creation.
> - converted the rpc stats for nfs to per-network namespace.
>
> -- Original email --
> Hello,
> 
> We're currently deploying NFS internally and have run into some oddities with
> our usage of containers.  All of the services that mount and export NFS volumes
> run inside of containers, specifically all the namespaces including network
> namespaces.  Our monitoring is done on a per-container basis, so we need access
> to the nfs and nfsd stats that are under /proc/net/sunrpc.  However these are
> only tied to the init_net, which makes them invisible to containers in a
> different network namespace.
> 
> Fix this so that these files are tied to the network namespace.  This allows us
> to avoid the hack of bind mounting the hosts /proc into the container in order
> to do proper monitoring.  Thanks,
> 
> Josef
> 
> Josef Bacik (13):
>   sunrpc: don't change ->sv_stats if it doesn't exist
>   nfs: stop setting ->pg_stats for unused stats
>   sunrpc: pass in the sv_stats struct through svc_create*
>   sunrpc: remove ->pg_stats from svc_program
>   sunrpc: add a struct rpc_stats arg to rpc_create_args
>   sunrpc: use the struct net as the svc proc private
>   nfsd: rename NFSD_NET_* to NFSD_STATS_*
>   nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
>   nfsd: make all of the nfsd stats per-network namespace
>   nfsd: move th_cnt into nfsd_net
>   nfsd: make svc_stat per-network namespace instead of global
>   nfs: expose /proc/net/sunrpc/nfs in net namespaces
>   nfs: make the rpc_stat per net namespace
> 
>  fs/lockd/svc.c              |  5 +--
>  fs/nfs/callback.c           |  5 +--
>  fs/nfs/client.c             |  5 ++-
>  fs/nfs/inode.c              |  8 ++---
>  fs/nfs/internal.h           |  2 --
>  fs/nfs/netns.h              |  2 ++
>  fs/nfsd/cache.h             |  2 --
>  fs/nfsd/netns.h             | 28 ++++++++++++---
>  fs/nfsd/nfs4proc.c          |  6 ++--
>  fs/nfsd/nfs4state.c         |  3 +-
>  fs/nfsd/nfscache.c          | 40 +++++----------------
>  fs/nfsd/nfsctl.c            | 16 ++++-----
>  fs/nfsd/nfsfh.c             |  3 +-
>  fs/nfsd/nfssvc.c            | 13 +++----
>  fs/nfsd/stats.c             | 53 ++++++++++++----------------
>  fs/nfsd/stats.h             | 70 +++++++++++++------------------------
>  fs/nfsd/vfs.c               |  5 +--
>  include/linux/sunrpc/clnt.h |  1 +
>  include/linux/sunrpc/svc.h  |  9 +++--
>  net/sunrpc/clnt.c           |  2 +-
>  net/sunrpc/stats.c          |  2 +-
>  net/sunrpc/svc.c            | 44 ++++++++++++++---------
>  22 files changed, 147 insertions(+), 177 deletions(-)
> 

Nice work!

I took a look and this all looks good to me, modulo the nits that Chuck
pointed out yesterday. I'd also plan to split this into client and
server sets since they'll probably go in separately. Otherwise though,
this looks good to me and you can add:

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-26 13:01       ` Jeff Layton
@ 2024-01-26 13:48         ` Chuck Lever III
  2024-01-26 14:08           ` Jeff Layton
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever III @ 2024-01-26 13:48 UTC (permalink / raw)
  To: Jeff Layton, Josef Bacik; +Cc: Linux NFS Mailing List, kernel-team@fb.com



> On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
>> On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
>>> On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
>>>> This is the last global stat, move it into nfsd_net and adjust all the
>>>> users to use that variant instead of the global one.
>>> 
>>> Hm. I thought nfsd threads were a global resource -- they service
>>> all network namespaces. So, shouldn't the same thread count be
>>> surfaced to all containers? Won't they all see all of the nfsd
>>> processes?
> 
> Each container is going to start /proc/fs/nfsd/threads number of threads
> regardless. I hadn't actually grokked that they just get tossed onto the
> pile of threads that service requests.
> 
> Is is possible for one container to start a small number of threads but
> have its client load be such that it spills over and ends up stealing
> threads from other containers?

I haven't seen any code that manages resources based on namespace,
except in filecache.c to restrict writeback per namespace.

My impression is that any nfsd thread can serve any namespace. I'm
not sure it is currently meaningful for a particular net namespace to
"create" more threads.

If someone would like that level of control, we could implement a
cgroup mechanism and have one or more separate svc_pools per net
namespace, maybe? </hand wave>


>> I don't think we want the network namespaces seeing how many threads exist in
>> the entire system right?

If someone in a non-init net namespace does a "pgrep -c nfsd" don't
they see the total nfsd thread count for the host?


>> Additionally it appears that we can have multiple threads per network namespace,
>> so it's not like this will just show 1 for each individual nn, it'll show
>> however many threads have been configured for that nfsd in that network
>> namespace.

I've never tried this, so I'm speculating. But it seems like for
now, because all nfsd threads can serve all namespaces, they should
all see the global thread count stat.

Then later we can refine it.


>> I'm good either way, but it makes sense to me to only surface the network
>> namespace related thread count.  I could probably have a global counter and only
>> surface the global counter if net == &init_net.  Let me know what you prefer.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args
  2024-01-25 21:54     ` Josef Bacik
  2024-01-25 22:30       ` Jeff Layton
@ 2024-01-26 13:49       ` Chuck Lever III
  1 sibling, 0 replies; 30+ messages in thread
From: Chuck Lever III @ 2024-01-26 13:49 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Linux NFS Mailing List, kernel-team@fb.com



> On Jan 25, 2024, at 4:54 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> 
> On Thu, Jan 25, 2024 at 03:53:24PM -0500, Chuck Lever wrote:
>> On Thu, Jan 25, 2024 at 02:53:15PM -0500, Josef Bacik wrote:
>>> We want to be able to have our rpc stats handled in a per network
>>> namespace manner, so add an option to rpc_create_args to specify a
>>> different rpc_stats struct instead of using the one on the rpc_program.
>>> 
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>> fs/nfs/client.c             | 2 +-
>>> include/linux/sunrpc/clnt.h | 1 +
>>> net/sunrpc/clnt.c           | 2 +-
>>> 3 files changed, 3 insertions(+), 2 deletions(-)
>> 
>> I know it isn't obvious to an outside observer, but the
>> maintainership of the NFS server and client are separate.
>> 
>> NFS client patches go To: Trond and Anna, Cc: linux-nfs
>> 
>> NFS server patches go To: Jeff and Chuck, Cc: linux-nfs
>> 
>> and you can Cc: server patches on the reviewers listed in
>> MAINTAINERS too if you like.
> 
> Sounds good, should I split the series up completely?  I think the nfsd and nfs
> sunrpc related changes are unique to either stack

That's what it looks like to me too.


> so I can split it out into two different series if that would help.

Let's do that for v3.

--
Chuck Lever



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-26 13:48         ` Chuck Lever III
@ 2024-01-26 14:08           ` Jeff Layton
  2024-01-26 14:27             ` Chuck Lever III
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Layton @ 2024-01-26 14:08 UTC (permalink / raw)
  To: Chuck Lever III, Josef Bacik; +Cc: Linux NFS Mailing List, kernel-team@fb.com

On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
> 
> > On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
> > > On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
> > > > On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> > > > > This is the last global stat, move it into nfsd_net and adjust all the
> > > > > users to use that variant instead of the global one.
> > > > 
> > > > Hm. I thought nfsd threads were a global resource -- they service
> > > > all network namespaces. So, shouldn't the same thread count be
> > > > surfaced to all containers? Won't they all see all of the nfsd
> > > > processes?
> > 
> > Each container is going to start /proc/fs/nfsd/threads number of threads
> > regardless. I hadn't actually grokked that they just get tossed onto the
> > pile of threads that service requests.
> > 
> > Is is possible for one container to start a small number of threads but
> > have its client load be such that it spills over and ends up stealing
> > threads from other containers?
> 
> I haven't seen any code that manages resources based on namespace,
> except in filecache.c to restrict writeback per namespace.
> 
> My impression is that any nfsd thread can serve any namespace. I'm
> not sure it is currently meaningful for a particular net namespace to
> "create" more threads.
> 
> If someone would like that level of control, we could implement a
> cgroup mechanism and have one or more separate svc_pools per net
> namespace, maybe? </hand wave>
> 

AFAICT, the total number of threads on the system will be the sum of the
threads started in each of the containers. They do just go into a big
pile, and whichever one wakes up will service the request, so the
threads aren't associated with the netns, per-se. The svc_rqst's however
_are_ per-netns. So, I don't see anything that ensures that a container
doesn't exceed the number of threads it started on its own behalf.

<hand wave>
I'm not sure we'd need to tie this in to cgroups. Now that Josef is
moving some of these key structures to be per-net, it should be fairly
simple to have nfsd() just look at the th_cnt and the thread count in
the current namespace, and just enqueue the RPC rather than doing it?
</hand wave>

OTOH, maybe I'm overly concerned here.


> 
> > > I don't think we want the network namespaces seeing how many threads exist in
> > > the entire system right?
> 
> If someone in a non-init net namespace does a "pgrep -c nfsd" don't
> they see the total nfsd thread count for the host?
> 

Yes, they're kernel threads and they aren't associated with a particular
pid namespace.

> 
> > > Additionally it appears that we can have multiple threads per network namespace,
> > > so it's not like this will just show 1 for each individual nn, it'll show
> > > however many threads have been configured for that nfsd in that network
> > > namespace.
> 
> I've never tried this, so I'm speculating. But it seems like for
> now, because all nfsd threads can serve all namespaces, they should
> all see the global thread count stat.
> 
> Then later we can refine it.
> 

I don't think that info is particularly useful though, and it certainly
breaks expectations wrt container isolation.

Look at it this way:

Suppose I have access to a container and I spin up nfsd with a
particular number of threads. I now want to know "did I spin up enough
threads?" By making this per-namespace as Josef suggests it should be
fairly simple to tell whether my clients are regularly overrunning the
threads I started. With this info as global, I have no idea what netns
the RPCs being counted are against. I can't do anything with that info.

> 
> > > I'm good either way, but it makes sense to me to only surface the network
> > > namespace related thread count.  I could probably have a global counter and only
> > > surface the global counter if net == &init_net.  Let me know what you prefer.
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-26 14:08           ` Jeff Layton
@ 2024-01-26 14:27             ` Chuck Lever III
  2024-01-26 15:03               ` Jeff Layton
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever III @ 2024-01-26 14:27 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Josef Bacik, Linux NFS Mailing List, kernel-team@fb.com



> On Jan 26, 2024, at 9:08 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
>> 
>>> On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
>>>> On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
>>>>> On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
>>>>>> This is the last global stat, move it into nfsd_net and adjust all the
>>>>>> users to use that variant instead of the global one.
>>>>> 
>>>>> Hm. I thought nfsd threads were a global resource -- they service
>>>>> all network namespaces. So, shouldn't the same thread count be
>>>>> surfaced to all containers? Won't they all see all of the nfsd
>>>>> processes?
>>> 
>>> Each container is going to start /proc/fs/nfsd/threads number of threads
>>> regardless. I hadn't actually grokked that they just get tossed onto the
>>> pile of threads that service requests.
>>> 
>>> Is is possible for one container to start a small number of threads but
>>> have its client load be such that it spills over and ends up stealing
>>> threads from other containers?
>> 
>> I haven't seen any code that manages resources based on namespace,
>> except in filecache.c to restrict writeback per namespace.
>> 
>> My impression is that any nfsd thread can serve any namespace. I'm
>> not sure it is currently meaningful for a particular net namespace to
>> "create" more threads.
>> 
>> If someone would like that level of control, we could implement a
>> cgroup mechanism and have one or more separate svc_pools per net
>> namespace, maybe? </hand wave>
>> 
> 
> AFAICT, the total number of threads on the system will be the sum of the
> threads started in each of the containers. They do just go into a big
> pile, and whichever one wakes up will service the request, so the
> threads aren't associated with the netns, per-se. The svc_rqst's however
> _are_ per-netns. So, I don't see anything that ensures that a container
> doesn't exceed the number of threads it started on its own behalf.
> 
> <hand wave>
> I'm not sure we'd need to tie this in to cgroups.

Not a need, but cgroups are typically the way that such
resource constraints are managed, that's all.


> Now that Josef is
> moving some of these key structures to be per-net, it should be fairly
> simple to have nfsd() just look at the th_cnt and the thread count in
> the current namespace, and just enqueue the RPC rather than doing it?
> </hand wave>
> 
> OTOH, maybe I'm overly concerned here.
> 
> 
>> 
>>>> I don't think we want the network namespaces seeing how many threads exist in
>>>> the entire system right?
>> 
>> If someone in a non-init net namespace does a "pgrep -c nfsd" don't
>> they see the total nfsd thread count for the host?
>> 
> 
> Yes, they're kernel threads and they aren't associated with a particular
> pid namespace.
> 
>> 
>>>> Additionally it appears that we can have multiple threads per network namespace,
>>>> so it's not like this will just show 1 for each individual nn, it'll show
>>>> however many threads have been configured for that nfsd in that network
>>>> namespace.
>> 
>> I've never tried this, so I'm speculating. But it seems like for
>> now, because all nfsd threads can serve all namespaces, they should
>> all see the global thread count stat.
>> 
>> Then later we can refine it.
>> 
> 
> I don't think that info is particularly useful though, and it certainly
> breaks expectations wrt container isolation.
> 
> Look at it this way:
> 
> Suppose I have access to a container and I spin up nfsd with a
> particular number of threads. I now want to know "did I spin up enough
> threads?"

It makes sense to me for a container to ask for one or more
NFSD listeners. But I'm not yet clear on what it means for
a container to try to alter the NFSD thread count, which is
currently global.


> By making this per-namespace as Josef suggests it should be
> fairly simple to tell whether my clients are regularly overrunning the
> threads I started. With this info as global, I have no idea what netns
> the RPCs being counted are against. I can't do anything with that info.

That's true. I'm just not sure we need to do anything
significant about it as part of this patch series.

I'm not at all advocating leaving it like this. I agree it
needs updating.

For now, just fill in that thread count stat with the global
thread count, and later when we have a better concept for
what it means to "adjust the nfsd thread count from inside a
container" we can come back and make it make sense.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-26 14:27             ` Chuck Lever III
@ 2024-01-26 15:03               ` Jeff Layton
  2024-01-26 15:16                 ` Chuck Lever III
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Layton @ 2024-01-26 15:03 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Josef Bacik, Linux NFS Mailing List, kernel-team@fb.com

On Fri, 2024-01-26 at 14:27 +0000, Chuck Lever III wrote:
> 
> > On Jan 26, 2024, at 9:08 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
> > > 
> > > > On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
> > > > > On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
> > > > > > On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> > > > > > > This is the last global stat, move it into nfsd_net and adjust all the
> > > > > > > users to use that variant instead of the global one.
> > > > > > 
> > > > > > Hm. I thought nfsd threads were a global resource -- they service
> > > > > > all network namespaces. So, shouldn't the same thread count be
> > > > > > surfaced to all containers? Won't they all see all of the nfsd
> > > > > > processes?
> > > > 
> > > > Each container is going to start /proc/fs/nfsd/threads number of threads
> > > > regardless. I hadn't actually grokked that they just get tossed onto the
> > > > pile of threads that service requests.
> > > > 
> > > > Is is possible for one container to start a small number of threads but
> > > > have its client load be such that it spills over and ends up stealing
> > > > threads from other containers?
> > > 
> > > I haven't seen any code that manages resources based on namespace,
> > > except in filecache.c to restrict writeback per namespace.
> > > 
> > > My impression is that any nfsd thread can serve any namespace. I'm
> > > not sure it is currently meaningful for a particular net namespace to
> > > "create" more threads.
> > > 
> > > If someone would like that level of control, we could implement a
> > > cgroup mechanism and have one or more separate svc_pools per net
> > > namespace, maybe? </hand wave>
> > > 
> > 
> > AFAICT, the total number of threads on the system will be the sum of the
> > threads started in each of the containers. They do just go into a big
> > pile, and whichever one wakes up will service the request, so the
> > threads aren't associated with the netns, per-se. The svc_rqst's however
> > _are_ per-netns. So, I don't see anything that ensures that a container
> > doesn't exceed the number of threads it started on its own behalf.
> > 
> > <hand wave>
> > I'm not sure we'd need to tie this in to cgroups.
> 
> Not a need, but cgroups are typically the way that such
> resource constraints are managed, that's all.
> 
> 
> > Now that Josef is
> > moving some of these key structures to be per-net, it should be fairly
> > simple to have nfsd() just look at the th_cnt and the thread count in
> > the current namespace, and just enqueue the RPC rather than doing it?
> > </hand wave>
> > 
> > OTOH, maybe I'm overly concerned here.
> > 
> > 
> > > 
> > > > > I don't think we want the network namespaces seeing how many threads exist in
> > > > > the entire system right?
> > > 
> > > If someone in a non-init net namespace does a "pgrep -c nfsd" don't
> > > they see the total nfsd thread count for the host?
> > > 
> > 
> > Yes, they're kernel threads and they aren't associated with a particular
> > pid namespace.
> > 
> > > 
> > > > > Additionally it appears that we can have multiple threads per network namespace,
> > > > > so it's not like this will just show 1 for each individual nn, it'll show
> > > > > however many threads have been configured for that nfsd in that network
> > > > > namespace.
> > > 
> > > I've never tried this, so I'm speculating. But it seems like for
> > > now, because all nfsd threads can serve all namespaces, they should
> > > all see the global thread count stat.
> > > 
> > > Then later we can refine it.
> > > 
> > 
> > I don't think that info is particularly useful though, and it certainly
> > breaks expectations wrt container isolation.
> > 
> > Look at it this way:
> > 
> > Suppose I have access to a container and I spin up nfsd with a
> > particular number of threads. I now want to know "did I spin up enough
> > threads?"
> 
> It makes sense to me for a container to ask for one or more
> NFSD listeners. But I'm not yet clear on what it means for
> a container to try to alter the NFSD thread count, which is
> currently global.
> 

write_threads sets the number of nfsd threads for the svc_serv, which is
a per-net object, so serv->sv_nrthreads is only counting the threads for
its namespace.

Each container that starts an nfsd will contribute "threads" number of
nfsds to the pile. When you set "threads" to a different number, the
total pile is grown or reduced accordingly. In fact, I'm not even sure
we keep a systemwide total.

What _isn't_ done (unless I'm missing something) is any sort of
limitation on the use of those threads. So you could have a container
that starts one nfsd thread, but its clients can still have many more
RPCs in flight, up to the total number of nfsd's running on the host.
Limiting that might be possible, but I'm not yet convinced that it's a
good idea.

It might be interesting to gather some metrics though. How often do the
number of RPCs in flight exceed the number of threads that we started in
a namespace? Might also be good to gather a high-water mark too?

> 
> > By making this per-namespace as Josef suggests it should be
> > fairly simple to tell whether my clients are regularly overrunning the
> > threads I started. With this info as global, I have no idea what netns
> > the RPCs being counted are against. I can't do anything with that info.
> 
> That's true. I'm just not sure we need to do anything
> significant about it as part of this patch series.
> 
> I'm not at all advocating leaving it like this. I agree it
> needs updating.
> 
> For now, just fill in that thread count stat with the global
> thread count, and later when we have a better concept for
> what it means to "adjust the nfsd thread count from inside a
> container" we can come back and make it make sense.

It would be good to step back and decide how we think this should work.
What would this look like if we were designing it today? Then we can
decide how to get there.

Personally, I think keeping the view inside the container as isolated as
possible is the right thing to do.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-26 15:03               ` Jeff Layton
@ 2024-01-26 15:16                 ` Chuck Lever III
  2024-01-26 15:35                   ` Jeff Layton
  0 siblings, 1 reply; 30+ messages in thread
From: Chuck Lever III @ 2024-01-26 15:16 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Josef Bacik, Linux NFS Mailing List, kernel-team@fb.com



> On Jan 26, 2024, at 10:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Fri, 2024-01-26 at 14:27 +0000, Chuck Lever III wrote:
>> 
>>> On Jan 26, 2024, at 9:08 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>> 
>>> On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
>>>> 
>>>>> On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>>>> 
>>>>> On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
>>>>>> On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
>>>>>>> On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
>>>>>>>> This is the last global stat, move it into nfsd_net and adjust all the
>>>>>>>> users to use that variant instead of the global one.
>>>>>>> 
>>>>>>> Hm. I thought nfsd threads were a global resource -- they service
>>>>>>> all network namespaces. So, shouldn't the same thread count be
>>>>>>> surfaced to all containers? Won't they all see all of the nfsd
>>>>>>> processes?
>>>>> 
>>>>> Each container is going to start /proc/fs/nfsd/threads number of threads
>>>>> regardless. I hadn't actually grokked that they just get tossed onto the
>>>>> pile of threads that service requests.
>>>>> 
>>>>> Is is possible for one container to start a small number of threads but
>>>>> have its client load be such that it spills over and ends up stealing
>>>>> threads from other containers?
>>>> 
>>>> I haven't seen any code that manages resources based on namespace,
>>>> except in filecache.c to restrict writeback per namespace.
>>>> 
>>>> My impression is that any nfsd thread can serve any namespace. I'm
>>>> not sure it is currently meaningful for a particular net namespace to
>>>> "create" more threads.
>>>> 
>>>> If someone would like that level of control, we could implement a
>>>> cgroup mechanism and have one or more separate svc_pools per net
>>>> namespace, maybe? </hand wave>
>>>> 
>>> 
>>> AFAICT, the total number of threads on the system will be the sum of the
>>> threads started in each of the containers. They do just go into a big
>>> pile, and whichever one wakes up will service the request, so the
>>> threads aren't associated with the netns, per-se. The svc_rqst's however
>>> _are_ per-netns. So, I don't see anything that ensures that a container
>>> doesn't exceed the number of threads it started on its own behalf.
>>> 
>>> <hand wave>
>>> I'm not sure we'd need to tie this in to cgroups.
>> 
>> Not a need, but cgroups are typically the way that such
>> resource constraints are managed, that's all.
>> 
>> 
>>> Now that Josef is
>>> moving some of these key structures to be per-net, it should be fairly
>>> simple to have nfsd() just look at the th_cnt and the thread count in
>>> the current namespace, and just enqueue the RPC rather than doing it?
>>> </hand wave>
>>> 
>>> OTOH, maybe I'm overly concerned here.
>>> 
>>> 
>>>> 
>>>>>> I don't think we want the network namespaces seeing how many threads exist in
>>>>>> the entire system right?
>>>> 
>>>> If someone in a non-init net namespace does a "pgrep -c nfsd" don't
>>>> they see the total nfsd thread count for the host?
>>>> 
>>> 
>>> Yes, they're kernel threads and they aren't associated with a particular
>>> pid namespace.
>>> 
>>>> 
>>>>>> Additionally it appears that we can have multiple threads per network namespace,
>>>>>> so it's not like this will just show 1 for each individual nn, it'll show
>>>>>> however many threads have been configured for that nfsd in that network
>>>>>> namespace.
>>>> 
>>>> I've never tried this, so I'm speculating. But it seems like for
>>>> now, because all nfsd threads can serve all namespaces, they should
>>>> all see the global thread count stat.
>>>> 
>>>> Then later we can refine it.
>>>> 
>>> 
>>> I don't think that info is particularly useful though, and it certainly
>>> breaks expectations wrt container isolation.
>>> 
>>> Look at it this way:
>>> 
>>> Suppose I have access to a container and I spin up nfsd with a
>>> particular number of threads. I now want to know "did I spin up enough
>>> threads?"
>> 
>> It makes sense to me for a container to ask for one or more
>> NFSD listeners. But I'm not yet clear on what it means for
>> a container to try to alter the NFSD thread count, which is
>> currently global.
>> 
> 
> write_threads sets the number of nfsd threads for the svc_serv, which is
> a per-net object, so serv->sv_nrthreads is only counting the threads for
> its namespace.

OK. I missed the part where struct svc_serv is per-net, and
each svc_serv has its own thread pool. Got it.


> Each container that starts an nfsd will contribute "threads" number of
> nfsds to the pile. When you set "threads" to a different number, the
> total pile is grown or reduced accordingly. In fact, I'm not even sure
> we keep a systemwide total.
> 
> What _isn't_ done (unless I'm missing something) is any sort of
> limitation on the use of those threads. So you could have a container
> that starts one nfsd thread, but its clients can still have many more
> RPCs in flight, up to the total number of nfsd's running on the host.
> Limiting that might be possible, but I'm not yet convinced that it's a
> good idea.
> 
> It might be interesting to gather some metrics though. How often do the
> number of RPCs in flight exceed the number of threads that we started in
> a namespace? Might also be good to gather a high-water mark too?

I had a bunch of trace points queued up for this purpose,
and all that got thrown out by Neil's recent work fixing
up our pool thread scheduler.

I'm not sure that queued requests are all that meaningful
or even quantifiable. A more meaningful metric is round
trip time measured on clients.


>>> By making this per-namespace as Josef suggests it should be
>>> fairly simple to tell whether my clients are regularly overrunning the
>>> threads I started. With this info as global, I have no idea what netns
>>> the RPCs being counted are against. I can't do anything with that info.
>> 
>> That's true. I'm just not sure we need to do anything
>> significant about it as part of this patch series.
>> 
>> I'm not at all advocating leaving it like this. I agree it
>> needs updating.
>> 
>> For now, just fill in that thread count stat with the global
>> thread count, and later when we have a better concept for
>> what it means to "adjust the nfsd thread count from inside a
>> container" we can come back and make it make sense.
> 
> It would be good to step back and decide how we think this should work.
> What would this look like if we were designing it today? Then we can
> decide how to get there.
> 
> Personally, I think keeping the view inside the container as isolated as
> possible is the right thing to do.

I agree. I would like to get Josef's patches in quickly too.

Should that metric report svc_serv::sv_nrthreads instead of
the global thread count?


--
Chuck Lever



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
  2024-01-26 15:16                 ` Chuck Lever III
@ 2024-01-26 15:35                   ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2024-01-26 15:35 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Josef Bacik, Linux NFS Mailing List, kernel-team@fb.com

On Fri, 2024-01-26 at 15:16 +0000, Chuck Lever III wrote:
> 
> > On Jan 26, 2024, at 10:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Fri, 2024-01-26 at 14:27 +0000, Chuck Lever III wrote:
> > > 
> > > > On Jan 26, 2024, at 9:08 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
> > > > > 
> > > > > > On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > 
> > > > > > On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
> > > > > > > On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
> > > > > > > > On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> > > > > > > > > This is the last global stat, move it into nfsd_net and adjust all the
> > > > > > > > > users to use that variant instead of the global one.
> > > > > > > > 
> > > > > > > > Hm. I thought nfsd threads were a global resource -- they service
> > > > > > > > all network namespaces. So, shouldn't the same thread count be
> > > > > > > > surfaced to all containers? Won't they all see all of the nfsd
> > > > > > > > processes?
> > > > > > 
> > > > > > Each container is going to start /proc/fs/nfsd/threads number of threads
> > > > > > regardless. I hadn't actually grokked that they just get tossed onto the
> > > > > > pile of threads that service requests.
> > > > > > 
> > > > > > Is is possible for one container to start a small number of threads but
> > > > > > have its client load be such that it spills over and ends up stealing
> > > > > > threads from other containers?
> > > > > 
> > > > > I haven't seen any code that manages resources based on namespace,
> > > > > except in filecache.c to restrict writeback per namespace.
> > > > > 
> > > > > My impression is that any nfsd thread can serve any namespace. I'm
> > > > > not sure it is currently meaningful for a particular net namespace to
> > > > > "create" more threads.
> > > > > 
> > > > > If someone would like that level of control, we could implement a
> > > > > cgroup mechanism and have one or more separate svc_pools per net
> > > > > namespace, maybe? </hand wave>
> > > > > 
> > > > 
> > > > AFAICT, the total number of threads on the system will be the sum of the
> > > > threads started in each of the containers. They do just go into a big
> > > > pile, and whichever one wakes up will service the request, so the
> > > > threads aren't associated with the netns, per-se. The svc_rqst's however
> > > > _are_ per-netns. So, I don't see anything that ensures that a container
> > > > doesn't exceed the number of threads it started on its own behalf.
> > > > 
> > > > <hand wave>
> > > > I'm not sure we'd need to tie this in to cgroups.
> > > 
> > > Not a need, but cgroups are typically the way that such
> > > resource constraints are managed, that's all.
> > > 
> > > 
> > > > Now that Josef is
> > > > moving some of these key structures to be per-net, it should be fairly
> > > > simple to have nfsd() just look at the th_cnt and the thread count in
> > > > the current namespace, and just enqueue the RPC rather than doing it?
> > > > </hand wave>
> > > > 
> > > > OTOH, maybe I'm overly concerned here.
> > > > 
> > > > 
> > > > > 
> > > > > > > I don't think we want the network namespaces seeing how many threads exist in
> > > > > > > the entire system right?
> > > > > 
> > > > > If someone in a non-init net namespace does a "pgrep -c nfsd" don't
> > > > > they see the total nfsd thread count for the host?
> > > > > 
> > > > 
> > > > Yes, they're kernel threads and they aren't associated with a particular
> > > > pid namespace.
> > > > 
> > > > > 
> > > > > > > Additionally it appears that we can have multiple threads per network namespace,
> > > > > > > so it's not like this will just show 1 for each individual nn, it'll show
> > > > > > > however many threads have been configured for that nfsd in that network
> > > > > > > namespace.
> > > > > 
> > > > > I've never tried this, so I'm speculating. But it seems like for
> > > > > now, because all nfsd threads can serve all namespaces, they should
> > > > > all see the global thread count stat.
> > > > > 
> > > > > Then later we can refine it.
> > > > > 
> > > > 
> > > > I don't think that info is particularly useful though, and it certainly
> > > > breaks expectations wrt container isolation.
> > > > 
> > > > Look at it this way:
> > > > 
> > > > Suppose I have access to a container and I spin up nfsd with a
> > > > particular number of threads. I now want to know "did I spin up enough
> > > > threads?"
> > > 
> > > It makes sense to me for a container to ask for one or more
> > > NFSD listeners. But I'm not yet clear on what it means for
> > > a container to try to alter the NFSD thread count, which is
> > > currently global.
> > > 
> > 
> > write_threads sets the number of nfsd threads for the svc_serv, which is
> > a per-net object, so serv->sv_nrthreads is only counting the threads for
> > its namespace.
> 
> OK. I missed the part where struct svc_serv is per-net, and
> each svc_serv has its own thread pool. Got it.
> 
> 
> > Each container that starts an nfsd will contribute "threads" number of
> > nfsds to the pile. When you set "threads" to a different number, the
> > total pile is grown or reduced accordingly. In fact, I'm not even sure
> > we keep a systemwide total.
> > 
> > What _isn't_ done (unless I'm missing something) is any sort of
> > limitation on the use of those threads. So you could have a container
> > that starts one nfsd thread, but its clients can still have many more
> > RPCs in flight, up to the total number of nfsd's running on the host.
> > Limiting that might be possible, but I'm not yet convinced that it's a
> > good idea.
> > 
> > It might be interesting to gather some metrics though. How often do the
> > number of RPCs in flight exceed the number of threads that we started in
> > a namespace? Might also be good to gather a high-water mark too?
> 
> I had a bunch of trace points queued up for this purpose,
> and all that got thrown out by Neil's recent work fixing
> up our pool thread scheduler.
> 
> I'm not sure that queued requests are all that meaningful
> or even quantifiable. A more meaningful metric is round
> trip time measured on clients.
> 

I wouldn't try to queue anything just yet, but knowing when a particular
container is exceeding the number of threads it started seems like a
helpful metric. That could be done in a later patch though. I wouldn't
block this work on that.
 
> 
> > > > By making this per-namespace as Josef suggests it should be
> > > > fairly simple to tell whether my clients are regularly overrunning the
> > > > threads I started. With this info as global, I have no idea what netns
> > > > the RPCs being counted are against. I can't do anything with that info.
> > > 
> > > That's true. I'm just not sure we need to do anything
> > > significant about it as part of this patch series.
> > > 
> > > I'm not at all advocating leaving it like this. I agree it
> > > needs updating.
> > > 
> > > For now, just fill in that thread count stat with the global
> > > thread count, and later when we have a better concept for
> > > what it means to "adjust the nfsd thread count from inside a
> > > container" we can come back and make it make sense.
> > 
> > It would be good to step back and decide how we think this should work.
> > What would this look like if we were designing it today? Then we can
> > decide how to get there.
> > 
> > Personally, I think keeping the view inside the container as isolated as
> > possible is the right thing to do.
> 
> I agree. I would like to get Josef's patches in quickly too.
> 
> Should that metric report svc_serv::sv_nrthreads instead of
> the global thread count?
> 

No. The sv_nrthreads is just the same value that /proc/fs/nfsd/threads
shows. The _metric_ should show the th_cnt for that particular
namespace.

It would also be nice to know what the high-water mark for th_cnt is.
IOW, if my containers clients exceeded the number of threads I've
started, then it would be helpful to know by how much.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-01-26 15:35 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
2024-01-25 19:53 ` [PATCH v2 01/13] sunrpc: don't change ->sv_stats if it doesn't exist Josef Bacik
2024-01-25 19:53 ` [PATCH v2 02/13] nfs: stop setting ->pg_stats for unused stats Josef Bacik
2024-01-25 19:53 ` [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create* Josef Bacik
2024-01-25 20:56   ` Chuck Lever
2024-01-25 21:56     ` Josef Bacik
2024-01-25 19:53 ` [PATCH v2 04/13] sunrpc: remove ->pg_stats from svc_program Josef Bacik
2024-01-25 19:53 ` [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args Josef Bacik
2024-01-25 20:53   ` Chuck Lever
2024-01-25 21:54     ` Josef Bacik
2024-01-25 22:30       ` Jeff Layton
2024-01-26 13:49       ` Chuck Lever III
2024-01-25 19:53 ` [PATCH v2 06/13] sunrpc: use the struct net as the svc proc private Josef Bacik
2024-01-25 19:53 ` [PATCH v2 07/13] nfsd: rename NFSD_NET_* to NFSD_STATS_* Josef Bacik
2024-01-25 19:53 ` [PATCH v2 08/13] nfsd: expose /proc/net/sunrpc/nfsd in net namespaces Josef Bacik
2024-01-25 19:53 ` [PATCH v2 09/13] nfsd: make all of the nfsd stats per-network namespace Josef Bacik
2024-01-25 19:53 ` [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net Josef Bacik
2024-01-25 21:01   ` Chuck Lever
2024-01-25 21:56     ` Josef Bacik
2024-01-26 13:01       ` Jeff Layton
2024-01-26 13:48         ` Chuck Lever III
2024-01-26 14:08           ` Jeff Layton
2024-01-26 14:27             ` Chuck Lever III
2024-01-26 15:03               ` Jeff Layton
2024-01-26 15:16                 ` Chuck Lever III
2024-01-26 15:35                   ` Jeff Layton
2024-01-25 19:53 ` [PATCH v2 11/13] nfsd: make svc_stat per-network namespace instead of global Josef Bacik
2024-01-25 19:53 ` [PATCH v2 12/13] nfs: expose /proc/net/sunrpc/nfs in net namespaces Josef Bacik
2024-01-25 19:53 ` [PATCH v2 13/13] nfs: make the rpc_stat per net namespace Josef Bacik
2024-01-26 13:12 ` [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox