All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-01 16:45 Lever, Charles
  2004-04-02  0:10 ` Greg Banks
  0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-04-01 16:45 UTC (permalink / raw)
  To: Greg Banks; +Cc: nfs

> The minor dimension of the 2d array of struct=20
> rpc_metrics_totals pointed to by rpc_tally is per-CPU.  To=20
> get correct per-CPU separation you will want to=20
> cacheline-align the per-CPU parts (so that cachelines which=20
> straddle the per-CPU parts don't end up bouncing between=20
> CPUs).  For example, in rpc_alloc_tally(),
>=20
> size =3D NR_CPUS * ROUNDUP(ops * sizeof(struct=20
> rpc_metric_totals), L1_CACHE_BYTES);
>=20
> and change the stride of the tl_totals[] initialisation loop.

that doesn't work, unfortunately, unless sizeof(struct rpc_tally)
is also an exact multiple of L1_CACHE_BYTES.  otherwise the whole
metrics array starts on a non-cache-aligned boundary since these
are all cut from the same piece of memory.

what is the preferred mechanism these days for ensuring that
structures like these are sized in multiples of L1_CACHE_BYTES?
if i can get the C compiler to do this, that would be a cleaner
solution.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-04  2:35 Lever, Charles
  2004-04-04  6:17 ` Greg Banks
  0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-04-04  2:35 UTC (permalink / raw)
  To: Greg Banks, Trond Myklebust; +Cc: nfs

> On Thu, Apr 01, 2004 at 07:36:27PM -0500, Trond Myklebust wrote:
> > On Thu, 2004-04-01 at 12:05, Lever, Charles wrote:
> >=20
> > > if we saved the export path somewhere in the nfs_server
> > > structure, that would be mighty convenient for iostats
> > > and i don't think that would break namespaces.
> >=20
> > ...but it would force us to introduce a new mount_data=20
> structure, and
> > hence a new "mount" ABI so that will have to wait until 2.7.x.
>=20
> Why not just save a copy of @devname in nfs_get_sb() ?  It's
> the same name that's stored in struct vfsmount and reported
> in /proc/mounts.

unless i've missed something, there are only two NFS functions
that take vfsmount as an argument.  one is nfs_show_options, and
the other is nfs_getattr.

the vfsmount struct is not available to the NFS client during a
mount operation.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-01 17:05 Lever, Charles
  2004-04-02  0:17 ` Greg Banks
  2004-04-02  0:36 ` Trond Myklebust
  0 siblings, 2 replies; 19+ messages in thread
From: Lever, Charles @ 2004-04-01 17:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Greg Banks, nfs

> I don't see any way for the userspace program which reads=20
> these to figure out which of the iostat files corresponds to=20
> which mount. We get the hostname in the filename and again in=20
> the contents but nowhere do we get the mount path, only the=20
> device minor number in the filename.  How about adding a line=20
> to nfs_iostat_show() to print nfss->mnt_path?

nfss->mnt_path is an NFSv4 only field at the moment.

let me make sure i understand exactly what it contains.  is
this the mounted-on directory on the client, or the export
path on the server?

if we saved the export path somewhere in the nfs_server
structure, that would be mighty convenient for iostats
and i don't think that would break namespaces.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-01 15:15 Lever, Charles
  2004-04-01 23:58 ` Greg Banks
  0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-04-01 15:15 UTC (permalink / raw)
  To: Greg Banks; +Cc: Jeremy McNicoll, Patrick Mochel, nfs

hi greg-

thanks for the review.

> I don't understand why you have an rpc_tally pointer in rpc_message.
> This just seems to complicate matters by forcing you to initialise
> the pointer to the rpc_tally in the nfs_server on every call.  AFAICT
> all the place where you need the rpc_tally pointer you already have
> an rpc_clnt pointer which is initialised directly from the nfs_server
> and could be made to hold the rpc_tally* instead.  With that reorg
> you could drop most of the 08-rpc_call.patch and all the changes
> to nfs3proc.c and proc.c in nfs-iostat.patch and replace them with
> a small change to nfs_create_client().   That would greatly reduce
> the code perturbation without affecting performance or functionality.

trond wanted a mechanism that could be used for other things besides
per-mount statistics.  for instance, you can allocate a different
tally structure for each inode on a file system, and collect stats
for each of them.  or you can use the same tally struct for files
in the same directory.  nailing the tally structure into the rpc_clnt
makes that kind of usage (which might be nice for debugging) impossible.

there was interest in removing rpc_call() anyway.  this is just a
convenient excuse.

in terms of performance impact, this change is actually pretty
minor compared to the massive amount of data copying done by XDR
and the socket layer, and to the lock contention that occurs in
the RPC client.  as i've inlined nfs3_rpc_wrapper, that more
than makes up for setting a single extra pointer, at least in
the NFSv3 synchronous path.  i don't see that there are any data
dependencies that would cause a pipeline stall when rpc_talp is
set, do you?

> I don't see any way for the userspace program which reads these to
> figure out which of the iostat files corresponds to which mount.
> We get the hostname in the filename and again in the contents but
> nowhere do we get the mount path, only the device minor number in
> the filename.  How about adding a line to nfs_iostat_show() to
> print nfss->mnt_path?

the problem there is namespace.  each process can have its own
set of mount points (ie export to local directory pairings).  we
still need to work out exactly how to provide that information.

i'm not too concerned about that, as iostat on local file systems
currently produces results based on devices, forcing an adminis-
trator to go back to /etc/fstab or df or mount to figure out
what's going on anyway.

> The minor dimension of the 2d array of struct rpc_metrics_totals
> pointed to by rpc_tally is per-CPU.  To get correct per-CPU
> separation you will want to cacheline-align the per-CPU parts
> (so that cachelines which straddle the per-CPU parts don't end
> up bouncing between CPUs).  For example, in rpc_alloc_tally(),
>=20
> size =3D NR_CPUS * ROUNDUP(ops * sizeof(struct=20
> rpc_metric_totals), L1_CACHE_BYTES);
>=20
> and change the stride of the tl_totals[] initialisation loop.

makes sense, will do.

> As others have mentioned, having at least four do_gettimeofday()
> calls per RPC call is evil; we've seen major problems scaling
> do_gettimeofday() on large CPU count Altix machines.  This problem
> also affects some NIC drivers because the network stack does
> a do_gettimeofday() to timestamp every incoming ethernet packet;
> David Miller's proposed solution to that is to define a new
> scalable high-resolution timestamp API.  I don't know if that's
> gone anywhere, you might try asking on the netdev list.

i asked on lkml yesterday.  folks there say "no such API exists."
at this point i'm thinking of using jiffies and calling it a day.
at some later point we can come back and add support for finer
resolution timestamps.  i've already added a timestamp resolution
indicator in the iostat file format, so changing later should be
no problem if user-level pays attention to it.

> So when the mount is unmounted, all stats are lost?  How does this
> work on clients which do intermittent traffic on automount mounts?
> Would it be possible to get some global stats too?

global stats still exist in /proc/net/rpc/  -- anything you
think is missing there?

> The accumulation-over-CPU loops in nfs_iostat_show() and=20
> rpc_print_tally()
> would be simpler if all the fields in struct nfs_iostat and struct
> rpc_metric_totals were a consistent size, so you could just treat the
> structs as an array of that type during the accumulation step.

i'm trying to keep rpc_metric_totals small because there are a whole
lot of them.  i'm not sure i understand exactly what you mean here.
can you give an example?

> In struct nfs_iostat, it would be nice to gather the number of times
> calls retried because they received EJUKEBOX from the server,=20
> the number
> of SETATTRs which cause a truncate, the number of WRITEs which cause
> a file extension, and the number of WRITEs which are not aligned to
> the wsize.

ok, good ideas.

> nfs_iostat_file_name() does an unbounded sprintf() of an=20
> unbounded string
> that comes from userspace (from an unprivileged user if the site runs
> automounter) into a buffer only NFS_PROC_NAMELEN=3D256 bytes long.

good catch.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-03-31 16:57 Lever, Charles
  2004-03-31 17:19 ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-03-31 16:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Patrick Mochel, nfs

> My experience with gettimeofday() was that it slows down and=20
> eventually hangs the machine when you use it inside bottom=20
> halves. That's why I had to revert to using 'jiffies'=20
> instead. While I agree that precision may be important, a=20
> measurement that so impacts on what it is measuring is not=20
> particularly useful either...

using jiffies makes these measurements useless for anything
faster than a millisecond (or 10 milliseconds on current
x86), which is most everything on 100Mb or faster, these
days.  isn't there some other solution?

> Could you also move EXPORT_SYMBOL() definitions to after=20
> their function definition, as per the 2.6.x recommendations.

i would prefer that we did this in a separate patch, and
moved all the symbols at once.


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH] 1/3 - RPC metrics support
@ 2004-03-31 16:34 Lever, Charles
  2004-03-31 16:46 ` Trond Myklebust
  2004-04-01  8:13 ` Greg Banks
  0 siblings, 2 replies; 19+ messages in thread
From: Lever, Charles @ 2004-03-31 16:34 UTC (permalink / raw)
  To: Trond Myklebust, Olaf Kirch, Steve Dickson, Jeremy McNicoll,
	Patrick Mochel
  Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

This patch provides a generic interface to the RPC client for tallying
per RPC request measurements.  The interface can be used by any
in-kernel application that calls the RPC client.  The API is:

rpc_alloc_tally
rpc_clear_tally

Provides a way for ULPs (upper level protocols) to allocate and free
that data structure that will hold the measurement totals.  This
structure is passed in to any RPC request that the ULP wants to have a
measurement for.

rpc_print_tally

Passing a seq_file and an RPC tally structure to this routine causes a
formatted human-readable byte stream representation of the tally struct
to appear on the seq_file.

rpc_clear_tally

Clear the counter fields in the RPC tally structure.

        - Chuck Lever
--
corporate:     <cel at netapp dot com>
personal:      <chucklever at bigfoot dot com>

[-- Attachment #2: 06-rpc_metrics.patch --]
[-- Type: application/octet-stream, Size: 19549 bytes --]

diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/clnt.h 06-rpc_metrics/include/linux/sunrpc/clnt.h
--- 05-dir-schedule/include/linux/sunrpc/clnt.h	2004-03-10 21:55:33.000000000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/clnt.h	2004-03-29 15:16:41.580240000 -0500
@@ -102,6 +102,7 @@ struct rpc_procinfo {
 	unsigned int		p_bufsiz;	/* req. buffer size */
 	unsigned int		p_count;	/* call count */
 	unsigned int		p_timer;	/* Which RTT timer to use */
+	char *			p_name;		/* name of procedure */
 };
 
 #define RPC_CONGESTED(clnt)	(RPCXPRT_CONGESTED((clnt)->cl_xprt))
diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/metrics.h 06-rpc_metrics/include/linux/sunrpc/metrics.h
--- 05-dir-schedule/include/linux/sunrpc/metrics.h	1969-12-31 19:00:00.000000000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/metrics.h	2004-03-30 13:05:11.385243000 -0500
@@ -0,0 +1,125 @@
+/*
+ *  linux/include/linux/sunrpc/metrics.h
+ *
+ *  Declarations for RPC client per-operation metrics
+ *
+ *  Copyright (C) 2004	Chuck Lever <cel@netapp.com>
+ */
+
+#ifndef _LINUX_SUNRPC_METRICS_H
+#define _LINUX_SUNRPC_METRICS_H
+
+#include <linux/time.h>
+#include <linux/seq_file.h>
+
+#define RPC_USECS_PER_SEC	(1000000ULL)
+#define RPC_USECS_PER_HOUR	(3600ULL * RPC_USECS_PER_SEC)
+
+/*
+ * This computes the elapsed time from then til now,
+ * in microseconds.  We cap this at one hour.
+ */
+static inline unsigned long
+rpc_calc_latency(struct timeval *then)
+{
+	struct timeval now;
+	long secs, usecs;
+	unsigned long long result;
+
+	do_gettimeofday(&now);
+
+	secs = now.tv_sec - then->tv_sec;
+	usecs = now.tv_usec - then->tv_usec;
+	result = ((secs * RPC_USECS_PER_SEC) + usecs);
+	if (result > RPC_USECS_PER_HOUR)
+		result = 0ULL;
+
+	return (unsigned long) result;
+}
+
+/*
+ * Metrics captured for each RPC request (one of these per rpc_task)
+ */
+struct rpc_metrics {
+	struct timeval		rm_task_start;	/* RPC execute time */
+	struct timeval		rm_xmit_time;	/* RPC transmit time */
+	unsigned long		rm_rtt;		/* RPC round-trip time */
+	size_t			rm_bytes_sent,	/* data bytes transmitted */
+				rm_bytes_recv,	/* data bytes received */
+				rm_sock_sndbuf;	/* socket buffer utilization */
+	unsigned int		rm_ntrans,	/* number of transmissions */
+				rm_cwnd,	/* congestion window size */
+				rm_cong,	/* congestion window fill */
+				rm_slots,	/* slot table utilization */
+				rm_sndq,	/* send queue utilization */
+				rm_backlog;	/* backlog utilization */
+};
+
+/*
+ * Metrics captured for an RPC transport (one of these per rpc_xprt)
+ */
+struct xprt_metrics {
+	struct timeval		xm_conn_start;	/* when we started connecting */
+	unsigned long long	xm_conn_wait;	/* total wait for connect */
+	unsigned long		xm_connects;	/* total count of connects */
+
+	unsigned int		xm_slots_inuse;	/* outstanding requests */
+	unsigned int		xm_sndq_len;	/* length of send queue */
+
+	unsigned int		xm_backlog_len;	/* length of backlog queue */
+	unsigned long		xm_major_tmos;	/* total major timeouts */
+	unsigned long		xm_partialwrs;	/* total partial writes */
+	unsigned long		xm_writespace;	/* total write space cbs */
+};
+
+/*
+ * One of these is allocated for each RPC operation type, per CPU.
+ * The array of these is indexed by CPU then by op type.
+ */
+struct rpc_metric_totals {
+	unsigned long		mt_ops,		/* count of operations */
+				mt_ntrans,	/* count of RPC transmissions */
+				mt_cwnd,	/* congestion window util */
+				mt_cong,	/* congestion window util */
+				mt_slot_u,	/* slot table utilization */
+				mt_sndq_u,	/* send queue utilization */
+				mt_bklog_u,	/* backlog utilization */
+				mt_errors;	/* count of EIO errors */
+
+	unsigned long long      mt_bytes_sent,	/* count of bytes out */
+				mt_bytes_recv,	/* count of bytes in */
+				mt_sendbuf;	/* total sendbuf */
+
+	unsigned long long	mt_rtt,		/* usecs for RPC RTT */
+				mt_rtt_s,	/* sum(rtt**2) */
+				mt_execute,	/* usecs for RPC execution */
+				mt_execute_s;	/* sum(execute**2) */
+};
+
+struct rpc_program;
+struct rpc_version;
+struct rpc_task;
+struct rpc_xprt;
+
+/*
+ * Passed in via rpc_message for collecting RPC metrics data
+ */
+struct rpc_tally {
+	struct rpc_program *		tl_program;
+	struct rpc_version *		tl_version;
+	struct rpc_xprt *		tl_xprt;
+	struct rpc_metric_totals *	tl_totals[NR_CPUS];
+};
+
+/*
+ * EXPORTed functions for managing tally structures
+ */
+struct rpc_tally *	rpc_alloc_tally(struct rpc_program *, struct rpc_version *,
+							struct rpc_xprt *);
+void			rpc_start_tally(struct rpc_task *);
+void			rpc_end_tally(struct rpc_task *);
+void			rpc_print_tally(struct seq_file *, struct rpc_tally *);
+void			rpc_clear_tally(struct rpc_tally *);
+void			rpc_free_tally(struct rpc_tally *);
+
+#endif /* _LINUX_SUNRPC_METRICS_H */
diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/sched.h 06-rpc_metrics/include/linux/sunrpc/sched.h
--- 05-dir-schedule/include/linux/sunrpc/sched.h	2004-03-26 14:29:38.456240000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/sched.h	2004-03-29 17:59:33.568240000 -0500
@@ -10,6 +10,7 @@
 #define _LINUX_SUNRPC_SCHED_H_
 
 #include <linux/timer.h>
+#include <linux/sunrpc/metrics.h>
 #include <linux/sunrpc/types.h>
 #include <linux/wait.h>
 #include <linux/sunrpc/xdr.h>
@@ -23,6 +24,7 @@ struct rpc_message {
 	void *			rpc_argp;	/* Arguments */
 	void *			rpc_resp;	/* Result */
 	struct rpc_cred *	rpc_cred;	/* Credentials */
+	struct rpc_tally *	rpc_talp;	/* Tally metrics */
 };
 
 /*
@@ -80,6 +82,7 @@ struct rpc_task {
 #ifdef RPC_DEBUG
 	unsigned short		tk_pid;		/* debugging aid */
 #endif
+	struct rpc_metrics	tk_metrics;	/* per-op statistics */
 };
 #define tk_auth			tk_client->cl_auth
 #define tk_xprt			tk_client->cl_xprt
diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/xprt.h 06-rpc_metrics/include/linux/sunrpc/xprt.h
--- 05-dir-schedule/include/linux/sunrpc/xprt.h	2004-03-26 14:29:38.485240000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/xprt.h	2004-03-29 15:16:41.589247000 -0500
@@ -188,6 +188,7 @@ struct rpc_xprt {
 
 	struct list_head	recv;
 
+	struct xprt_metrics	metrics;
 
 	void			(*old_data_ready)(struct sock *, int);
 	void			(*old_state_change)(struct sock *);
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/clnt.c 06-rpc_metrics/net/sunrpc/clnt.c
--- 05-dir-schedule/net/sunrpc/clnt.c	2004-03-26 14:29:38.519242000 -0500
+++ 06-rpc_metrics/net/sunrpc/clnt.c	2004-03-29 15:16:41.593240000 -0500
@@ -797,6 +797,7 @@ call_timeout(struct rpc_task *task)
 	to->to_retries = clnt->cl_timeout.to_retries;
 
 	dprintk("RPC: %4d call_timeout (major)\n", task->tk_pid);
+	clnt->cl_xprt->metrics.xm_major_tmos++;
 	if (RPC_IS_SOFT(task)) {
 		if (clnt->cl_chatty)
 			printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/sched.c 06-rpc_metrics/net/sunrpc/sched.c
--- 05-dir-schedule/net/sunrpc/sched.c	2004-03-26 14:29:38.537240000 -0500
+++ 06-rpc_metrics/net/sunrpc/sched.c	2004-03-29 15:16:41.598240000 -0500
@@ -890,6 +890,9 @@ void rpc_init_task(struct rpc_task *task
 	task->tk_magic = 0xf00baa;
 	task->tk_pid = rpc_task_id++;
 #endif
+
+	rpc_start_tally(task);
+
 	dprintk("RPC: %4d new task procpid %d\n", task->tk_pid,
 				current->pid);
 }
@@ -956,6 +959,9 @@ rpc_release_task(struct rpc_task *task)
 	}
 #endif
 
+	/* Tally per-op statistics */
+	rpc_end_tally(task);
+
 	/* Remove from global task list */
 	spin_lock(&rpc_sched_lock);
 	list_del(&task->tk_task);
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/stats.c 06-rpc_metrics/net/sunrpc/stats.c
--- 05-dir-schedule/net/sunrpc/stats.c	2004-03-10 21:55:54.000000000 -0500
+++ 06-rpc_metrics/net/sunrpc/stats.c	2004-03-31 10:51:24.296000000 -0500
@@ -20,6 +20,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/sunrpc/clnt.h>
+#include <linux/sunrpc/sched.h>
 #include <linux/sunrpc/svcsock.h>
 
 #define RPCDBG_FACILITY	RPCDBG_MISC
@@ -107,6 +108,255 @@ void svc_seq_show(struct seq_file *seq, 
 }
 
 /*
+ * Export a generic mechanism for RPC client applications to collect and
+ * report per-RPC statistics.
+ */
+struct rpc_tally *
+rpc_alloc_tally(struct rpc_program *prog, struct rpc_version *version,
+	struct rpc_xprt *xprt)
+{
+	unsigned int i, size, ops;
+	struct rpc_tally *new;
+	struct rpc_metric_totals *t;
+
+	/* one piece of memory for tally struct and metrics array */
+	ops = version->nrprocs;
+	size = NR_CPUS * ops * sizeof(struct rpc_metric_totals);
+	size += sizeof(struct rpc_tally);
+
+	new = (struct rpc_tally *) kmalloc(size, GFP_KERNEL);
+	if (new) {
+		memset(new, 0, size);
+		new->tl_xprt = xprt;
+		new->tl_program = prog;
+		new->tl_version = version;
+
+		/* save a pointer for each CPU's set of metric totals */
+		t = (struct rpc_metric_totals *) &new[1];
+		for (i = 0; i < NR_CPUS; i++) {
+			new->tl_totals[i] = t;
+			t += ops;
+		}
+	}
+	return new;
+}
+
+void
+rpc_free_tally(struct rpc_tally *tally)
+{
+	kfree(tally);
+}
+
+void
+rpc_start_tally(struct rpc_task *task)
+{
+	do_gettimeofday(&task->tk_metrics.rm_task_start);
+}
+
+void
+rpc_end_tally(struct rpc_task *task)
+{
+	unsigned long long tmp;
+	unsigned int op = task->tk_msg.rpc_proc->p_proc;
+	struct rpc_tally *tally = task->tk_msg.rpc_talp;
+	struct rpc_metrics *m;
+	struct rpc_metric_totals *t;
+
+	if (!tally)
+		return;
+
+	t = tally->tl_totals[smp_processor_id()];
+	t = &t[op];
+	m = &task->tk_metrics;
+
+	t->mt_ops++;
+
+	t->mt_ntrans += m->rm_ntrans;
+	t->mt_cwnd += m->rm_cwnd;
+	t->mt_cong += m->rm_cong;
+	t->mt_slot_u += m->rm_slots;
+	t->mt_sndq_u += m->rm_sndq;
+	t->mt_bklog_u += m->rm_backlog;
+
+	if (task->tk_status == EIO)
+		t->mt_errors++;
+
+	t->mt_bytes_sent += (unsigned long long) m->rm_bytes_sent;
+	t->mt_bytes_recv += (unsigned long long) m->rm_bytes_recv;
+	t->mt_sendbuf += (unsigned long long) m->rm_sock_sndbuf;
+
+	tmp = (unsigned long long) m->rm_rtt;
+	t->mt_rtt += tmp;
+	t->mt_rtt_s += tmp * tmp;
+	tmp = (unsigned long long) rpc_calc_latency(&m->rm_task_start);
+	t->mt_execute += tmp;
+	t->mt_execute_s += tmp * tmp;
+}
+
+static void
+__print_name(struct seq_file *seq, struct rpc_version *vers, unsigned int op)
+{
+	if (vers->procs[op].p_name)
+		seq_printf(seq, "%12s: ", vers->procs[op].p_name);
+	else if (op == 0)
+		seq_printf(seq, "        NULL: ");
+	else
+		seq_printf(seq, "%12u: ", op);
+}
+
+/*
+ * TODO: add some GSS metrics
+ */
+void
+rpc_print_tally(struct seq_file *seq, struct rpc_tally *tally)
+{
+	struct rpc_program *prog = tally->tl_program;
+	struct rpc_version *vers = tally->tl_version;
+	struct rpc_metric_totals total;
+	unsigned int i, op;
+
+	seq_printf(seq, "RPC metrics version: 1.0\n");
+	seq_printf(seq,
+		"protocol: '%s'  program: %u  version: %u\n",
+			prog->name,
+			prog->number,
+			vers->number);
+
+	/*
+	 * Transport stats
+	 */
+	if (tally->tl_xprt) {
+		const struct rpc_xprt *xprt = tally->tl_xprt;
+		const struct xprt_metrics *m = &xprt->metrics;
+
+		seq_printf(seq,
+			"\ntransport idle time: %lu seconds\n",
+				(jiffies - xprt->last_used) / HZ);
+		seq_printf(seq,
+			"major timeouts: %lu\n",
+				m->xm_major_tmos);
+		seq_printf(seq,
+			"transport partial writes: %lu\n",
+				m->xm_partialwrs);
+		seq_printf(seq,
+			"write_space callbacks: %lu\n",
+				m->xm_writespace);
+		if (tally->tl_xprt->stream) {
+			seq_printf(seq, "transport socket type: tcp\n");
+			seq_printf(seq,
+				"connect attempts: %lu\n",
+					m->xm_connects);
+			seq_printf(seq,
+				"total connect wait time: %Lu usecs\n",
+					m->xm_conn_wait);
+		} else
+			seq_printf(seq, "transport socket type: udp\n");
+	}
+
+	/*
+	 * op rate and byte rate stats
+	 */
+	seq_printf(seq, "\n#     optype    op count             bytes in");
+	seq_printf(seq, "            bytes out     ntrans     errors\n");
+
+	for (op = 0; op < vers->nrprocs; op++) {
+		memset(&total, 0, sizeof(struct rpc_metric_totals));
+		for (i = 0; i < NR_CPUS; i++) {
+			struct rpc_metric_totals *r = tally->tl_totals[i];
+			r = &r[op];
+			total.mt_ops += r->mt_ops;
+			total.mt_ntrans += r->mt_ntrans;
+			total.mt_errors += r->mt_errors;
+			total.mt_bytes_sent += r->mt_bytes_sent;
+			total.mt_bytes_recv += r->mt_bytes_recv;
+		}
+		__print_name(seq, vers, op);
+		seq_printf(seq,
+			"%10lu %20Lu %20Lu %10lu %10lu\n",
+				total.mt_ops,
+				total.mt_bytes_recv,
+				total.mt_bytes_sent,
+				total.mt_ntrans,
+				total.mt_errors);
+	}
+
+	/*
+	 * latency stats
+	 */
+	seq_printf(seq, "\nticks/sec: %Lu\n", RPC_USECS_PER_SEC);
+	seq_printf(seq, "#     optype       rtt total ticks         sum rtt ** 2");
+	seq_printf(seq, "  execute total ticks     sum execute ** 2\n");
+
+	for (op = 0; op < vers->nrprocs; op++) {
+		memset(&total, 0, sizeof(struct rpc_metric_totals));
+		for (i = 0; i < NR_CPUS; i++) {
+			struct rpc_metric_totals *r = tally->tl_totals[i];
+			r = &r[op];
+			total.mt_rtt += r->mt_rtt;
+			total.mt_rtt_s += r->mt_rtt_s;
+			total.mt_execute += r->mt_execute;
+			total.mt_execute_s += r->mt_execute_s;
+		}
+		__print_name(seq, vers, op);
+		seq_printf(seq,
+			"%20Lu %20Lu %20Lu %20Lu\n",
+				total.mt_rtt,
+				total.mt_rtt_s,
+				total.mt_execute,
+				total.mt_execute_s);
+	}
+
+	/*
+	 * queue stats
+	 */
+	seq_printf(seq, "\n#     optype   slot util    backlog  sndq util");
+	seq_printf(seq, "       cwnd       cong              sendbuf\n");
+
+	for (op = 0; op < vers->nrprocs; op++) {
+		memset(&total, 0, sizeof(struct rpc_metric_totals));
+		for (i = 0; i < NR_CPUS; i++) {
+			struct rpc_metric_totals *r = tally->tl_totals[i];
+			r = &r[op];
+			total.mt_slot_u += r->mt_slot_u;
+			total.mt_sndq_u += r->mt_sndq_u;
+			total.mt_bklog_u += r->mt_bklog_u;
+			total.mt_sendbuf += r->mt_sendbuf;
+			total.mt_cwnd += r->mt_cwnd;
+			total.mt_cong += r->mt_cong;
+		}
+		__print_name(seq, vers, op);
+		seq_printf(seq, "%10lu %10lu %10lu %10lu %10lu %20Lu\n",
+				total.mt_slot_u,
+				total.mt_sndq_u,
+				total.mt_bklog_u,
+				total.mt_cwnd,
+				total.mt_cong,
+				total.mt_sendbuf);
+	}
+}
+
+void
+rpc_clear_tally(struct rpc_tally *tally)
+{
+	unsigned int i, size = sizeof(struct rpc_metric_totals) *
+						tally->tl_version->nrprocs;
+	struct xprt_metrics *m = &tally->tl_xprt->metrics;
+
+	for (i = 0; i < NR_CPUS; i++)
+		memset(tally->tl_totals[i], 0, size);
+
+	/* we don't clear time-related fields  */
+	m->xm_connects = 0UL;
+	m->xm_conn_wait = 0ULL;
+	m->xm_slots_inuse = 0U;
+	m->xm_sndq_len = 0U;
+	m->xm_backlog_len = 0U;
+	m->xm_major_tmos = 0U;
+	m->xm_partialwrs = 0U;
+	m->xm_writespace = 0U;
+}
+
+/*
  * Register/unregister RPC proc files
  */
 static inline struct proc_dir_entry *
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/sunrpc_syms.c 06-rpc_metrics/net/sunrpc/sunrpc_syms.c
--- 05-dir-schedule/net/sunrpc/sunrpc_syms.c	2004-03-26 14:29:38.551240000 -0500
+++ 06-rpc_metrics/net/sunrpc/sunrpc_syms.c	2004-03-29 15:16:41.605246000 -0500
@@ -92,6 +92,10 @@ EXPORT_SYMBOL(auth_domain_lookup);
 
 /* RPC statistics */
 #ifdef CONFIG_PROC_FS
+EXPORT_SYMBOL(rpc_alloc_tally);
+EXPORT_SYMBOL(rpc_print_tally);
+EXPORT_SYMBOL(rpc_clear_tally);
+EXPORT_SYMBOL(rpc_free_tally);
 EXPORT_SYMBOL(rpc_proc_register);
 EXPORT_SYMBOL(rpc_proc_unregister);
 EXPORT_SYMBOL(svc_proc_register);
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/xprt.c 06-rpc_metrics/net/sunrpc/xprt.c
--- 05-dir-schedule/net/sunrpc/xprt.c	2004-03-26 14:29:38.586243000 -0500
+++ 06-rpc_metrics/net/sunrpc/xprt.c	2004-03-30 17:24:08.132240000 -0500
@@ -298,6 +298,7 @@ static int
 __xprt_get_cong(struct rpc_xprt *xprt, struct rpc_task *task)
 {
 	struct rpc_rqst *req = task->tk_rqstp;
+	struct rpc_metrics *metrics = &task->tk_metrics;
 
 	if (req->rq_cong)
 		return 1;
@@ -307,6 +308,8 @@ __xprt_get_cong(struct rpc_xprt *xprt, s
 		return 0;
 	req->rq_cong = 1;
 	xprt->cong += RPC_CWNDSCALE;
+	metrics->rm_cwnd = xprt->cwnd;
+	metrics->rm_cong = xprt->cong;
 	return 1;
 }
 
@@ -487,6 +490,8 @@ static void xprt_socket_connect(void *ar
 	/*
 	 * Tell the socket layer to start connecting...
 	 */
+	xprt->metrics.xm_connects++;
+	do_gettimeofday(&xprt->metrics.xm_conn_start);
 	status = sock->ops->connect(sock, (struct sockaddr *) &xprt->addr,
 			sizeof(xprt->addr), O_NONBLOCK);
 	dprintk("RPC: %p  connect status %d connected %d sock state %d\n",
@@ -559,6 +564,8 @@ xprt_connect_status(struct rpc_task *tas
 	struct rpc_xprt	*xprt = task->tk_xprt;
 
 	if (task->tk_status >= 0) {
+		xprt->metrics.xm_conn_wait += (unsigned long long)
+			rpc_calc_latency(&xprt->metrics.xm_conn_start);
 		dprintk("RPC: %4d xprt_connect_status: connection established\n",
 				task->tk_pid);
 		return;
@@ -613,6 +620,7 @@ xprt_complete_rqst(struct rpc_xprt *xprt
 {
 	struct rpc_task	*task = req->rq_task;
 	struct rpc_clnt *clnt = task->tk_client;
+	struct rpc_metrics *metrics = &task->tk_metrics;
 
 	/* Adjust congestion window */
 	if (!xprt->nocong) {
@@ -627,6 +635,13 @@ xprt_complete_rqst(struct rpc_xprt *xprt
 		}
 	}
 
+	metrics->rm_ntrans = req->rq_ntrans;
+	metrics->rm_bytes_recv = copied;
+	metrics->rm_slots = xprt->metrics.xm_slots_inuse;
+	metrics->rm_sndq = xprt->metrics.xm_sndq_len;
+	metrics->rm_backlog = xprt->metrics.xm_backlog_len;
+	metrics->rm_rtt = rpc_calc_latency(&metrics->rm_xmit_time);
+
 #ifdef RPC_PROFILE
 	/* Profile only reads for now */
 	if (copied > 1024) {
@@ -1062,6 +1077,8 @@ xprt_write_space(struct sock *sk)
 	if (xprt->shutdown)
 		goto out;
 
+	xprt->metrics.xm_writespace++;
+
 	/* Wait until we have enough socket memory */
 	if (xprt->stream) {
 		/* from net/ipv4/tcp.c:tcp_write_space */
@@ -1151,6 +1168,7 @@ xprt_transmit(struct rpc_task *task)
 	struct rpc_clnt *clnt = task->tk_client;
 	struct rpc_rqst	*req = task->tk_rqstp;
 	struct rpc_xprt	*xprt = req->rq_xprt;
+	struct rpc_metrics *metrics = &task->tk_metrics;
 	int status, retry = 0;
 
 
@@ -1195,16 +1213,24 @@ xprt_transmit(struct rpc_task *task)
 			/* If we've sent the entire packet, immediately
 			 * reset the count of bytes sent. */
 			if (req->rq_bytes_sent >= req->rq_slen) {
+				metrics->rm_bytes_sent = req->rq_bytes_sent;
+				metrics->rm_sock_sndbuf =
+					xprt->inet->sk_wmem_queued;
 				req->rq_bytes_sent = 0;
 				goto out_receive;
 			}
 		} else {
-			if (status >= req->rq_slen)
+			if (status >= req->rq_slen) {
+				metrics->rm_bytes_sent = status;
+				metrics->rm_sock_sndbuf =
+					atomic_read(&xprt->inet->sk_wmem_alloc);
 				goto out_receive;
+			}
 			status = -EAGAIN;
 			break;
 		}
 
+		xprt->metrics.xm_partialwrs++;
 		dprintk("RPC: %4d xmit incomplete (%d left of %d)\n",
 				task->tk_pid, req->rq_slen - req->rq_bytes_sent,
 				req->rq_slen);
@@ -1251,6 +1277,7 @@ xprt_transmit(struct rpc_task *task)
 	return;
  out_receive:
 	dprintk("RPC: %4d xmit complete\n", task->tk_pid);
+	do_gettimeofday(&metrics->rm_xmit_time);
 	/* Set the task's receive timeout value */
 	spin_lock_bh(&xprt->sock_lock);
 	if (!xprt->nocong) {

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-04-04  6:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-01 16:45 [PATCH] 1/3 - RPC metrics support Lever, Charles
2004-04-02  0:10 ` Greg Banks
  -- strict thread matches above, loose matches on Subject: below --
2004-04-04  2:35 Lever, Charles
2004-04-04  6:17 ` Greg Banks
2004-04-01 17:05 Lever, Charles
2004-04-02  0:17 ` Greg Banks
2004-04-02  0:36 ` Trond Myklebust
2004-04-02  0:49   ` Greg Banks
2004-04-02  1:41     ` Trond Myklebust
2004-04-02  2:42       ` Greg Banks
2004-04-01 15:15 Lever, Charles
2004-04-01 23:58 ` Greg Banks
2004-04-02  0:34   ` Trond Myklebust
2004-04-02  2:33     ` Greg Banks
2004-03-31 16:57 Lever, Charles
2004-03-31 17:19 ` Trond Myklebust
2004-03-31 16:34 Lever, Charles
2004-03-31 16:46 ` Trond Myklebust
2004-04-01  8:13 ` Greg Banks

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.