* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-01 17:05 Lever, Charles
2004-04-02 0:17 ` Greg Banks
2004-04-02 0:36 ` Trond Myklebust
0 siblings, 2 replies; 19+ messages in thread
From: Lever, Charles @ 2004-04-01 17:05 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Greg Banks, nfs
> I don't see any way for the userspace program which reads=20
> these to figure out which of the iostat files corresponds to=20
> which mount. We get the hostname in the filename and again in=20
> the contents but nowhere do we get the mount path, only the=20
> device minor number in the filename. How about adding a line=20
> to nfs_iostat_show() to print nfss->mnt_path?
nfss->mnt_path is an NFSv4 only field at the moment.
let me make sure i understand exactly what it contains. is
this the mounted-on directory on the client, or the export
path on the server?
if we saved the export path somewhere in the nfs_server
structure, that would be mighty convenient for iostats
and i don't think that would break namespaces.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 1/3 - RPC metrics support
2004-04-01 17:05 [PATCH] 1/3 - RPC metrics support Lever, Charles
@ 2004-04-02 0:17 ` Greg Banks
2004-04-02 0:36 ` Trond Myklebust
1 sibling, 0 replies; 19+ messages in thread
From: Greg Banks @ 2004-04-02 0:17 UTC (permalink / raw)
To: Lever, Charles; +Cc: Trond Myklebust, nfs
On Thu, Apr 01, 2004 at 09:05:20AM -0800, Lever, Charles wrote:
> > I don't see any way for the userspace program which reads
> > these to figure out which of the iostat files corresponds to
> > which mount. We get the hostname in the filename and again in
> > the contents but nowhere do we get the mount path, only the
> > device minor number in the filename. How about adding a line
> > to nfs_iostat_show() to print nfss->mnt_path?
>
> nfss->mnt_path is an NFSv4 only field at the moment.
>
> let me make sure i understand exactly what it contains. is
> this the mounted-on directory on the client, or the export
> path on the server?
>
> if we saved the export path somewhere in the nfs_server
> structure, that would be mighty convenient for iostats
> and i don't think that would break namespaces.
Sounds good.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
2004-04-01 17:05 [PATCH] 1/3 - RPC metrics support Lever, Charles
2004-04-02 0:17 ` Greg Banks
@ 2004-04-02 0:36 ` Trond Myklebust
2004-04-02 0:49 ` Greg Banks
1 sibling, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2004-04-02 0:36 UTC (permalink / raw)
To: Charles Lever; +Cc: Greg Banks, nfs
On Thu, 2004-04-01 at 12:05, Lever, Charles wrote:
> if we saved the export path somewhere in the nfs_server
> structure, that would be mighty convenient for iostats
> and i don't think that would break namespaces.
...but it would force us to introduce a new mount_data structure, and
hence a new "mount" ABI so that will have to wait until 2.7.x.
Cheers,
Trond
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 1/3 - RPC metrics support
2004-04-02 0:36 ` Trond Myklebust
@ 2004-04-02 0:49 ` Greg Banks
2004-04-02 1:41 ` Trond Myklebust
0 siblings, 1 reply; 19+ messages in thread
From: Greg Banks @ 2004-04-02 0:49 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Charles Lever, nfs
On Thu, Apr 01, 2004 at 07:36:27PM -0500, Trond Myklebust wrote:
> On Thu, 2004-04-01 at 12:05, Lever, Charles wrote:
>
> > if we saved the export path somewhere in the nfs_server
> > structure, that would be mighty convenient for iostats
> > and i don't think that would break namespaces.
>
> ...but it would force us to introduce a new mount_data structure, and
> hence a new "mount" ABI so that will have to wait until 2.7.x.
Why not just save a copy of @devname in nfs_get_sb() ? It's
the same name that's stored in struct vfsmount and reported
in /proc/mounts.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 1/3 - RPC metrics support
2004-04-02 0:49 ` Greg Banks
@ 2004-04-02 1:41 ` Trond Myklebust
2004-04-02 2:42 ` Greg Banks
0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2004-04-02 1:41 UTC (permalink / raw)
To: Greg Banks; +Cc: Charles Lever, nfs
On Thu, 2004-04-01 at 19:49, Greg Banks wrote:
> Why not just save a copy of @devname in nfs_get_sb() ? It's
> the same name that's stored in struct vfsmount and reported
> in /proc/mounts.
That is very different. That we can do...
Please don't confuse the export path and the device name. nfss->mnt_path
is the former: it is part of the NFS mount ABI, and contains the path
that we use to look up the mountpoint on the NFSv4 server. It allows us,
for instance, to recover in the case where the server changes the
filehandle of the mountpoint (NFSv4 servers are allowed to have
filehandles that expire).
Adding the export path to the NFSv2/v3 mount structures would for
instance allow us to do things such as identifying whether or not the
user is mounting a parent/subdirectory of one of our current mount
points.
The device name OTOH is just whatever label the caller of sys_mount()
feels like passing down to the kernel. It will usually contain the name
of the server and the export path ('cos that's what the nfs-utils
"mount" does), but it does not have to, and we currently do not depend
on it taking any particular value.
Cheers,
Trond
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] 1/3 - RPC metrics support
2004-04-02 1:41 ` Trond Myklebust
@ 2004-04-02 2:42 ` Greg Banks
0 siblings, 0 replies; 19+ messages in thread
From: Greg Banks @ 2004-04-02 2:42 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Charles Lever, nfs
Trond Myklebust wrote:
>
> On Thu, 2004-04-01 at 19:49, Greg Banks wrote:
>
> > Why not just save a copy of @devname in nfs_get_sb() ?
>
> That is very different. That we can do...
>
> Please don't confuse the export path and the device name. nfss->mnt_path
> is the former:
> The device name OTOH is just whatever label the caller of sys_mount()
> feels like passing down to the kernel. It will usually contain the name
> of the server and the export path ('cos that's what the nfs-utils
> "mount" does), but it does not have to, and we currently do not depend
> on it taking any particular value.
If all we care about is identifying the sb for purposes of iostat and
post-mortem debugging (I for one would find this useful), the precise
semantics placed on the string by the mount program don't matter that
much. You could add a `devname' field to nfs_server and document
that it doesn't mean anything at all. Alternatively you could check
that the first part matches (modulo domain names) the `hostname' field
and is followed by a ':' and a path, and if all that matches, save the
path.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-04 2:35 Lever, Charles
2004-04-04 6:17 ` Greg Banks
0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-04-04 2:35 UTC (permalink / raw)
To: Greg Banks, Trond Myklebust; +Cc: nfs
> On Thu, Apr 01, 2004 at 07:36:27PM -0500, Trond Myklebust wrote:
> > On Thu, 2004-04-01 at 12:05, Lever, Charles wrote:
> >=20
> > > if we saved the export path somewhere in the nfs_server
> > > structure, that would be mighty convenient for iostats
> > > and i don't think that would break namespaces.
> >=20
> > ...but it would force us to introduce a new mount_data=20
> structure, and
> > hence a new "mount" ABI so that will have to wait until 2.7.x.
>=20
> Why not just save a copy of @devname in nfs_get_sb() ? It's
> the same name that's stored in struct vfsmount and reported
> in /proc/mounts.
unless i've missed something, there are only two NFS functions
that take vfsmount as an argument. one is nfs_show_options, and
the other is nfs_getattr.
the vfsmount struct is not available to the NFS client during a
mount operation.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 1/3 - RPC metrics support
2004-04-04 2:35 Lever, Charles
@ 2004-04-04 6:17 ` Greg Banks
0 siblings, 0 replies; 19+ messages in thread
From: Greg Banks @ 2004-04-04 6:17 UTC (permalink / raw)
To: Lever, Charles; +Cc: Trond Myklebust, nfs
"Lever, Charles" wrote:
>
> > Why not just save a copy of @devname in nfs_get_sb() ? It's
> > the same name that's stored in struct vfsmount and reported
> > in /proc/mounts.
>
> unless i've missed something, there are only two NFS functions
> that take vfsmount as an argument. one is nfs_show_options, and
> the other is nfs_getattr.
>
> the vfsmount struct is not available to the NFS client during a
> mount operation.
The devname string that fs/super.c:do_kern_mount() passes to
nfs_get_sb() via file_system_type->get_sb is passed to
alloc_vfsmount() and used there to initialise the vfsmount's
mnt_devname. So while NFS doesn't have the vfsmount it does
have the same devname string.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-01 16:45 Lever, Charles
2004-04-02 0:10 ` Greg Banks
0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-04-01 16:45 UTC (permalink / raw)
To: Greg Banks; +Cc: nfs
> The minor dimension of the 2d array of struct=20
> rpc_metrics_totals pointed to by rpc_tally is per-CPU. To=20
> get correct per-CPU separation you will want to=20
> cacheline-align the per-CPU parts (so that cachelines which=20
> straddle the per-CPU parts don't end up bouncing between=20
> CPUs). For example, in rpc_alloc_tally(),
>=20
> size =3D NR_CPUS * ROUNDUP(ops * sizeof(struct=20
> rpc_metric_totals), L1_CACHE_BYTES);
>=20
> and change the stride of the tl_totals[] initialisation loop.
that doesn't work, unfortunately, unless sizeof(struct rpc_tally)
is also an exact multiple of L1_CACHE_BYTES. otherwise the whole
metrics array starts on a non-cache-aligned boundary since these
are all cut from the same piece of memory.
what is the preferred mechanism these days for ensuring that
structures like these are sized in multiples of L1_CACHE_BYTES?
if i can get the C compiler to do this, that would be a cleaner
solution.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 1/3 - RPC metrics support
2004-04-01 16:45 Lever, Charles
@ 2004-04-02 0:10 ` Greg Banks
0 siblings, 0 replies; 19+ messages in thread
From: Greg Banks @ 2004-04-02 0:10 UTC (permalink / raw)
To: Lever, Charles; +Cc: nfs
On Thu, Apr 01, 2004 at 08:45:40AM -0800, Lever, Charles wrote:
> > The minor dimension of the 2d array of struct
> > rpc_metrics_totals pointed to by rpc_tally is per-CPU. To
> > get correct per-CPU separation you will want to
> > cacheline-align the per-CPU parts (so that cachelines which
> > straddle the per-CPU parts don't end up bouncing between
> > CPUs). For example, in rpc_alloc_tally(),
> >
> > size = NR_CPUS * ROUNDUP(ops * sizeof(struct
> > rpc_metric_totals), L1_CACHE_BYTES);
> >
> > and change the stride of the tl_totals[] initialisation loop.
>
> that doesn't work, unfortunately, unless sizeof(struct rpc_tally)
> is also an exact multiple of L1_CACHE_BYTES. otherwise the whole
> metrics array starts on a non-cache-aligned boundary since these
> are all cut from the same piece of memory.
That's right, you'll need to add ____cacheline_aligned like this:
struct rpc_tally {
} ____cacheline_aligned;
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-04-01 15:15 Lever, Charles
2004-04-01 23:58 ` Greg Banks
0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-04-01 15:15 UTC (permalink / raw)
To: Greg Banks; +Cc: Jeremy McNicoll, Patrick Mochel, nfs
hi greg-
thanks for the review.
> I don't understand why you have an rpc_tally pointer in rpc_message.
> This just seems to complicate matters by forcing you to initialise
> the pointer to the rpc_tally in the nfs_server on every call. AFAICT
> all the place where you need the rpc_tally pointer you already have
> an rpc_clnt pointer which is initialised directly from the nfs_server
> and could be made to hold the rpc_tally* instead. With that reorg
> you could drop most of the 08-rpc_call.patch and all the changes
> to nfs3proc.c and proc.c in nfs-iostat.patch and replace them with
> a small change to nfs_create_client(). That would greatly reduce
> the code perturbation without affecting performance or functionality.
trond wanted a mechanism that could be used for other things besides
per-mount statistics. for instance, you can allocate a different
tally structure for each inode on a file system, and collect stats
for each of them. or you can use the same tally struct for files
in the same directory. nailing the tally structure into the rpc_clnt
makes that kind of usage (which might be nice for debugging) impossible.
there was interest in removing rpc_call() anyway. this is just a
convenient excuse.
in terms of performance impact, this change is actually pretty
minor compared to the massive amount of data copying done by XDR
and the socket layer, and to the lock contention that occurs in
the RPC client. as i've inlined nfs3_rpc_wrapper, that more
than makes up for setting a single extra pointer, at least in
the NFSv3 synchronous path. i don't see that there are any data
dependencies that would cause a pipeline stall when rpc_talp is
set, do you?
> I don't see any way for the userspace program which reads these to
> figure out which of the iostat files corresponds to which mount.
> We get the hostname in the filename and again in the contents but
> nowhere do we get the mount path, only the device minor number in
> the filename. How about adding a line to nfs_iostat_show() to
> print nfss->mnt_path?
the problem there is namespace. each process can have its own
set of mount points (ie export to local directory pairings). we
still need to work out exactly how to provide that information.
i'm not too concerned about that, as iostat on local file systems
currently produces results based on devices, forcing an adminis-
trator to go back to /etc/fstab or df or mount to figure out
what's going on anyway.
> The minor dimension of the 2d array of struct rpc_metrics_totals
> pointed to by rpc_tally is per-CPU. To get correct per-CPU
> separation you will want to cacheline-align the per-CPU parts
> (so that cachelines which straddle the per-CPU parts don't end
> up bouncing between CPUs). For example, in rpc_alloc_tally(),
>=20
> size =3D NR_CPUS * ROUNDUP(ops * sizeof(struct=20
> rpc_metric_totals), L1_CACHE_BYTES);
>=20
> and change the stride of the tl_totals[] initialisation loop.
makes sense, will do.
> As others have mentioned, having at least four do_gettimeofday()
> calls per RPC call is evil; we've seen major problems scaling
> do_gettimeofday() on large CPU count Altix machines. This problem
> also affects some NIC drivers because the network stack does
> a do_gettimeofday() to timestamp every incoming ethernet packet;
> David Miller's proposed solution to that is to define a new
> scalable high-resolution timestamp API. I don't know if that's
> gone anywhere, you might try asking on the netdev list.
i asked on lkml yesterday. folks there say "no such API exists."
at this point i'm thinking of using jiffies and calling it a day.
at some later point we can come back and add support for finer
resolution timestamps. i've already added a timestamp resolution
indicator in the iostat file format, so changing later should be
no problem if user-level pays attention to it.
> So when the mount is unmounted, all stats are lost? How does this
> work on clients which do intermittent traffic on automount mounts?
> Would it be possible to get some global stats too?
global stats still exist in /proc/net/rpc/ -- anything you
think is missing there?
> The accumulation-over-CPU loops in nfs_iostat_show() and=20
> rpc_print_tally()
> would be simpler if all the fields in struct nfs_iostat and struct
> rpc_metric_totals were a consistent size, so you could just treat the
> structs as an array of that type during the accumulation step.
i'm trying to keep rpc_metric_totals small because there are a whole
lot of them. i'm not sure i understand exactly what you mean here.
can you give an example?
> In struct nfs_iostat, it would be nice to gather the number of times
> calls retried because they received EJUKEBOX from the server,=20
> the number
> of SETATTRs which cause a truncate, the number of WRITEs which cause
> a file extension, and the number of WRITEs which are not aligned to
> the wsize.
ok, good ideas.
> nfs_iostat_file_name() does an unbounded sprintf() of an=20
> unbounded string
> that comes from userspace (from an unprivileged user if the site runs
> automounter) into a buffer only NFS_PROC_NAMELEN=3D256 bytes long.
good catch.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 1/3 - RPC metrics support
2004-04-01 15:15 Lever, Charles
@ 2004-04-01 23:58 ` Greg Banks
2004-04-02 0:34 ` Trond Myklebust
0 siblings, 1 reply; 19+ messages in thread
From: Greg Banks @ 2004-04-01 23:58 UTC (permalink / raw)
To: Lever, Charles; +Cc: Jeremy McNicoll, Patrick Mochel, nfs
On Thu, Apr 01, 2004 at 07:15:34AM -0800, Lever, Charles wrote:
>
> trond wanted a mechanism that could be used for other things besides
> per-mount statistics. for instance, you can allocate a different
> tally structure for each inode on a file system, and collect stats
> for each of them. or you can use the same tally struct for files
> in the same directory. nailing the tally structure into the rpc_clnt
> makes that kind of usage (which might be nice for debugging) impossible.
Ah. I didn't realise that was a requirement.
> there was interest in removing rpc_call() anyway.
Why?
> in terms of performance impact, this change is actually pretty
> minor
Agreed, I saw no performance issues other than do_gettimeofday().
> > I don't see any way for the userspace program which reads these to
> > figure out which of the iostat files corresponds to which mount.
>
> the problem there is namespace. each process can have its own
> set of mount points (ie export to local directory pairings). we
> still need to work out exactly how to provide that information.
Actually I was interested in knowing the serverside mount path,
which doesn't depend on the client's namespace (although it's not
necessarily unique).
The obvious solution to the namespace issue is to just use the
namespace of the process doing the /proc read, which should be
current->namespace during nfs_iostat_show(). For almost every
case this will be the default namespace anyway. In the unlikely
case the the sb is not mounted in that namespace, return early
and don't print any stats.
> i'm not too concerned about that, as iostat on local file systems
> currently produces results based on devices, forcing an adminis-
> trator to go back to /etc/fstab or df or mount to figure out
> what's going on anyway.
This works because for local filesystems those sources give you
the device name. However (at least on my system) neither fstab
nor mount nor /proc/mounts will tell you anything about the device
number for NFS filesystems.
So what I see from "mount" is
server:/ on /mnt/foo type nfs (rw,addr=134.14.54.149)
and in /proc/mounts
server:/ /mnt/foo nfs rw,v3,rsize=32768,wsize=32768,hard,udp,lock,addr=server 0 0
So how am I supposed to correlate this back to a device minor?
> > As others have mentioned, having at least four do_gettimeofday()
> > calls per RPC call is evil;
>
> i asked on lkml yesterday. folks there say "no such API exists."
> at this point i'm thinking of using jiffies and calling it a day.
Fair enough.
> > So when the mount is unmounted, all stats are lost? How does this
> > work on clients which do intermittent traffic on automount mounts?
> > Would it be possible to get some global stats too?
>
> global stats still exist in /proc/net/rpc/ -- anything you
> think is missing there?
It would be nice to have all the same stats as are gathered per-mount,
in the same format so we only have to write and debug one userspace
parsing routine. In particular, the latency stats.
> > The accumulation-over-CPU loops in nfs_iostat_show() and
> > rpc_print_tally()
> > would be simpler if all the fields in struct nfs_iostat and struct
> > rpc_metric_totals were a consistent size, so you could just treat the
> > structs as an array of that type during the accumulation step.
>
> i'm trying to keep rpc_metric_totals small because there are a whole
> lot of them.
Sure, highly admirable.
> i'm not sure i understand exactly what you mean here.
> can you give an example?
struct rpc_metric_totals {
u64 mt_ops; /* count of operations */
u64 mt_ntrans; /* count of RPC transmissions */
u64 mt_cwnd; /* congestion window util */
u64 mt_cong; /* congestion window util */
u64 mt_slot_u; /* slot table utilization */
u64 mt_sndq_u; /* send queue utilization */
u64 mt_bklog_u; /* backlog utilization */
u64 mt_errors; /* count of EIO errors */
u64 mt_bytes_sent; /* count of bytes out */
u64 mt_bytes_recv; /* count of bytes in */
u64 mt_sendbuf; /* total sendbuf */
u64 mt_rtt; /* usecs for RPC RTT */
u64 mt_rtt_s; /* sum(rtt**2) */
u64 mt_execute; /* usecs for RPC execution */
u64 mt_execute_s; /* sum(execute**2) */
};
rpc_print_tally(...)
{
...
for (op = 0; op < vers->nrprocs; op++) {
memset(&total, 0, sizeof(struct rpc_metric_totals));
for (i = 0; i < NR_CPUS; i++) {
u64 *r = tally->tl_totals[i];
for (k = 0 ; k < sizeof(struct rpc_metric_totals)/sizeof(u64) ; k++)
((u64 *)&total)[k] += r[k];
}
}
}
The idea being that adding another field to rpc_metric_totals
becomes easier. Obviously the cost is higher memory use.
Just an idea.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] 1/3 - RPC metrics support
2004-04-01 23:58 ` Greg Banks
@ 2004-04-02 0:34 ` Trond Myklebust
2004-04-02 2:33 ` Greg Banks
0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2004-04-02 0:34 UTC (permalink / raw)
To: Greg Banks; +Cc: Charles Lever, Jeremy McNicoll, Patrick Mochel, nfs
On Thu, 2004-04-01 at 18:58, Greg Banks wrote:
> struct rpc_metric_totals {
> u64 mt_ops; /* count of operations */
> u64 mt_ntrans; /* count of RPC transmissions */
> u64 mt_cwnd; /* congestion window util */
> u64 mt_cong; /* congestion window util */
> u64 mt_slot_u; /* slot table utilization */
> u64 mt_sndq_u; /* send queue utilization */
> u64 mt_bklog_u; /* backlog utilization */
> u64 mt_errors; /* count of EIO errors */
>
> u64 mt_bytes_sent; /* count of bytes out */
> u64 mt_bytes_recv; /* count of bytes in */
> u64 mt_sendbuf; /* total sendbuf */
>
> u64 mt_rtt; /* usecs for RPC RTT */
> u64 mt_rtt_s; /* sum(rtt**2) */
> u64 mt_execute; /* usecs for RPC execution */
> u64 mt_execute_s; /* sum(execute**2) */
> };
>
> rpc_print_tally(...)
> {
> ...
> for (op = 0; op < vers->nrprocs; op++) {
> memset(&total, 0, sizeof(struct rpc_metric_totals));
> for (i = 0; i < NR_CPUS; i++) {
> u64 *r = tally->tl_totals[i];
> for (k = 0 ; k < sizeof(struct rpc_metric_totals)/sizeof(u64) ; k++)
> ((u64 *)&total)[k] += r[k];
> }
> }
> }
Let's not go there... 64-bit arithmetic on a 32-bit platform is still
not one of gcc's strong points, so let's not force everything into one
mold simply in order to save a couple of lines of code. Performance does
matter too...
Cheers,
Trond
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] 1/3 - RPC metrics support
2004-04-02 0:34 ` Trond Myklebust
@ 2004-04-02 2:33 ` Greg Banks
0 siblings, 0 replies; 19+ messages in thread
From: Greg Banks @ 2004-04-02 2:33 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Charles Lever, Jeremy McNicoll, Patrick Mochel, nfs
Trond Myklebust wrote:
>
> On Thu, 2004-04-01 at 18:58, Greg Banks wrote:
>
> > struct rpc_metric_totals {
> > u64 mt_ops; /* count of operations */
>
> Let's not go there... 64-bit arithmetic on a 32-bit platform is still
> not one of gcc's strong points,
This is of course not something I considered ;-)
> so let's not force everything into one
> mold simply in order to save a couple of lines of code. Performance does
> matter too...
Fair enough.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
@ 2004-03-31 16:57 Lever, Charles
2004-03-31 17:19 ` Trond Myklebust
0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2004-03-31 16:57 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Patrick Mochel, nfs
> My experience with gettimeofday() was that it slows down and=20
> eventually hangs the machine when you use it inside bottom=20
> halves. That's why I had to revert to using 'jiffies'=20
> instead. While I agree that precision may be important, a=20
> measurement that so impacts on what it is measuring is not=20
> particularly useful either...
using jiffies makes these measurements useless for anything
faster than a millisecond (or 10 milliseconds on current
x86), which is most everything on 100Mb or faster, these
days. isn't there some other solution?
> Could you also move EXPORT_SYMBOL() definitions to after=20
> their function definition, as per the 2.6.x recommendations.
i would prefer that we did this in a separate patch, and
moved all the symbols at once.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH] 1/3 - RPC metrics support
2004-03-31 16:57 Lever, Charles
@ 2004-03-31 17:19 ` Trond Myklebust
0 siblings, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2004-03-31 17:19 UTC (permalink / raw)
To: Charles Lever; +Cc: Patrick Mochel, nfs
On Wed, 2004-03-31 at 11:57, Lever, Charles wrote:
> > My experience with gettimeofday() was that it slows down and
> > eventually hangs the machine when you use it inside bottom
> > halves. That's why I had to revert to using 'jiffies'
> > instead. While I agree that precision may be important, a
> > measurement that so impacts on what it is measuring is not
> > particularly useful either...
>
> using jiffies makes these measurements useless for anything
> faster than a millisecond (or 10 milliseconds on current
> x86), which is most everything on 100Mb or faster, these
> days. isn't there some other solution?
In 2.6. jiffies have 1 ms resolution, not 10 on x86. There may be some
alternative generic timing solution in Linux that does better than that,
and that can work within bottom halves without side-effects but I'm not
aware of such a solution.
You can of course use do_gettimeofday() for the non bh-safe parts if you
want to, but using it inside things like *data_ready() etc. was clearly
unacceptable.
The only generic solution I see is therefore to use statistics...
Cheers,
Trond
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] 1/3 - RPC metrics support
@ 2004-03-31 16:34 Lever, Charles
2004-03-31 16:46 ` Trond Myklebust
2004-04-01 8:13 ` Greg Banks
0 siblings, 2 replies; 19+ messages in thread
From: Lever, Charles @ 2004-03-31 16:34 UTC (permalink / raw)
To: Trond Myklebust, Olaf Kirch, Steve Dickson, Jeremy McNicoll,
Patrick Mochel
Cc: nfs
[-- Attachment #1: Type: text/plain, Size: 850 bytes --]
This patch provides a generic interface to the RPC client for tallying
per RPC request measurements. The interface can be used by any
in-kernel application that calls the RPC client. The API is:
rpc_alloc_tally
rpc_clear_tally
Provides a way for ULPs (upper level protocols) to allocate and free
that data structure that will hold the measurement totals. This
structure is passed in to any RPC request that the ULP wants to have a
measurement for.
rpc_print_tally
Passing a seq_file and an RPC tally structure to this routine causes a
formatted human-readable byte stream representation of the tally struct
to appear on the seq_file.
rpc_clear_tally
Clear the counter fields in the RPC tally structure.
- Chuck Lever
--
corporate: <cel at netapp dot com>
personal: <chucklever at bigfoot dot com>
[-- Attachment #2: 06-rpc_metrics.patch --]
[-- Type: application/octet-stream, Size: 19549 bytes --]
diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/clnt.h 06-rpc_metrics/include/linux/sunrpc/clnt.h
--- 05-dir-schedule/include/linux/sunrpc/clnt.h 2004-03-10 21:55:33.000000000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/clnt.h 2004-03-29 15:16:41.580240000 -0500
@@ -102,6 +102,7 @@ struct rpc_procinfo {
unsigned int p_bufsiz; /* req. buffer size */
unsigned int p_count; /* call count */
unsigned int p_timer; /* Which RTT timer to use */
+ char * p_name; /* name of procedure */
};
#define RPC_CONGESTED(clnt) (RPCXPRT_CONGESTED((clnt)->cl_xprt))
diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/metrics.h 06-rpc_metrics/include/linux/sunrpc/metrics.h
--- 05-dir-schedule/include/linux/sunrpc/metrics.h 1969-12-31 19:00:00.000000000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/metrics.h 2004-03-30 13:05:11.385243000 -0500
@@ -0,0 +1,125 @@
+/*
+ * linux/include/linux/sunrpc/metrics.h
+ *
+ * Declarations for RPC client per-operation metrics
+ *
+ * Copyright (C) 2004 Chuck Lever <cel@netapp.com>
+ */
+
+#ifndef _LINUX_SUNRPC_METRICS_H
+#define _LINUX_SUNRPC_METRICS_H
+
+#include <linux/time.h>
+#include <linux/seq_file.h>
+
+#define RPC_USECS_PER_SEC (1000000ULL)
+#define RPC_USECS_PER_HOUR (3600ULL * RPC_USECS_PER_SEC)
+
+/*
+ * This computes the elapsed time from then til now,
+ * in microseconds. We cap this at one hour.
+ */
+static inline unsigned long
+rpc_calc_latency(struct timeval *then)
+{
+ struct timeval now;
+ long secs, usecs;
+ unsigned long long result;
+
+ do_gettimeofday(&now);
+
+ secs = now.tv_sec - then->tv_sec;
+ usecs = now.tv_usec - then->tv_usec;
+ result = ((secs * RPC_USECS_PER_SEC) + usecs);
+ if (result > RPC_USECS_PER_HOUR)
+ result = 0ULL;
+
+ return (unsigned long) result;
+}
+
+/*
+ * Metrics captured for each RPC request (one of these per rpc_task)
+ */
+struct rpc_metrics {
+ struct timeval rm_task_start; /* RPC execute time */
+ struct timeval rm_xmit_time; /* RPC transmit time */
+ unsigned long rm_rtt; /* RPC round-trip time */
+ size_t rm_bytes_sent, /* data bytes transmitted */
+ rm_bytes_recv, /* data bytes received */
+ rm_sock_sndbuf; /* socket buffer utilization */
+ unsigned int rm_ntrans, /* number of transmissions */
+ rm_cwnd, /* congestion window size */
+ rm_cong, /* congestion window fill */
+ rm_slots, /* slot table utilization */
+ rm_sndq, /* send queue utilization */
+ rm_backlog; /* backlog utilization */
+};
+
+/*
+ * Metrics captured for an RPC transport (one of these per rpc_xprt)
+ */
+struct xprt_metrics {
+ struct timeval xm_conn_start; /* when we started connecting */
+ unsigned long long xm_conn_wait; /* total wait for connect */
+ unsigned long xm_connects; /* total count of connects */
+
+ unsigned int xm_slots_inuse; /* outstanding requests */
+ unsigned int xm_sndq_len; /* length of send queue */
+
+ unsigned int xm_backlog_len; /* length of backlog queue */
+ unsigned long xm_major_tmos; /* total major timeouts */
+ unsigned long xm_partialwrs; /* total partial writes */
+ unsigned long xm_writespace; /* total write space cbs */
+};
+
+/*
+ * One of these is allocated for each RPC operation type, per CPU.
+ * The array of these is indexed by CPU then by op type.
+ */
+struct rpc_metric_totals {
+ unsigned long mt_ops, /* count of operations */
+ mt_ntrans, /* count of RPC transmissions */
+ mt_cwnd, /* congestion window util */
+ mt_cong, /* congestion window util */
+ mt_slot_u, /* slot table utilization */
+ mt_sndq_u, /* send queue utilization */
+ mt_bklog_u, /* backlog utilization */
+ mt_errors; /* count of EIO errors */
+
+ unsigned long long mt_bytes_sent, /* count of bytes out */
+ mt_bytes_recv, /* count of bytes in */
+ mt_sendbuf; /* total sendbuf */
+
+ unsigned long long mt_rtt, /* usecs for RPC RTT */
+ mt_rtt_s, /* sum(rtt**2) */
+ mt_execute, /* usecs for RPC execution */
+ mt_execute_s; /* sum(execute**2) */
+};
+
+struct rpc_program;
+struct rpc_version;
+struct rpc_task;
+struct rpc_xprt;
+
+/*
+ * Passed in via rpc_message for collecting RPC metrics data
+ */
+struct rpc_tally {
+ struct rpc_program * tl_program;
+ struct rpc_version * tl_version;
+ struct rpc_xprt * tl_xprt;
+ struct rpc_metric_totals * tl_totals[NR_CPUS];
+};
+
+/*
+ * EXPORTed functions for managing tally structures
+ */
+struct rpc_tally * rpc_alloc_tally(struct rpc_program *, struct rpc_version *,
+ struct rpc_xprt *);
+void rpc_start_tally(struct rpc_task *);
+void rpc_end_tally(struct rpc_task *);
+void rpc_print_tally(struct seq_file *, struct rpc_tally *);
+void rpc_clear_tally(struct rpc_tally *);
+void rpc_free_tally(struct rpc_tally *);
+
+#endif /* _LINUX_SUNRPC_METRICS_H */
diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/sched.h 06-rpc_metrics/include/linux/sunrpc/sched.h
--- 05-dir-schedule/include/linux/sunrpc/sched.h 2004-03-26 14:29:38.456240000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/sched.h 2004-03-29 17:59:33.568240000 -0500
@@ -10,6 +10,7 @@
#define _LINUX_SUNRPC_SCHED_H_
#include <linux/timer.h>
+#include <linux/sunrpc/metrics.h>
#include <linux/sunrpc/types.h>
#include <linux/wait.h>
#include <linux/sunrpc/xdr.h>
@@ -23,6 +24,7 @@ struct rpc_message {
void * rpc_argp; /* Arguments */
void * rpc_resp; /* Result */
struct rpc_cred * rpc_cred; /* Credentials */
+ struct rpc_tally * rpc_talp; /* Tally metrics */
};
/*
@@ -80,6 +82,7 @@ struct rpc_task {
#ifdef RPC_DEBUG
unsigned short tk_pid; /* debugging aid */
#endif
+ struct rpc_metrics tk_metrics; /* per-op statistics */
};
#define tk_auth tk_client->cl_auth
#define tk_xprt tk_client->cl_xprt
diff -X ../../dont-diff -Naurp 05-dir-schedule/include/linux/sunrpc/xprt.h 06-rpc_metrics/include/linux/sunrpc/xprt.h
--- 05-dir-schedule/include/linux/sunrpc/xprt.h 2004-03-26 14:29:38.485240000 -0500
+++ 06-rpc_metrics/include/linux/sunrpc/xprt.h 2004-03-29 15:16:41.589247000 -0500
@@ -188,6 +188,7 @@ struct rpc_xprt {
struct list_head recv;
+ struct xprt_metrics metrics;
void (*old_data_ready)(struct sock *, int);
void (*old_state_change)(struct sock *);
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/clnt.c 06-rpc_metrics/net/sunrpc/clnt.c
--- 05-dir-schedule/net/sunrpc/clnt.c 2004-03-26 14:29:38.519242000 -0500
+++ 06-rpc_metrics/net/sunrpc/clnt.c 2004-03-29 15:16:41.593240000 -0500
@@ -797,6 +797,7 @@ call_timeout(struct rpc_task *task)
to->to_retries = clnt->cl_timeout.to_retries;
dprintk("RPC: %4d call_timeout (major)\n", task->tk_pid);
+ clnt->cl_xprt->metrics.xm_major_tmos++;
if (RPC_IS_SOFT(task)) {
if (clnt->cl_chatty)
printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/sched.c 06-rpc_metrics/net/sunrpc/sched.c
--- 05-dir-schedule/net/sunrpc/sched.c 2004-03-26 14:29:38.537240000 -0500
+++ 06-rpc_metrics/net/sunrpc/sched.c 2004-03-29 15:16:41.598240000 -0500
@@ -890,6 +890,9 @@ void rpc_init_task(struct rpc_task *task
task->tk_magic = 0xf00baa;
task->tk_pid = rpc_task_id++;
#endif
+
+ rpc_start_tally(task);
+
dprintk("RPC: %4d new task procpid %d\n", task->tk_pid,
current->pid);
}
@@ -956,6 +959,9 @@ rpc_release_task(struct rpc_task *task)
}
#endif
+ /* Tally per-op statistics */
+ rpc_end_tally(task);
+
/* Remove from global task list */
spin_lock(&rpc_sched_lock);
list_del(&task->tk_task);
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/stats.c 06-rpc_metrics/net/sunrpc/stats.c
--- 05-dir-schedule/net/sunrpc/stats.c 2004-03-10 21:55:54.000000000 -0500
+++ 06-rpc_metrics/net/sunrpc/stats.c 2004-03-31 10:51:24.296000000 -0500
@@ -20,6 +20,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/sunrpc/clnt.h>
+#include <linux/sunrpc/sched.h>
#include <linux/sunrpc/svcsock.h>
#define RPCDBG_FACILITY RPCDBG_MISC
@@ -107,6 +108,255 @@ void svc_seq_show(struct seq_file *seq,
}
/*
+ * Export a generic mechanism for RPC client applications to collect and
+ * report per-RPC statistics.
+ */
+struct rpc_tally *
+rpc_alloc_tally(struct rpc_program *prog, struct rpc_version *version,
+ struct rpc_xprt *xprt)
+{
+ unsigned int i, size, ops;
+ struct rpc_tally *new;
+ struct rpc_metric_totals *t;
+
+ /* one piece of memory for tally struct and metrics array */
+ ops = version->nrprocs;
+ size = NR_CPUS * ops * sizeof(struct rpc_metric_totals);
+ size += sizeof(struct rpc_tally);
+
+ new = (struct rpc_tally *) kmalloc(size, GFP_KERNEL);
+ if (new) {
+ memset(new, 0, size);
+ new->tl_xprt = xprt;
+ new->tl_program = prog;
+ new->tl_version = version;
+
+ /* save a pointer for each CPU's set of metric totals */
+ t = (struct rpc_metric_totals *) &new[1];
+ for (i = 0; i < NR_CPUS; i++) {
+ new->tl_totals[i] = t;
+ t += ops;
+ }
+ }
+ return new;
+}
+
+void
+rpc_free_tally(struct rpc_tally *tally)
+{
+ kfree(tally);
+}
+
+void
+rpc_start_tally(struct rpc_task *task)
+{
+ do_gettimeofday(&task->tk_metrics.rm_task_start);
+}
+
+void
+rpc_end_tally(struct rpc_task *task)
+{
+ unsigned long long tmp;
+ unsigned int op = task->tk_msg.rpc_proc->p_proc;
+ struct rpc_tally *tally = task->tk_msg.rpc_talp;
+ struct rpc_metrics *m;
+ struct rpc_metric_totals *t;
+
+ if (!tally)
+ return;
+
+ t = tally->tl_totals[smp_processor_id()];
+ t = &t[op];
+ m = &task->tk_metrics;
+
+ t->mt_ops++;
+
+ t->mt_ntrans += m->rm_ntrans;
+ t->mt_cwnd += m->rm_cwnd;
+ t->mt_cong += m->rm_cong;
+ t->mt_slot_u += m->rm_slots;
+ t->mt_sndq_u += m->rm_sndq;
+ t->mt_bklog_u += m->rm_backlog;
+
+ if (task->tk_status == EIO)
+ t->mt_errors++;
+
+ t->mt_bytes_sent += (unsigned long long) m->rm_bytes_sent;
+ t->mt_bytes_recv += (unsigned long long) m->rm_bytes_recv;
+ t->mt_sendbuf += (unsigned long long) m->rm_sock_sndbuf;
+
+ tmp = (unsigned long long) m->rm_rtt;
+ t->mt_rtt += tmp;
+ t->mt_rtt_s += tmp * tmp;
+ tmp = (unsigned long long) rpc_calc_latency(&m->rm_task_start);
+ t->mt_execute += tmp;
+ t->mt_execute_s += tmp * tmp;
+}
+
+static void
+__print_name(struct seq_file *seq, struct rpc_version *vers, unsigned int op)
+{
+ if (vers->procs[op].p_name)
+ seq_printf(seq, "%12s: ", vers->procs[op].p_name);
+ else if (op == 0)
+ seq_printf(seq, " NULL: ");
+ else
+ seq_printf(seq, "%12u: ", op);
+}
+
+/*
+ * TODO: add some GSS metrics
+ */
+void
+rpc_print_tally(struct seq_file *seq, struct rpc_tally *tally)
+{
+ struct rpc_program *prog = tally->tl_program;
+ struct rpc_version *vers = tally->tl_version;
+ struct rpc_metric_totals total;
+ unsigned int i, op;
+
+ seq_printf(seq, "RPC metrics version: 1.0\n");
+ seq_printf(seq,
+ "protocol: '%s' program: %u version: %u\n",
+ prog->name,
+ prog->number,
+ vers->number);
+
+ /*
+ * Transport stats
+ */
+ if (tally->tl_xprt) {
+ const struct rpc_xprt *xprt = tally->tl_xprt;
+ const struct xprt_metrics *m = &xprt->metrics;
+
+ seq_printf(seq,
+ "\ntransport idle time: %lu seconds\n",
+ (jiffies - xprt->last_used) / HZ);
+ seq_printf(seq,
+ "major timeouts: %lu\n",
+ m->xm_major_tmos);
+ seq_printf(seq,
+ "transport partial writes: %lu\n",
+ m->xm_partialwrs);
+ seq_printf(seq,
+ "write_space callbacks: %lu\n",
+ m->xm_writespace);
+ if (tally->tl_xprt->stream) {
+ seq_printf(seq, "transport socket type: tcp\n");
+ seq_printf(seq,
+ "connect attempts: %lu\n",
+ m->xm_connects);
+ seq_printf(seq,
+ "total connect wait time: %Lu usecs\n",
+ m->xm_conn_wait);
+ } else
+ seq_printf(seq, "transport socket type: udp\n");
+ }
+
+ /*
+ * op rate and byte rate stats
+ */
+ seq_printf(seq, "\n# optype op count bytes in");
+ seq_printf(seq, " bytes out ntrans errors\n");
+
+ for (op = 0; op < vers->nrprocs; op++) {
+ memset(&total, 0, sizeof(struct rpc_metric_totals));
+ for (i = 0; i < NR_CPUS; i++) {
+ struct rpc_metric_totals *r = tally->tl_totals[i];
+ r = &r[op];
+ total.mt_ops += r->mt_ops;
+ total.mt_ntrans += r->mt_ntrans;
+ total.mt_errors += r->mt_errors;
+ total.mt_bytes_sent += r->mt_bytes_sent;
+ total.mt_bytes_recv += r->mt_bytes_recv;
+ }
+ __print_name(seq, vers, op);
+ seq_printf(seq,
+ "%10lu %20Lu %20Lu %10lu %10lu\n",
+ total.mt_ops,
+ total.mt_bytes_recv,
+ total.mt_bytes_sent,
+ total.mt_ntrans,
+ total.mt_errors);
+ }
+
+ /*
+ * latency stats
+ */
+ seq_printf(seq, "\nticks/sec: %Lu\n", RPC_USECS_PER_SEC);
+ seq_printf(seq, "# optype rtt total ticks sum rtt ** 2");
+ seq_printf(seq, " execute total ticks sum execute ** 2\n");
+
+ for (op = 0; op < vers->nrprocs; op++) {
+ memset(&total, 0, sizeof(struct rpc_metric_totals));
+ for (i = 0; i < NR_CPUS; i++) {
+ struct rpc_metric_totals *r = tally->tl_totals[i];
+ r = &r[op];
+ total.mt_rtt += r->mt_rtt;
+ total.mt_rtt_s += r->mt_rtt_s;
+ total.mt_execute += r->mt_execute;
+ total.mt_execute_s += r->mt_execute_s;
+ }
+ __print_name(seq, vers, op);
+ seq_printf(seq,
+ "%20Lu %20Lu %20Lu %20Lu\n",
+ total.mt_rtt,
+ total.mt_rtt_s,
+ total.mt_execute,
+ total.mt_execute_s);
+ }
+
+ /*
+ * queue stats
+ */
+ seq_printf(seq, "\n# optype slot util backlog sndq util");
+ seq_printf(seq, " cwnd cong sendbuf\n");
+
+ for (op = 0; op < vers->nrprocs; op++) {
+ memset(&total, 0, sizeof(struct rpc_metric_totals));
+ for (i = 0; i < NR_CPUS; i++) {
+ struct rpc_metric_totals *r = tally->tl_totals[i];
+ r = &r[op];
+ total.mt_slot_u += r->mt_slot_u;
+ total.mt_sndq_u += r->mt_sndq_u;
+ total.mt_bklog_u += r->mt_bklog_u;
+ total.mt_sendbuf += r->mt_sendbuf;
+ total.mt_cwnd += r->mt_cwnd;
+ total.mt_cong += r->mt_cong;
+ }
+ __print_name(seq, vers, op);
+ seq_printf(seq, "%10lu %10lu %10lu %10lu %10lu %20Lu\n",
+ total.mt_slot_u,
+ total.mt_sndq_u,
+ total.mt_bklog_u,
+ total.mt_cwnd,
+ total.mt_cong,
+ total.mt_sendbuf);
+ }
+}
+
+void
+rpc_clear_tally(struct rpc_tally *tally)
+{
+ unsigned int i, size = sizeof(struct rpc_metric_totals) *
+ tally->tl_version->nrprocs;
+ struct xprt_metrics *m = &tally->tl_xprt->metrics;
+
+ for (i = 0; i < NR_CPUS; i++)
+ memset(tally->tl_totals[i], 0, size);
+
+ /* we don't clear time-related fields */
+ m->xm_connects = 0UL;
+ m->xm_conn_wait = 0ULL;
+ m->xm_slots_inuse = 0U;
+ m->xm_sndq_len = 0U;
+ m->xm_backlog_len = 0U;
+ m->xm_major_tmos = 0U;
+ m->xm_partialwrs = 0U;
+ m->xm_writespace = 0U;
+}
+
+/*
* Register/unregister RPC proc files
*/
static inline struct proc_dir_entry *
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/sunrpc_syms.c 06-rpc_metrics/net/sunrpc/sunrpc_syms.c
--- 05-dir-schedule/net/sunrpc/sunrpc_syms.c 2004-03-26 14:29:38.551240000 -0500
+++ 06-rpc_metrics/net/sunrpc/sunrpc_syms.c 2004-03-29 15:16:41.605246000 -0500
@@ -92,6 +92,10 @@ EXPORT_SYMBOL(auth_domain_lookup);
/* RPC statistics */
#ifdef CONFIG_PROC_FS
+EXPORT_SYMBOL(rpc_alloc_tally);
+EXPORT_SYMBOL(rpc_print_tally);
+EXPORT_SYMBOL(rpc_clear_tally);
+EXPORT_SYMBOL(rpc_free_tally);
EXPORT_SYMBOL(rpc_proc_register);
EXPORT_SYMBOL(rpc_proc_unregister);
EXPORT_SYMBOL(svc_proc_register);
diff -X ../../dont-diff -Naurp 05-dir-schedule/net/sunrpc/xprt.c 06-rpc_metrics/net/sunrpc/xprt.c
--- 05-dir-schedule/net/sunrpc/xprt.c 2004-03-26 14:29:38.586243000 -0500
+++ 06-rpc_metrics/net/sunrpc/xprt.c 2004-03-30 17:24:08.132240000 -0500
@@ -298,6 +298,7 @@ static int
__xprt_get_cong(struct rpc_xprt *xprt, struct rpc_task *task)
{
struct rpc_rqst *req = task->tk_rqstp;
+ struct rpc_metrics *metrics = &task->tk_metrics;
if (req->rq_cong)
return 1;
@@ -307,6 +308,8 @@ __xprt_get_cong(struct rpc_xprt *xprt, s
return 0;
req->rq_cong = 1;
xprt->cong += RPC_CWNDSCALE;
+ metrics->rm_cwnd = xprt->cwnd;
+ metrics->rm_cong = xprt->cong;
return 1;
}
@@ -487,6 +490,8 @@ static void xprt_socket_connect(void *ar
/*
* Tell the socket layer to start connecting...
*/
+ xprt->metrics.xm_connects++;
+ do_gettimeofday(&xprt->metrics.xm_conn_start);
status = sock->ops->connect(sock, (struct sockaddr *) &xprt->addr,
sizeof(xprt->addr), O_NONBLOCK);
dprintk("RPC: %p connect status %d connected %d sock state %d\n",
@@ -559,6 +564,8 @@ xprt_connect_status(struct rpc_task *tas
struct rpc_xprt *xprt = task->tk_xprt;
if (task->tk_status >= 0) {
+ xprt->metrics.xm_conn_wait += (unsigned long long)
+ rpc_calc_latency(&xprt->metrics.xm_conn_start);
dprintk("RPC: %4d xprt_connect_status: connection established\n",
task->tk_pid);
return;
@@ -613,6 +620,7 @@ xprt_complete_rqst(struct rpc_xprt *xprt
{
struct rpc_task *task = req->rq_task;
struct rpc_clnt *clnt = task->tk_client;
+ struct rpc_metrics *metrics = &task->tk_metrics;
/* Adjust congestion window */
if (!xprt->nocong) {
@@ -627,6 +635,13 @@ xprt_complete_rqst(struct rpc_xprt *xprt
}
}
+ metrics->rm_ntrans = req->rq_ntrans;
+ metrics->rm_bytes_recv = copied;
+ metrics->rm_slots = xprt->metrics.xm_slots_inuse;
+ metrics->rm_sndq = xprt->metrics.xm_sndq_len;
+ metrics->rm_backlog = xprt->metrics.xm_backlog_len;
+ metrics->rm_rtt = rpc_calc_latency(&metrics->rm_xmit_time);
+
#ifdef RPC_PROFILE
/* Profile only reads for now */
if (copied > 1024) {
@@ -1062,6 +1077,8 @@ xprt_write_space(struct sock *sk)
if (xprt->shutdown)
goto out;
+ xprt->metrics.xm_writespace++;
+
/* Wait until we have enough socket memory */
if (xprt->stream) {
/* from net/ipv4/tcp.c:tcp_write_space */
@@ -1151,6 +1168,7 @@ xprt_transmit(struct rpc_task *task)
struct rpc_clnt *clnt = task->tk_client;
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
+ struct rpc_metrics *metrics = &task->tk_metrics;
int status, retry = 0;
@@ -1195,16 +1213,24 @@ xprt_transmit(struct rpc_task *task)
/* If we've sent the entire packet, immediately
* reset the count of bytes sent. */
if (req->rq_bytes_sent >= req->rq_slen) {
+ metrics->rm_bytes_sent = req->rq_bytes_sent;
+ metrics->rm_sock_sndbuf =
+ xprt->inet->sk_wmem_queued;
req->rq_bytes_sent = 0;
goto out_receive;
}
} else {
- if (status >= req->rq_slen)
+ if (status >= req->rq_slen) {
+ metrics->rm_bytes_sent = status;
+ metrics->rm_sock_sndbuf =
+ atomic_read(&xprt->inet->sk_wmem_alloc);
goto out_receive;
+ }
status = -EAGAIN;
break;
}
+ xprt->metrics.xm_partialwrs++;
dprintk("RPC: %4d xmit incomplete (%d left of %d)\n",
task->tk_pid, req->rq_slen - req->rq_bytes_sent,
req->rq_slen);
@@ -1251,6 +1277,7 @@ xprt_transmit(struct rpc_task *task)
return;
out_receive:
dprintk("RPC: %4d xmit complete\n", task->tk_pid);
+ do_gettimeofday(&metrics->rm_xmit_time);
/* Set the task's receive timeout value */
spin_lock_bh(&xprt->sock_lock);
if (!xprt->nocong) {
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] 1/3 - RPC metrics support
2004-03-31 16:34 Lever, Charles
@ 2004-03-31 16:46 ` Trond Myklebust
2004-04-01 8:13 ` Greg Banks
1 sibling, 0 replies; 19+ messages in thread
From: Trond Myklebust @ 2004-03-31 16:46 UTC (permalink / raw)
To: Charles Lever
Cc: Olaf Kirch, Steve Dickson, Jeremy McNicoll, Patrick Mochel, nfs
My experience with gettimeofday() was that it slows down and eventually
hangs the machine when you use it inside bottom halves. That's why I had
to revert to using 'jiffies' instead. While I agree that precision may
be important, a measurement that so impacts on what it is measuring is
not particularly useful either...
Could you also move EXPORT_SYMBOL() definitions to after their function
definition, as per the 2.6.x recommendations.
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] 1/3 - RPC metrics support
2004-03-31 16:34 Lever, Charles
2004-03-31 16:46 ` Trond Myklebust
@ 2004-04-01 8:13 ` Greg Banks
1 sibling, 0 replies; 19+ messages in thread
From: Greg Banks @ 2004-04-01 8:13 UTC (permalink / raw)
To: Lever, Charles; +Cc: Jeremy McNicoll, Patrick Mochel, nfs
G'day,
[cc trimmed somewhat]
"Lever, Charles" wrote:
>
> -------------------------------------------------------------------------------------
> Name: 06-rpc_metrics.patch
> 06-rpc_metrics.patch Type: unspecified type (application/octet-stream)
> Encoding: base64
> Description: 06-rpc_metrics.patch
> -------------------------------------------------------------------------------------
> Name: 07-nfs-iostat.patch
> 07-nfs-iostat.patch Type: unspecified type (application/octet-stream)
> Encoding: base64
> Description: 07-nfs-iostat.patch
> -------------------------------------------------------------------------------------
> Name: 08-rpc_call.patch
> 08-rpc_call.patch Type: unspecified type (application/octet-stream)
> Encoding: base64
> Description: 08-rpc_call.patch
These patches look great, I'm very happy to see extended stats
on the client and I expect it will be very useful debugging
performance and other problems. Thanks! Just a few comments...
I don't understand why you have an rpc_tally pointer in rpc_message.
This just seems to complicate matters by forcing you to initialise
the pointer to the rpc_tally in the nfs_server on every call. AFAICT
all the place where you need the rpc_tally pointer you already have
an rpc_clnt pointer which is initialised directly from the nfs_server
and could be made to hold the rpc_tally* instead. With that reorg
you could drop most of the 08-rpc_call.patch and all the changes
to nfs3proc.c and proc.c in nfs-iostat.patch and replace them with
a small change to nfs_create_client(). That would greatly reduce
the code perturbation without affecting performance or functionality.
I don't see any way for the userspace program which reads these to
figure out which of the iostat files corresponds to which mount.
We get the hostname in the filename and again in the contents but
nowhere do we get the mount path, only the device minor number in
the filename. How about adding a line to nfs_iostat_show() to
print nfss->mnt_path?
The minor dimension of the 2d array of struct rpc_metrics_totals
pointed to by rpc_tally is per-CPU. To get correct per-CPU
separation you will want to cacheline-align the per-CPU parts
(so that cachelines which straddle the per-CPU parts don't end
up bouncing between CPUs). For example, in rpc_alloc_tally(),
size = NR_CPUS * ROUNDUP(ops * sizeof(struct rpc_metric_totals), L1_CACHE_BYTES);
and change the stride of the tl_totals[] initialisation loop.
As others have mentioned, having at least four do_gettimeofday()
calls per RPC call is evil; we've seen major problems scaling
do_gettimeofday() on large CPU count Altix machines. This problem
also affects some NIC drivers because the network stack does
a do_gettimeofday() to timestamp every incoming ethernet packet;
David Miller's proposed solution to that is to define a new
scalable high-resolution timestamp API. I don't know if that's
gone anywhere, you might try asking on the netdev list.
So when the mount is unmounted, all stats are lost? How does this
work on clients which do intermittent traffic on automount mounts?
Would it be possible to get some global stats too?
The accumulation-over-CPU loops in nfs_iostat_show() and rpc_print_tally()
would be simpler if all the fields in struct nfs_iostat and struct
rpc_metric_totals were a consistent size, so you could just treat the
structs as an array of that type during the accumulation step.
In struct nfs_iostat, it would be nice to gather the number of times
calls retried because they received EJUKEBOX from the server, the number
of SETATTRs which cause a truncate, the number of WRITEs which cause
a file extension, and the number of WRITEs which are not aligned to
the wsize.
nfs_iostat_file_name() does an unbounded sprintf() of an unbounded string
that comes from userspace (from an unprivileged user if the site runs
automounter) into a buffer only NFS_PROC_NAMELEN=256 bytes long.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2004-04-04 6:17 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-01 17:05 [PATCH] 1/3 - RPC metrics support Lever, Charles
2004-04-02 0:17 ` Greg Banks
2004-04-02 0:36 ` Trond Myklebust
2004-04-02 0:49 ` Greg Banks
2004-04-02 1:41 ` Trond Myklebust
2004-04-02 2:42 ` Greg Banks
-- strict thread matches above, loose matches on Subject: below --
2004-04-04 2:35 Lever, Charles
2004-04-04 6:17 ` Greg Banks
2004-04-01 16:45 Lever, Charles
2004-04-02 0:10 ` Greg Banks
2004-04-01 15:15 Lever, Charles
2004-04-01 23:58 ` Greg Banks
2004-04-02 0:34 ` Trond Myklebust
2004-04-02 2:33 ` Greg Banks
2004-03-31 16:57 Lever, Charles
2004-03-31 17:19 ` Trond Myklebust
2004-03-31 16:34 Lever, Charles
2004-03-31 16:46 ` Trond Myklebust
2004-04-01 8:13 ` Greg Banks
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.