* [Patch 7/8] documentation
@ 2006-05-02 6:20 Balbir Singh
0 siblings, 0 replies; 2+ messages in thread
From: Balbir Singh @ 2006-05-02 6:20 UTC (permalink / raw)
To: linux-kernel; +Cc: lse-tech, jlan
delayacct-doc.patch
Some documentation for delay accounting.
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
---
Documentation/accounting/delay-accounting.txt | 115 +++++++
Documentation/accounting/getdelays.c | 376 ++++++++++++++++++++++++++
Documentation/accounting/taskstats.txt | 2
3 files changed, 493 insertions(+)
diff -puN /dev/null Documentation/accounting/delay-accounting.txt
--- /dev/null 2004-06-24 23:34:38.000000000 +0530
+++ linux-2.6.17-rc3-balbir/Documentation/accounting/delay-accounting.txt 2006-05-02 10:36:37.000000000 +0530
@@ -0,0 +1,115 @@
+Delay accounting
+----------------
+
+Tasks encounter delays in execution when they wait
+for some kernel resource to become available e.g. a
+runnable task may wait for a free CPU to run on.
+
+The per-task delay accounting functionality measures
+the delays experienced by a task while
+
+a) waiting for a CPU (while being runnable)
+b) completion of synchronous block I/O initiated by the task
+c) swapping in pages
+
+and makes these statistics available to userspace through
+the taskstats interface.
+
+Such delays provide feedback for setting a task's cpu priority,
+io priority and rss limit values appropriately. Long delays for
+important tasks could be a trigger for raising its corresponding priority.
+
+The functionality, through its use of the taskstats interface, also provides
+delay statistics aggregated for all tasks (or threads) belonging to a
+thread group (corresponding to a traditional Unix process). This is a commonly
+needed aggregation that is more efficiently done by the kernel.
+
+Userspace utilities, particularly resource management applications, can also
+aggregate delay statistics into arbitrary groups. To enable this, delay
+statistics of a task are available both during its lifetime as well as on its
+exit, ensuring continuous and complete monitoring can be done.
+
+
+Interface
+---------
+
+Delay accounting uses the taskstats interface which is described
+in detail in a separate document in this directory. Taskstats returns a
+generic data structure to userspace corresponding to per-pid and per-tgid
+statistics. The delay accounting functionality populates specific fields of
+this structure. See
+ include/linux/taskstats.h
+for a description of the fields pertaining to delay accounting.
+It will generally be in the form of counters returning the cumulative
+delay seen for cpu, sync block I/O, swapin etc.
+
+Taking the difference of two successive readings of a given
+counter (say cpu_delay_total) for a task will give the delay
+experienced by the task waiting for the corresponding resource
+in that interval.
+
+When a task exits, records containing the per-task and per-process statistics
+are sent to userspace without requiring a command. More details are given in
+the taskstats interface description.
+
+The getdelays.c userspace utility in this directory allows simple commands to
+be run and the corresponding delay statistics to be displayed. It also serves
+as an example of using the taskstats interface.
+
+Usage
+-----
+
+Compile the kernel with
+ CONFIG_TASK_DELAY_ACCT=y
+ CONFIG_TASKSTATS=y
+
+Enable the accounting at boot time by adding
+the following to the kernel boot options
+ delayacct
+
+and after the system has booted up, use a utility
+similar to getdelays.c to access the delays
+seen by a given task or a task group (tgid).
+The utility also allows a given command to be
+executed and the corresponding delays to be
+seen.
+
+General format of the getdelays command
+
+getdelays [-t tgid] [-p pid] [-c cmd...]
+
+
+Get delays, since system boot, for pid 10
+# ./getdelays -p 10
+(output similar to next case)
+
+Get sum of delays, since system boot, for all pids with tgid 5
+# ./getdelays -t 5
+
+
+CPU count real total virtual total delay total
+ 7876 92005750 100000000 24001500
+IO count delay total
+ 0 0
+MEM count delay total
+ 0 0
+
+Get delays seen in executing a given simple command
+# ./getdelays -c ls /
+
+bin data1 data3 data5 dev home media opt root srv sys usr
+boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
+
+
+CPU count real total virtual total delay total
+ 6 4000250 4000000 0
+IO count delay total
+ 0 0
+MEM count delay total
+ 0 0
+
+
+
+
+
+
diff -puN /dev/null Documentation/accounting/getdelays.c
--- /dev/null 2004-06-24 23:34:38.000000000 +0530
+++ linux-2.6.17-rc3-balbir/Documentation/accounting/getdelays.c 2006-05-02 10:36:37.000000000 +0530
@@ -0,0 +1,376 @@
+/* getdelays.c
+ *
+ * Utility to get per-pid and per-tgid delay accounting statistics
+ * Also illustrates usage of the taskstats interface
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2005
+ * Copyright (C) Balbir Singh, IBM Corp. 2006
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <unistd.h>
+#include <poll.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <signal.h>
+
+#include <linux/genetlink.h>
+#include <linux/taskstats.h>
+
+/*
+ * Generic macros for dealing with netlink sockets. Might be duplicated
+ * elsewhere. It is recommended that commercial grade applications use
+ * libnl or libnetlink and use the interfaces provided by the library
+ */
+#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
+#define GENLMSG_PAYLOAD(glh) (NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN)
+#define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN))
+#define NLA_PAYLOAD(len) (len - NLA_HDRLEN)
+
+#define err(code, fmt, arg...) do { printf(fmt, ##arg); exit(code); } while (0)
+int done = 0;
+
+/*
+ * Create a raw netlink socket and bind
+ */
+static int create_nl_socket(int protocol, int groups)
+{
+ socklen_t addr_len;
+ int fd;
+ struct sockaddr_nl local;
+
+ fd = socket(AF_NETLINK, SOCK_RAW, protocol);
+ if (fd < 0)
+ return -1;
+
+ memset(&local, 0, sizeof(local));
+ local.nl_family = AF_NETLINK;
+ local.nl_groups = groups;
+
+ if (bind(fd, (struct sockaddr *) &local, sizeof(local)) < 0)
+ goto error;
+
+ return fd;
+ error:
+ close(fd);
+ return -1;
+}
+
+int sendto_fd(int s, const char *buf, int bufLen)
+{
+ struct sockaddr_nl nladdr;
+ int r;
+
+ memset(&nladdr, 0, sizeof(nladdr));
+ nladdr.nl_family = AF_NETLINK;
+
+ while ((r = sendto(s, buf, bufLen, 0, (struct sockaddr *) &nladdr,
+ sizeof(nladdr))) < bufLen) {
+ if (r > 0) {
+ buf += r;
+ bufLen -= r;
+ } else if (errno != EAGAIN)
+ return -1;
+ }
+ return 0;
+}
+
+/*
+ * Probe the controller in genetlink to find the family id
+ * for the TASKSTATS family
+ */
+int get_family_id(int sd)
+{
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } family_req;
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } ans;
+
+ int id;
+ struct nlattr *na;
+ int rep_len;
+
+ /* Get family name */
+ family_req.n.nlmsg_type = GENL_ID_CTRL;
+ family_req.n.nlmsg_flags = NLM_F_REQUEST;
+ family_req.n.nlmsg_seq = 0;
+ family_req.n.nlmsg_pid = getpid();
+ family_req.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
+ family_req.g.cmd = CTRL_CMD_GETFAMILY;
+ family_req.g.version = 0x1;
+ na = (struct nlattr *) GENLMSG_DATA(&family_req);
+ na->nla_type = CTRL_ATTR_FAMILY_NAME;
+ na->nla_len = strlen(TASKSTATS_GENL_NAME) + 1 + NLA_HDRLEN;
+ strcpy(NLA_DATA(na), TASKSTATS_GENL_NAME);
+ family_req.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);
+
+ if (sendto_fd(sd, (char *) &family_req, family_req.n.nlmsg_len) < 0)
+ err(1, "error sending message via Netlink\n");
+
+ rep_len = recv(sd, &ans, sizeof(ans), 0);
+
+ if (rep_len < 0)
+ err(1, "error receiving reply message via Netlink\n");
+
+
+ /* Validate response message */
+ if (!NLMSG_OK((&ans.n), rep_len))
+ err(1, "invalid reply message received via Netlink\n");
+
+ if (ans.n.nlmsg_type == NLMSG_ERROR) { /* error */
+ printf("error received NACK - leaving\n");
+ exit(1);
+ }
+
+
+ na = (struct nlattr *) GENLMSG_DATA(&ans);
+ na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len));
+ if (na->nla_type == CTRL_ATTR_FAMILY_ID) {
+ id = *(__u16 *) NLA_DATA(na);
+ }
+ return id;
+}
+
+void print_taskstats(struct taskstats *t)
+{
+ printf("\n\nCPU %15s%15s%15s%15s\n"
+ " %15llu%15llu%15llu%15llu\n"
+ "IO %15s%15s\n"
+ " %15llu%15llu\n"
+ "MEM %15s%15s\n"
+ " %15llu%15llu\n\n",
+ "count", "real total", "virtual total", "delay total",
+ t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
+ t->cpu_delay_total,
+ "count", "delay total",
+ t->blkio_count, t->blkio_delay_total,
+ "count", "delay total", t->swapin_count, t->swapin_delay_total);
+}
+
+void sigchld(int sig)
+{
+ done = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ int rc;
+ int sk_nl;
+ struct nlmsghdr *nlh;
+ struct genlmsghdr *genlhdr;
+ char *buf;
+ struct taskstats_cmd_param *param;
+ __u16 id;
+ struct nlattr *na;
+
+ /* For receiving */
+ struct sockaddr_nl kern_nla, from_nla;
+ socklen_t from_nla_len;
+ int recv_len;
+ struct taskstats_reply *reply;
+
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } req;
+
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } ans;
+
+ int nl_sd = -1;
+ int rep_len;
+ int len = 0;
+ int aggr_len, len2;
+ struct sockaddr_nl nladdr;
+ pid_t tid = 0;
+ pid_t rtid = 0;
+ int cmd_type = TASKSTATS_TYPE_TGID;
+ int c, status;
+ int forking = 0;
+ struct sigaction act = {
+ .sa_handler = SIG_IGN,
+ .sa_mask = SA_NOMASK,
+ };
+ struct sigaction tact ;
+
+ if (argc < 3) {
+ printf("usage %s [-t tgid][-p pid][-c cmd]\n", argv[0]);
+ exit(-1);
+ }
+
+ tact.sa_handler = sigchld;
+ sigemptyset(&tact.sa_mask);
+ if (sigaction(SIGCHLD, &tact, NULL) < 0)
+ err(1, "sigaction failed for SIGCHLD\n");
+
+ while (1) {
+
+ c = getopt(argc, argv, "t:p:c:");
+ if (c < 0)
+ break;
+
+ switch (c) {
+ case 't':
+ tid = atoi(optarg);
+ if (!tid)
+ err(1, "Invalid tgid\n");
+ cmd_type = TASKSTATS_CMD_ATTR_TGID;
+ break;
+ case 'p':
+ tid = atoi(optarg);
+ if (!tid)
+ err(1, "Invalid pid\n");
+ cmd_type = TASKSTATS_CMD_ATTR_TGID;
+ break;
+ case 'c':
+ opterr = 0;
+ tid = fork();
+ if (tid < 0)
+ err(1, "fork failed\n");
+
+ if (tid == 0) { /* child process */
+ if (execvp(argv[optind - 1], &argv[optind - 1]) < 0) {
+ exit(-1);
+ }
+ }
+ forking = 1;
+ break;
+ default:
+ printf("usage %s [-t tgid][-p pid][-c cmd]\n", argv[0]);
+ exit(-1);
+ break;
+ }
+ if (c == 'c')
+ break;
+ }
+
+ /* Construct Netlink request message */
+
+ /* Send Netlink request message & get reply */
+
+ if ((nl_sd =
+ create_nl_socket(NETLINK_GENERIC, TASKSTATS_LISTEN_GROUP)) < 0)
+ err(1, "error creating Netlink socket\n");
+
+
+ id = get_family_id(nl_sd);
+
+ /* Send command needed */
+ req.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
+ req.n.nlmsg_type = id;
+ req.n.nlmsg_flags = NLM_F_REQUEST;
+ req.n.nlmsg_seq = 0;
+ req.n.nlmsg_pid = tid;
+ req.g.cmd = TASKSTATS_CMD_GET;
+ na = (struct nlattr *) GENLMSG_DATA(&req);
+ na->nla_type = cmd_type;
+ na->nla_len = sizeof(unsigned int) + NLA_HDRLEN;
+ *(__u32 *) NLA_DATA(na) = tid;
+ req.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);
+
+
+ if (!forking && sendto_fd(nl_sd, (char *) &req, req.n.nlmsg_len) < 0)
+ err(1, "error sending message via Netlink\n");
+
+ act.sa_handler = SIG_IGN;
+ sigemptyset(&act.sa_mask);
+ if (sigaction(SIGINT, &act, NULL) < 0)
+ err(1, "sigaction failed for SIGINT\n");
+
+ do {
+ int i;
+ struct pollfd pfd;
+ int pollres;
+
+ pfd.events = 0xffff & ~POLLOUT;
+ pfd.fd = nl_sd;
+ pollres = poll(&pfd, 1, 5000);
+ if (pollres < 0 || done) {
+ break;
+ }
+
+ rep_len = recv(nl_sd, &ans, sizeof(ans), 0);
+ nladdr.nl_family = AF_NETLINK;
+ nladdr.nl_groups = TASKSTATS_LISTEN_GROUP;
+
+ if (ans.n.nlmsg_type == NLMSG_ERROR) { /* error */
+ printf("error received NACK - leaving\n");
+ exit(1);
+ }
+
+ if (rep_len < 0) {
+ err(1, "error receiving reply message via Netlink\n");
+ break;
+ }
+
+ /* Validate response message */
+ if (!NLMSG_OK((&ans.n), rep_len))
+ err(1, "invalid reply message received via Netlink\n");
+
+ rep_len = GENLMSG_PAYLOAD(&ans.n);
+
+ na = (struct nlattr *) GENLMSG_DATA(&ans);
+ len = 0;
+ i = 0;
+ while (len < rep_len) {
+ len += NLA_ALIGN(na->nla_len);
+ switch (na->nla_type) {
+ case TASKSTATS_TYPE_AGGR_PID:
+ /* Fall through */
+ case TASKSTATS_TYPE_AGGR_TGID:
+ aggr_len = NLA_PAYLOAD(na->nla_len);
+ len2 = 0;
+ /* For nested attributes, na follows */
+ na = (struct nlattr *) NLA_DATA(na);
+ done = 0;
+ while (len2 < aggr_len) {
+ switch (na->nla_type) {
+ case TASKSTATS_TYPE_PID:
+ rtid = *(int *) NLA_DATA(na);
+ break;
+ case TASKSTATS_TYPE_TGID:
+ rtid = *(int *) NLA_DATA(na);
+ break;
+ case TASKSTATS_TYPE_STATS:
+ if (rtid == tid) {
+ print_taskstats((struct taskstats *)
+ NLA_DATA(na));
+ done = 1;
+ }
+ break;
+ }
+ len2 += NLA_ALIGN(na->nla_len);
+ na = (struct nlattr *) ((char *) na + len2);
+ if (done)
+ break;
+ }
+ }
+ na = (struct nlattr *) (GENLMSG_DATA(&ans) + len);
+ if (done)
+ break;
+ }
+ if (done)
+ break;
+ }
+ while (1);
+
+ close(nl_sd);
+ return 0;
+}
diff -puN Documentation/accounting/taskstats.txt~delayacct-doc Documentation/accounting/taskstats.txt
--- linux-2.6.17-rc3/Documentation/accounting/taskstats.txt~delayacct-doc 2006-05-02 10:36:37.000000000 +0530
+++ linux-2.6.17-rc3-balbir/Documentation/accounting/taskstats.txt 2006-05-02 10:36:37.000000000 +0530
@@ -39,6 +39,8 @@ belongs (the task does not need to be th
per-tgid stats to be sent for each exiting task is explained in the per-tgid
stats section below.
+getdelays.c is a simple utility demonstrating usage of the taskstats interface
+for reporting delay accounting statistics.
Interface
---------
_
^ permalink raw reply [flat|nested] 2+ messages in thread* [Patch 0/8] per-task delay accounting
@ 2006-04-22 2:16 Shailabh Nagar
2006-04-22 2:40 ` [Patch 7/8] documentation Shailabh Nagar
0 siblings, 1 reply; 2+ messages in thread
From: Shailabh Nagar @ 2006-04-22 2:16 UTC (permalink / raw)
To: linux-kernel
Cc: LSE, Jes Sorensen, Peter Chubb, Erich Focht, Levent Serinol,
Jay Lan
Here are the delay accounting patches again. I'm not using the
earlier email thread due to code being refactored a bit.
The previous posting
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0603.3/1776.html
of these patches elicited several review comments from Andrew Morton
all of which have been addressed.
The other main thread of the comments was whether other accounting
stakeholders would be ok with this interface. Towards this end,
I'd posted an overview of what the other packages do (which didn't seem
to make the archives) and some of the stakeholders responded.
I'll repost the analysis as a reply to this post. Meanwhile, here's
the list of the stakeholders identified by Andrew and a summary of status
of their comments.
1. CSA accounting/PAGG/JOB: Jay Lan <jlan@engr.sgi.com>
Raised several points
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0604.1/0397.html
all of which have been addressed in this set of patches.
2. per-process IO statistics: Levent Serinol <lserinol@gmail.com>
No reponse.
I'd ascertained that its needs are a subset of CSA.
3. per-cpu time statistics: Erich Focht <efocht@ess.nec.de>
No response.
I'd ascertained that its needs can be met by taskstats
interface whenever these statistics are submitted for inclusion.
4. Microstate accounting: Peter Chubb <peterc@gelato.unsw.edu.au>
Mentioned overlap of patches with delay accounting
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0603.3/2286.html
and also that a /proc interface was preferable due to convenience.
My position is that the netlink interface is a superset of /proc due to
former's ability to supply exit-time data.
5. ELSA: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Confirmed that ELSA is not a direct user of a new kernel statistics
interface since it is a consumer of CSA or BSD accounting's statistics.
6. pnotify: Jes Sorensen <jes@sgi.com>
(taken over pnotify from Erik Jacobson)
Informed over private email that pnotify replacement is
being worked on.
I'd ascertained that pnotify (or its replacemenent) will not be
concerned with exporting data to userspace or collecting any stats.
Thats left to the kernel module that uses pnotify to get
notifications. CSA is one expected user of pnotify.
Hence CSA's concerns are the only ones relevant to pnotify as well.
7. Scalable statistics counters with /proc reporting:
Ravikiran G Thirumalai, Dipankar Sarma <dipankar@in.ibm.com>
Confirmed these counters aren't relevant to this discussion.
--Shailabh
Series
delayacct-setup.patch
delayacct-blkio-swapin.patch
delayacct-schedstats.patch
genetlink-utils.patch
taskstats-setup.patch
delayacct-taskstats.patch
delayacct-doc.patch
delayacct-procfs.patch
^ permalink raw reply [flat|nested] 2+ messages in thread* [Patch 7/8] documentation
2006-04-22 2:16 [Patch 0/8] per-task delay accounting Shailabh Nagar
@ 2006-04-22 2:40 ` Shailabh Nagar
0 siblings, 0 replies; 2+ messages in thread
From: Shailabh Nagar @ 2006-04-22 2:40 UTC (permalink / raw)
To: linux-kernel; +Cc: LSE, Jay Lan
delayacct-doc.patch
Some documentation for delay accounting.
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Documentation/accounting/delay-accounting.txt | 115 +++++++
Documentation/accounting/getdelays.c | 376 ++++++++++++++++++++++++++
Documentation/accounting/taskstats.txt | 2
3 files changed, 493 insertions(+)
Index: linux-2.6.17-rc1/Documentation/accounting/delay-accounting.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.17-rc1/Documentation/accounting/delay-accounting.txt 2006-04-21 20:50:22.000000000 -0400
@@ -0,0 +1,115 @@
+Delay accounting
+----------------
+
+Tasks encounter delays in execution when they wait
+for some kernel resource to become available e.g. a
+runnable task may wait for a free CPU to run on.
+
+The per-task delay accounting functionality measures
+the delays experienced by a task while
+
+a) waiting for a CPU (while being runnable)
+b) completion of synchronous block I/O initiated by the task
+c) swapping in pages
+
+and makes these statistics available to userspace through
+the taskstats interface.
+
+Such delays provide feedback for setting a task's cpu priority,
+io priority and rss limit values appropriately. Long delays for
+important tasks could be a trigger for raising its corresponding priority.
+
+The functionality, through its use of the taskstats interface, also provides
+delay statistics aggregated for all tasks (or threads) belonging to a
+thread group (corresponding to a traditional Unix process). This is a commonly
+needed aggregation that is more efficiently done by the kernel.
+
+Userspace utilities, particularly resource management applications, can also
+aggregate delay statistics into arbitrary groups. To enable this, delay
+statistics of a task are available both during its lifetime as well as on its
+exit, ensuring continuous and complete monitoring can be done.
+
+
+Interface
+---------
+
+Delay accounting uses the taskstats interface which is described
+in detail in a separate document in this directory. Taskstats returns a
+generic data structure to userspace corresponding to per-pid and per-tgid
+statistics. The delay accounting functionality populates specific fields of
+this structure. See
+ include/linux/taskstats.h
+for a description of the fields pertaining to delay accounting.
+It will generally be in the form of counters returning the cumulative
+delay seen for cpu, sync block I/O, swapin etc.
+
+Taking the difference of two successive readings of a given
+counter (say cpu_delay_total) for a task will give the delay
+experienced by the task waiting for the corresponding resource
+in that interval.
+
+When a task exits, records containing the per-task and per-process statistics
+are sent to userspace without requiring a command. More details are given in
+the taskstats interface description.
+
+The getdelays.c userspace utility in this directory allows simple commands to
+be run and the corresponding delay statistics to be displayed. It also serves
+as an example of using the taskstats interface.
+
+Usage
+-----
+
+Compile the kernel with
+ CONFIG_TASK_DELAY_ACCT=y
+ CONFIG_TASKSTATS=y
+
+Enable the accounting at boot time by adding
+the following to the kernel boot options
+ delayacct
+
+and after the system has booted up, use a utility
+similar to getdelays.c to access the delays
+seen by a given task or a task group (tgid).
+The utility also allows a given command to be
+executed and the corresponding delays to be
+seen.
+
+General format of the getdelays command
+
+getdelays [-t tgid] [-p pid] [-c cmd...]
+
+
+Get delays, since system boot, for pid 10
+# ./getdelays -p 10
+(output similar to next case)
+
+Get sum of delays, since system boot, for all pids with tgid 5
+# ./getdelays -t 5
+
+
+CPU count real total virtual total delay total
+ 7876 92005750 100000000 24001500
+IO count delay total
+ 0 0
+MEM count delay total
+ 0 0
+
+Get delays seen in executing a given simple command
+# ./getdelays -c ls /
+
+bin data1 data3 data5 dev home media opt root srv sys usr
+boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
+
+
+CPU count real total virtual total delay total
+ 6 4000250 4000000 0
+IO count delay total
+ 0 0
+MEM count delay total
+ 0 0
+
+
+
+
+
+
Index: linux-2.6.17-rc1/Documentation/accounting/getdelays.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.17-rc1/Documentation/accounting/getdelays.c 2006-04-21 20:53:54.000000000 -0400
@@ -0,0 +1,376 @@
+/* getdelays.c
+ *
+ * Utility to get per-pid and per-tgid delay accounting statistics
+ * Also illustrates usage of the taskstats interface
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2005
+ * Copyright (C) Balbir Singh, IBM Corp. 2006
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <unistd.h>
+#include <poll.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <signal.h>
+
+#include <linux/genetlink.h>
+#include <linux/taskstats.h>
+
+/*
+ * Generic macros for dealing with netlink sockets. Might be duplicated
+ * elsewhere. It is recommended that commercial grade applications use
+ * libnl or libnetlink and use the interfaces provided by the library
+ */
+#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
+#define GENLMSG_PAYLOAD(glh) (NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN)
+#define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN))
+#define NLA_PAYLOAD(len) (len - NLA_HDRLEN)
+
+#define err(code, fmt, arg...) do { printf(fmt, ##arg); exit(code); } while (0)
+int done = 0;
+
+/*
+ * Create a raw netlink socket and bind
+ */
+static int create_nl_socket(int protocol, int groups)
+{
+ socklen_t addr_len;
+ int fd;
+ struct sockaddr_nl local;
+
+ fd = socket(AF_NETLINK, SOCK_RAW, protocol);
+ if (fd < 0)
+ return -1;
+
+ memset(&local, 0, sizeof(local));
+ local.nl_family = AF_NETLINK;
+ local.nl_groups = groups;
+
+ if (bind(fd, (struct sockaddr *) &local, sizeof(local)) < 0)
+ goto error;
+
+ return fd;
+ error:
+ close(fd);
+ return -1;
+}
+
+int sendto_fd(int s, const char *buf, int bufLen)
+{
+ struct sockaddr_nl nladdr;
+ int r;
+
+ memset(&nladdr, 0, sizeof(nladdr));
+ nladdr.nl_family = AF_NETLINK;
+
+ while ((r = sendto(s, buf, bufLen, 0, (struct sockaddr *) &nladdr,
+ sizeof(nladdr))) < bufLen) {
+ if (r > 0) {
+ buf += r;
+ bufLen -= r;
+ } else if (errno != EAGAIN)
+ return -1;
+ }
+ return 0;
+}
+
+/*
+ * Probe the controller in genetlink to find the family id
+ * for the TASKSTATS family
+ */
+int get_family_id(int sd)
+{
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } family_req;
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } ans;
+
+ int id;
+ struct nlattr *na;
+ int rep_len;
+
+ /* Get family name */
+ family_req.n.nlmsg_type = GENL_ID_CTRL;
+ family_req.n.nlmsg_flags = NLM_F_REQUEST;
+ family_req.n.nlmsg_seq = 0;
+ family_req.n.nlmsg_pid = getpid();
+ family_req.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
+ family_req.g.cmd = CTRL_CMD_GETFAMILY;
+ family_req.g.version = 0x1;
+ na = (struct nlattr *) GENLMSG_DATA(&family_req);
+ na->nla_type = CTRL_ATTR_FAMILY_NAME;
+ na->nla_len = strlen(TASKSTATS_GENL_NAME) + 1 + NLA_HDRLEN;
+ strcpy(NLA_DATA(na), TASKSTATS_GENL_NAME);
+ family_req.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);
+
+ if (sendto_fd(sd, (char *) &family_req, family_req.n.nlmsg_len) < 0)
+ err(1, "error sending message via Netlink\n");
+
+ rep_len = recv(sd, &ans, sizeof(ans), 0);
+
+ if (rep_len < 0)
+ err(1, "error receiving reply message via Netlink\n");
+
+
+ /* Validate response message */
+ if (!NLMSG_OK((&ans.n), rep_len))
+ err(1, "invalid reply message received via Netlink\n");
+
+ if (ans.n.nlmsg_type == NLMSG_ERROR) { /* error */
+ printf("error received NACK - leaving\n");
+ exit(1);
+ }
+
+
+ na = (struct nlattr *) GENLMSG_DATA(&ans);
+ na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len));
+ if (na->nla_type == CTRL_ATTR_FAMILY_ID) {
+ id = *(__u16 *) NLA_DATA(na);
+ }
+ return id;
+}
+
+void print_taskstats(struct taskstats *t)
+{
+ printf("\n\nCPU %15s%15s%15s%15s\n"
+ " %15llu%15llu%15llu%15llu\n"
+ "IO %15s%15s\n"
+ " %15llu%15llu\n"
+ "MEM %15s%15s\n"
+ " %15llu%15llu\n\n",
+ "count", "real total", "virtual total", "delay total",
+ t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
+ t->cpu_delay_total,
+ "count", "delay total",
+ t->blkio_count, t->blkio_delay_total,
+ "count", "delay total", t->swapin_count, t->swapin_delay_total);
+}
+
+void sigchld(int sig)
+{
+ done = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ int rc;
+ int sk_nl;
+ struct nlmsghdr *nlh;
+ struct genlmsghdr *genlhdr;
+ char *buf;
+ struct taskstats_cmd_param *param;
+ __u16 id;
+ struct nlattr *na;
+
+ /* For receiving */
+ struct sockaddr_nl kern_nla, from_nla;
+ socklen_t from_nla_len;
+ int recv_len;
+ struct taskstats_reply *reply;
+
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } req;
+
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } ans;
+
+ int nl_sd = -1;
+ int rep_len;
+ int len = 0;
+ int aggr_len, len2;
+ struct sockaddr_nl nladdr;
+ pid_t tid = 0;
+ pid_t rtid = 0;
+ int cmd_type = TASKSTATS_TYPE_TGID;
+ int c, status;
+ int forking = 0;
+ struct sigaction act = {
+ .sa_handler = SIG_IGN,
+ .sa_mask = SA_NOMASK,
+ };
+ struct sigaction tact ;
+
+ if (argc < 3) {
+ printf("usage %s [-t tgid][-p pid][-c cmd]\n", argv[0]);
+ exit(-1);
+ }
+
+ tact.sa_handler = sigchld;
+ sigemptyset(&tact.sa_mask);
+ if (sigaction(SIGCHLD, &tact, NULL) < 0)
+ err(1, "sigaction failed for SIGCHLD\n");
+
+ while (1) {
+
+ c = getopt(argc, argv, "t:p:c:");
+ if (c < 0)
+ break;
+
+ switch (c) {
+ case 't':
+ tid = atoi(optarg);
+ if (!tid)
+ err(1, "Invalid tgid\n");
+ cmd_type = TASKSTATS_CMD_ATTR_TGID;
+ break;
+ case 'p':
+ tid = atoi(optarg);
+ if (!tid)
+ err(1, "Invalid pid\n");
+ cmd_type = TASKSTATS_CMD_ATTR_TGID;
+ break;
+ case 'c':
+ opterr = 0;
+ tid = fork();
+ if (tid < 0)
+ err(1, "fork failed\n");
+
+ if (tid == 0) { /* child process */
+ if (execvp(argv[optind - 1], &argv[optind - 1]) < 0) {
+ exit(-1);
+ }
+ }
+ forking = 1;
+ break;
+ default:
+ printf("usage %s [-t tgid][-p pid][-c cmd]\n", argv[0]);
+ exit(-1);
+ break;
+ }
+ if (c == 'c')
+ break;
+ }
+
+ /* Construct Netlink request message */
+
+ /* Send Netlink request message & get reply */
+
+ if ((nl_sd =
+ create_nl_socket(NETLINK_GENERIC, TASKSTATS_LISTEN_GROUP)) < 0)
+ err(1, "error creating Netlink socket\n");
+
+
+ id = get_family_id(nl_sd);
+
+ /* Send command needed */
+ req.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
+ req.n.nlmsg_type = id;
+ req.n.nlmsg_flags = NLM_F_REQUEST;
+ req.n.nlmsg_seq = 0;
+ req.n.nlmsg_pid = tid;
+ req.g.cmd = TASKSTATS_CMD_GET;
+ na = (struct nlattr *) GENLMSG_DATA(&req);
+ na->nla_type = cmd_type;
+ na->nla_len = sizeof(unsigned int) + NLA_HDRLEN;
+ *(__u32 *) NLA_DATA(na) = tid;
+ req.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);
+
+
+ if (!forking && sendto_fd(nl_sd, (char *) &req, req.n.nlmsg_len) < 0)
+ err(1, "error sending message via Netlink\n");
+
+ act.sa_handler = SIG_IGN;
+ sigemptyset(&act.sa_mask);
+ if (sigaction(SIGINT, &act, NULL) < 0)
+ err(1, "sigaction failed for SIGINT\n");
+
+ do {
+ int i;
+ struct pollfd pfd;
+ int pollres;
+
+ pfd.events = 0xffff & ~POLLOUT;
+ pfd.fd = nl_sd;
+ pollres = poll(&pfd, 1, 5000);
+ if (pollres < 0 || done) {
+ break;
+ }
+
+ rep_len = recv(nl_sd, &ans, sizeof(ans), 0);
+ nladdr.nl_family = AF_NETLINK;
+ nladdr.nl_groups = TASKSTATS_LISTEN_GROUP;
+
+ if (ans.n.nlmsg_type == NLMSG_ERROR) { /* error */
+ printf("error received NACK - leaving\n");
+ exit(1);
+ }
+
+ if (rep_len < 0) {
+ err(1, "error receiving reply message via Netlink\n");
+ break;
+ }
+
+ /* Validate response message */
+ if (!NLMSG_OK((&ans.n), rep_len))
+ err(1, "invalid reply message received via Netlink\n");
+
+ rep_len = GENLMSG_PAYLOAD(&ans.n);
+
+ na = (struct nlattr *) GENLMSG_DATA(&ans);
+ len = 0;
+ i = 0;
+ while (len < rep_len) {
+ len += NLA_ALIGN(na->nla_len);
+ switch (na->nla_type) {
+ case TASKSTATS_TYPE_AGGR_PID:
+ /* Fall through */
+ case TASKSTATS_TYPE_AGGR_TGID:
+ aggr_len = NLA_PAYLOAD(na->nla_len);
+ len2 = 0;
+ /* For nested attributes, na follows */
+ na = (struct nlattr *) NLA_DATA(na);
+ done = 0;
+ while (len2 < aggr_len) {
+ switch (na->nla_type) {
+ case TASKSTATS_TYPE_PID:
+ rtid = *(int *) NLA_DATA(na);
+ break;
+ case TASKSTATS_TYPE_TGID:
+ rtid = *(int *) NLA_DATA(na);
+ break;
+ case TASKSTATS_TYPE_STATS:
+ if (rtid == tid) {
+ print_taskstats((struct taskstats *)
+ NLA_DATA(na));
+ done = 1;
+ }
+ break;
+ }
+ len2 += NLA_ALIGN(na->nla_len);
+ na = (struct nlattr *) ((char *) na + len2);
+ if (done)
+ break;
+ }
+ }
+ na = (struct nlattr *) (GENLMSG_DATA(&ans) + len);
+ if (done)
+ break;
+ }
+ if (done)
+ break;
+ }
+ while (1);
+
+ close(nl_sd);
+ return 0;
+}
Index: linux-2.6.17-rc1/Documentation/accounting/taskstats.txt
===================================================================
--- linux-2.6.17-rc1.orig/Documentation/accounting/taskstats.txt 2006-04-21 20:29:22.000000000 -0400
+++ linux-2.6.17-rc1/Documentation/accounting/taskstats.txt 2006-04-21 20:50:22.000000000 -0400
@@ -39,6 +39,8 @@ belongs (the task does not need to be th
per-tgid stats to be sent for each exiting task is explained in the Advanced
Usage section below.
+getdelays.c is a simple utility demonstrating usage of the taskstats interface
+for reporting delay accounting statistics.
Interface
---------
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-05-02 6:23 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-02 6:20 [Patch 7/8] documentation Balbir Singh
-- strict thread matches above, loose matches on Subject: below --
2006-04-22 2:16 [Patch 0/8] per-task delay accounting Shailabh Nagar
2006-04-22 2:40 ` [Patch 7/8] documentation Shailabh Nagar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox