[PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor
@ 2009-06-29 11:08 Paul Mackerras
  2009-06-29 11:13 ` [PATCH 2/2] perf_counter: tools: Reduce perf stat overhead Paul Mackerras
  2009-06-29 20:34 ` [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: Paul Mackerras @ 2009-06-29 11:08 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

At present, appending ":u" to an event sets the exclude_kernel bit,
and ":k" sets the exclude_user bit.  There is no way to set the
exclude_hv bit, which means that on systems with a hypervisor (e.g.
IBM pSeries systems), we get counts from hypervisor mode for an event
such as 0:1:u.

This fixes the problem by setting all three exclude bits when we see
the second ':' and the clearing the exclude bits corresponding to the
modes we want to count.  This also adds a ":h" modifier to allow the
user to ask for counts in hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 tools/perf/util/parse-events.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4d042f1..f2ffe2c 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -277,10 +277,15 @@ static int parse_event_symbols(const char *str, struct perf_counter_attr *attr)
 		sep = strchr(pstr, ':');
 		if (sep) {
 			pstr = sep + 1;
+			attr->exclude_user = 1;
+			attr->exclude_kernel = 1;
+			attr->exclude_hv = 1;
 			if (strchr(pstr, 'k'))
-				attr->exclude_user = 1;
+				attr->exclude_kernel = 0;
 			if (strchr(pstr, 'u'))
-				attr->exclude_kernel = 1;
+				attr->exclude_user = 0;
+			if (strchr(pstr, 'h'))
+				attr->exclude_hv = 0;
 		}
 		attr->type = type;
 		attr->config = id;
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] perf_counter: tools: Reduce perf stat overhead
  2009-06-29 11:08 [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor Paul Mackerras
@ 2009-06-29 11:13 ` Paul Mackerras
  2009-06-29 20:51   ` [tip:perfcounters/urgent] perf_counter tools: Reduce perf stat measurement overhead/skew tip-bot for Paul Mackerras
  2009-06-29 20:34 ` [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor Ingo Molnar
  1 sibling, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2009-06-29 11:13 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

At present, perf stat creates its counters on the perf process.  Thus
the counters count the fork and various other activity in both the
parent and child, such as the resolver overhead for resolving PLT
entries for any libc functions that haven't been called before, such
as execvp.

This reduces the overhead by creating the counters on the child process
after the fork, using a couple of pipes to synchronize so that the
child process waits until the parent has created the counters before
doing the exec.  To eliminate the PLT resolution overhead on calling
execvp, this does a dummy execvp first which will always fail.

With this, the overhead of executing a program goes down from over
4800 instructions to about 90 instructions on powerpc (32-bit).
This was measured with a statically-linked program written in
assembler which only does the 3 instructions needed to call _exit(0).

Before:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

           4858  instructions

    0.001274523  seconds time elapsed

After:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

             92  instructions

    0.000468153  seconds time elapsed

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 tools/perf/builtin-stat.c |   64 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3e5ea4e..f0260ac 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -99,7 +99,7 @@ static u64			runtime_cycles_noise;
 #define ERR_PERF_OPEN \
 "Error: counter %d, sys_perf_counter_open() syscall returned with %d (%s)\n"
 
-static void create_perf_stat_counter(int counter)
+static void create_perf_stat_counter(int counter, int pid)
 {
 	struct perf_counter_attr *attr = attrs + counter;
 
@@ -119,7 +119,7 @@ static void create_perf_stat_counter(int counter)
 		attr->inherit	= inherit;
 		attr->disabled	= 1;
 
-		fd[0][counter] = sys_perf_counter_open(attr, 0, -1, -1, 0);
+		fd[0][counter] = sys_perf_counter_open(attr, pid, -1, -1, 0);
 		if (fd[0][counter] < 0 && verbose)
 			fprintf(stderr, ERR_PERF_OPEN, counter,
 				fd[0][counter], strerror(errno));
@@ -205,12 +205,58 @@ static int run_perf_stat(int argc, const char **argv)
 	int status = 0;
 	int counter;
 	int pid;
+	int child_ready_pipe[2], go_pipe[2];
+	char buf;
 
 	if (!system_wide)
 		nr_cpus = 1;
 
+	if (pipe(child_ready_pipe) < 0 || pipe(go_pipe) < 0) {
+		perror("failed to create pipes");
+		exit(1);
+	}
+
+	if ((pid = fork()) < 0)
+		perror("failed to fork");
+
+	if (!pid) {
+		close(child_ready_pipe[0]);
+		close(go_pipe[1]);
+		fcntl(go_pipe[0], F_SETFD, FD_CLOEXEC);
+
+		/*
+		 * Do a dummy execvp to get the PLT entry resolved,
+		 * so we avoid the resolver overhead on the real
+		 * execvp call.
+		 */
+		execvp("", (char **)argv);
+
+		/*
+		 * Tell the parent we're ready to go
+		 */
+		close(child_ready_pipe[1]);
+
+		/*
+		 * Wait until the parent tells us to go.
+		 */
+		read(go_pipe[0], &buf, 1);
+
+		execvp(argv[0], (char **)argv);
+
+		perror(argv[0]);
+		exit(-1);
+	}
+
+	/*
+	 * Wait for the child to be ready to exec.
+	 */
+	close(child_ready_pipe[1]);
+	close(go_pipe[0]);
+	read(child_ready_pipe[0], &buf, 1);
+	close(child_ready_pipe[0]);
+
 	for (counter = 0; counter < nr_counters; counter++)
-		create_perf_stat_counter(counter);
+		create_perf_stat_counter(counter, pid);
 
 	/*
 	 * Enable counters and exec the command:
@@ -218,19 +264,9 @@ static int run_perf_stat(int argc, const char **argv)
 	t0 = rdclock();
 	prctl(PR_TASK_PERF_COUNTERS_ENABLE);
 
-	if ((pid = fork()) < 0)
-		perror("failed to fork");
-
-	if (!pid) {
-		if (execvp(argv[0], (char **)argv)) {
-			perror(argv[0]);
-			exit(-1);
-		}
-	}
-
+	close(go_pipe[1]);
 	wait(&status);
 
-	prctl(PR_TASK_PERF_COUNTERS_DISABLE);
 	t1 = rdclock();
 
 	walltime_nsecs[run_idx] = t1 - t0;
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor
  2009-06-29 11:08 [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor Paul Mackerras
  2009-06-29 11:13 ` [PATCH 2/2] perf_counter: tools: Reduce perf stat overhead Paul Mackerras
@ 2009-06-29 20:34 ` Ingo Molnar
  2009-06-30 11:35   ` Paul Mackerras
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2009-06-29 20:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Peter Zijlstra, linux-kernel


* Paul Mackerras <paulus@samba.org> wrote:

> At present, appending ":u" to an event sets the exclude_kernel 
> bit, and ":k" sets the exclude_user bit.  There is no way to set 
> the exclude_hv bit, which means that on systems with a hypervisor 
> (e.g. IBM pSeries systems), we get counts from hypervisor mode for 
> an event such as 0:1:u.
> 
> This fixes the problem by setting all three exclude bits when we 
> see the second ':' and the clearing the exclude bits corresponding 
> to the modes we want to count.  This also adds a ":h" modifier to 
> allow the user to ask for counts in hypervisor mode.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  tools/perf/util/parse-events.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 4d042f1..f2ffe2c 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -277,10 +277,15 @@ static int parse_event_symbols(const char *str, struct perf_counter_attr *attr)
>  		sep = strchr(pstr, ':');
>  		if (sep) {
>  			pstr = sep + 1;
> +			attr->exclude_user = 1;
> +			attr->exclude_kernel = 1;
> +			attr->exclude_hv = 1;
>  			if (strchr(pstr, 'k'))
> -				attr->exclude_user = 1;
> +				attr->exclude_kernel = 0;
>  			if (strchr(pstr, 'u'))
> -				attr->exclude_kernel = 1;
> +				attr->exclude_user = 0;
> +			if (strchr(pstr, 'h'))
> +				attr->exclude_hv = 0;
>  		}

Hm, mind fixing the full range of problems with these flags please?

One problem is that things like:

	--event cycles:u

dont work as expected - the u/k/h flags only work in numeric events 
which is a pity. Also, it would be nice to have an 'general' option 
to specify the context mask for all events, in some straightforward 
format like this:

	--event-mask +u+k-h

Things like that. This bit is really not well developed right now.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:perfcounters/urgent] perf_counter tools: Reduce perf stat measurement overhead/skew
  2009-06-29 11:13 ` [PATCH 2/2] perf_counter: tools: Reduce perf stat overhead Paul Mackerras
@ 2009-06-29 20:51   ` tip-bot for Paul Mackerras
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Paul Mackerras @ 2009-06-29 20:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, vince, paulus, hpa, mingo, a.p.zijlstra, tglx,
	mingo

Commit-ID:  051ae7f7344f453616b6b10332d4d8e1d40ed823
Gitweb:     http://git.kernel.org/tip/051ae7f7344f453616b6b10332d4d8e1d40ed823
Author:     Paul Mackerras <paulus@samba.org>
AuthorDate: Mon, 29 Jun 2009 21:13:21 +1000
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 29 Jun 2009 22:38:09 +0200

perf_counter tools: Reduce perf stat measurement overhead/skew

Vince Weaver reported a 'perf stat' measurement overhead in the
count of retired instructions, which can amount to a +6000
instructions inflated count in the reported count.

At present, perf stat creates its counters on the perf process.  Thus
the counters count the fork and various other activity in both the
parent and child, such as the resolver overhead for resolving PLT
entries for any libc functions that haven't been called before, such
as execvp.

This reduces the overhead by creating the counters on the child process
after the fork, using a couple of pipes to synchronize so that the
child process waits until the parent has created the counters before
doing the exec.  To eliminate the PLT resolution overhead on calling
execvp, this does a dummy execvp first which will always fail.

With this, the overhead of executing a program goes down from over
4800 instructions to about 90 instructions on powerpc (32-bit).
This was measured with a statically-linked program written in
assembler which only does the 3 instructions needed to call _exit(0).

Before:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

           4858  instructions

    0.001274523  seconds time elapsed

After:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

             92  instructions

    0.000468153  seconds time elapsed

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 tools/perf/builtin-stat.c |   64 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c5a2907..201ef23 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -99,7 +99,7 @@ static u64			runtime_cycles_noise;
 #define ERR_PERF_OPEN \
 "Error: counter %d, sys_perf_counter_open() syscall returned with %d (%s)\n"
 
-static void create_perf_stat_counter(int counter)
+static void create_perf_stat_counter(int counter, int pid)
 {
 	struct perf_counter_attr *attr = attrs + counter;
 
@@ -119,7 +119,7 @@ static void create_perf_stat_counter(int counter)
 		attr->inherit	= inherit;
 		attr->disabled	= 1;
 
-		fd[0][counter] = sys_perf_counter_open(attr, 0, -1, -1, 0);
+		fd[0][counter] = sys_perf_counter_open(attr, pid, -1, -1, 0);
 		if (fd[0][counter] < 0 && verbose)
 			fprintf(stderr, ERR_PERF_OPEN, counter,
 				fd[0][counter], strerror(errno));
@@ -205,12 +205,58 @@ static int run_perf_stat(int argc, const char **argv)
 	int status = 0;
 	int counter;
 	int pid;
+	int child_ready_pipe[2], go_pipe[2];
+	char buf;
 
 	if (!system_wide)
 		nr_cpus = 1;
 
+	if (pipe(child_ready_pipe) < 0 || pipe(go_pipe) < 0) {
+		perror("failed to create pipes");
+		exit(1);
+	}
+
+	if ((pid = fork()) < 0)
+		perror("failed to fork");
+
+	if (!pid) {
+		close(child_ready_pipe[0]);
+		close(go_pipe[1]);
+		fcntl(go_pipe[0], F_SETFD, FD_CLOEXEC);
+
+		/*
+		 * Do a dummy execvp to get the PLT entry resolved,
+		 * so we avoid the resolver overhead on the real
+		 * execvp call.
+		 */
+		execvp("", (char **)argv);
+
+		/*
+		 * Tell the parent we're ready to go
+		 */
+		close(child_ready_pipe[1]);
+
+		/*
+		 * Wait until the parent tells us to go.
+		 */
+		read(go_pipe[0], &buf, 1);
+
+		execvp(argv[0], (char **)argv);
+
+		perror(argv[0]);
+		exit(-1);
+	}
+
+	/*
+	 * Wait for the child to be ready to exec.
+	 */
+	close(child_ready_pipe[1]);
+	close(go_pipe[0]);
+	read(child_ready_pipe[0], &buf, 1);
+	close(child_ready_pipe[0]);
+
 	for (counter = 0; counter < nr_counters; counter++)
-		create_perf_stat_counter(counter);
+		create_perf_stat_counter(counter, pid);
 
 	/*
 	 * Enable counters and exec the command:
@@ -218,19 +264,9 @@ static int run_perf_stat(int argc, const char **argv)
 	t0 = rdclock();
 	prctl(PR_TASK_PERF_COUNTERS_ENABLE);
 
-	if ((pid = fork()) < 0)
-		perror("failed to fork");
-
-	if (!pid) {
-		if (execvp(argv[0], (char **)argv)) {
-			perror(argv[0]);
-			exit(-1);
-		}
-	}
-
+	close(go_pipe[1]);
 	wait(&status);
 
-	prctl(PR_TASK_PERF_COUNTERS_DISABLE);
 	t1 = rdclock();
 
 	walltime_nsecs[run_idx] = t1 - t0;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor
  2009-06-29 20:34 ` [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor Ingo Molnar
@ 2009-06-30 11:35   ` Paul Mackerras
  2009-06-30 11:57     ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2009-06-30 11:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel

Ingo Molnar writes:

> Hm, mind fixing the full range of problems with these flags please?

Sure, looking at it now.

One thing I'd like to do is add complete lists of hardware events for
each processor so that perf can tell you the full set of things you
can measure, and can let you ask for them without having to know raw
event codes.  I know how to work out which processor we're running on
for powerpc; for x86 I assume cpuid or something similar is usable
from userspace, is it?

Paul.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor
  2009-06-30 11:35   ` Paul Mackerras
@ 2009-06-30 11:57     ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-06-30 11:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Peter Zijlstra, linux-kernel


* Paul Mackerras <paulus@samba.org> wrote:

> Ingo Molnar writes:
> 
> > Hm, mind fixing the full range of problems with these flags 
> > please?
> 
> Sure, looking at it now.
> 
> One thing I'd like to do is add complete lists of hardware events 
> for each processor so that perf can tell you the full set of 
> things you can measure, and can let you ask for them without 
> having to know raw event codes. [...]

Excellent.

> [...] I know how to work out which processor we're running on for 
> powerpc; for x86 I assume cpuid or something similar is usable 
> from userspace, is it?

Yeah. /proc/cpuinfo can be read as well.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-30 11:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-29 11:08 [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor Paul Mackerras
2009-06-29 11:13 ` [PATCH 2/2] perf_counter: tools: Reduce perf stat overhead Paul Mackerras
2009-06-29 20:51   ` [tip:perfcounters/urgent] perf_counter tools: Reduce perf stat measurement overhead/skew tip-bot for Paul Mackerras
2009-06-29 20:34 ` [PATCH 1/2] perf_counter: tools: Make :u and :k exclude hypervisor Ingo Molnar
2009-06-30 11:35   ` Paul Mackerras
2009-06-30 11:57     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox