public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Vince Weaver <vince@deater.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Mackerras <paulus@samba.org>,
	linux-kernel@vger.kernel.org, Mike Galbraith <efault@gmx.de>
Subject: [patch] perf_counter: Add enable-on-exec attribute
Date: Tue, 30 Jun 2009 01:46:01 +0200	[thread overview]
Message-ID: <20090629234601.GA5869@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0906291354380.1404@pianoman.cluster.toy>


* Vince Weaver <vince@deater.net> wrote:

>> If the 5 thousand cycles measurement overhead _still_ matters to 
>> you under such circumstances then by all means please submit the 
>> patches to improve it. Despite your claims this is totally 
>> fixable with the current perfcounters design, Peter outlined the 
>> steps of how to solve it, you can utilize ptrace if you want to.
>
> Is it really "totally" fixible?  I don't just mean getting the 
> overhead from ~3000 down to ~100, I mean down to zero.

Yes, it's truly very easy to get exactly the same output as pfmon, 
for the 'million.s' test app you posted:

  titan:~> perf stat -e 0:1:u ./million

   Performance counter stats for './million':

          1000001  instructions            

      0.000489736  seconds time elapsed

See the small patch below.

( Note that this approach does not use ptrace, hence it can be used
  to measure debuggers too. ptrace attach has the limitation of
  being exclusive - no task can be attached to twice. perfmon used
  ptrace attach, which limited its capabilities unreasonably. )

The question was really not whether we can do it - but whether we 
want to do it. I have no strong feelings either way - because as i 
told you in my first mail, all the other noise sources in the system 
dominate the metrics far more than this very small constant startup 
offset.

And the thing is, as a perfmon contributor i assume you have 
experience in these matters. Had you taken a serious, unbiased look 
at perfcounters, and had this problem truly bothered you personally, 
you could have come up with a similar patch yourself as well, while 
only spending a fraction of the energies you are putting into these 
emails. Instead you ignored our technical arguments, you refused to 
touch the code and you went on rambling against how perfcounters 
supposedly cannot solve this problem. Not very productive IMO.

	Ingo

---------------->
Subject: perf_counter: Add enable-on-exec attribute
From: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 29 22:05:11 CEST 2009

Add another attribute variant: attr.enable_on_exec.

The purpose is to allow the auto-enabling of such counters
on exec(), to measure exec()-ed workloads precisely, from
the first to the last instruction.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 fs/exec.c                    |    3 +--
 include/linux/perf_counter.h |    5 ++++-
 kernel/perf_counter.c        |   39 ++++++++++++++++++++++++++++++++++++---
 tools/perf/builtin-stat.c    |    5 +++--
 4 files changed, 44 insertions(+), 8 deletions(-)

Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -996,8 +996,7 @@ int flush_old_exec(struct linux_binprm *
 	 * Flush performance counters when crossing a
 	 * security domain:
 	 */
-	if (!get_dumpable(current->mm))
-		perf_counter_exit_task(current);
+	perf_counter_exec(current);
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
Index: linux/include/linux/perf_counter.h
===================================================================
--- linux.orig/include/linux/perf_counter.h
+++ linux/include/linux/perf_counter.h
@@ -179,8 +179,9 @@ struct perf_counter_attr {
 				comm	       :  1, /* include comm data     */
 				freq           :  1, /* use freq, not period  */
 				inherit_stat   :  1, /* per task counts       */
+				enable_on_exec :  1, /* enable on exec        */
 
-				__reserved_1   : 52;
+				__reserved_1   : 51;
 
 	__u32			wakeup_events;	/* wakeup every n events */
 	__u32			__reserved_2;
@@ -712,6 +713,7 @@ static inline void perf_counter_mmap(str
 
 extern void perf_counter_comm(struct task_struct *tsk);
 extern void perf_counter_fork(struct task_struct *tsk);
+extern void perf_counter_exec(struct task_struct *tsk);
 
 extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
 
@@ -752,6 +754,7 @@ perf_swcounter_event(u32 event, u64 nr, 
 static inline void perf_counter_mmap(struct vm_area_struct *vma)	{ }
 static inline void perf_counter_comm(struct task_struct *tsk)		{ }
 static inline void perf_counter_fork(struct task_struct *tsk)		{ }
+static inline void perf_counter_exec(struct task_struct *tsk)		{ }
 static inline void perf_counter_init(void)				{ }
 #endif
 
Index: linux/kernel/perf_counter.c
===================================================================
--- linux.orig/kernel/perf_counter.c
+++ linux/kernel/perf_counter.c
@@ -903,6 +903,9 @@ static void perf_counter_enable(struct p
 	struct perf_counter_context *ctx = counter->ctx;
 	struct task_struct *task = ctx->task;
 
+	if (counter->attr.enable_on_exec)
+		return;
+
 	if (!task) {
 		/*
 		 * Enable the counter on the cpu that it's on
@@ -2856,6 +2859,32 @@ void perf_counter_fork(struct task_struc
 	perf_counter_fork_event(&fork_event);
 }
 
+void perf_counter_exec(struct task_struct *task)
+{
+	struct perf_counter_context *ctx;
+	struct perf_counter *counter;
+
+	if (!get_dumpable(task->mm)) {
+		perf_counter_exit_task(task);
+		return;
+	}
+
+	if (!task->perf_counter_ctxp)
+		return;
+
+	rcu_read_lock();
+	ctx = task->perf_counter_ctxp;
+	if (ctx) {
+		list_for_each_entry(counter, &ctx->counter_list, list_entry) {
+			if (counter->attr.enable_on_exec) {
+				counter->attr.enable_on_exec = 0;
+				__perf_counter_enable(counter);
+			}
+		}
+	}
+	rcu_read_unlock();
+}
+
 /*
  * comm tracking
  */
@@ -4064,10 +4093,14 @@ inherit_counter(struct perf_counter *par
 	 * not its attr.disabled bit.  We hold the parent's mutex,
 	 * so we won't race with perf_counter_{en, dis}able_family.
 	 */
-	if (parent_counter->state >= PERF_COUNTER_STATE_INACTIVE)
-		child_counter->state = PERF_COUNTER_STATE_INACTIVE;
-	else
+	if (parent_counter->state >= PERF_COUNTER_STATE_INACTIVE) {
+		if (child_counter->attr.enable_on_exec)
+			child_counter->state = PERF_COUNTER_STATE_OFF;
+		else
+			child_counter->state = PERF_COUNTER_STATE_INACTIVE;
+	} else {
 		child_counter->state = PERF_COUNTER_STATE_OFF;
+	}
 
 	if (parent_counter->attr.freq)
 		child_counter->hw.sample_period = parent_counter->hw.sample_period;
Index: linux/tools/perf/builtin-stat.c
===================================================================
--- linux.orig/tools/perf/builtin-stat.c
+++ linux/tools/perf/builtin-stat.c
@@ -116,8 +116,9 @@ static void create_perf_stat_counter(int
 					fd[cpu][counter], strerror(errno));
 		}
 	} else {
-		attr->inherit	= inherit;
-		attr->disabled	= 1;
+		attr->inherit		= inherit;
+		attr->disabled		= 1;
+		attr->enable_on_exec	= 1;
 
 		fd[0][counter] = sys_perf_counter_open(attr, pid, -1, -1, 0);
 		if (fd[0][counter] < 0 && verbose)


  parent reply	other threads:[~2009-06-29 23:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-24 13:59 performance counter 20% error finding retired instruction count Vince Weaver
2009-06-24 15:10 ` Ingo Molnar
2009-06-25  2:12   ` Vince Weaver
2009-06-25  6:50     ` Peter Zijlstra
2009-06-25  9:13       ` Ingo Molnar
2009-06-26 18:22   ` Vince Weaver
2009-06-26 19:12     ` Peter Zijlstra
2009-06-27  5:32       ` Ingo Molnar
2009-06-26 19:23     ` Vince Weaver
2009-06-27  6:04       ` performance counter ~0.4% " Ingo Molnar
2009-06-27  6:44         ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-29 18:25           ` Vince Weaver
2009-06-29 21:02             ` Ingo Molnar
2009-07-02 21:07               ` Vince Weaver
2009-07-03  7:58                 ` Ingo Molnar
2009-07-03 21:43                   ` Vince Weaver
2009-07-03 18:31                 ` Andi Kleen
2009-07-03 21:25                   ` Vince Weaver
2009-07-03 23:40                     ` Andi Kleen
2009-06-29 23:46             ` Ingo Molnar [this message]
2009-06-29 23:55             ` Ingo Molnar
2009-06-30  0:05             ` Ingo Molnar
2009-06-27  6:48         ` performance counter ~0.4% error finding retired instruction count Paul Mackerras
2009-06-27 17:28           ` Ingo Molnar
2009-06-29  2:12             ` Paul Mackerras
2009-06-29  2:13               ` Paul Mackerras
2009-06-29  3:48               ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090629234601.GA5869@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=vince@deater.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox