From: Ingo Molnar <mingo@elte.hu>
To: Vince Weaver <vince@deater.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Paul Mackerras <paulus@samba.org>,
linux-kernel@vger.kernel.org, Mike Galbraith <efault@gmx.de>
Subject: [patch] perf_counter: Add enable-on-exec attribute
Date: Tue, 30 Jun 2009 01:46:01 +0200 [thread overview]
Message-ID: <20090629234601.GA5869@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0906291354380.1404@pianoman.cluster.toy>
* Vince Weaver <vince@deater.net> wrote:
>> If the 5 thousand cycles measurement overhead _still_ matters to
>> you under such circumstances then by all means please submit the
>> patches to improve it. Despite your claims this is totally
>> fixable with the current perfcounters design, Peter outlined the
>> steps of how to solve it, you can utilize ptrace if you want to.
>
> Is it really "totally" fixible? I don't just mean getting the
> overhead from ~3000 down to ~100, I mean down to zero.
Yes, it's truly very easy to get exactly the same output as pfmon,
for the 'million.s' test app you posted:
titan:~> perf stat -e 0:1:u ./million
Performance counter stats for './million':
1000001 instructions
0.000489736 seconds time elapsed
See the small patch below.
( Note that this approach does not use ptrace, hence it can be used
to measure debuggers too. ptrace attach has the limitation of
being exclusive - no task can be attached to twice. perfmon used
ptrace attach, which limited its capabilities unreasonably. )
The question was really not whether we can do it - but whether we
want to do it. I have no strong feelings either way - because as i
told you in my first mail, all the other noise sources in the system
dominate the metrics far more than this very small constant startup
offset.
And the thing is, as a perfmon contributor i assume you have
experience in these matters. Had you taken a serious, unbiased look
at perfcounters, and had this problem truly bothered you personally,
you could have come up with a similar patch yourself as well, while
only spending a fraction of the energies you are putting into these
emails. Instead you ignored our technical arguments, you refused to
touch the code and you went on rambling against how perfcounters
supposedly cannot solve this problem. Not very productive IMO.
Ingo
---------------->
Subject: perf_counter: Add enable-on-exec attribute
From: Ingo Molnar <mingo@elte.hu>
Date: Mon Jun 29 22:05:11 CEST 2009
Add another attribute variant: attr.enable_on_exec.
The purpose is to allow the auto-enabling of such counters
on exec(), to measure exec()-ed workloads precisely, from
the first to the last instruction.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
fs/exec.c | 3 +--
include/linux/perf_counter.h | 5 ++++-
kernel/perf_counter.c | 39 ++++++++++++++++++++++++++++++++++++---
tools/perf/builtin-stat.c | 5 +++--
4 files changed, 44 insertions(+), 8 deletions(-)
Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -996,8 +996,7 @@ int flush_old_exec(struct linux_binprm *
* Flush performance counters when crossing a
* security domain:
*/
- if (!get_dumpable(current->mm))
- perf_counter_exit_task(current);
+ perf_counter_exec(current);
/* An exec changes our domain. We are no longer part of the thread
group */
Index: linux/include/linux/perf_counter.h
===================================================================
--- linux.orig/include/linux/perf_counter.h
+++ linux/include/linux/perf_counter.h
@@ -179,8 +179,9 @@ struct perf_counter_attr {
comm : 1, /* include comm data */
freq : 1, /* use freq, not period */
inherit_stat : 1, /* per task counts */
+ enable_on_exec : 1, /* enable on exec */
- __reserved_1 : 52;
+ __reserved_1 : 51;
__u32 wakeup_events; /* wakeup every n events */
__u32 __reserved_2;
@@ -712,6 +713,7 @@ static inline void perf_counter_mmap(str
extern void perf_counter_comm(struct task_struct *tsk);
extern void perf_counter_fork(struct task_struct *tsk);
+extern void perf_counter_exec(struct task_struct *tsk);
extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
@@ -752,6 +754,7 @@ perf_swcounter_event(u32 event, u64 nr,
static inline void perf_counter_mmap(struct vm_area_struct *vma) { }
static inline void perf_counter_comm(struct task_struct *tsk) { }
static inline void perf_counter_fork(struct task_struct *tsk) { }
+static inline void perf_counter_exec(struct task_struct *tsk) { }
static inline void perf_counter_init(void) { }
#endif
Index: linux/kernel/perf_counter.c
===================================================================
--- linux.orig/kernel/perf_counter.c
+++ linux/kernel/perf_counter.c
@@ -903,6 +903,9 @@ static void perf_counter_enable(struct p
struct perf_counter_context *ctx = counter->ctx;
struct task_struct *task = ctx->task;
+ if (counter->attr.enable_on_exec)
+ return;
+
if (!task) {
/*
* Enable the counter on the cpu that it's on
@@ -2856,6 +2859,32 @@ void perf_counter_fork(struct task_struc
perf_counter_fork_event(&fork_event);
}
+void perf_counter_exec(struct task_struct *task)
+{
+ struct perf_counter_context *ctx;
+ struct perf_counter *counter;
+
+ if (!get_dumpable(task->mm)) {
+ perf_counter_exit_task(task);
+ return;
+ }
+
+ if (!task->perf_counter_ctxp)
+ return;
+
+ rcu_read_lock();
+ ctx = task->perf_counter_ctxp;
+ if (ctx) {
+ list_for_each_entry(counter, &ctx->counter_list, list_entry) {
+ if (counter->attr.enable_on_exec) {
+ counter->attr.enable_on_exec = 0;
+ __perf_counter_enable(counter);
+ }
+ }
+ }
+ rcu_read_unlock();
+}
+
/*
* comm tracking
*/
@@ -4064,10 +4093,14 @@ inherit_counter(struct perf_counter *par
* not its attr.disabled bit. We hold the parent's mutex,
* so we won't race with perf_counter_{en, dis}able_family.
*/
- if (parent_counter->state >= PERF_COUNTER_STATE_INACTIVE)
- child_counter->state = PERF_COUNTER_STATE_INACTIVE;
- else
+ if (parent_counter->state >= PERF_COUNTER_STATE_INACTIVE) {
+ if (child_counter->attr.enable_on_exec)
+ child_counter->state = PERF_COUNTER_STATE_OFF;
+ else
+ child_counter->state = PERF_COUNTER_STATE_INACTIVE;
+ } else {
child_counter->state = PERF_COUNTER_STATE_OFF;
+ }
if (parent_counter->attr.freq)
child_counter->hw.sample_period = parent_counter->hw.sample_period;
Index: linux/tools/perf/builtin-stat.c
===================================================================
--- linux.orig/tools/perf/builtin-stat.c
+++ linux/tools/perf/builtin-stat.c
@@ -116,8 +116,9 @@ static void create_perf_stat_counter(int
fd[cpu][counter], strerror(errno));
}
} else {
- attr->inherit = inherit;
- attr->disabled = 1;
+ attr->inherit = inherit;
+ attr->disabled = 1;
+ attr->enable_on_exec = 1;
fd[0][counter] = sys_perf_counter_open(attr, pid, -1, -1, 0);
if (fd[0][counter] < 0 && verbose)
next prev parent reply other threads:[~2009-06-29 23:46 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 13:59 performance counter 20% error finding retired instruction count Vince Weaver
2009-06-24 15:10 ` Ingo Molnar
2009-06-25 2:12 ` Vince Weaver
2009-06-25 6:50 ` Peter Zijlstra
2009-06-25 9:13 ` Ingo Molnar
2009-06-26 18:22 ` Vince Weaver
2009-06-26 19:12 ` Peter Zijlstra
2009-06-27 5:32 ` Ingo Molnar
2009-06-26 19:23 ` Vince Weaver
2009-06-27 6:04 ` performance counter ~0.4% " Ingo Molnar
2009-06-27 6:44 ` [numbers] perfmon/pfmon overhead of 17%-94% Ingo Molnar
2009-06-29 18:25 ` Vince Weaver
2009-06-29 21:02 ` Ingo Molnar
2009-07-02 21:07 ` Vince Weaver
2009-07-03 7:58 ` Ingo Molnar
2009-07-03 21:43 ` Vince Weaver
2009-07-03 18:31 ` Andi Kleen
2009-07-03 21:25 ` Vince Weaver
2009-07-03 23:40 ` Andi Kleen
2009-06-29 23:46 ` Ingo Molnar [this message]
2009-06-29 23:55 ` Ingo Molnar
2009-06-30 0:05 ` Ingo Molnar
2009-06-27 6:48 ` performance counter ~0.4% error finding retired instruction count Paul Mackerras
2009-06-27 17:28 ` Ingo Molnar
2009-06-29 2:12 ` Paul Mackerras
2009-06-29 2:13 ` Paul Mackerras
2009-06-29 3:48 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090629234601.GA5869@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=vince@deater.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox