* [PATCH 0/5] perf events finer grained context instrumentation / context exclusion
@ 2010-06-10 3:49 Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 1/5] perf: Provide a proper stop action for software events Frederic Weisbecker
` (5 more replies)
0 siblings, 6 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 3:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras, Stephane Eranian,
Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
Here is the new version of per context exclusion, based on hooks
on irq_enter/irq_exit. I haven't observed slowdowns but I haven't
actually measured the impact.
It's pullable from:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
perf/exclusion-3
It's against latest tip:perf/core.
Thanks.
Frederic Weisbecker (5):
perf: Provide a proper stop action for software events
perf: Support disable() after stop() on software events
perf: New PERF_EVENT_STATE_PAUSED event state
perf: Introduce task, softirq and hardirq contexts exclusion
perf: Support for task/softirq/hardirq exclusion on tools
arch/x86/kernel/cpu/perf_event.c | 6 +-
include/linux/perf_event.h | 44 +++++++-
kernel/perf_event.c | 232 ++++++++++++++++++++++++++++++++------
kernel/softirq.c | 6 +
kernel/trace/trace_event_perf.c | 2 +-
tools/perf/util/parse-events.c | 37 +++++--
6 files changed, 276 insertions(+), 51 deletions(-)
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 3:49 [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Frederic Weisbecker
@ 2010-06-10 3:49 ` Frederic Weisbecker
2010-06-10 10:46 ` Peter Zijlstra
2010-06-10 3:49 ` [PATCH 2/5] perf: Support disable() after stop() on " Frederic Weisbecker
` (4 subsequent siblings)
5 siblings, 1 reply; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 3:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras, Stephane Eranian,
Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
In order to introduce new context exclusions, software events will
have to eventually stop when needed. We'll want perf_event_stop() to
act on every events.
To achieve this, remove the stub stop/start pmu callbacks of software
and tracepoint events.
This may even optimize the case of hardware and software events
running at the same time: now we only stop/start all hardware
events if we reset a hardware event period, not anymore with
software events.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
kernel/perf_event.c | 29 ++++++++++++++++-------------
1 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index c772a3d..5c004f7 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -1541,11 +1541,23 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
hwc->sample_period = sample_period;
if (local64_read(&hwc->period_left) > 8*sample_period) {
- perf_disable();
- perf_event_stop(event);
+ bool software_event = is_software_event(event);
+
+ /*
+ * Only hardware events need their irq period to be
+ * reprogrammed
+ */
+ if (!software_event) {
+ perf_disable();
+ perf_event_stop(event);
+ }
+
local64_set(&hwc->period_left, 0);
- perf_event_start(event);
- perf_enable();
+
+ if (!software_event) {
+ perf_event_start(event);
+ perf_enable();
+ }
}
}
@@ -4286,16 +4298,9 @@ static void perf_swevent_void(struct perf_event *event)
{
}
-static int perf_swevent_int(struct perf_event *event)
-{
- return 0;
-}
-
static const struct pmu perf_ops_generic = {
.enable = perf_swevent_enable,
.disable = perf_swevent_disable,
- .start = perf_swevent_int,
- .stop = perf_swevent_void,
.read = perf_swevent_read,
.unthrottle = perf_swevent_void, /* hwc->interrupts already reset */
};
@@ -4578,8 +4583,6 @@ static int swevent_hlist_get(struct perf_event *event)
static const struct pmu perf_ops_tracepoint = {
.enable = perf_trace_enable,
.disable = perf_trace_disable,
- .start = perf_swevent_int,
- .stop = perf_swevent_void,
.read = perf_swevent_read,
.unthrottle = perf_swevent_void,
};
--
1.6.2.3
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 2/5] perf: Support disable() after stop() on software events
2010-06-10 3:49 [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 1/5] perf: Provide a proper stop action for software events Frederic Weisbecker
@ 2010-06-10 3:49 ` Frederic Weisbecker
2010-06-10 10:50 ` Peter Zijlstra
2010-06-10 3:49 ` [PATCH 3/5] perf: New PERF_EVENT_STATE_PAUSED event state Frederic Weisbecker
` (3 subsequent siblings)
5 siblings, 1 reply; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 3:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras, Stephane Eranian,
Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
If we call perf_event_stop() on a software event and then the
disable() pmu callback on them after that, we'll call twice
hlist_del_rcu() on the same hlist node and then bring a crash
by dereferencing LIST_POISON2.
Just use hlist_del_init_rcu() instead to fix this problem.
This preparates for new context exclusions.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
kernel/perf_event.c | 2 +-
kernel/trace/trace_event_perf.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 5c004f7..2d818bc 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4291,7 +4291,7 @@ static int perf_swevent_enable(struct perf_event *event)
static void perf_swevent_disable(struct perf_event *event)
{
- hlist_del_rcu(&event->hlist_entry);
+ hlist_del_init_rcu(&event->hlist_entry);
}
static void perf_swevent_void(struct perf_event *event)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 4799d70..7bc1f26 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -122,7 +122,7 @@ int perf_trace_enable(struct perf_event *p_event)
void perf_trace_disable(struct perf_event *p_event)
{
- hlist_del_rcu(&p_event->hlist_entry);
+ hlist_del_init_rcu(&p_event->hlist_entry);
}
void perf_trace_destroy(struct perf_event *p_event)
--
1.6.2.3
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 3/5] perf: New PERF_EVENT_STATE_PAUSED event state
2010-06-10 3:49 [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 1/5] perf: Provide a proper stop action for software events Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 2/5] perf: Support disable() after stop() on " Frederic Weisbecker
@ 2010-06-10 3:49 ` Frederic Weisbecker
2010-06-10 10:55 ` Peter Zijlstra
2010-06-10 3:49 ` [PATCH 4/5] perf: Introduce task, softirq and hardirq contexts exclusion Frederic Weisbecker
` (2 subsequent siblings)
5 siblings, 1 reply; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 3:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras, Stephane Eranian,
Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
This brings a new PERF_EVENT_STATE_PAUSED state. It means the events
is enabled but we don't want it to run, it must be in the same state
than after a pmu->stop() call. So the event has been reserved and
allocated and it is ready to start after a pmu->start() call.
It is deemed for hardware events when we want them to be reserved on
the cpu and ready to be started anytime. This is going to be useful
for the new context exclusion that will follow.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
arch/x86/kernel/cpu/perf_event.c | 6 ++++--
include/linux/perf_event.h | 3 ++-
kernel/perf_event.c | 7 ++++---
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index f2da20f..9b0e52f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -839,7 +839,8 @@ void hw_perf_enable(void)
match_prev_assignment(hwc, cpuc, i))
continue;
- x86_pmu_stop(event);
+ if (event->state != PERF_EVENT_STATE_PAUSED)
+ x86_pmu_stop(event);
}
for (i = 0; i < cpuc->n_events; i++) {
@@ -851,7 +852,8 @@ void hw_perf_enable(void)
else if (i < n_running)
continue;
- x86_pmu_start(event);
+ if (event->state != PERF_EVENT_STATE_PAUSED)
+ x86_pmu_start(event);
}
cpuc->n_added = 0;
perf_events_lapic_init();
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 63b5aa5..48b3157 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -598,7 +598,8 @@ enum perf_event_active_state {
PERF_EVENT_STATE_ERROR = -2,
PERF_EVENT_STATE_OFF = -1,
PERF_EVENT_STATE_INACTIVE = 0,
- PERF_EVENT_STATE_ACTIVE = 1,
+ PERF_EVENT_STATE_PAUSED = 1,
+ PERF_EVENT_STATE_ACTIVE = 2,
};
struct file;
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 2d818bc..7c6502a 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -1541,20 +1541,21 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
hwc->sample_period = sample_period;
if (local64_read(&hwc->period_left) > 8*sample_period) {
- bool software_event = is_software_event(event);
+ bool reprogram = !is_software_event(event) &&
+ event->state != PERF_EVENT_STATE_PAUSED;
/*
* Only hardware events need their irq period to be
* reprogrammed
*/
- if (!software_event) {
+ if (reprogram) {
perf_disable();
perf_event_stop(event);
}
local64_set(&hwc->period_left, 0);
- if (!software_event) {
+ if (reprogram) {
perf_event_start(event);
perf_enable();
}
--
1.6.2.3
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 4/5] perf: Introduce task, softirq and hardirq contexts exclusion
2010-06-10 3:49 [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Frederic Weisbecker
` (2 preceding siblings ...)
2010-06-10 3:49 ` [PATCH 3/5] perf: New PERF_EVENT_STATE_PAUSED event state Frederic Weisbecker
@ 2010-06-10 3:49 ` Frederic Weisbecker
2010-06-10 11:01 ` Peter Zijlstra
2010-06-10 3:49 ` [PATCH 5/5] perf: Support for task/softirq/hardirq exclusion on tools Frederic Weisbecker
2010-06-10 6:26 ` [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Ingo Molnar
5 siblings, 1 reply; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 3:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras, Stephane Eranian,
Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
This brings the possibility to exclude task and irq context from
the instrumentation, so that one can either filter any kind of
context or just confine the profiling to a single one.
In order to achieve that, this hooks into the irq_enter(),
irq_exit() and also the softirq paths. Each time we enter or exit
a new non-nested context, we determine the events that need to be
paused or resumed.
Here we use the ->stop() and ->start() callbacks that provide
lightweight pause/resume modes to the events.
The off-case (no running events having these new exclude properties
set) only adds a single atomic_read() in each hooks: two in the irq
path and two in the softirq path.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
include/linux/perf_event.h | 41 +++++++++-
kernel/perf_event.c | 200 +++++++++++++++++++++++++++++++++++++++-----
kernel/softirq.c | 6 ++
3 files changed, 225 insertions(+), 22 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 48b3157..2bb8516 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -215,8 +215,11 @@ struct perf_event_attr {
*/
precise_ip : 2, /* skid constraint */
mmap_data : 1, /* non-exec mmap data */
+ exclude_task : 1, /* exclude task contexts */
+ exclude_softirq: 1, /* exclude softirq contexts */
+ exclude_hardirq: 1, /* exclude hardirq contexts */
- __reserved_1 : 46;
+ __reserved_1 : 43;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -930,8 +933,13 @@ static inline int is_software_event(struct perf_event *event)
}
extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+extern atomic_t nr_excluded_events;
extern void __perf_sw_event(u32, u64, int, struct pt_regs *, u64);
+extern void __perf_event_hardirq_enter(void);
+extern void __perf_event_hardirq_exit(void);
+extern void __perf_event_softirq_enter(void);
+extern void __perf_event_softirq_exit(void);
#ifndef perf_arch_fetch_caller_regs
static inline void
@@ -968,6 +976,31 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
}
extern void perf_event_mmap(struct vm_area_struct *vma);
+
+static inline void perf_event_hardirq_enter(void)
+{
+ if (atomic_read(&nr_excluded_events))
+ __perf_event_hardirq_enter();
+}
+
+static inline void perf_event_hardirq_exit(void)
+{
+ if (atomic_read(&nr_excluded_events))
+ __perf_event_hardirq_exit();
+}
+
+static inline void perf_event_softirq_enter(void)
+{
+ if (atomic_read(&nr_excluded_events))
+ __perf_event_softirq_enter();
+}
+
+static inline void perf_event_softirq_exit(void)
+{
+ if (atomic_read(&nr_excluded_events))
+ __perf_event_softirq_exit();
+}
+
extern struct perf_guest_info_callbacks *perf_guest_cbs;
extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
@@ -1039,6 +1072,12 @@ static inline int perf_event_task_enable(void) { return -EINVAL; }
static inline void
perf_sw_event(u32 event_id, u64 nr, int nmi,
struct pt_regs *regs, u64 addr) { }
+
+static inline void perf_event_hardirq_enter(void) { }
+static inline void perf_event_hardirq_exit(void) { }
+static inline void perf_event_softirq_enter(void) { }
+static inline void perf_event_softirq_exit(void) { }
+
static inline void
perf_bp_event(struct perf_event *event, void *data) { }
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 7c6502a..1c291e9 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -48,6 +48,7 @@ static atomic_t nr_events __read_mostly;
static atomic_t nr_mmap_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
+atomic_t nr_excluded_events __read_mostly;
/*
* perf event paranoia level:
@@ -642,17 +643,31 @@ event_sched_in(struct perf_event *event,
if (event->state <= PERF_EVENT_STATE_OFF)
return 0;
- event->state = PERF_EVENT_STATE_ACTIVE;
+ if (event->attr.exclude_task)
+ event->state = PERF_EVENT_STATE_PAUSED;
+ else
+ event->state = PERF_EVENT_STATE_ACTIVE;
+
event->oncpu = smp_processor_id();
+
/*
* The new state must be visible before we turn it on in the hardware:
*/
smp_wmb();
- if (event->pmu->enable(event)) {
- event->state = PERF_EVENT_STATE_INACTIVE;
- event->oncpu = -1;
- return -EAGAIN;
+ /*
+ * If we exclude the tasks, we only need to schedule hardware
+ * events that need to settle themselves, even in a pause mode.
+ * Software events can simply be scheduled anytime.
+ * If we want more granularity in all that, we can still provide
+ * later a pmu->reserve callback.
+ */
+ if (!event->attr.exclude_task || !is_software_event(event)) {
+ if (event->pmu->enable(event)) {
+ event->state = PERF_EVENT_STATE_INACTIVE;
+ event->oncpu = -1;
+ return -EAGAIN;
+ }
}
event->tstamp_running += ctx->time - event->tstamp_stopped;
@@ -1191,6 +1206,159 @@ static void perf_event_sync_stat(struct perf_event_context *ctx,
}
}
+static void perf_event_stop(struct perf_event *event)
+{
+ if (!event->pmu->stop)
+ return event->pmu->disable(event);
+
+ return event->pmu->stop(event);
+}
+
+static int perf_event_start(struct perf_event *event)
+{
+ if (!event->pmu->start)
+ return event->pmu->enable(event);
+
+ return event->pmu->start(event);
+}
+
+enum enter_context_t {
+ CONTEXT_HARDIRQ,
+ CONTEXT_SOFTIRQ,
+ CONTEXT_TASK,
+};
+
+static int event_enter_context(enum enter_context_t context,
+ struct perf_event *event)
+{
+ int exclude;
+ int ret = 0;
+
+ switch (context) {
+ case CONTEXT_HARDIRQ:
+ exclude = event->attr.exclude_hardirq;
+ break;
+ case CONTEXT_SOFTIRQ:
+ exclude = event->attr.exclude_softirq;
+ break;
+ case CONTEXT_TASK:
+ exclude = event->attr.exclude_task;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+
+ if (exclude && event->state == PERF_EVENT_STATE_ACTIVE) {
+ event->state = PERF_EVENT_STATE_PAUSED;
+ perf_event_stop(event);
+ } else if (!exclude && event->state == PERF_EVENT_STATE_PAUSED) {
+ event->state = PERF_EVENT_STATE_ACTIVE;
+ ret = perf_event_start(event);
+ }
+
+ return ret;
+}
+
+static void
+group_enter_context(enum enter_context_t context,
+ struct perf_event *group_event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *event;
+
+ if (group_event->state < PERF_EVENT_STATE_PAUSED)
+ return;
+
+ /*
+ * We probably want to make the exclude_* things all the same in a
+ * group, to enforce the group instrumentation and to optmitize this
+ * path.
+ */
+ if (event_enter_context(context, group_event))
+ goto fail;
+
+ list_for_each_entry(event, &group_event->sibling_list, group_entry) {
+ if (event_enter_context(context, event))
+ goto fail;
+ }
+
+ return;
+
+ fail:
+ group_sched_out(group_event, cpuctx, ctx);
+ group_event->state = PERF_EVENT_STATE_ERROR;
+}
+
+static void
+ctx_enter_context(enum enter_context_t context,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *group_event;
+
+ raw_spin_lock(&ctx->lock);
+
+ list_for_each_entry(group_event, &ctx->pinned_groups, group_entry)
+ group_enter_context(context, group_event, cpuctx, ctx);
+
+ list_for_each_entry(group_event, &ctx->flexible_groups, group_entry)
+ group_enter_context(context, group_event, cpuctx, ctx);
+
+ raw_spin_unlock(&ctx->lock);
+}
+
+static void enter_context(enum enter_context_t context)
+{
+ struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
+ struct perf_event_context *ctx = current->perf_event_ctxp;
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ perf_disable();
+
+ ctx_enter_context(context, cpuctx, &cpuctx->ctx);
+ if (ctx)
+ ctx_enter_context(context, cpuctx, ctx);
+
+ perf_enable();
+
+ local_irq_restore(flags);
+}
+
+void __perf_event_hardirq_enter(void)
+{
+ /* Don't account nested cases */
+ if (!hardirq_count())
+ enter_context(CONTEXT_HARDIRQ);
+}
+
+void __perf_event_hardirq_exit(void)
+{
+ /* We are not truly leaving the irq if we nested */
+ if (hardirq_count())
+ return;
+
+ if (softirq_count())
+ enter_context(CONTEXT_SOFTIRQ);
+ else
+ enter_context(CONTEXT_TASK);
+}
+
+void __perf_event_softirq_enter(void)
+{
+ /* Softirqs can't nest */
+ enter_context(CONTEXT_SOFTIRQ);
+}
+
+void __perf_event_softirq_exit(void)
+{
+ /* Softirqs could have only interrupted a task context */
+ enter_context(CONTEXT_TASK);
+}
+
/*
* Called from scheduler to remove the events of the current task,
* with interrupts disabled.
@@ -1506,22 +1674,6 @@ do { \
return div64_u64(dividend, divisor);
}
-static void perf_event_stop(struct perf_event *event)
-{
- if (!event->pmu->stop)
- return event->pmu->disable(event);
-
- return event->pmu->stop(event);
-}
-
-static int perf_event_start(struct perf_event *event)
-{
- if (!event->pmu->start)
- return event->pmu->enable(event);
-
- return event->pmu->start(event);
-}
-
static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
{
struct hw_perf_event *hwc = &event->hw;
@@ -1908,6 +2060,9 @@ static void free_event(struct perf_event *event)
atomic_dec(&nr_comm_events);
if (event->attr.task)
atomic_dec(&nr_task_events);
+ if (event->attr.exclude_task || event->attr.exclude_softirq ||
+ event->attr.exclude_hardirq)
+ atomic_dec(&nr_excluded_events);
}
if (event->buffer) {
@@ -4933,6 +5088,9 @@ done:
atomic_inc(&nr_comm_events);
if (event->attr.task)
atomic_inc(&nr_task_events);
+ if (event->attr.exclude_task || event->attr.exclude_softirq ||
+ event->attr.exclude_hardirq)
+ atomic_inc(&nr_excluded_events);
}
return event;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 825e112..bb31457 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -198,6 +198,8 @@ asmlinkage void __do_softirq(void)
pending = local_softirq_pending();
account_system_vtime(current);
+ perf_event_softirq_enter();
+
__local_bh_disable((unsigned long)__builtin_return_address(0));
lockdep_softirq_enter();
@@ -246,6 +248,8 @@ restart:
account_system_vtime(current);
_local_bh_enable();
+
+ perf_event_softirq_exit();
}
#ifndef __ARCH_HAS_DO_SOFTIRQ
@@ -277,6 +281,7 @@ void irq_enter(void)
{
int cpu = smp_processor_id();
+ perf_event_hardirq_enter();
rcu_irq_enter();
if (idle_cpu(cpu) && !in_interrupt()) {
__irq_enter();
@@ -302,6 +307,7 @@ void irq_exit(void)
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
+ perf_event_hardirq_exit();
rcu_irq_exit();
#ifdef CONFIG_NO_HZ
/* Make sure that timer wheel updates are propagated */
--
1.6.2.3
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 5/5] perf: Support for task/softirq/hardirq exclusion on tools
2010-06-10 3:49 [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Frederic Weisbecker
` (3 preceding siblings ...)
2010-06-10 3:49 ` [PATCH 4/5] perf: Introduce task, softirq and hardirq contexts exclusion Frederic Weisbecker
@ 2010-06-10 3:49 ` Frederic Weisbecker
2010-06-10 6:26 ` [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Ingo Molnar
5 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 3:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Paul Mackerras, Cyrill Gorcunov,
Zhang Yanmin, Steven Rostedt
Bring the following new flags on perf events:
- t = Profile task context
- s = Profile softirq context
- i = Profile hardirq context
Example:
perf record -a -g -e cycles:i ls -R /usr > /dev/null
3.11% ls [kernel.kallsyms] [k] __lock_acquire
|
--- __lock_acquire
|
|--95.83%-- lock_acquire
| _raw_spin_lock
| |
| |--30.43%-- perf_ctx_adjust_freq
| | perf_event_task_tick
| | scheduler_tick
| | update_process_times
| | tick_sched_timer
| | __run_hrtimer
| | hrtimer_interrupt
| | smp_apic_timer_interrupt
| | apic_timer_interrupt
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
tools/perf/util/parse-events.c | 37 ++++++++++++++++++++++++++-----------
1 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 9bf0f40..7a18e71 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -688,24 +688,36 @@ static enum event_result
parse_event_modifier(const char **strp, struct perf_event_attr *attr)
{
const char *str = *strp;
- int exclude = 0;
- int eu = 0, ek = 0, eh = 0, precise = 0;
+ int exclude_ring = 0, exclude_context = 0;
+ int eu = 0, ek = 0, eh = 0, et = 0, es = 0, ei = 0, precise = 0;
if (*str++ != ':')
return 0;
while (*str) {
if (*str == 'u') {
- if (!exclude)
- exclude = eu = ek = eh = 1;
+ if (!exclude_ring)
+ exclude_ring = eu = ek = eh = 1;
eu = 0;
} else if (*str == 'k') {
- if (!exclude)
- exclude = eu = ek = eh = 1;
+ if (!exclude_ring)
+ exclude_ring = eu = ek = eh = 1;
ek = 0;
} else if (*str == 'h') {
- if (!exclude)
- exclude = eu = ek = eh = 1;
+ if (!exclude_ring)
+ exclude_ring = eu = ek = eh = 1;
eh = 0;
+ } else if (*str == 't') {
+ if (!exclude_context)
+ exclude_context = et = es = ei = 1;
+ et = 0;
+ } else if (*str == 's') {
+ if (!exclude_context)
+ exclude_context = et = es = ei = 1;
+ es = 0;
+ } else if (*str == 'i') {
+ if (!exclude_context)
+ exclude_context = et = es = ei = 1;
+ ei = 0;
} else if (*str == 'p') {
precise++;
} else
@@ -715,9 +727,12 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
}
if (str >= *strp + 2) {
*strp = str;
- attr->exclude_user = eu;
- attr->exclude_kernel = ek;
- attr->exclude_hv = eh;
+ attr->exclude_user = eu;
+ attr->exclude_kernel = ek;
+ attr->exclude_hv = eh;
+ attr->exclude_task = et;
+ attr->exclude_softirq = es;
+ attr->exclude_hardirq = ei;
attr->precise_ip = precise;
return 1;
}
--
1.6.2.3
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 0/5] perf events finer grained context instrumentation / context exclusion
2010-06-10 3:49 [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Frederic Weisbecker
` (4 preceding siblings ...)
2010-06-10 3:49 ` [PATCH 5/5] perf: Support for task/softirq/hardirq exclusion on tools Frederic Weisbecker
@ 2010-06-10 6:26 ` Ingo Molnar
2010-06-10 7:15 ` Frederic Weisbecker
2010-06-10 7:31 ` Frederic Weisbecker
5 siblings, 2 replies; 26+ messages in thread
From: Ingo Molnar @ 2010-06-10 6:26 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
* Frederic Weisbecker <fweisbec@gmail.com> wrote:
> Here is the new version of per context exclusion, based on hooks on
> irq_enter/irq_exit. I haven't observed slowdowns but I haven't actually
> measured the impact.
One thing that would be nice to see in this discussion is a comparison of
before/after perf stat --repeat runs.
Something like:
perf stat --repeat ./hackbench 5
Done with full stat, and then also done with hardirqs/softirqs excluded. (i.e.
task context stats only)
I.e. does the feature really give us the expected statistical stability in
results? Does it really exclude hardirq/softirq workloads, etc.?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/5] perf events finer grained context instrumentation / context exclusion
2010-06-10 6:26 ` [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Ingo Molnar
@ 2010-06-10 7:15 ` Frederic Weisbecker
2010-06-10 7:31 ` Frederic Weisbecker
1 sibling, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 7:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 08:26:18AM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > Here is the new version of per context exclusion, based on hooks on
> > irq_enter/irq_exit. I haven't observed slowdowns but I haven't actually
> > measured the impact.
>
> One thing that would be nice to see in this discussion is a comparison of
> before/after perf stat --repeat runs.
>
> Something like:
>
> perf stat --repeat ./hackbench 5
>
> Done with full stat, and then also done with hardirqs/softirqs excluded. (i.e.
> task context stats only)
Right, so I just tried each perf stat default events with :t and it hung up ;-)
(Not severely, I can kill perf stat with ^Z, but still there is something I need
to fix).
>
> I.e. does the feature really give us the expected statistical stability in
> results? Does it really exclude hardirq/softirq workloads, etc.?
But yeah, before posting these patches I gave that a try with the instruction
counter and it didn't change much against the usual results, it's about the same
variations.
I just know the exclusion works by using perf record -g / perf report, as the
callchains are truly reliable against the exclusion rules, and since
counting and samples are treated the same in this scheme (we just deactivate
/reactivate, there is no post blocking or fixup).
I just don't know where the entropy comes from. May be once I'll have the hang
fixed I'll be able to test with all the default stat events and see a better
overview of progress.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/5] perf events finer grained context instrumentation / context exclusion
2010-06-10 6:26 ` [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Ingo Molnar
2010-06-10 7:15 ` Frederic Weisbecker
@ 2010-06-10 7:31 ` Frederic Weisbecker
2010-06-10 10:16 ` Ingo Molnar
1 sibling, 1 reply; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 7:31 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 08:26:18AM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > Here is the new version of per context exclusion, based on hooks on
> > irq_enter/irq_exit. I haven't observed slowdowns but I haven't actually
> > measured the impact.
>
> One thing that would be nice to see in this discussion is a comparison of
> before/after perf stat --repeat runs.
>
> Something like:
>
> perf stat --repeat ./hackbench 5
>
> Done with full stat, and then also done with hardirqs/softirqs excluded. (i.e.
> task context stats only)
>
> I.e. does the feature really give us the expected statistical stability in
> results? Does it really exclude hardirq/softirq workloads, etc.?
>
> Thanks,
>
> Ingo
Just got some results:
$ sudo ./perf stat -e instructions -e cycles -e branches -e branch-misses -v -r 10 ./hackbench 5
Performance counter stats for './hackbench 5' (10 runs):
1313640764 instructions # 0,241 IPC ( +- 1,393% ) (scaled from 100,05%)
5440853130 cycles ( +- 0,925% ) (scaled from 100,05%)
214737441 branches ( +- 0,948% )
12332109 branch-misses # 5,743 % ( +- 1,239% )
1,727051101 seconds time elapsed ( +- 0,897% )
$ sudo ./perf stat -e instructions:t -e cycles:t -e branches:t -e branch-misses:t -v -r 10 ./hackbench 5
Performance counter stats for './hackbench 5' (10 runs):
1293802776 instructions # 0,245 IPC ( +- 0,343% )
5280769301 cycles ( +- 0,471% ) (scaled from 100,02%)
209495435 branches ( +- 0,392% )
11890938 branch-misses # 5,676 % ( +- 0,491% )
1,750534923 seconds time elapsed ( +- 0,463% )
So yeah, the results look a bit better. Still not perfects:
- we are still instrumenting the tiny parts between the true interrupt
and irq_enter() (same for irq_exit() and the end). Same for softirqs.
- random randomnesses...
Another try, this time with a kernel downloading in parallel, to generate
network interrupts:
$ sudo ./perf stat -e instructions -e cycles -e branches -e branch-misses -v -r 10 ./hackbench 5
Performance counter stats for './hackbench 5' (10 runs):
1324759169 instructions # 0,244 IPC ( +- 0,494% ) (scaled from 100,09%)
5424824320 cycles ( +- 0,503% )
214443106 branches ( +- 0,516% )
12245614 branch-misses # 5,710 % ( +- 0,604% )
1,723413199 seconds time elapsed ( +- 0,483% )
$ sudo ./perf stat -e instructions:t -e cycles:t -e branches:t -e branch-misses:t -v -r 10 ./hackbench 5
Performance counter stats for './hackbench 5' (10 runs):
1292119132 instructions # 0,251 IPC ( +- 0,138% )
5138407131 cycles ( +- 2,708% )
209052068 branches ( +- 0,139% )
11835090 branch-misses # 5,661 % ( +- 0,105% )
1,752192124 seconds time elapsed ( +- 0,278% )
Again, globally better, except for the cycles this time.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/5] perf events finer grained context instrumentation / context exclusion
2010-06-10 7:31 ` Frederic Weisbecker
@ 2010-06-10 10:16 ` Ingo Molnar
2010-06-10 17:03 ` Frederic Weisbecker
0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2010-06-10 10:16 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
* Frederic Weisbecker <fweisbec@gmail.com> wrote:
> Performance counter stats for './hackbench 5' (10 runs):
>
> 1313640764 instructions # 0,241 IPC ( +- 1,393% ) (scaled from 100,05%)
> 214737441 branches ( +- 0,948% )
>
> 1293802776 instructions # 0,245 IPC ( +- 0,343% )
> 209495435 branches ( +- 0,392% )
Indeed it's about 4 times less noise, not bad.
Cycles is fundamentally random.
> So yeah, the results look a bit better. Still not perfects:
>
> - we are still instrumenting the tiny parts between the true interrupt
> and irq_enter() (same for irq_exit() and the end). Same for softirqs.
>
> - random randomnesses...
Random randomness shouldnt occur for something like instructions or branches.
Could you try some 'must not be variable' workload, like:
taskset 1 ./hackbench 1
If the workload is pinned to a single CPU then it ought to not be variable at
all. (modulo things like hash chain lengths and slab caching details, but
those should not cause 0.4% kind of noise IMO)
Btw., we could try to record all branches of an execution (using BTS, of a
relatively short but static-length run), and see where the variance comes
from. I doubt the current BTS code is ready for that, but it would be 'the'
magic trace-from-hell that includes all execution of the task, recorded at the
hardware level.
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 3:49 ` [PATCH 1/5] perf: Provide a proper stop action for software events Frederic Weisbecker
@ 2010-06-10 10:46 ` Peter Zijlstra
2010-06-10 11:10 ` Peter Zijlstra
2010-06-10 12:06 ` Ingo Molnar
0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2010-06-10 10:46 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, 2010-06-10 at 05:49 +0200, Frederic Weisbecker wrote:
> In order to introduce new context exclusions, software events will
> have to eventually stop when needed. We'll want perf_event_stop() to
> act on every events.
>
> To achieve this, remove the stub stop/start pmu callbacks of software
> and tracepoint events.
>
> This may even optimize the case of hardware and software events
> running at the same time: now we only stop/start all hardware
> events if we reset a hardware event period, not anymore with
> software events.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
> kernel/perf_event.c | 29 ++++++++++++++++-------------
> 1 files changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index c772a3d..5c004f7 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -1541,11 +1541,23 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
> hwc->sample_period = sample_period;
>
> if (local64_read(&hwc->period_left) > 8*sample_period) {
> - perf_disable();
> - perf_event_stop(event);
> + bool software_event = is_software_event(event);
> +
> + /*
> + * Only hardware events need their irq period to be
> + * reprogrammed
> + */
> + if (!software_event) {
> + perf_disable();
> + perf_event_stop(event);
> + }
> +
> local64_set(&hwc->period_left, 0);
> - perf_event_start(event);
> - perf_enable();
> +
> + if (!software_event) {
> + perf_event_start(event);
> + perf_enable();
> + }
> }
> }
>
> @@ -4286,16 +4298,9 @@ static void perf_swevent_void(struct perf_event *event)
> {
> }
>
> -static int perf_swevent_int(struct perf_event *event)
> -{
> - return 0;
> -}
> -
> static const struct pmu perf_ops_generic = {
> .enable = perf_swevent_enable,
> .disable = perf_swevent_disable,
> - .start = perf_swevent_int,
> - .stop = perf_swevent_void,
> .read = perf_swevent_read,
> .unthrottle = perf_swevent_void, /* hwc->interrupts already reset */
> };
> @@ -4578,8 +4583,6 @@ static int swevent_hlist_get(struct perf_event *event)
> static const struct pmu perf_ops_tracepoint = {
> .enable = perf_trace_enable,
> .disable = perf_trace_disable,
> - .start = perf_swevent_int,
> - .stop = perf_swevent_void,
> .read = perf_swevent_read,
> .unthrottle = perf_swevent_void,
> };
I really don't like this.. we should be removing differences between
software and hardware pmu implementations, not add more :/
Something like the below would work, the only 'problem' is that it grows
hw_perf_event.
---
include/linux/perf_event.h | 1 +
kernel/perf_event.c | 27 ++++++++++++++++++---------
2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9073bde..2292659 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -531,6 +531,7 @@ struct hw_perf_event {
struct { /* software */
s64 remaining;
struct hrtimer hrtimer;
+ int stopped;
};
#ifdef CONFIG_HAVE_HW_BREAKPOINT
/* breakpoint */
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 403d180..14b691e 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4113,6 +4113,9 @@ static int perf_swevent_match(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
+ if (event->hw.stopped)
+ return 0;
+
if (event->attr.type != type)
return 0;
@@ -4282,22 +4285,28 @@ static void perf_swevent_disable(struct perf_event *event)
hlist_del_rcu(&event->hlist_entry);
}
-static void perf_swevent_void(struct perf_event *event)
+static void perf_swevent_throttle(struct perf_event *event)
{
+ /* hwc->interrupts already reset */
}
-static int perf_swevent_int(struct perf_event *event)
+static int perf_swevent_start(struct perf_event *event)
{
- return 0;
+ event->hw.stopped = 0;
+}
+
+static void perf_swevent_throttle(struct perf_event *event)
+{
+ event->hw.stopped = 1;
}
static const struct pmu perf_ops_generic = {
.enable = perf_swevent_enable,
.disable = perf_swevent_disable,
- .start = perf_swevent_int,
- .stop = perf_swevent_void,
+ .start = perf_swevent_start,
+ .stop = perf_swevent_stop,
.read = perf_swevent_read,
- .unthrottle = perf_swevent_void, /* hwc->interrupts already reset */
+ .unthrottle = perf_swevent_throttle,
};
/*
@@ -4578,10 +4587,10 @@ static int swevent_hlist_get(struct perf_event *event)
static const struct pmu perf_ops_tracepoint = {
.enable = perf_trace_enable,
.disable = perf_trace_disable,
- .start = perf_swevent_int,
- .stop = perf_swevent_void,
+ .start = perf_swevent_start,
+ .stop = perf_swevent_stop,
.read = perf_swevent_read,
- .unthrottle = perf_swevent_void,
+ .unthrottle = perf_swevent_throttle,
};
static int perf_tp_filter_match(struct perf_event *event,
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 2/5] perf: Support disable() after stop() on software events
2010-06-10 3:49 ` [PATCH 2/5] perf: Support disable() after stop() on " Frederic Weisbecker
@ 2010-06-10 10:50 ` Peter Zijlstra
2010-06-10 16:31 ` Frederic Weisbecker
0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2010-06-10 10:50 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, 2010-06-10 at 05:49 +0200, Frederic Weisbecker wrote:
> If we call perf_event_stop() on a software event and then the
> disable() pmu callback on them after that, we'll call twice
> hlist_del_rcu() on the same hlist node and then bring a crash
> by dereferencing LIST_POISON2.
>
> Just use hlist_del_init_rcu() instead to fix this problem.
>
> This preparates for new context exclusions.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
> kernel/perf_event.c | 2 +-
> kernel/trace/trace_event_perf.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index 5c004f7..2d818bc 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -4291,7 +4291,7 @@ static int perf_swevent_enable(struct perf_event *event)
>
> static void perf_swevent_disable(struct perf_event *event)
> {
> - hlist_del_rcu(&event->hlist_entry);
> + hlist_del_init_rcu(&event->hlist_entry);
> }
>
> static void perf_swevent_void(struct perf_event *event)
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index 4799d70..7bc1f26 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -122,7 +122,7 @@ int perf_trace_enable(struct perf_event *p_event)
>
> void perf_trace_disable(struct perf_event *p_event)
> {
> - hlist_del_rcu(&p_event->hlist_entry);
> + hlist_del_init_rcu(&p_event->hlist_entry);
> }
>
> void perf_trace_destroy(struct perf_event *p_event)
Ok, so then why did you need the first patch?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/5] perf: New PERF_EVENT_STATE_PAUSED event state
2010-06-10 3:49 ` [PATCH 3/5] perf: New PERF_EVENT_STATE_PAUSED event state Frederic Weisbecker
@ 2010-06-10 10:55 ` Peter Zijlstra
2010-06-10 16:26 ` Frederic Weisbecker
0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2010-06-10 10:55 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, 2010-06-10 at 05:49 +0200, Frederic Weisbecker wrote:
> This brings a new PERF_EVENT_STATE_PAUSED state. It means the events
> is enabled but we don't want it to run, it must be in the same state
> than after a pmu->stop() call. So the event has been reserved and
> allocated and it is ready to start after a pmu->start() call.
>
> It is deemed for hardware events when we want them to be reserved on
> the cpu and ready to be started anytime. This is going to be useful
> for the new context exclusion that will follow.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
> arch/x86/kernel/cpu/perf_event.c | 6 ++++--
> include/linux/perf_event.h | 3 ++-
> kernel/perf_event.c | 7 ++++---
> 3 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index f2da20f..9b0e52f 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -839,7 +839,8 @@ void hw_perf_enable(void)
> match_prev_assignment(hwc, cpuc, i))
> continue;
>
> - x86_pmu_stop(event);
> + if (event->state != PERF_EVENT_STATE_PAUSED)
> + x86_pmu_stop(event);
> }
>
> for (i = 0; i < cpuc->n_events; i++) {
> @@ -851,7 +852,8 @@ void hw_perf_enable(void)
> else if (i < n_running)
> continue;
>
> - x86_pmu_start(event);
> + if (event->state != PERF_EVENT_STATE_PAUSED)
> + x86_pmu_start(event);
> }
> cpuc->n_added = 0;
> perf_events_lapic_init();
Shouldn't that latter be == PAUSED?
Also, you'll have to audit all struct pmu implementations that
stop/disable or disable/disable is good.
Also, I'd rather keep the whole event->state knowledge in the generic
code.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/5] perf: Introduce task, softirq and hardirq contexts exclusion
2010-06-10 3:49 ` [PATCH 4/5] perf: Introduce task, softirq and hardirq contexts exclusion Frederic Weisbecker
@ 2010-06-10 11:01 ` Peter Zijlstra
2010-06-10 16:24 ` Frederic Weisbecker
0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2010-06-10 11:01 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, 2010-06-10 at 05:49 +0200, Frederic Weisbecker wrote:
> @@ -642,17 +643,31 @@ event_sched_in(struct perf_event *event,
> if (event->state <= PERF_EVENT_STATE_OFF)
> return 0;
>
> - event->state = PERF_EVENT_STATE_ACTIVE;
> + if (event->attr.exclude_task)
> + event->state = PERF_EVENT_STATE_PAUSED;
> + else
> + event->state = PERF_EVENT_STATE_ACTIVE;
> +
> event->oncpu = smp_processor_id();
> +
Aah, so that is why you added the PAUSE state knowledge to the arch
code, you want to be able to call ->enable() on a PAUSEd event.
That means you need to audit/touch all implementations anyway, isn't
there a better interface we can use, like maybe extend ->enable() with a
flags argument?
> /*
> * The new state must be visible before we turn it on in the hardware:
> */
> smp_wmb();
>
> - if (event->pmu->enable(event)) {
> - event->state = PERF_EVENT_STATE_INACTIVE;
> - event->oncpu = -1;
> - return -EAGAIN;
> + /*
> + * If we exclude the tasks, we only need to schedule hardware
> + * events that need to settle themselves, even in a pause mode.
> + * Software events can simply be scheduled anytime.
> + * If we want more granularity in all that, we can still provide
> + * later a pmu->reserve callback.
> + */
> + if (!event->attr.exclude_task || !is_software_event(event)) {
> + if (event->pmu->enable(event)) {
> + event->state = PERF_EVENT_STATE_INACTIVE;
> + event->oncpu = -1;
> + return -EAGAIN;
> + }
> }
>
> event->tstamp_running += ctx->time - event->tstamp_stopped;
Remove is_software_event(), not add more.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 10:46 ` Peter Zijlstra
@ 2010-06-10 11:10 ` Peter Zijlstra
2010-06-10 16:12 ` Frederic Weisbecker
2010-06-10 12:06 ` Ingo Molnar
1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2010-06-10 11:10 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, 2010-06-10 at 12:46 +0200, Peter Zijlstra wrote:
>
> Something like the below would work, the only 'problem' is that it grows
> hw_perf_event.
If we do the whole PAUSEd thing right, we'd not need this I think.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 10:46 ` Peter Zijlstra
2010-06-10 11:10 ` Peter Zijlstra
@ 2010-06-10 12:06 ` Ingo Molnar
1 sibling, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2010-06-10 12:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Frederic Weisbecker, LKML, Arnaldo Carvalho de Melo,
Paul Mackerras, Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin,
Steven Rostedt
* Peter Zijlstra <peterz@infradead.org> wrote:
> Something like the below would work, the only 'problem' is that it grows
> hw_perf_event.
> @@ -531,6 +531,7 @@ struct hw_perf_event {
> struct { /* software */
> s64 remaining;
> struct hrtimer hrtimer;
> + int stopped;
IMO that's ok.
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 11:10 ` Peter Zijlstra
@ 2010-06-10 16:12 ` Frederic Weisbecker
2010-06-10 16:16 ` Peter Zijlstra
0 siblings, 1 reply; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 16:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 01:10:42PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 12:46 +0200, Peter Zijlstra wrote:
> >
> > Something like the below would work, the only 'problem' is that it grows
> > hw_perf_event.
>
> If we do the whole PAUSEd thing right, we'd not need this I think.
It's not needed, and moreover software_pmu:stop/start() can be the same
than software:pmu:disable/enable() without the need to add another check
in the fast path.
But we need perf_event_stop/start() to work on software events. And in fact
now that we use the hlist_del_init, it's safe, but a bit wasteful in
the period reset path. That's another problem that is not critical, but
if you want to solve this by ripping the differences between software and
hardware (which I agree with), we need a ->reset_period callback.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 16:12 ` Frederic Weisbecker
@ 2010-06-10 16:16 ` Peter Zijlstra
2010-06-10 16:29 ` Frederic Weisbecker
2010-06-10 19:54 ` Frederic Weisbecker
0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2010-06-10 16:16 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, 2010-06-10 at 18:12 +0200, Frederic Weisbecker wrote:
> On Thu, Jun 10, 2010 at 01:10:42PM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-06-10 at 12:46 +0200, Peter Zijlstra wrote:
> > >
> > > Something like the below would work, the only 'problem' is that it grows
> > > hw_perf_event.
> >
> > If we do the whole PAUSEd thing right, we'd not need this I think.
>
>
> It's not needed, and moreover software_pmu:stop/start() can be the same
> than software:pmu:disable/enable() without the need to add another check
> in the fast path.
>
> But we need perf_event_stop/start() to work on software events. And in fact
> now that we use the hlist_del_init, it's safe, but a bit wasteful in
> the period reset path. That's another problem that is not critical, but
> if you want to solve this by ripping the differences between software and
> hardware (which I agree with), we need a ->reset_period callback.
>
Why? ->start() should reprogram the hardware, so a
->stop()/poke-at-state/->start() cycle is much more flexible.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/5] perf: Introduce task, softirq and hardirq contexts exclusion
2010-06-10 11:01 ` Peter Zijlstra
@ 2010-06-10 16:24 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 16:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 01:01:27PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 05:49 +0200, Frederic Weisbecker wrote:
> > @@ -642,17 +643,31 @@ event_sched_in(struct perf_event *event,
> > if (event->state <= PERF_EVENT_STATE_OFF)
> > return 0;
> >
> > - event->state = PERF_EVENT_STATE_ACTIVE;
> > + if (event->attr.exclude_task)
> > + event->state = PERF_EVENT_STATE_PAUSED;
> > + else
> > + event->state = PERF_EVENT_STATE_ACTIVE;
> > +
> > event->oncpu = smp_processor_id();
> > +
>
> Aah, so that is why you added the PAUSE state knowledge to the arch
> code, you want to be able to call ->enable() on a PAUSEd event.
>
> That means you need to audit/touch all implementations anyway, isn't
> there a better interface we can use, like maybe extend ->enable() with a
> flags argument?
The problem is more on hw_perf_enable/disable() as this is the place
where we do the true activations.
Moreover we also need to keep track of this paused state from the
generic code.
If I add a flag in enable(), this is going to add the same check
everywhere, and in fact hardware event would need to maintain
an internal paused state while this is something we have in the
generic code already.
> > /*
> > * The new state must be visible before we turn it on in the hardware:
> > */
> > smp_wmb();
> >
> > - if (event->pmu->enable(event)) {
> > - event->state = PERF_EVENT_STATE_INACTIVE;
> > - event->oncpu = -1;
> > - return -EAGAIN;
> > + /*
> > + * If we exclude the tasks, we only need to schedule hardware
> > + * events that need to settle themselves, even in a pause mode.
> > + * Software events can simply be scheduled anytime.
> > + * If we want more granularity in all that, we can still provide
> > + * later a pmu->reserve callback.
> > + */
> > + if (!event->attr.exclude_task || !is_software_event(event)) {
> > + if (event->pmu->enable(event)) {
> > + event->state = PERF_EVENT_STATE_INACTIVE;
> > + event->oncpu = -1;
> > + return -EAGAIN;
> > + }
> > }
> >
> > event->tstamp_running += ctx->time - event->tstamp_stopped;
>
> Remove is_software_event(), not add more.
I don't like this either, that's why I talked about the potential need
of a pmu:reserve() callback in the comments.
ie, pmu->reserve() would mean that the pmu knows how to deal with the
PAUSED state. If a pmu doesn't have it, then we would need to call
->enable() without setting the PAUSED state, so that exclude_* things
don't work with them (as we haven't audited the code).
What do you think? That would solve the audit everywhere problem
plus the is_software_event() problem.
pmu->reserve() would be the same than pmu->enable on x86 pmu, and
it would be a stub on software events. On x86 we would just need
to use the generic PAUSED state to make the difference, as it
would be a waste to maintain a parallel internal state.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/5] perf: New PERF_EVENT_STATE_PAUSED event state
2010-06-10 10:55 ` Peter Zijlstra
@ 2010-06-10 16:26 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 16:26 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 12:55:17PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 05:49 +0200, Frederic Weisbecker wrote:
> > This brings a new PERF_EVENT_STATE_PAUSED state. It means the events
> > is enabled but we don't want it to run, it must be in the same state
> > than after a pmu->stop() call. So the event has been reserved and
> > allocated and it is ready to start after a pmu->start() call.
> >
> > It is deemed for hardware events when we want them to be reserved on
> > the cpu and ready to be started anytime. This is going to be useful
> > for the new context exclusion that will follow.
> >
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Stephane Eranian <eranian@google.com>
> > Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> > Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > ---
> > arch/x86/kernel/cpu/perf_event.c | 6 ++++--
> > include/linux/perf_event.h | 3 ++-
> > kernel/perf_event.c | 7 ++++---
> > 3 files changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> > index f2da20f..9b0e52f 100644
> > --- a/arch/x86/kernel/cpu/perf_event.c
> > +++ b/arch/x86/kernel/cpu/perf_event.c
> > @@ -839,7 +839,8 @@ void hw_perf_enable(void)
> > match_prev_assignment(hwc, cpuc, i))
> > continue;
> >
> > - x86_pmu_stop(event);
> > + if (event->state != PERF_EVENT_STATE_PAUSED)
> > + x86_pmu_stop(event);
> > }
> >
> > for (i = 0; i < cpuc->n_events; i++) {
> > @@ -851,7 +852,8 @@ void hw_perf_enable(void)
> > else if (i < n_running)
> > continue;
> >
> > - x86_pmu_start(event);
> > + if (event->state != PERF_EVENT_STATE_PAUSED)
> > + x86_pmu_start(event);
> > }
> > cpuc->n_added = 0;
> > perf_events_lapic_init();
>
> Also, I'd rather keep the whole event->state knowledge in the generic
> code.
Yeah, but if we do this, we need to maintain the exact same state in the
arch level. We need this for every pmus.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 16:16 ` Peter Zijlstra
@ 2010-06-10 16:29 ` Frederic Weisbecker
2010-06-10 16:38 ` Peter Zijlstra
2010-06-10 19:54 ` Frederic Weisbecker
1 sibling, 1 reply; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 16:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 06:16:16PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 18:12 +0200, Frederic Weisbecker wrote:
> > On Thu, Jun 10, 2010 at 01:10:42PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-06-10 at 12:46 +0200, Peter Zijlstra wrote:
> > > >
> > > > Something like the below would work, the only 'problem' is that it grows
> > > > hw_perf_event.
> > >
> > > If we do the whole PAUSEd thing right, we'd not need this I think.
> >
> >
> > It's not needed, and moreover software_pmu:stop/start() can be the same
> > than software:pmu:disable/enable() without the need to add another check
> > in the fast path.
> >
> > But we need perf_event_stop/start() to work on software events. And in fact
> > now that we use the hlist_del_init, it's safe, but a bit wasteful in
> > the period reset path. That's another problem that is not critical, but
> > if you want to solve this by ripping the differences between software and
> > hardware (which I agree with), we need a ->reset_period callback.
>
>
> Why? ->start() should reprogram the hardware, so a
> ->stop()/poke-at-state/->start() cycle is much more flexible.
Imagine you have several software and hardware events running on the
same cpu. Each time you reset this period for a software event, you do
a hw_pmu_disable() / hw_pmu_enable(), which writes/read the hardware
register for each hardware events, amongst other wasteful things.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/5] perf: Support disable() after stop() on software events
2010-06-10 10:50 ` Peter Zijlstra
@ 2010-06-10 16:31 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 16:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 12:50:17PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 05:49 +0200, Frederic Weisbecker wrote:
> > If we call perf_event_stop() on a software event and then the
> > disable() pmu callback on them after that, we'll call twice
> > hlist_del_rcu() on the same hlist node and then bring a crash
> > by dereferencing LIST_POISON2.
> >
> > Just use hlist_del_init_rcu() instead to fix this problem.
> >
> > This preparates for new context exclusions.
> >
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Stephane Eranian <eranian@google.com>
> > Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> > Cc: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > ---
> > kernel/perf_event.c | 2 +-
> > kernel/trace/trace_event_perf.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> > index 5c004f7..2d818bc 100644
> > --- a/kernel/perf_event.c
> > +++ b/kernel/perf_event.c
> > @@ -4291,7 +4291,7 @@ static int perf_swevent_enable(struct perf_event *event)
> >
> > static void perf_swevent_disable(struct perf_event *event)
> > {
> > - hlist_del_rcu(&event->hlist_entry);
> > + hlist_del_init_rcu(&event->hlist_entry);
> > }
> >
> > static void perf_swevent_void(struct perf_event *event)
> > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > index 4799d70..7bc1f26 100644
> > --- a/kernel/trace/trace_event_perf.c
> > +++ b/kernel/trace/trace_event_perf.c
> > @@ -122,7 +122,7 @@ int perf_trace_enable(struct perf_event *p_event)
> >
> > void perf_trace_disable(struct perf_event *p_event)
> > {
> > - hlist_del_rcu(&p_event->hlist_entry);
> > + hlist_del_init_rcu(&p_event->hlist_entry);
> > }
> >
> > void perf_trace_destroy(struct perf_event *p_event)
>
>
> Ok, so then why did you need the first patch?
Because we need perf_event_stop/start to act on software events too.
The perf_event_enable,stop/start,perf_event_disable avoidance for
software is more an optimization that could be thought in another
patchset though.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 16:29 ` Frederic Weisbecker
@ 2010-06-10 16:38 ` Peter Zijlstra
2010-06-10 17:04 ` Frederic Weisbecker
0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2010-06-10 16:38 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, 2010-06-10 at 18:29 +0200, Frederic Weisbecker wrote:
> Imagine you have several software and hardware events running on the
> same cpu. Each time you reset this period for a software event, you do
> a hw_pmu_disable() / hw_pmu_enable(), which writes/read the hardware
> register for each hardware events, amongst other wasteful things.
hw_perf_disable/enable() are on their way out. They should be replaced
with a struct pmu callback. We must remove all these weak functions if
we want to support multiple pmus.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/5] perf events finer grained context instrumentation / context exclusion
2010-06-10 10:16 ` Ingo Molnar
@ 2010-06-10 17:03 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 17:03 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 12:16:37PM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > Performance counter stats for './hackbench 5' (10 runs):
> >
> > 1313640764 instructions # 0,241 IPC ( +- 1,393% ) (scaled from 100,05%)
> > 214737441 branches ( +- 0,948% )
> >
> > 1293802776 instructions # 0,245 IPC ( +- 0,343% )
> > 209495435 branches ( +- 0,392% )
>
> Indeed it's about 4 times less noise, not bad.
>
> Cycles is fundamentally random.
>
> > So yeah, the results look a bit better. Still not perfects:
> >
> > - we are still instrumenting the tiny parts between the true interrupt
> > and irq_enter() (same for irq_exit() and the end). Same for softirqs.
> >
> > - random randomnesses...
>
> Random randomness shouldnt occur for something like instructions or branches.
>
> Could you try some 'must not be variable' workload, like:
>
> taskset 1 ./hackbench 1
>
> If the workload is pinned to a single CPU then it ought to not be variable at
> all. (modulo things like hash chain lengths and slab caching details, but
> those should not cause 0.4% kind of noise IMO)
Good idea, with that we have at least less variations between profiles.
Now the results:
$ sudo ./perf stat -e instructions -e cycles -e branches -e branch-misses -v -r 10 taskset 1 ./hackbench 1
Performance counter stats for 'taskset 1 ./hackbench 1' (10 runs):
318090069 instructions # 0,371 IPC ( +- 2,238% )
856426449 cycles ( +- 2,207% )
51704292 branches ( +- 2,264% )
2321798 branch-misses # 4,491 % ( +- 2,815% )
0,541982879 seconds time elapsed ( +- 2,185% )
$ sudo ./perf stat -e instructions:t -e cycles:t -e branches:t -e branch-misses:t -v -r 10 taskset 1 ./hackbench 1
Performance counter stats for 'taskset 1 ./hackbench 1' (10 runs):
305852952 instructions # 0,371 IPC ( +- 1,775% )
823521707 cycles ( +- 1,753% )
49712722 branches ( +- 1,801% )
2210546 branch-misses # 4,447 % ( +- 2,219% )
0,538258337 seconds time elapsed ( +- 1,737% )
I did the same tests by deactivating my secondary cpu (to deactivate SMT)
but there the result were about the same between :t and non :t
>
> Btw., we could try to record all branches of an execution (using BTS, of a
> relatively short but static-length run), and see where the variance comes
> from. I doubt the current BTS code is ready for that, but it would be 'the'
> magic trace-from-hell that includes all execution of the task, recorded at the
> hardware level.
Agreed, we could cook a nice diff graph about this.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 16:38 ` Peter Zijlstra
@ 2010-06-10 17:04 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 17:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 06:38:34PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 18:29 +0200, Frederic Weisbecker wrote:
>
> > Imagine you have several software and hardware events running on the
> > same cpu. Each time you reset this period for a software event, you do
> > a hw_pmu_disable() / hw_pmu_enable(), which writes/read the hardware
> > register for each hardware events, amongst other wasteful things.
>
> hw_perf_disable/enable() are on their way out. They should be replaced
> with a struct pmu callback. We must remove all these weak functions if
> we want to support multiple pmus.
Ok, that's a good news.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/5] perf: Provide a proper stop action for software events
2010-06-10 16:16 ` Peter Zijlstra
2010-06-10 16:29 ` Frederic Weisbecker
@ 2010-06-10 19:54 ` Frederic Weisbecker
1 sibling, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2010-06-10 19:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Paul Mackerras,
Stephane Eranian, Cyrill Gorcunov, Zhang Yanmin, Steven Rostedt
On Thu, Jun 10, 2010 at 06:16:16PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 18:12 +0200, Frederic Weisbecker wrote:
> > On Thu, Jun 10, 2010 at 01:10:42PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-06-10 at 12:46 +0200, Peter Zijlstra wrote:
> > > >
> > > > Something like the below would work, the only 'problem' is that it grows
> > > > hw_perf_event.
> > >
> > > If we do the whole PAUSEd thing right, we'd not need this I think.
> >
> >
> > It's not needed, and moreover software_pmu:stop/start() can be the same
> > than software:pmu:disable/enable() without the need to add another check
> > in the fast path.
> >
> > But we need perf_event_stop/start() to work on software events. And in fact
> > now that we use the hlist_del_init, it's safe, but a bit wasteful in
> > the period reset path. That's another problem that is not critical, but
> > if you want to solve this by ripping the differences between software and
> > hardware (which I agree with), we need a ->reset_period callback.
> >
> Why? ->start() should reprogram the hardware, so a
> ->stop()/poke-at-state/->start() cycle is much more flexible.
Reconsidering the situation after remembering the race with software
events on period adjusting:
In fact, if we want to support start/stop on software events, we still
need the if (!software event) in perf_adjust_period(), otherwise
start and stop may race on a software event with the hlist ops.
So it's now both useless and dangerous.
What about keeping this software event check for now?
Once we'll have a pmu:disable_all()/enable_all(), this
can serve as a more appropriate check later.
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2010-06-10 19:54 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-10 3:49 [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 1/5] perf: Provide a proper stop action for software events Frederic Weisbecker
2010-06-10 10:46 ` Peter Zijlstra
2010-06-10 11:10 ` Peter Zijlstra
2010-06-10 16:12 ` Frederic Weisbecker
2010-06-10 16:16 ` Peter Zijlstra
2010-06-10 16:29 ` Frederic Weisbecker
2010-06-10 16:38 ` Peter Zijlstra
2010-06-10 17:04 ` Frederic Weisbecker
2010-06-10 19:54 ` Frederic Weisbecker
2010-06-10 12:06 ` Ingo Molnar
2010-06-10 3:49 ` [PATCH 2/5] perf: Support disable() after stop() on " Frederic Weisbecker
2010-06-10 10:50 ` Peter Zijlstra
2010-06-10 16:31 ` Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 3/5] perf: New PERF_EVENT_STATE_PAUSED event state Frederic Weisbecker
2010-06-10 10:55 ` Peter Zijlstra
2010-06-10 16:26 ` Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 4/5] perf: Introduce task, softirq and hardirq contexts exclusion Frederic Weisbecker
2010-06-10 11:01 ` Peter Zijlstra
2010-06-10 16:24 ` Frederic Weisbecker
2010-06-10 3:49 ` [PATCH 5/5] perf: Support for task/softirq/hardirq exclusion on tools Frederic Weisbecker
2010-06-10 6:26 ` [PATCH 0/5] perf events finer grained context instrumentation / context exclusion Ingo Molnar
2010-06-10 7:15 ` Frederic Weisbecker
2010-06-10 7:31 ` Frederic Weisbecker
2010-06-10 10:16 ` Ingo Molnar
2010-06-10 17:03 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox