* [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
2013-06-18 19:21 [PATCH 0/3] tracing/perf: perf_trace_buf/perf_xxx hacks Oleg Nesterov
@ 2013-06-18 19:22 ` Oleg Nesterov
2013-06-18 20:02 ` Steven Rostedt
0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-18 19:22 UTC (permalink / raw)
To: Steven Rostedt, Peter Zijlstra
Cc: Frederic Weisbecker, Ingo Molnar, Masami Hiramatsu,
Srikar Dronamraju, zhangwei(Jovi), linux-kernel
perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.
However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/trace/ftrace.h | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 8886877..04455b8 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -663,6 +663,12 @@ perf_trace_##call(void *__data, proto) \
int rctx; \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+ \
+ head = this_cpu_ptr(event_call->perf_events); \
+ if (__builtin_constant_p(!__task) && !__task && \
+ hlist_empty(head)) \
+ return; \
+ \
__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
sizeof(u64)); \
__entry_size -= sizeof(u32); \
@@ -677,7 +683,6 @@ perf_trace_##call(void *__data, proto) \
\
{ assign; } \
\
- head = this_cpu_ptr(event_call->perf_events); \
perf_trace_buf_submit(entry, __entry_size, rctx, __addr, \
__count, &__regs, head, __task); \
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
2013-06-18 19:22 ` [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
@ 2013-06-18 20:02 ` Steven Rostedt
2013-06-19 18:12 ` Oleg Nesterov
0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2013-06-18 20:02 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Frederic Weisbecker, Ingo Molnar,
Masami Hiramatsu, Srikar Dronamraju, zhangwei(Jovi), linux-kernel
On Tue, 2013-06-18 at 21:22 +0200, Oleg Nesterov wrote:
> perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
> make no sense if hlist_empty(head). Change perf_trace_##call()
> to check ->perf_events beforehand and do nothing if it is empty.
>
> However, we can only do this if __task == NULL, so we also add
> the __builtin_constant_p(__task) check.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> include/trace/ftrace.h | 7 ++++++-
> 1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index 8886877..04455b8 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -663,6 +663,12 @@ perf_trace_##call(void *__data, proto) \
> int rctx; \
> \
> __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> + \
> + head = this_cpu_ptr(event_call->perf_events); \
> + if (__builtin_constant_p(!__task) && !__task && \
I'm trying to wrap my head around this:
__builtin_constant_p(!task)
is this the same as:
!__builtin_constant_p(task)
Or is it the same as:
__builtin_constant_p(task)
?
Because that '!' is confusing the heck out of me.
If !task is a constant, wouldn't task be a constant too, and if task is
not a constant then I would also assume !task is not a constant as well.
If this is the case, can we nuke the '!' from the builtin_consant_p().
The code is confusing enough as is. Or is it that the code is very
confusing and in keeping with the coding style, you are trying to come
up with new ways of adding to the confusion.
Or is this your way to confuse me as much as my code has confused
you? ;-)
-- Steve
> + hlist_empty(head)) \
> + return; \
> + \
> __entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
> sizeof(u64)); \
> __entry_size -= sizeof(u32); \
> @@ -677,7 +683,6 @@ perf_trace_##call(void *__data, proto) \
> \
> { assign; } \
> \
> - head = this_cpu_ptr(event_call->perf_events); \
> perf_trace_buf_submit(entry, __entry_size, rctx, __addr, \
> __count, &__regs, head, __task); \
> }
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
2013-06-18 20:02 ` Steven Rostedt
@ 2013-06-19 18:12 ` Oleg Nesterov
2013-06-19 18:24 ` Steven Rostedt
0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-06-19 18:12 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra, Frederic Weisbecker, Ingo Molnar,
Masami Hiramatsu, Srikar Dronamraju, zhangwei(Jovi), linux-kernel
On 06/18, Steven Rostedt wrote:
>
> On Tue, 2013-06-18 at 21:22 +0200, Oleg Nesterov wrote:
> > @@ -663,6 +663,12 @@ perf_trace_##call(void *__data, proto) \
> > int rctx; \
> > \
> > __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
> > + \
> > + head = this_cpu_ptr(event_call->perf_events); \
> > + if (__builtin_constant_p(!__task) && !__task && \
>
>
> I'm trying to wrap my head around this:
>
> __builtin_constant_p(!task)
>
> is this the same as:
>
> !__builtin_constant_p(task)
>
> Or is it the same as:
>
> __builtin_constant_p(task)
>
> ?
>
> Because that '!' is confusing the heck out of me.
>
> If !task is a constant, wouldn't task be a constant too, and if task is
> not a constant then I would also assume !task is not a constant as well.
!__task looks more explicit/symmetrical to me. We need
if (is_compile_time_true(!__task)) && list_empty)
return;
is_compile_time_true(cond) could be defined as
__builtin_constant_p(cond) && (cond)
or
__builtin_constant_p(!cond) && (cond)
but the 1ts one looks more clean.
However,
> If this is the case, can we nuke the '!' from the builtin_consant_p().
OK, I do not really mind, will do.
And,
> Or is this your way to confuse me as much as my code has confused
> you? ;-)
Of course! this was the main reason.
Steven, I convinced myself the patch should be correct. If you agree with
this hack:
- anything else I should do apart from the change above?
- should I resend the previous "[PATCH 0/3] tracing: more
list_empty(perf_events) checks" series?
This series depends on "[PATCH 3/3] tracing/perf: Move the
PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()".
Or I can drop this patch if you do not like it and rediff.
Just in case, there are other pending patches in trace_kprobe.c
which I am going to resend, but they are orthogonal.
Oleg.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
2013-06-19 18:12 ` Oleg Nesterov
@ 2013-06-19 18:24 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-19 18:24 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Frederic Weisbecker, Ingo Molnar,
Masami Hiramatsu, Srikar Dronamraju, zhangwei(Jovi), linux-kernel
On Wed, 2013-06-19 at 20:12 +0200, Oleg Nesterov wrote:
> > Or is this your way to confuse me as much as my code has confused
> > you? ;-)
>
> Of course! this was the main reason.
I knew it!
>
>
> Steven, I convinced myself the patch should be correct. If you agree with
> this hack:
>
> - anything else I should do apart from the change above?
>
> - should I resend the previous "[PATCH 0/3] tracing: more
> list_empty(perf_events) checks" series?
>
> This series depends on "[PATCH 3/3] tracing/perf: Move the
> PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()".
>
> Or I can drop this patch if you do not like it and rediff.
>
> Just in case, there are other pending patches in trace_kprobe.c
> which I am going to resend, but they are orthogonal.
I'll pull in the patches and play with them. I'll let you know what I
find.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
@ 2013-07-18 18:30 Oleg Nesterov
2013-07-18 18:30 ` [PATCH 1/3] tracing/perf: expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-07-18 18:30 UTC (permalink / raw)
To: Frederic Weisbecker, Peter Zijlstra, Steven Rostedt
Cc: David Ahern, Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi),
linux-kernel
Hello.
The patches are the same, I only tried to update the changelogs a bit.
I am also quoting my old email below, to explain what this hack tries
to do.
Say, "perf record -e sched:sched_switch -p1".
Every task except /sbin/init will do perf_trace_sched_switch() and
perf_trace_buf_prepare() + perf_trace_buf_submit for no reason(),
it doesn't have a counter.
So it makes sense to add the fast-path check at the start of
perf_trace_##call(),
if (hlist_empty(event_call->perf_events))
return;
The problem is, we should not do this if __task != NULL (iow, if
DECLARE_EVENT_CLASS() uses __perf_task()), perf_tp_event() has the
additional code for this case.
So we should do
if (!__task && hlist_empty(event_call->perf_events))
return;
But __task is changed by "{ assign; }" block right before
perf_trace_buf_submit(). Too late for the fast-path check,
we already called perf_trace_buf_prepare/fetch_regs.
So. After 2/3 __perf_task() (and __perf_count/addr) is called
when ftrace_get_offsets_##call(args) evaluates the arguments,
and we can check !__task && hlist_empty() right after that.
Oleg.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] tracing/perf: expand TRACE_EVENT(sched_stat_runtime)
2013-07-18 18:30 [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
@ 2013-07-18 18:30 ` Oleg Nesterov
2013-07-18 18:30 ` [PATCH 2/3] tracing/perf: reimplement TP_perf_assign() logic Oleg Nesterov
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-07-18 18:30 UTC (permalink / raw)
To: Frederic Weisbecker, Peter Zijlstra, Steven Rostedt
Cc: David Ahern, Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi),
linux-kernel
To simplify the review of the next patches:
1. We are going to reimplent __perf_task/counter and embedd them
into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
different TP_ARGS's.
2. Change perf_trace_##call() macro to do perf_fetch_caller_regs()
right before perf_trace_buf_prepare().
This way it evaluates TP_ARGS() asap, the next patch explores
this fact.
Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
"struct pt_regs *regs", perhaps it makes sense to remove this
argument. And perhaps we can teach perf_trace_buf_submit()
to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
in this case.
3. Cosmetic, but the typecast from "void*" buys nothing. It just
adds the noise, remove it.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/trace/events/sched.h | 6 +++++-
include/trace/ftrace.h | 7 +++----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index e5586ca..249c024 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -372,7 +372,7 @@ DEFINE_EVENT(sched_stat_template, sched_stat_blocked,
* Tracepoint for accounting runtime (time the task is executing
* on a CPU).
*/
-TRACE_EVENT(sched_stat_runtime,
+DECLARE_EVENT_CLASS(sched_stat_runtime,
TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
@@ -401,6 +401,10 @@ TRACE_EVENT(sched_stat_runtime,
(unsigned long long)__entry->vruntime)
);
+DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
+ TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
+ TP_ARGS(tsk, runtime, vruntime));
+
/*
* Tracepoint for showing priority inheritance modifying a tasks
* priority.
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index c162a57..aed594a 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -659,15 +659,14 @@ perf_trace_##call(void *__data, proto) \
int __data_size; \
int rctx; \
\
- perf_fetch_caller_regs(&__regs); \
- \
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
sizeof(u64)); \
__entry_size -= sizeof(u32); \
\
- entry = (struct ftrace_raw_##call *)perf_trace_buf_prepare( \
- __entry_size, event_call->event.type, &__regs, &rctx); \
+ perf_fetch_caller_regs(&__regs); \
+ entry = perf_trace_buf_prepare(__entry_size, \
+ event_call->event.type, &__regs, &rctx); \
if (!entry) \
return; \
\
--
1.5.5.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] tracing/perf: reimplement TP_perf_assign() logic
2013-07-18 18:30 [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
2013-07-18 18:30 ` [PATCH 1/3] tracing/perf: expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
@ 2013-07-18 18:30 ` Oleg Nesterov
2013-07-18 18:30 ` [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
2013-07-19 7:43 ` [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Ingo Molnar
3 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-07-18 18:30 UTC (permalink / raw)
To: Frederic Weisbecker, Peter Zijlstra, Steven Rostedt
Cc: David Ahern, Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi),
linux-kernel
The next patch tries to avoid the costly perf_trace_buf_* calls
when possible but there is a problem. We can only do this if
__task == NULL, perf_tp_event(task != NULL) has the additional
code for this case.
Unfortunately, TP_perf_assign/__perf_xxx which changes the default
values of __count/__task variables for perf_trace_buf_submit() is
called "too late", after we already did perf_trace_buf_prepare(),
and the optimization above can't work.
So this patch simply embeds __perf_xxx() into TP_ARGS(), this way
DECLARE_EVENT_CLASS() can use the result of assignments hidden in
"args" right after ftrace_get_offsets_##call() which is mostly
trivial. This allows us to have the fast-path "__task != NULL"
check at the start, see the next patch.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/trace/events/sched.h | 16 +++-------------
include/trace/ftrace.h | 19 +++++++++++--------
2 files changed, 14 insertions(+), 21 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 249c024..2e7d994 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -57,7 +57,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
TP_PROTO(struct task_struct *p, int success),
- TP_ARGS(p, success),
+ TP_ARGS(__perf_task(p), success),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
@@ -73,9 +73,6 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
__entry->prio = p->prio;
__entry->success = success;
__entry->target_cpu = task_cpu(p);
- )
- TP_perf_assign(
- __perf_task(p);
),
TP_printk("comm=%s pid=%d prio=%d success=%d target_cpu=%03d",
@@ -313,7 +310,7 @@ DECLARE_EVENT_CLASS(sched_stat_template,
TP_PROTO(struct task_struct *tsk, u64 delay),
- TP_ARGS(tsk, delay),
+ TP_ARGS(__perf_task(tsk), __perf_count(delay)),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
@@ -325,10 +322,6 @@ DECLARE_EVENT_CLASS(sched_stat_template,
memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
__entry->pid = tsk->pid;
__entry->delay = delay;
- )
- TP_perf_assign(
- __perf_count(delay);
- __perf_task(tsk);
),
TP_printk("comm=%s pid=%d delay=%Lu [ns]",
@@ -376,7 +369,7 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
- TP_ARGS(tsk, runtime, vruntime),
+ TP_ARGS(tsk, __perf_count(runtime), vruntime),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
@@ -390,9 +383,6 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
__entry->pid = tsk->pid;
__entry->runtime = runtime;
__entry->vruntime = vruntime;
- )
- TP_perf_assign(
- __perf_count(runtime);
),
TP_printk("comm=%s pid=%d runtime=%Lu [ns] vruntime=%Lu [ns]",
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index aed594a..8886877 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -503,8 +503,14 @@ static inline notrace int ftrace_get_offsets_##call( \
#undef TP_fast_assign
#define TP_fast_assign(args...) args
-#undef TP_perf_assign
-#define TP_perf_assign(args...)
+#undef __perf_addr
+#define __perf_addr(a) (a)
+
+#undef __perf_count
+#define __perf_count(c) (c)
+
+#undef __perf_task
+#define __perf_task(t) (t)
#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
@@ -632,16 +638,13 @@ __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call
#define __get_str(field) (char *)__get_dynamic_array(field)
#undef __perf_addr
-#define __perf_addr(a) __addr = (a)
+#define __perf_addr(a) (__addr = (a))
#undef __perf_count
-#define __perf_count(c) __count = (c)
+#define __perf_count(c) (__count = (c))
#undef __perf_task
-#define __perf_task(t) __task = (t)
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...) args
+#define __perf_task(t) (__task = (t))
#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
--
1.5.5.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
2013-07-18 18:30 [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
2013-07-18 18:30 ` [PATCH 1/3] tracing/perf: expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
2013-07-18 18:30 ` [PATCH 2/3] tracing/perf: reimplement TP_perf_assign() logic Oleg Nesterov
@ 2013-07-18 18:30 ` Oleg Nesterov
2013-07-19 7:43 ` [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Ingo Molnar
3 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-07-18 18:30 UTC (permalink / raw)
To: Frederic Weisbecker, Peter Zijlstra, Steven Rostedt
Cc: David Ahern, Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi),
linux-kernel
perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.
This removes the overhead for tasks without events associated
with them. For example, "perf record -e sched:sched_switch -p1"
attaches the counter(s) to the single task, but every task in
system will do perf_trace_buf_prepare/submit() just to realize
that it was not attached to this event.
However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.
With this patch "perf bench sched pipe" shows approximately 4%
improvement when "perf record -p1" runs in parallel, many thanks
to Steven for the testing.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/trace/ftrace.h | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 8886877..04455b8 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -663,6 +663,12 @@ perf_trace_##call(void *__data, proto) \
int rctx; \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+ \
+ head = this_cpu_ptr(event_call->perf_events); \
+ if (__builtin_constant_p(!__task) && !__task && \
+ hlist_empty(head)) \
+ return; \
+ \
__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
sizeof(u64)); \
__entry_size -= sizeof(u32); \
@@ -677,7 +683,6 @@ perf_trace_##call(void *__data, proto) \
\
{ assign; } \
\
- head = this_cpu_ptr(event_call->perf_events); \
perf_trace_buf_submit(entry, __entry_size, rctx, __addr, \
__count, &__regs, head, __task); \
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
2013-07-18 18:30 [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
` (2 preceding siblings ...)
2013-07-18 18:30 ` [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
@ 2013-07-19 7:43 ` Ingo Molnar
2013-07-19 20:27 ` Steven Rostedt
3 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2013-07-19 7:43 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Frederic Weisbecker, Peter Zijlstra, Steven Rostedt, David Ahern,
Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi), linux-kernel
* Oleg Nesterov <oleg@redhat.com> wrote:
> Hello.
>
> The patches are the same, I only tried to update the changelogs a bit.
> I am also quoting my old email below, to explain what this hack tries
> to do.
>
> Say, "perf record -e sched:sched_switch -p1".
>
> Every task except /sbin/init will do perf_trace_sched_switch() and
> perf_trace_buf_prepare() + perf_trace_buf_submit for no reason(),
> it doesn't have a counter.
>
> So it makes sense to add the fast-path check at the start of
> perf_trace_##call(),
>
> if (hlist_empty(event_call->perf_events))
> return;
>
> The problem is, we should not do this if __task != NULL (iow, if
> DECLARE_EVENT_CLASS() uses __perf_task()), perf_tp_event() has the
> additional code for this case.
>
> So we should do
>
> if (!__task && hlist_empty(event_call->perf_events))
> return;
>
> But __task is changed by "{ assign; }" block right before
> perf_trace_buf_submit(). Too late for the fast-path check,
> we already called perf_trace_buf_prepare/fetch_regs.
>
> So. After 2/3 __perf_task() (and __perf_count/addr) is called
> when ftrace_get_offsets_##call(args) evaluates the arguments,
> and we can check !__task && hlist_empty() right after that.
>
> Oleg.
Nice improvement.
Peter, Steve, any objections?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
2013-07-19 7:43 ` [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Ingo Molnar
@ 2013-07-19 20:27 ` Steven Rostedt
2013-07-19 21:00 ` David Ahern
2013-07-20 15:40 ` Oleg Nesterov
0 siblings, 2 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-07-19 20:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: Oleg Nesterov, Frederic Weisbecker, Peter Zijlstra, David Ahern,
Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi), linux-kernel
On Fri, 2013-07-19 at 09:43 +0200, Ingo Molnar wrote:
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Hello.
> >
> > The patches are the same, I only tried to update the changelogs a bit.
> > I am also quoting my old email below, to explain what this hack tries
> > to do.
> >
> > Say, "perf record -e sched:sched_switch -p1".
> >
> > Every task except /sbin/init will do perf_trace_sched_switch() and
> > perf_trace_buf_prepare() + perf_trace_buf_submit for no reason(),
> > it doesn't have a counter.
> >
> > So it makes sense to add the fast-path check at the start of
> > perf_trace_##call(),
> >
> > if (hlist_empty(event_call->perf_events))
> > return;
> >
> > The problem is, we should not do this if __task != NULL (iow, if
> > DECLARE_EVENT_CLASS() uses __perf_task()), perf_tp_event() has the
> > additional code for this case.
> >
> > So we should do
> >
> > if (!__task && hlist_empty(event_call->perf_events))
> > return;
> >
> > But __task is changed by "{ assign; }" block right before
> > perf_trace_buf_submit(). Too late for the fast-path check,
> > we already called perf_trace_buf_prepare/fetch_regs.
> >
> > So. After 2/3 __perf_task() (and __perf_count/addr) is called
> > when ftrace_get_offsets_##call(args) evaluates the arguments,
> > and we can check !__task && hlist_empty() right after that.
> >
> > Oleg.
>
> Nice improvement.
>
> Peter, Steve, any objections?
>
Yep, agreed.
The whole series...
Reviewed-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
2013-07-19 20:27 ` Steven Rostedt
@ 2013-07-19 21:00 ` David Ahern
2013-07-20 15:40 ` Oleg Nesterov
1 sibling, 0 replies; 14+ messages in thread
From: David Ahern @ 2013-07-19 21:00 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Oleg Nesterov, Frederic Weisbecker, Peter Zijlstra,
Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi), linux-kernel
On 7/19/13 2:27 PM, Steven Rostedt wrote:
> On Fri, 2013-07-19 at 09:43 +0200, Ingo Molnar wrote:
>> * Oleg Nesterov <oleg@redhat.com> wrote:
>>
>>> Hello.
>>>
>>> The patches are the same, I only tried to update the changelogs a bit.
>>> I am also quoting my old email below, to explain what this hack tries
>>> to do.
>>>
>>> Say, "perf record -e sched:sched_switch -p1".
>>>
>>> Every task except /sbin/init will do perf_trace_sched_switch() and
>>> perf_trace_buf_prepare() + perf_trace_buf_submit for no reason(),
>>> it doesn't have a counter.
>>>
>>> So it makes sense to add the fast-path check at the start of
>>> perf_trace_##call(),
>>>
>>> if (hlist_empty(event_call->perf_events))
>>> return;
>>>
>>> The problem is, we should not do this if __task != NULL (iow, if
>>> DECLARE_EVENT_CLASS() uses __perf_task()), perf_tp_event() has the
>>> additional code for this case.
>>>
>>> So we should do
>>>
>>> if (!__task && hlist_empty(event_call->perf_events))
>>> return;
>>>
>>> But __task is changed by "{ assign; }" block right before
>>> perf_trace_buf_submit(). Too late for the fast-path check,
>>> we already called perf_trace_buf_prepare/fetch_regs.
>>>
>>> So. After 2/3 __perf_task() (and __perf_count/addr) is called
>>> when ftrace_get_offsets_##call(args) evaluates the arguments,
>>> and we can check !__task && hlist_empty() right after that.
>>>
>>> Oleg.
>>
>> Nice improvement.
>>
>> Peter, Steve, any objections?
>>
>
> Yep, agreed.
>
>
> The whole series...
>
> Reviewed-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
For what it is worth,
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
2013-07-19 20:27 ` Steven Rostedt
2013-07-19 21:00 ` David Ahern
@ 2013-07-20 15:40 ` Oleg Nesterov
2013-07-26 15:20 ` Oleg Nesterov
1 sibling, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2013-07-20 15:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Frederic Weisbecker, Peter Zijlstra, David Ahern,
Ingo Molnar, Masami Hiramatsu, zhangwei(Jovi), linux-kernel
On 07/19, Steven Rostedt wrote:
>
> On Fri, 2013-07-19 at 09:43 +0200, Ingo Molnar wrote:
> >
> > Peter, Steve, any objections?
> >
>
> Yep, agreed.
>
>
> The whole series...
>
> Reviewed-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
Thanks!
But, to avoid the confusion, please do not forget that this series
textually depends on cd92bf61 "Move the PERF_MAX_TRACE_SIZE check
into perf_trace_buf_prepare()" in your tree. So this should be
routed via rostedt/linux-trace as well.
Oleg.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
2013-07-20 15:40 ` Oleg Nesterov
@ 2013-07-26 15:20 ` Oleg Nesterov
0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-07-26 15:20 UTC (permalink / raw)
To: Ingo Molnar, Steven Rostedt
Cc: Frederic Weisbecker, Peter Zijlstra, David Ahern, Ingo Molnar,
Masami Hiramatsu, zhangwei(Jovi), linux-kernel
On 07/20, Oleg Nesterov wrote:
>
> On 07/19, Steven Rostedt wrote:
> >
> > On Fri, 2013-07-19 at 09:43 +0200, Ingo Molnar wrote:
> > >
> > > Peter, Steve, any objections?
> > >
> >
> > Yep, agreed.
> >
> >
> > The whole series...
> >
> > Reviewed-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
>
> Thanks!
>
> But, to avoid the confusion, please do not forget that this series
> textually depends on cd92bf61 "Move the PERF_MAX_TRACE_SIZE check
> into perf_trace_buf_prepare()" in your tree. So this should be
> routed via rostedt/linux-trace as well.
Update. cd92bf61 is already in Linus's tree.
So, Ingo, if you were going to take these patches - please ;)
It seems that everybody agree with this hack. Please tell me
if I should resend this series once again or make a small branch
for git-pull.
Oleg.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
2013-08-05 16:50 [PATCH " Oleg Nesterov
@ 2013-08-05 16:51 ` Oleg Nesterov
0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2013-08-05 16:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, David Ahern,
Masami Hiramatsu, zhangwei(Jovi), linux-kernel
perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.
This removes the overhead for tasks without events associated
with them. For example, "perf record -e sched:sched_switch -p1"
attaches the counter(s) to the single task, but every task in
system will do perf_trace_buf_prepare/submit() just to realize
that it was not attached to this event.
However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.
With this patch "perf bench sched pipe" shows approximately 4%
improvement when "perf record -p1" runs in parallel, many thanks
to Steven for the testing.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Reviewed-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
---
include/trace/ftrace.h | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 4163d93..5c7ab17 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -667,6 +667,12 @@ perf_trace_##call(void *__data, proto) \
int rctx; \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+ \
+ head = this_cpu_ptr(event_call->perf_events); \
+ if (__builtin_constant_p(!__task) && !__task && \
+ hlist_empty(head)) \
+ return; \
+ \
__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
sizeof(u64)); \
__entry_size -= sizeof(u32); \
@@ -681,7 +687,6 @@ perf_trace_##call(void *__data, proto) \
\
{ assign; } \
\
- head = this_cpu_ptr(event_call->perf_events); \
perf_trace_buf_submit(entry, __entry_size, rctx, __addr, \
__count, &__regs, head, __task); \
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-08-05 16:56 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-18 18:30 [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
2013-07-18 18:30 ` [PATCH 1/3] tracing/perf: expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
2013-07-18 18:30 ` [PATCH 2/3] tracing/perf: reimplement TP_perf_assign() logic Oleg Nesterov
2013-07-18 18:30 ` [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
2013-07-19 7:43 ` [PATCH RESEND 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Ingo Molnar
2013-07-19 20:27 ` Steven Rostedt
2013-07-19 21:00 ` David Ahern
2013-07-20 15:40 ` Oleg Nesterov
2013-07-26 15:20 ` Oleg Nesterov
-- strict thread matches above, loose matches on Subject: below --
2013-08-05 16:50 [PATCH " Oleg Nesterov
2013-08-05 16:51 ` [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
2013-06-18 19:21 [PATCH 0/3] tracing/perf: perf_trace_buf/perf_xxx hacks Oleg Nesterov
2013-06-18 19:22 ` [PATCH 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
2013-06-18 20:02 ` Steven Rostedt
2013-06-19 18:12 ` Oleg Nesterov
2013-06-19 18:24 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox