* [RFC] tracepoint/jump_label overhead
@ 2011-11-17 3:55 Eric Dumazet
2011-11-17 15:25 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2011-11-17 3:55 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: linux-kernel, Ingo Molnar, Christoph Lameter
The general admitted claim of a tracepoint being on x86 a single
instruction :
jmp +0
Is not always true.
For example in mm/slub.c, kmem_cache_alloc()
void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_);
trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags);
return ret;
We can check compiler output and see that 4 extra instructions were
added because s->objsize & s->size are evaluated.
I noticed this in a perf session, because these 4 extra instructions
added some noticeable latency/cost.
c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx
c10e26a7: 85 db test %ebx,%ebx
c10e26a9: 75 6d jne c10e2718 (doing the memset())
c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1
c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2
c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3
c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4
c10e26b7: e9 00 00 00 00 jmp c10e26bc
c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax
c10e26bf: 83 c4 28 add $0x28,%esp
c10e26c2: 5b pop %ebx
c10e26c3: 5e pop %esi
c10e26c4: 5f pop %edi
c10e26c5: c9 leave
A fix would be to not declare an inline function but a macro...
#define trace_kmem_cache_alloc(...) \
if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \
__DO_TRACE(&__tracepoint_kmem_cache_alloc, \
...
Anyone has some clever idea how to make this possible ?
Thanks
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] tracepoint/jump_label overhead
2011-11-17 3:55 [RFC] tracepoint/jump_label overhead Eric Dumazet
@ 2011-11-17 15:25 ` Peter Zijlstra
2011-11-17 15:38 ` Avi Kivity
2011-11-17 15:56 ` Mathieu Desnoyers
0 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2011-11-17 15:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Christoph Lameter,
rostedt
On Thu, 2011-11-17 at 04:55 +0100, Eric Dumazet wrote:
> The general admitted claim of a tracepoint being on x86 a single
> instruction :
>
> jmp +0
>
> Is not always true.
>
> For example in mm/slub.c, kmem_cache_alloc()
>
> void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_);
> trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags);
> return ret;
>
> We can check compiler output and see that 4 extra instructions were
> added because s->objsize & s->size are evaluated.
>
> I noticed this in a perf session, because these 4 extra instructions
> added some noticeable latency/cost.
>
> c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx
> c10e26a7: 85 db test %ebx,%ebx
> c10e26a9: 75 6d jne c10e2718 (doing the memset())
> c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1
> c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2
> c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3
> c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4
> c10e26b7: e9 00 00 00 00 jmp c10e26bc
> c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax
> c10e26bf: 83 c4 28 add $0x28,%esp
> c10e26c2: 5b pop %ebx
> c10e26c3: 5e pop %esi
> c10e26c4: 5f pop %edi
> c10e26c5: c9 leave
>
>
> A fix would be to not declare an inline function but a macro...
>
> #define trace_kmem_cache_alloc(...) \
> if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \
> __DO_TRACE(&__tracepoint_kmem_cache_alloc, \
> ...
>
> Anyone has some clever idea how to make this possible ?
Right so you're not really supposed to use arguments that require
evaluation in tracepoints, although I bet its common these days :/
The problem here is that its 'hard' to pass s in and have the
TP_fast_assign() thing do the dereference because of the sl[auo]b thing.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] tracepoint/jump_label overhead
2011-11-17 15:25 ` Peter Zijlstra
@ 2011-11-17 15:38 ` Avi Kivity
2011-11-17 15:56 ` Mathieu Desnoyers
1 sibling, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2011-11-17 15:38 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Eric Dumazet, Mathieu Desnoyers, linux-kernel, Ingo Molnar,
Christoph Lameter, rostedt
On 11/17/2011 05:25 PM, Peter Zijlstra wrote:
> On Thu, 2011-11-17 at 04:55 +0100, Eric Dumazet wrote:
> > The general admitted claim of a tracepoint being on x86 a single
> > instruction :
> >
> > jmp +0
> >
> > Is not always true.
> >
> > For example in mm/slub.c, kmem_cache_alloc()
> >
> > void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_);
> > trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags);
> > return ret;
> >
> > We can check compiler output and see that 4 extra instructions were
> > added because s->objsize & s->size are evaluated.
> >
> > I noticed this in a perf session, because these 4 extra instructions
> > added some noticeable latency/cost.
> >
> > c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx
> > c10e26a7: 85 db test %ebx,%ebx
> > c10e26a9: 75 6d jne c10e2718 (doing the memset())
> > c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1
> > c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2
> > c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3
> > c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4
> > c10e26b7: e9 00 00 00 00 jmp c10e26bc
> > c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax
> > c10e26bf: 83 c4 28 add $0x28,%esp
> > c10e26c2: 5b pop %ebx
> > c10e26c3: 5e pop %esi
> > c10e26c4: 5f pop %edi
> > c10e26c5: c9 leave
> >
> >
> > A fix would be to not declare an inline function but a macro...
> >
> > #define trace_kmem_cache_alloc(...) \
> > if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \
> > __DO_TRACE(&__tracepoint_kmem_cache_alloc, \
> > ...
> >
> > Anyone has some clever idea how to make this possible ?
You could do that with a code generator, which I'm sure everyone will like.
> Right so you're not really supposed to use arguments that require
> evaluation in tracepoints, although I bet its common these days :/
>
> The problem here is that its 'hard' to pass s in and have the
> TP_fast_assign() thing do the dereference because of the sl[auo]b thing.
>
You could have sl[auo]b define a function or macro which tp_fast_assign
then uses to dereference its parameter.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] tracepoint/jump_label overhead
2011-11-17 15:25 ` Peter Zijlstra
2011-11-17 15:38 ` Avi Kivity
@ 2011-11-17 15:56 ` Mathieu Desnoyers
2011-11-17 20:50 ` [RFC PATCH] Tracepoint: introduce tracepoint() API Mathieu Desnoyers
1 sibling, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2011-11-17 15:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Eric Dumazet, linux-kernel, Ingo Molnar, Christoph Lameter,
rostedt
* Peter Zijlstra (peterz@infradead.org) wrote:
> On Thu, 2011-11-17 at 04:55 +0100, Eric Dumazet wrote:
> > The general admitted claim of a tracepoint being on x86 a single
> > instruction :
> >
> > jmp +0
> >
> > Is not always true.
> >
> > For example in mm/slub.c, kmem_cache_alloc()
> >
> > void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_);
> > trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags);
> > return ret;
> >
> > We can check compiler output and see that 4 extra instructions were
> > added because s->objsize & s->size are evaluated.
> >
> > I noticed this in a perf session, because these 4 extra instructions
> > added some noticeable latency/cost.
> >
> > c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx
> > c10e26a7: 85 db test %ebx,%ebx
> > c10e26a9: 75 6d jne c10e2718 (doing the memset())
> > c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1
> > c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2
> > c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3
> > c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4
> > c10e26b7: e9 00 00 00 00 jmp c10e26bc
> > c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax
> > c10e26bf: 83 c4 28 add $0x28,%esp
> > c10e26c2: 5b pop %ebx
> > c10e26c3: 5e pop %esi
> > c10e26c4: 5f pop %edi
> > c10e26c5: c9 leave
> >
> >
> > A fix would be to not declare an inline function but a macro...
> >
> > #define trace_kmem_cache_alloc(...) \
> > if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \
> > __DO_TRACE(&__tracepoint_kmem_cache_alloc, \
> > ...
> >
> > Anyone has some clever idea how to make this possible ?
>
> Right so you're not really supposed to use arguments that require
> evaluation in tracepoints, although I bet its common these days :/
>
> The problem here is that its 'hard' to pass s in and have the
> TP_fast_assign() thing do the dereference because of the sl[auo]b thing.
I created a variant of tracepoints for userspace in my lttng-ust project
where I changed the invokation of a tracepoint to:
tracepoint(provider, name, arg1, arg2, arg3, ..., arg10);
This allows me to much with a the tracepoint call-site since it becomes
a macro. This macro uses the __tracepoint_provider_name() static inline
underneath.
I just took a go at implementing the behavior proposed by Eric Dumazet
in the lttng-ust tree, it seems to work out nicely! Now doing this at
kernel-level would imply changing the trace_system_event(..) call-sites
into trace(system, event, ...), trace(system_event, ...) or
tracepoint(system, event, ...). Is that something you are willing to
consider doing ?
Diff for lttng-ust:
diff --git a/include/lttng/tracepoint.h b/include/lttng/tracepoint.h
index 2ca4598..8e3b604 100644
--- a/include/lttng/tracepoint.h
+++ b/include/lttng/tracepoint.h
@@ -35,8 +35,11 @@ extern "C" {
* Tracepoints should be added to the instrumented code using the
* "tracepoint()" macro.
*/
-#define tracepoint(provider, name, args...) \
- __trace_##provider##___##name(args)
+#define tracepoint(provider, name, args...) \
+ do { \
+ if (caa_unlikely(__tracepoint_##provider##___##name.state)) \
+ __trace_##provider##___##name(args); \
+ } while (0)
/*
* it_func[0] is never NULL because there is at least one element in the array
@@ -148,13 +151,6 @@ extern "C" {
#define _TP_ARGS_PROTO_DATA(...) _TP_PROTO_DATA_N(_TP_NARGS(0, ##__VA_ARGS__), ##__VA_ARGS__)
#define _TP_ARGS_VARS_DATA(...) _TP_VARS_DATA_N(_TP_NARGS(0, ##__VA_ARGS__), ##__VA_ARGS__)
-#define __CHECK_TRACE(provider, name, proto, args) \
- do { \
- if (caa_unlikely(__tracepoint_##provider##___##name.state)) \
- __DO_TRACE(&__tracepoint_##provider##___##name, \
- TP_PARAMS(proto), TP_PARAMS(args)); \
- } while (0)
-
/*
* Make sure the alignment of the structure in the __tracepoints section will
* not add unwanted padding between the beginning of the section and the
@@ -164,8 +160,8 @@ extern "C" {
extern struct tracepoint __tracepoint_##provider##___##name; \
static inline void __trace_##provider##___##name(proto) \
{ \
- __CHECK_TRACE(provider, name, TP_PARAMS(data_proto), \
- TP_PARAMS(data_args)); \
+ __DO_TRACE(&__tracepoint_##provider##___##name, \
+ TP_PARAMS(data_proto), TP_PARAMS(data_args)); \
} \
static inline int \
__register_trace_##provider##___##name(void (*probe)(data_proto), void *data) \
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH] Tracepoint: introduce tracepoint() API
2011-11-17 15:56 ` Mathieu Desnoyers
@ 2011-11-17 20:50 ` Mathieu Desnoyers
2011-11-17 21:06 ` Steven Rostedt
0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Desnoyers @ 2011-11-17 20:50 UTC (permalink / raw)
To: Eric Dumazet
Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Christoph Lameter,
rostedt
Introduce:
tracepoint(event_name, arg1, arg2, ...)
while keeping the old tracepoint API in place, e.g.:
trace_event_name(arg1, arg2, ...)
This allows skipping parameter side-effects (pointer dereference,
function calls, ...) when the tracepoint is not dynamically activated.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index d530a44..c9c73f7 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
#ifdef CONFIG_TRACEPOINTS
+#define tracepoint(name, args...) \
+ do { \
+ if (static_branch(&__tracepoint_##name.key)) \
+ __trace_##name(args); \
+ } while (0)
+
/*
* it_func[0] is never NULL because there is at least one element in the array
* when the array itself is non NULL.
@@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
*/
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
extern struct tracepoint __tracepoint_##name; \
+ static inline void __trace_##name(proto) \
+ { \
+ __DO_TRACE(&__tracepoint_##name, \
+ TP_PROTO(data_proto), \
+ TP_ARGS(data_args), \
+ TP_CONDITION(cond)); \
+ } \
static inline void trace_##name(proto) \
{ \
if (static_branch(&__tracepoint_##name.key)) \
- __DO_TRACE(&__tracepoint_##name, \
- TP_PROTO(data_proto), \
- TP_ARGS(data_args), \
- TP_CONDITION(cond)); \
+ __trace_##name(args); \
} \
static inline int \
register_trace_##name(void (*probe)(data_proto), void *data) \
@@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
EXPORT_SYMBOL(__tracepoint_##name)
#else /* !CONFIG_TRACEPOINTS */
+
+#define tracepoint(name, args...) __trace_##name(args)
+
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
+ static inline void __trace_##name(proto) \
+ { } \
static inline void trace_##name(proto) \
{ } \
static inline int \
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API
2011-11-17 20:50 ` [RFC PATCH] Tracepoint: introduce tracepoint() API Mathieu Desnoyers
@ 2011-11-17 21:06 ` Steven Rostedt
2011-11-17 21:58 ` Jason Baron
2011-11-17 23:25 ` Mathieu Desnoyers
0 siblings, 2 replies; 9+ messages in thread
From: Steven Rostedt @ 2011-11-17 21:06 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Eric Dumazet, Peter Zijlstra, linux-kernel, Ingo Molnar,
Christoph Lameter
On Thu, 2011-11-17 at 15:50 -0500, Mathieu Desnoyers wrote:
> Introduce:
>
> tracepoint(event_name, arg1, arg2, ...)
>
> while keeping the old tracepoint API in place, e.g.:
>
> trace_event_name(arg1, arg2, ...)
>
> This allows skipping parameter side-effects (pointer dereference,
> function calls, ...) when the tracepoint is not dynamically activated.
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index d530a44..c9c73f7 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
>
> #ifdef CONFIG_TRACEPOINTS
>
> +#define tracepoint(name, args...) \
> + do { \
> + if (static_branch(&__tracepoint_##name.key)) \
> + __trace_##name(args); \
> + } while (0)
> +
> /*
> * it_func[0] is never NULL because there is at least one element in the array
> * when the array itself is non NULL.
> @@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> */
> #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> extern struct tracepoint __tracepoint_##name; \
> + static inline void __trace_##name(proto) \
> + { \
> + __DO_TRACE(&__tracepoint_##name, \
> + TP_PROTO(data_proto), \
> + TP_ARGS(data_args), \
> + TP_CONDITION(cond)); \
> + } \
I wrote a patch earlier today that does almost the exact same thing, but
I had more in macro part, which I would have cleaned up after the RFC. I
didn't add another static inline, but I think this approach is a little
cleaner (with the second static inline).
I didn't post mine because I was still analyzing the assembly to make
sure it did what I expected. But I got side tracked on other things (RT
related) and didn't quite finish the analysis.
Did you do a compare of kmem_cache_alloc() to see if this fixes the
reported problem?
-- Steve
> static inline void trace_##name(proto) \
> { \
> if (static_branch(&__tracepoint_##name.key)) \
> - __DO_TRACE(&__tracepoint_##name, \
> - TP_PROTO(data_proto), \
> - TP_ARGS(data_args), \
> - TP_CONDITION(cond)); \
> + __trace_##name(args); \
> } \
> static inline int \
> register_trace_##name(void (*probe)(data_proto), void *data) \
> @@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> EXPORT_SYMBOL(__tracepoint_##name)
>
> #else /* !CONFIG_TRACEPOINTS */
> +
> +#define tracepoint(name, args...) __trace_##name(args)
> +
> #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> + static inline void __trace_##name(proto) \
> + { } \
> static inline void trace_##name(proto) \
> { } \
> static inline int \
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API
2011-11-17 21:06 ` Steven Rostedt
@ 2011-11-17 21:58 ` Jason Baron
2011-11-17 22:27 ` Steven Rostedt
2011-11-17 23:25 ` Mathieu Desnoyers
1 sibling, 1 reply; 9+ messages in thread
From: Jason Baron @ 2011-11-17 21:58 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, Eric Dumazet, Peter Zijlstra, linux-kernel,
Ingo Molnar, Christoph Lameter
On Thu, Nov 17, 2011 at 04:06:54PM -0500, Steven Rostedt wrote:
> On Thu, 2011-11-17 at 15:50 -0500, Mathieu Desnoyers wrote:
> > Introduce:
> >
> > tracepoint(event_name, arg1, arg2, ...)
> >
> > while keeping the old tracepoint API in place, e.g.:
> >
> > trace_event_name(arg1, arg2, ...)
> >
> > This allows skipping parameter side-effects (pointer dereference,
> > function calls, ...) when the tracepoint is not dynamically activated.
> >
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > ---
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index d530a44..c9c73f7 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> >
> > #ifdef CONFIG_TRACEPOINTS
> >
> > +#define tracepoint(name, args...) \
> > + do { \
> > + if (static_branch(&__tracepoint_##name.key)) \
> > + __trace_##name(args); \
> > + } while (0)
> > +
> > /*
> > * it_func[0] is never NULL because there is at least one element in the array
> > * when the array itself is non NULL.
> > @@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> > */
> > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > extern struct tracepoint __tracepoint_##name; \
> > + static inline void __trace_##name(proto) \
> > + { \
> > + __DO_TRACE(&__tracepoint_##name, \
> > + TP_PROTO(data_proto), \
> > + TP_ARGS(data_args), \
> > + TP_CONDITION(cond)); \
> > + } \
>
> I wrote a patch earlier today that does almost the exact same thing, but
> I had more in macro part, which I would have cleaned up after the RFC. I
> didn't add another static inline, but I think this approach is a little
> cleaner (with the second static inline).
>
> I didn't post mine because I was still analyzing the assembly to make
> sure it did what I expected. But I got side tracked on other things (RT
> related) and didn't quite finish the analysis.
>
> Did you do a compare of kmem_cache_alloc() to see if this fixes the
> reported problem?
>
> -- Steve
>
> > static inline void trace_##name(proto) \
> > { \
> > if (static_branch(&__tracepoint_##name.key)) \
> > - __DO_TRACE(&__tracepoint_##name, \
> > - TP_PROTO(data_proto), \
> > - TP_ARGS(data_args), \
> > - TP_CONDITION(cond)); \
> > + __trace_##name(args); \
> > } \
> > static inline int \
> > register_trace_##name(void (*probe)(data_proto), void *data) \
> > @@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> > EXPORT_SYMBOL(__tracepoint_##name)
> >
> > #else /* !CONFIG_TRACEPOINTS */
> > +
> > +#define tracepoint(name, args...) __trace_##name(args)
> > +
> > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > + static inline void __trace_##name(proto) \
> > + { } \
> > static inline void trace_##name(proto) \
> > { } \
> > static inline int \
> >
>
I brought this issue up a while back, with a very similar patch to what
Mathieu wrote. Please see:
http://thread.gmane.org/gmane.linux.kernel/838911
One conclusion from that thread was that it would be nice if gcc could
optimize the load so that it only happens in the unlikely path. I filed
a gcc bug for that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40207,
but it doesn't seem to be implemented yet...
Thanks,
-Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API
2011-11-17 21:58 ` Jason Baron
@ 2011-11-17 22:27 ` Steven Rostedt
0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2011-11-17 22:27 UTC (permalink / raw)
To: Jason Baron
Cc: Mathieu Desnoyers, Eric Dumazet, Peter Zijlstra, linux-kernel,
Ingo Molnar, Christoph Lameter
On Thu, 2011-11-17 at 16:58 -0500, Jason Baron wrote:
> I brought this issue up a while back, with a very similar patch to what
> Mathieu wrote. Please see:
>
> http://thread.gmane.org/gmane.linux.kernel/838911
Ah, I remember this now. Mathieu's version is a bit nicer. And we should
only do this for those places that are the problem cases. Basically only
the places that dereference structures, or have other indirections.
>
> One conclusion from that thread was that it would be nice if gcc could
> optimize the load so that it only happens in the unlikely path. I filed
> a gcc bug for that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40207,
> but it doesn't seem to be implemented yet...
I agree that this is a compiler issue, but this has been two years and
no progress. Lets go with Mathieu's solution, and we should add a Link:
tag to Jason's patches as well, with a side note that talks about
Jason's previous changes with it.
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API
2011-11-17 21:06 ` Steven Rostedt
2011-11-17 21:58 ` Jason Baron
@ 2011-11-17 23:25 ` Mathieu Desnoyers
1 sibling, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2011-11-17 23:25 UTC (permalink / raw)
To: Steven Rostedt
Cc: Eric Dumazet, Peter Zijlstra, linux-kernel, Ingo Molnar,
Christoph Lameter
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Thu, 2011-11-17 at 15:50 -0500, Mathieu Desnoyers wrote:
> > Introduce:
> >
> > tracepoint(event_name, arg1, arg2, ...)
> >
> > while keeping the old tracepoint API in place, e.g.:
> >
> > trace_event_name(arg1, arg2, ...)
> >
> > This allows skipping parameter side-effects (pointer dereference,
> > function calls, ...) when the tracepoint is not dynamically activated.
> >
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > ---
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index d530a44..c9c73f7 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> >
> > #ifdef CONFIG_TRACEPOINTS
> >
> > +#define tracepoint(name, args...) \
> > + do { \
> > + if (static_branch(&__tracepoint_##name.key)) \
> > + __trace_##name(args); \
> > + } while (0)
> > +
> > /*
> > * it_func[0] is never NULL because there is at least one element in the array
> > * when the array itself is non NULL.
> > @@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> > */
> > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > extern struct tracepoint __tracepoint_##name; \
> > + static inline void __trace_##name(proto) \
> > + { \
> > + __DO_TRACE(&__tracepoint_##name, \
> > + TP_PROTO(data_proto), \
> > + TP_ARGS(data_args), \
> > + TP_CONDITION(cond)); \
> > + } \
>
> I wrote a patch earlier today that does almost the exact same thing, but
> I had more in macro part, which I would have cleaned up after the RFC. I
> didn't add another static inline, but I think this approach is a little
> cleaner (with the second static inline).
I'm glad you like it :)
>
> I didn't post mine because I was still analyzing the assembly to make
> sure it did what I expected. But I got side tracked on other things (RT
> related) and didn't quite finish the analysis.
>
> Did you do a compare of kmem_cache_alloc() to see if this fixes the
> reported problem?
I did not test it against this specific reported case, but I had this
exact same kind of issue a while back when moving from Kernel Markers
(which were macros) to the Tracepoint static inlines. Macros were
letting the compiler optimize away the side-effects, which was not
possible with static inlines only.
Eric, does it work better for you with this patch and by using
tracepoint() instead of trace_...() ?
Thanks,
Mathieu
>
> -- Steve
>
> > static inline void trace_##name(proto) \
> > { \
> > if (static_branch(&__tracepoint_##name.key)) \
> > - __DO_TRACE(&__tracepoint_##name, \
> > - TP_PROTO(data_proto), \
> > - TP_ARGS(data_args), \
> > - TP_CONDITION(cond)); \
> > + __trace_##name(args); \
> > } \
> > static inline int \
> > register_trace_##name(void (*probe)(data_proto), void *data) \
> > @@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin,
> > EXPORT_SYMBOL(__tracepoint_##name)
> >
> > #else /* !CONFIG_TRACEPOINTS */
> > +
> > +#define tracepoint(name, args...) __trace_##name(args)
> > +
> > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > + static inline void __trace_##name(proto) \
> > + { } \
> > static inline void trace_##name(proto) \
> > { } \
> > static inline int \
> >
>
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-11-17 23:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-17 3:55 [RFC] tracepoint/jump_label overhead Eric Dumazet
2011-11-17 15:25 ` Peter Zijlstra
2011-11-17 15:38 ` Avi Kivity
2011-11-17 15:56 ` Mathieu Desnoyers
2011-11-17 20:50 ` [RFC PATCH] Tracepoint: introduce tracepoint() API Mathieu Desnoyers
2011-11-17 21:06 ` Steven Rostedt
2011-11-17 21:58 ` Jason Baron
2011-11-17 22:27 ` Steven Rostedt
2011-11-17 23:25 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).