* [RFC] tracepoint/jump_label overhead @ 2011-11-17 3:55 Eric Dumazet 2011-11-17 15:25 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2011-11-17 3:55 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: linux-kernel, Ingo Molnar, Christoph Lameter The general admitted claim of a tracepoint being on x86 a single instruction : jmp +0 Is not always true. For example in mm/slub.c, kmem_cache_alloc() void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_); trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags); return ret; We can check compiler output and see that 4 extra instructions were added because s->objsize & s->size are evaluated. I noticed this in a perf session, because these 4 extra instructions added some noticeable latency/cost. c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx c10e26a7: 85 db test %ebx,%ebx c10e26a9: 75 6d jne c10e2718 (doing the memset()) c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1 c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2 c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3 c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4 c10e26b7: e9 00 00 00 00 jmp c10e26bc c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax c10e26bf: 83 c4 28 add $0x28,%esp c10e26c2: 5b pop %ebx c10e26c3: 5e pop %esi c10e26c4: 5f pop %edi c10e26c5: c9 leave A fix would be to not declare an inline function but a macro... #define trace_kmem_cache_alloc(...) \ if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \ __DO_TRACE(&__tracepoint_kmem_cache_alloc, \ ... Anyone has some clever idea how to make this possible ? Thanks ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] tracepoint/jump_label overhead 2011-11-17 3:55 [RFC] tracepoint/jump_label overhead Eric Dumazet @ 2011-11-17 15:25 ` Peter Zijlstra 2011-11-17 15:38 ` Avi Kivity 2011-11-17 15:56 ` Mathieu Desnoyers 0 siblings, 2 replies; 9+ messages in thread From: Peter Zijlstra @ 2011-11-17 15:25 UTC (permalink / raw) To: Eric Dumazet Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Christoph Lameter, rostedt On Thu, 2011-11-17 at 04:55 +0100, Eric Dumazet wrote: > The general admitted claim of a tracepoint being on x86 a single > instruction : > > jmp +0 > > Is not always true. > > For example in mm/slub.c, kmem_cache_alloc() > > void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_); > trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags); > return ret; > > We can check compiler output and see that 4 extra instructions were > added because s->objsize & s->size are evaluated. > > I noticed this in a perf session, because these 4 extra instructions > added some noticeable latency/cost. > > c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx > c10e26a7: 85 db test %ebx,%ebx > c10e26a9: 75 6d jne c10e2718 (doing the memset()) > c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1 > c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2 > c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3 > c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4 > c10e26b7: e9 00 00 00 00 jmp c10e26bc > c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax > c10e26bf: 83 c4 28 add $0x28,%esp > c10e26c2: 5b pop %ebx > c10e26c3: 5e pop %esi > c10e26c4: 5f pop %edi > c10e26c5: c9 leave > > > A fix would be to not declare an inline function but a macro... > > #define trace_kmem_cache_alloc(...) \ > if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \ > __DO_TRACE(&__tracepoint_kmem_cache_alloc, \ > ... > > Anyone has some clever idea how to make this possible ? Right so you're not really supposed to use arguments that require evaluation in tracepoints, although I bet its common these days :/ The problem here is that its 'hard' to pass s in and have the TP_fast_assign() thing do the dereference because of the sl[auo]b thing. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] tracepoint/jump_label overhead 2011-11-17 15:25 ` Peter Zijlstra @ 2011-11-17 15:38 ` Avi Kivity 2011-11-17 15:56 ` Mathieu Desnoyers 1 sibling, 0 replies; 9+ messages in thread From: Avi Kivity @ 2011-11-17 15:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Eric Dumazet, Mathieu Desnoyers, linux-kernel, Ingo Molnar, Christoph Lameter, rostedt On 11/17/2011 05:25 PM, Peter Zijlstra wrote: > On Thu, 2011-11-17 at 04:55 +0100, Eric Dumazet wrote: > > The general admitted claim of a tracepoint being on x86 a single > > instruction : > > > > jmp +0 > > > > Is not always true. > > > > For example in mm/slub.c, kmem_cache_alloc() > > > > void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_); > > trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags); > > return ret; > > > > We can check compiler output and see that 4 extra instructions were > > added because s->objsize & s->size are evaluated. > > > > I noticed this in a perf session, because these 4 extra instructions > > added some noticeable latency/cost. > > > > c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx > > c10e26a7: 85 db test %ebx,%ebx > > c10e26a9: 75 6d jne c10e2718 (doing the memset()) > > c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1 > > c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2 > > c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3 > > c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4 > > c10e26b7: e9 00 00 00 00 jmp c10e26bc > > c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax > > c10e26bf: 83 c4 28 add $0x28,%esp > > c10e26c2: 5b pop %ebx > > c10e26c3: 5e pop %esi > > c10e26c4: 5f pop %edi > > c10e26c5: c9 leave > > > > > > A fix would be to not declare an inline function but a macro... > > > > #define trace_kmem_cache_alloc(...) \ > > if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \ > > __DO_TRACE(&__tracepoint_kmem_cache_alloc, \ > > ... > > > > Anyone has some clever idea how to make this possible ? You could do that with a code generator, which I'm sure everyone will like. > Right so you're not really supposed to use arguments that require > evaluation in tracepoints, although I bet its common these days :/ > > The problem here is that its 'hard' to pass s in and have the > TP_fast_assign() thing do the dereference because of the sl[auo]b thing. > You could have sl[auo]b define a function or macro which tp_fast_assign then uses to dereference its parameter. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] tracepoint/jump_label overhead 2011-11-17 15:25 ` Peter Zijlstra 2011-11-17 15:38 ` Avi Kivity @ 2011-11-17 15:56 ` Mathieu Desnoyers 2011-11-17 20:50 ` [RFC PATCH] Tracepoint: introduce tracepoint() API Mathieu Desnoyers 1 sibling, 1 reply; 9+ messages in thread From: Mathieu Desnoyers @ 2011-11-17 15:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Eric Dumazet, linux-kernel, Ingo Molnar, Christoph Lameter, rostedt * Peter Zijlstra (peterz@infradead.org) wrote: > On Thu, 2011-11-17 at 04:55 +0100, Eric Dumazet wrote: > > The general admitted claim of a tracepoint being on x86 a single > > instruction : > > > > jmp +0 > > > > Is not always true. > > > > For example in mm/slub.c, kmem_cache_alloc() > > > > void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_); > > trace_kmem_cache_alloc(_RET_IP_, ret, s->objsize, s->size, gfpflags); > > return ret; > > > > We can check compiler output and see that 4 extra instructions were > > added because s->objsize & s->size are evaluated. > > > > I noticed this in a perf session, because these 4 extra instructions > > added some noticeable latency/cost. > > > > c10e26a4: 8b 5d d8 mov -0x28(%ebp),%ebx > > c10e26a7: 85 db test %ebx,%ebx > > c10e26a9: 75 6d jne c10e2718 (doing the memset()) > > c10e26ab: 8b 76 0c mov 0xc(%esi),%esi // extra 1 > > c10e26ae: 8b 5d 04 mov 0x4(%ebp),%ebx // extra 2 > > c10e26b1: 89 75 f0 mov %esi,-0x10(%ebp) // extra 3 > > c10e26b4: 89 5d ec mov %ebx,-0x14(%ebp) // extra 4 > > c10e26b7: e9 00 00 00 00 jmp c10e26bc > > c10e26bc: 8b 45 d8 mov -0x28(%ebp),%eax > > c10e26bf: 83 c4 28 add $0x28,%esp > > c10e26c2: 5b pop %ebx > > c10e26c3: 5e pop %esi > > c10e26c4: 5f pop %edi > > c10e26c5: c9 leave > > > > > > A fix would be to not declare an inline function but a macro... > > > > #define trace_kmem_cache_alloc(...) \ > > if (static_branch(&__tracepoint_kmem_cache_alloc.key)) \ > > __DO_TRACE(&__tracepoint_kmem_cache_alloc, \ > > ... > > > > Anyone has some clever idea how to make this possible ? > > Right so you're not really supposed to use arguments that require > evaluation in tracepoints, although I bet its common these days :/ > > The problem here is that its 'hard' to pass s in and have the > TP_fast_assign() thing do the dereference because of the sl[auo]b thing. I created a variant of tracepoints for userspace in my lttng-ust project where I changed the invokation of a tracepoint to: tracepoint(provider, name, arg1, arg2, arg3, ..., arg10); This allows me to much with a the tracepoint call-site since it becomes a macro. This macro uses the __tracepoint_provider_name() static inline underneath. I just took a go at implementing the behavior proposed by Eric Dumazet in the lttng-ust tree, it seems to work out nicely! Now doing this at kernel-level would imply changing the trace_system_event(..) call-sites into trace(system, event, ...), trace(system_event, ...) or tracepoint(system, event, ...). Is that something you are willing to consider doing ? Diff for lttng-ust: diff --git a/include/lttng/tracepoint.h b/include/lttng/tracepoint.h index 2ca4598..8e3b604 100644 --- a/include/lttng/tracepoint.h +++ b/include/lttng/tracepoint.h @@ -35,8 +35,11 @@ extern "C" { * Tracepoints should be added to the instrumented code using the * "tracepoint()" macro. */ -#define tracepoint(provider, name, args...) \ - __trace_##provider##___##name(args) +#define tracepoint(provider, name, args...) \ + do { \ + if (caa_unlikely(__tracepoint_##provider##___##name.state)) \ + __trace_##provider##___##name(args); \ + } while (0) /* * it_func[0] is never NULL because there is at least one element in the array @@ -148,13 +151,6 @@ extern "C" { #define _TP_ARGS_PROTO_DATA(...) _TP_PROTO_DATA_N(_TP_NARGS(0, ##__VA_ARGS__), ##__VA_ARGS__) #define _TP_ARGS_VARS_DATA(...) _TP_VARS_DATA_N(_TP_NARGS(0, ##__VA_ARGS__), ##__VA_ARGS__) -#define __CHECK_TRACE(provider, name, proto, args) \ - do { \ - if (caa_unlikely(__tracepoint_##provider##___##name.state)) \ - __DO_TRACE(&__tracepoint_##provider##___##name, \ - TP_PARAMS(proto), TP_PARAMS(args)); \ - } while (0) - /* * Make sure the alignment of the structure in the __tracepoints section will * not add unwanted padding between the beginning of the section and the @@ -164,8 +160,8 @@ extern "C" { extern struct tracepoint __tracepoint_##provider##___##name; \ static inline void __trace_##provider##___##name(proto) \ { \ - __CHECK_TRACE(provider, name, TP_PARAMS(data_proto), \ - TP_PARAMS(data_args)); \ + __DO_TRACE(&__tracepoint_##provider##___##name, \ + TP_PARAMS(data_proto), TP_PARAMS(data_args)); \ } \ static inline int \ __register_trace_##provider##___##name(void (*probe)(data_proto), void *data) \ -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH] Tracepoint: introduce tracepoint() API 2011-11-17 15:56 ` Mathieu Desnoyers @ 2011-11-17 20:50 ` Mathieu Desnoyers 2011-11-17 21:06 ` Steven Rostedt 0 siblings, 1 reply; 9+ messages in thread From: Mathieu Desnoyers @ 2011-11-17 20:50 UTC (permalink / raw) To: Eric Dumazet Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Christoph Lameter, rostedt Introduce: tracepoint(event_name, arg1, arg2, ...) while keeping the old tracepoint API in place, e.g.: trace_event_name(arg1, arg2, ...) This allows skipping parameter side-effects (pointer dereference, function calls, ...) when the tracepoint is not dynamically activated. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> --- diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index d530a44..c9c73f7 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, #ifdef CONFIG_TRACEPOINTS +#define tracepoint(name, args...) \ + do { \ + if (static_branch(&__tracepoint_##name.key)) \ + __trace_##name(args); \ + } while (0) + /* * it_func[0] is never NULL because there is at least one element in the array * when the array itself is non NULL. @@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, */ #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ + static inline void __trace_##name(proto) \ + { \ + __DO_TRACE(&__tracepoint_##name, \ + TP_PROTO(data_proto), \ + TP_ARGS(data_args), \ + TP_CONDITION(cond)); \ + } \ static inline void trace_##name(proto) \ { \ if (static_branch(&__tracepoint_##name.key)) \ - __DO_TRACE(&__tracepoint_##name, \ - TP_PROTO(data_proto), \ - TP_ARGS(data_args), \ - TP_CONDITION(cond)); \ + __trace_##name(args); \ } \ static inline int \ register_trace_##name(void (*probe)(data_proto), void *data) \ @@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, EXPORT_SYMBOL(__tracepoint_##name) #else /* !CONFIG_TRACEPOINTS */ + +#define tracepoint(name, args...) __trace_##name(args) + #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ + static inline void __trace_##name(proto) \ + { } \ static inline void trace_##name(proto) \ { } \ static inline int \ -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API 2011-11-17 20:50 ` [RFC PATCH] Tracepoint: introduce tracepoint() API Mathieu Desnoyers @ 2011-11-17 21:06 ` Steven Rostedt 2011-11-17 21:58 ` Jason Baron 2011-11-17 23:25 ` Mathieu Desnoyers 0 siblings, 2 replies; 9+ messages in thread From: Steven Rostedt @ 2011-11-17 21:06 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Eric Dumazet, Peter Zijlstra, linux-kernel, Ingo Molnar, Christoph Lameter On Thu, 2011-11-17 at 15:50 -0500, Mathieu Desnoyers wrote: > Introduce: > > tracepoint(event_name, arg1, arg2, ...) > > while keeping the old tracepoint API in place, e.g.: > > trace_event_name(arg1, arg2, ...) > > This allows skipping parameter side-effects (pointer dereference, > function calls, ...) when the tracepoint is not dynamically activated. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > --- > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index d530a44..c9c73f7 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > > #ifdef CONFIG_TRACEPOINTS > > +#define tracepoint(name, args...) \ > + do { \ > + if (static_branch(&__tracepoint_##name.key)) \ > + __trace_##name(args); \ > + } while (0) > + > /* > * it_func[0] is never NULL because there is at least one element in the array > * when the array itself is non NULL. > @@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > */ > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ > extern struct tracepoint __tracepoint_##name; \ > + static inline void __trace_##name(proto) \ > + { \ > + __DO_TRACE(&__tracepoint_##name, \ > + TP_PROTO(data_proto), \ > + TP_ARGS(data_args), \ > + TP_CONDITION(cond)); \ > + } \ I wrote a patch earlier today that does almost the exact same thing, but I had more in macro part, which I would have cleaned up after the RFC. I didn't add another static inline, but I think this approach is a little cleaner (with the second static inline). I didn't post mine because I was still analyzing the assembly to make sure it did what I expected. But I got side tracked on other things (RT related) and didn't quite finish the analysis. Did you do a compare of kmem_cache_alloc() to see if this fixes the reported problem? -- Steve > static inline void trace_##name(proto) \ > { \ > if (static_branch(&__tracepoint_##name.key)) \ > - __DO_TRACE(&__tracepoint_##name, \ > - TP_PROTO(data_proto), \ > - TP_ARGS(data_args), \ > - TP_CONDITION(cond)); \ > + __trace_##name(args); \ > } \ > static inline int \ > register_trace_##name(void (*probe)(data_proto), void *data) \ > @@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > EXPORT_SYMBOL(__tracepoint_##name) > > #else /* !CONFIG_TRACEPOINTS */ > + > +#define tracepoint(name, args...) __trace_##name(args) > + > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ > + static inline void __trace_##name(proto) \ > + { } \ > static inline void trace_##name(proto) \ > { } \ > static inline int \ > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API 2011-11-17 21:06 ` Steven Rostedt @ 2011-11-17 21:58 ` Jason Baron 2011-11-17 22:27 ` Steven Rostedt 2011-11-17 23:25 ` Mathieu Desnoyers 1 sibling, 1 reply; 9+ messages in thread From: Jason Baron @ 2011-11-17 21:58 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Eric Dumazet, Peter Zijlstra, linux-kernel, Ingo Molnar, Christoph Lameter On Thu, Nov 17, 2011 at 04:06:54PM -0500, Steven Rostedt wrote: > On Thu, 2011-11-17 at 15:50 -0500, Mathieu Desnoyers wrote: > > Introduce: > > > > tracepoint(event_name, arg1, arg2, ...) > > > > while keeping the old tracepoint API in place, e.g.: > > > > trace_event_name(arg1, arg2, ...) > > > > This allows skipping parameter side-effects (pointer dereference, > > function calls, ...) when the tracepoint is not dynamically activated. > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > --- > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > index d530a44..c9c73f7 100644 > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > > > > #ifdef CONFIG_TRACEPOINTS > > > > +#define tracepoint(name, args...) \ > > + do { \ > > + if (static_branch(&__tracepoint_##name.key)) \ > > + __trace_##name(args); \ > > + } while (0) > > + > > /* > > * it_func[0] is never NULL because there is at least one element in the array > > * when the array itself is non NULL. > > @@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > > */ > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ > > extern struct tracepoint __tracepoint_##name; \ > > + static inline void __trace_##name(proto) \ > > + { \ > > + __DO_TRACE(&__tracepoint_##name, \ > > + TP_PROTO(data_proto), \ > > + TP_ARGS(data_args), \ > > + TP_CONDITION(cond)); \ > > + } \ > > I wrote a patch earlier today that does almost the exact same thing, but > I had more in macro part, which I would have cleaned up after the RFC. I > didn't add another static inline, but I think this approach is a little > cleaner (with the second static inline). > > I didn't post mine because I was still analyzing the assembly to make > sure it did what I expected. But I got side tracked on other things (RT > related) and didn't quite finish the analysis. > > Did you do a compare of kmem_cache_alloc() to see if this fixes the > reported problem? > > -- Steve > > > static inline void trace_##name(proto) \ > > { \ > > if (static_branch(&__tracepoint_##name.key)) \ > > - __DO_TRACE(&__tracepoint_##name, \ > > - TP_PROTO(data_proto), \ > > - TP_ARGS(data_args), \ > > - TP_CONDITION(cond)); \ > > + __trace_##name(args); \ > > } \ > > static inline int \ > > register_trace_##name(void (*probe)(data_proto), void *data) \ > > @@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > > EXPORT_SYMBOL(__tracepoint_##name) > > > > #else /* !CONFIG_TRACEPOINTS */ > > + > > +#define tracepoint(name, args...) __trace_##name(args) > > + > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ > > + static inline void __trace_##name(proto) \ > > + { } \ > > static inline void trace_##name(proto) \ > > { } \ > > static inline int \ > > > I brought this issue up a while back, with a very similar patch to what Mathieu wrote. Please see: http://thread.gmane.org/gmane.linux.kernel/838911 One conclusion from that thread was that it would be nice if gcc could optimize the load so that it only happens in the unlikely path. I filed a gcc bug for that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40207, but it doesn't seem to be implemented yet... Thanks, -Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API 2011-11-17 21:58 ` Jason Baron @ 2011-11-17 22:27 ` Steven Rostedt 0 siblings, 0 replies; 9+ messages in thread From: Steven Rostedt @ 2011-11-17 22:27 UTC (permalink / raw) To: Jason Baron Cc: Mathieu Desnoyers, Eric Dumazet, Peter Zijlstra, linux-kernel, Ingo Molnar, Christoph Lameter On Thu, 2011-11-17 at 16:58 -0500, Jason Baron wrote: > I brought this issue up a while back, with a very similar patch to what > Mathieu wrote. Please see: > > http://thread.gmane.org/gmane.linux.kernel/838911 Ah, I remember this now. Mathieu's version is a bit nicer. And we should only do this for those places that are the problem cases. Basically only the places that dereference structures, or have other indirections. > > One conclusion from that thread was that it would be nice if gcc could > optimize the load so that it only happens in the unlikely path. I filed > a gcc bug for that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40207, > but it doesn't seem to be implemented yet... I agree that this is a compiler issue, but this has been two years and no progress. Lets go with Mathieu's solution, and we should add a Link: tag to Jason's patches as well, with a side note that talks about Jason's previous changes with it. -- Steve ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] Tracepoint: introduce tracepoint() API 2011-11-17 21:06 ` Steven Rostedt 2011-11-17 21:58 ` Jason Baron @ 2011-11-17 23:25 ` Mathieu Desnoyers 1 sibling, 0 replies; 9+ messages in thread From: Mathieu Desnoyers @ 2011-11-17 23:25 UTC (permalink / raw) To: Steven Rostedt Cc: Eric Dumazet, Peter Zijlstra, linux-kernel, Ingo Molnar, Christoph Lameter * Steven Rostedt (rostedt@goodmis.org) wrote: > On Thu, 2011-11-17 at 15:50 -0500, Mathieu Desnoyers wrote: > > Introduce: > > > > tracepoint(event_name, arg1, arg2, ...) > > > > while keeping the old tracepoint API in place, e.g.: > > > > trace_event_name(arg1, arg2, ...) > > > > This allows skipping parameter side-effects (pointer dereference, > > function calls, ...) when the tracepoint is not dynamically activated. > > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > --- > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > index d530a44..c9c73f7 100644 > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -107,6 +107,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > > > > #ifdef CONFIG_TRACEPOINTS > > > > +#define tracepoint(name, args...) \ > > + do { \ > > + if (static_branch(&__tracepoint_##name.key)) \ > > + __trace_##name(args); \ > > + } while (0) > > + > > /* > > * it_func[0] is never NULL because there is at least one element in the array > > * when the array itself is non NULL. > > @@ -144,13 +150,17 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > > */ > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ > > extern struct tracepoint __tracepoint_##name; \ > > + static inline void __trace_##name(proto) \ > > + { \ > > + __DO_TRACE(&__tracepoint_##name, \ > > + TP_PROTO(data_proto), \ > > + TP_ARGS(data_args), \ > > + TP_CONDITION(cond)); \ > > + } \ > > I wrote a patch earlier today that does almost the exact same thing, but > I had more in macro part, which I would have cleaned up after the RFC. I > didn't add another static inline, but I think this approach is a little > cleaner (with the second static inline). I'm glad you like it :) > > I didn't post mine because I was still analyzing the assembly to make > sure it did what I expected. But I got side tracked on other things (RT > related) and didn't quite finish the analysis. > > Did you do a compare of kmem_cache_alloc() to see if this fixes the > reported problem? I did not test it against this specific reported case, but I had this exact same kind of issue a while back when moving from Kernel Markers (which were macros) to the Tracepoint static inlines. Macros were letting the compiler optimize away the side-effects, which was not possible with static inlines only. Eric, does it work better for you with this patch and by using tracepoint() instead of trace_...() ? Thanks, Mathieu > > -- Steve > > > static inline void trace_##name(proto) \ > > { \ > > if (static_branch(&__tracepoint_##name.key)) \ > > - __DO_TRACE(&__tracepoint_##name, \ > > - TP_PROTO(data_proto), \ > > - TP_ARGS(data_args), \ > > - TP_CONDITION(cond)); \ > > + __trace_##name(args); \ > > } \ > > static inline int \ > > register_trace_##name(void (*probe)(data_proto), void *data) \ > > @@ -193,7 +203,12 @@ void tracepoint_update_probe_range(struct tracepoint * const *begin, > > EXPORT_SYMBOL(__tracepoint_##name) > > > > #else /* !CONFIG_TRACEPOINTS */ > > + > > +#define tracepoint(name, args...) __trace_##name(args) > > + > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ > > + static inline void __trace_##name(proto) \ > > + { } \ > > static inline void trace_##name(proto) \ > > { } \ > > static inline int \ > > > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-11-17 23:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-17 3:55 [RFC] tracepoint/jump_label overhead Eric Dumazet 2011-11-17 15:25 ` Peter Zijlstra 2011-11-17 15:38 ` Avi Kivity 2011-11-17 15:56 ` Mathieu Desnoyers 2011-11-17 20:50 ` [RFC PATCH] Tracepoint: introduce tracepoint() API Mathieu Desnoyers 2011-11-17 21:06 ` Steven Rostedt 2011-11-17 21:58 ` Jason Baron 2011-11-17 22:27 ` Steven Rostedt 2011-11-17 23:25 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).