* Kernel marker has no performance impact on ia64.
@ 2008-06-02 22:12 Hideo AOKI
2008-06-02 22:32 ` Peter Zijlstra
0 siblings, 1 reply; 30+ messages in thread
From: Hideo AOKI @ 2008-06-02 22:12 UTC (permalink / raw)
To: mingo, mathieu.desnoyers, peterz
Cc: Masami Hiramatsu, linux-kernel, Hideo AOKI
Hello,
I evaluated overhead of kernel marker using linux-2.6-sched-fixes
git tree, which includes several markers for LTTng, using an ia64
server.
While the immediate trace mark feature isn't implemented on ia64,
there is no major performance regression. So, I think that we
don't have any issues to propose merging marker point patches
into Linus's tree from the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled
without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the
average and difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance
regression wasn't found in any case. Even if number of processes
increases, differences between marker-enabled kernel and marker-
disabled kernel doesn't increase. Moreover, if number of CPUs
increases, the differences doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled
kernel in more than half cases, although I guess it comes from
the difference of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Best regards,
Hideo
P.S. When I compiled the linux-2.6-sched-fixes tree on ia64, I
had to revert the following git commit since pteval_t is defined
on x86 only.
commit 8686f2b37e7394b51dd6593678cbfd85ecd28c65
Date: Tue May 6 15:42:40 2008 -0700
generic, x86, PAT: fix mprotect
--
Hitachi Computer Products (America) Inc.
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: Kernel marker has no performance impact on ia64. 2008-06-02 22:12 Kernel marker has no performance impact on ia64 Hideo AOKI @ 2008-06-02 22:32 ` Peter Zijlstra 2008-06-02 23:21 ` Mathieu Desnoyers 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2008-06-02 22:32 UTC (permalink / raw) To: Hideo AOKI; +Cc: mingo, mathieu.desnoyers, Masami Hiramatsu, linux-kernel On Mon, 2008-06-02 at 18:12 -0400, Hideo AOKI wrote: > Hello, > > I evaluated overhead of kernel marker using linux-2.6-sched-fixes > git tree, which includes several markers for LTTng, using an ia64 > server. > > While the immediate trace mark feature isn't implemented on ia64, > there is no major performance regression. So, I think that we > don't have any issues to propose merging marker point patches > into Linus's tree from the viewpoint of performance impact. Performance is atm the least of the concerns regarding this work. I'm still convinced markers are too ugly to live. I also worry greatly about the fact that its too easy to expose too much to user-space. There are no clear rules and the free form marker format just begs for an inconsistent mess to arise. IMHO the current free-form trace_mark() should be removed from the tree - its great for ad-hoc debugging but its a disaster waiting to happen for anything else. Anybody doing ad-hoc debugging can patch it in themselves if needed. Regular trace points can be custom made; this has the advantages that it raises the implementation barrier and hopefully that encourages some thought in the process. It also avoid the code from growing into something that looks like someone had a long night of debugging. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-02 22:32 ` Peter Zijlstra @ 2008-06-02 23:21 ` Mathieu Desnoyers 2008-06-03 6:07 ` Takashi Nishiie ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Mathieu Desnoyers @ 2008-06-02 23:21 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel * Peter Zijlstra (peterz@infradead.org) wrote: > On Mon, 2008-06-02 at 18:12 -0400, Hideo AOKI wrote: > > Hello, > > > > I evaluated overhead of kernel marker using linux-2.6-sched-fixes > > git tree, which includes several markers for LTTng, using an ia64 > > server. > > > > While the immediate trace mark feature isn't implemented on ia64, > > there is no major performance regression. So, I think that we > > don't have any issues to propose merging marker point patches > > into Linus's tree from the viewpoint of performance impact. > > Performance is atm the least of the concerns regarding this work. > > I'm still convinced markers are too ugly to live. > > I also worry greatly about the fact that its too easy to expose too much > to user-space. There are no clear rules and the free form marker format > just begs for an inconsistent mess to arise. > > IMHO the current free-form trace_mark() should be removed from the tree > - its great for ad-hoc debugging but its a disaster waiting to happen > for anything else. Anybody doing ad-hoc debugging can patch it in > themselves if needed. > > Regular trace points can be custom made; this has the advantages that it > raises the implementation barrier and hopefully that encourages some > thought in the process. It also avoid the code from growing into > something that looks like someone had a long night of debugging. > Maybe we could settle for an intermediate solution : I agree with you that defining the trace points in headers, like you did for the scheduler, makes the code much cleaner and makes things much easier to maintain afterward. However, having the trace_mark mechanism underneath helps a lot in plugging a generic tracer (actually, if we can settle the marker issue, I've got a kernel tracer, LTTng, that I've been waiting for quite a while to push to mainline that I would like to post someday). So I would be in favor of requiring tracing statements to be described in static inline functions, in header files, that could preferably call trace_mark() and optionally also call other in-kernel tracers directly. Ideally, we could re-use the immediate values infrastructure to control activation of these trace points with minimal impact on the system. One of my goal is to provide a mechanism that can feed both non-debug and debug information to a generic tracing mechanism to allow system-wide analysis of the kernel, both for production system and kernel debugging. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Kernel marker has no performance impact on ia64. 2008-06-02 23:21 ` Mathieu Desnoyers @ 2008-06-03 6:07 ` Takashi Nishiie 2008-06-04 4:58 ` Masami Hiramatsu 2008-06-04 22:27 ` Peter Zijlstra 2 siblings, 0 replies; 30+ messages in thread From: Takashi Nishiie @ 2008-06-03 6:07 UTC (permalink / raw) To: 'Mathieu Desnoyers', 'Peter Zijlstra' Cc: 'Hideo AOKI', mingo, 'Masami Hiramatsu', linux-kernel Hello. > One of my goal is to provide a mechanism that can feed both non-debug > and debug information to a generic tracing mechanism to allow > system-wide analysis of the kernel, both for production system and > kernel debugging. I agree with the target. I understand there is a person who is the neat paranoiac that it doesn't want to leave the code for debugging for a final code. If it is a situation in which the code for debugging without united interfaces lies scattered, the opinion might be correct. It is an effective method only to set up the marker if the code for debugging without united interfaces can be appropriately set up with the marker. I think that the mechanism that can offer both non-debugging and debugging information to the generic pursuit mechanism to permit the analysis of the entire kernel system in the meaning of obtaining dynamic information for a long time not obtained by a static necropsy by kdump is indispensable in a mission-critical system. Regards. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-02 23:21 ` Mathieu Desnoyers 2008-06-03 6:07 ` Takashi Nishiie @ 2008-06-04 4:58 ` Masami Hiramatsu 2008-06-04 23:26 ` Mathieu Desnoyers 2008-06-04 22:27 ` Peter Zijlstra 2 siblings, 1 reply; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-04 4:58 UTC (permalink / raw) To: Mathieu Desnoyers, Peter Zijlstra; +Cc: Hideo AOKI, mingo, linux-kernel Hi Peter and Mathieu, Thank you for your comments. Mathieu Desnoyers wrote: > * Peter Zijlstra (peterz@infradead.org) wrote: >> On Mon, 2008-06-02 at 18:12 -0400, Hideo AOKI wrote: >>> Hello, >>> >>> I evaluated overhead of kernel marker using linux-2.6-sched-fixes >>> git tree, which includes several markers for LTTng, using an ia64 >>> server. >>> >>> While the immediate trace mark feature isn't implemented on ia64, >>> there is no major performance regression. So, I think that we >>> don't have any issues to propose merging marker point patches >>> into Linus's tree from the viewpoint of performance impact. >> Performance is atm the least of the concerns regarding this work. >> >> I'm still convinced markers are too ugly to live. >> >> I also worry greatly about the fact that its too easy to expose too much >> to user-space. There are no clear rules and the free form marker format >> just begs for an inconsistent mess to arise. Sure, I think we should review each point carefully and should make clear rules what is acceptable or not and why. >> >> IMHO the current free-form trace_mark() should be removed from the tree >> - its great for ad-hoc debugging but its a disaster waiting to happen >> for anything else. Anybody doing ad-hoc debugging can patch it in >> themselves if needed. >> >> Regular trace points can be custom made; this has the advantages that it >> raises the implementation barrier and hopefully that encourages some >> thought in the process. It also avoid the code from growing into >> something that looks like someone had a long night of debugging. >> > > Maybe we could settle for an intermediate solution : I agree with you > that defining the trace points in headers, like you did for the > scheduler, makes the code much cleaner and makes things much easier to > maintain afterward. However, having the trace_mark mechanism underneath > helps a lot in plugging a generic tracer (actually, if we can settle the > marker issue, I've got a kernel tracer, LTTng, that I've been waiting > for quite a while to push to mainline that I would like to post someday). That's good to me. BTW, I'd like to know your plan, would those static inline functions be defined in new headers or marker.h(or other existing headers)? Regards, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-04 4:58 ` Masami Hiramatsu @ 2008-06-04 23:26 ` Mathieu Desnoyers 2008-06-04 23:40 ` Masami Hiramatsu 0 siblings, 1 reply; 30+ messages in thread From: Mathieu Desnoyers @ 2008-06-04 23:26 UTC (permalink / raw) To: Masami Hiramatsu; +Cc: Peter Zijlstra, Hideo AOKI, mingo, linux-kernel * Masami Hiramatsu (mhiramat@redhat.com) wrote: > Hi Peter and Mathieu, > > Thank you for your comments. > > Mathieu Desnoyers wrote: > > * Peter Zijlstra (peterz@infradead.org) wrote: > >> On Mon, 2008-06-02 at 18:12 -0400, Hideo AOKI wrote: > >>> Hello, > >>> > >>> I evaluated overhead of kernel marker using linux-2.6-sched-fixes > >>> git tree, which includes several markers for LTTng, using an ia64 > >>> server. > >>> > >>> While the immediate trace mark feature isn't implemented on ia64, > >>> there is no major performance regression. So, I think that we > >>> don't have any issues to propose merging marker point patches > >>> into Linus's tree from the viewpoint of performance impact. > >> Performance is atm the least of the concerns regarding this work. > >> > >> I'm still convinced markers are too ugly to live. > >> > >> I also worry greatly about the fact that its too easy to expose too much > >> to user-space. There are no clear rules and the free form marker format > >> just begs for an inconsistent mess to arise. > > Sure, I think we should review each point carefully > and should make clear rules what is acceptable or not and why. > > >> > >> IMHO the current free-form trace_mark() should be removed from the tree > >> - its great for ad-hoc debugging but its a disaster waiting to happen > >> for anything else. Anybody doing ad-hoc debugging can patch it in > >> themselves if needed. > >> > >> Regular trace points can be custom made; this has the advantages that it > >> raises the implementation barrier and hopefully that encourages some > >> thought in the process. It also avoid the code from growing into > >> something that looks like someone had a long night of debugging. > >> > > > > Maybe we could settle for an intermediate solution : I agree with you > > that defining the trace points in headers, like you did for the > > scheduler, makes the code much cleaner and makes things much easier to > > maintain afterward. However, having the trace_mark mechanism underneath > > helps a lot in plugging a generic tracer (actually, if we can settle the > > marker issue, I've got a kernel tracer, LTTng, that I've been waiting > > for quite a while to push to mainline that I would like to post someday). > > That's good to me. > BTW, I'd like to know your plan, would those static inline functions be > defined in new headers or marker.h(or other existing headers)? > Hi Masami, What do you think of kernel/sched-trace.h for the scheduler as proposed by Peter ? Having these headers close to the c file instrumentation they deal with seems to scale maintenance better. Placing all these in one big kernel header included everywhere would require to recompile the whole kernel when the header is touched, which is, I guess, not what we want. Mathieu > Regards, > > -- > Masami Hiramatsu > > Software Engineer > Hitachi Computer Products (America) Inc. > Software Solutions Division > > e-mail: mhiramat@redhat.com > > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-04 23:26 ` Mathieu Desnoyers @ 2008-06-04 23:40 ` Masami Hiramatsu 0 siblings, 0 replies; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-04 23:40 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Peter Zijlstra, Hideo AOKI, mingo, linux-kernel Hi Mathieu, Mathieu Desnoyers wrote: >>> Maybe we could settle for an intermediate solution : I agree with you >>> that defining the trace points in headers, like you did for the >>> scheduler, makes the code much cleaner and makes things much easier to >>> maintain afterward. However, having the trace_mark mechanism underneath >>> helps a lot in plugging a generic tracer (actually, if we can settle the >>> marker issue, I've got a kernel tracer, LTTng, that I've been waiting >>> for quite a while to push to mainline that I would like to post someday). >> That's good to me. >> BTW, I'd like to know your plan, would those static inline functions be >> defined in new headers or marker.h(or other existing headers)? >> > > Hi Masami, > > What do you think of kernel/sched-trace.h for the scheduler as proposed > by Peter ? Having these headers close to the c file instrumentation they > deal with seems to scale maintenance better. Placing all these in one > big kernel header included everywhere would require to recompile the > whole kernel when the header is touched, which is, I guess, not what we > want. I agree with you, one big kernel header is hard to maintain, especially by patches :-) Thanks, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-02 23:21 ` Mathieu Desnoyers 2008-06-03 6:07 ` Takashi Nishiie 2008-06-04 4:58 ` Masami Hiramatsu @ 2008-06-04 22:27 ` Peter Zijlstra 2008-06-04 23:22 ` Mathieu Desnoyers 2 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2008-06-04 22:27 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel On Mon, 2008-06-02 at 19:21 -0400, Mathieu Desnoyers wrote: > * Peter Zijlstra (peterz@infradead.org) wrote: > > On Mon, 2008-06-02 at 18:12 -0400, Hideo AOKI wrote: > > > Hello, > > > > > > I evaluated overhead of kernel marker using linux-2.6-sched-fixes > > > git tree, which includes several markers for LTTng, using an ia64 > > > server. > > > > > > While the immediate trace mark feature isn't implemented on ia64, > > > there is no major performance regression. So, I think that we > > > don't have any issues to propose merging marker point patches > > > into Linus's tree from the viewpoint of performance impact. > > > > Performance is atm the least of the concerns regarding this work. > > > > I'm still convinced markers are too ugly to live. > > > > I also worry greatly about the fact that its too easy to expose too much > > to user-space. There are no clear rules and the free form marker format > > just begs for an inconsistent mess to arise. > > > > IMHO the current free-form trace_mark() should be removed from the tree > > - its great for ad-hoc debugging but its a disaster waiting to happen > > for anything else. Anybody doing ad-hoc debugging can patch it in > > themselves if needed. > > > > Regular trace points can be custom made; this has the advantages that it > > raises the implementation barrier and hopefully that encourages some > > thought in the process. It also avoid the code from growing into > > something that looks like someone had a long night of debugging. > > > > Maybe we could settle for an intermediate solution : I agree with you > that defining the trace points in headers, like you did for the > scheduler, makes the code much cleaner and makes things much easier to > maintain afterward. However, having the trace_mark mechanism underneath > helps a lot in plugging a generic tracer (actually, if we can settle the > marker issue, I've got a kernel tracer, LTTng, that I've been waiting > for quite a while to push to mainline that I would like to post someday). > > So I would be in favor of requiring tracing statements to be described > in static inline functions, in header files, that could preferably call > trace_mark() and optionally also call other in-kernel tracers directly. > > Ideally, we could re-use the immediate values infrastructure to control > activation of these trace points with minimal impact on the system. > > One of my goal is to provide a mechanism that can feed both non-debug > and debug information to a generic tracing mechanism to allow > system-wide analysis of the kernel, both for production system and > kernel debugging. So are you proposing something like: static inline void trace_sched_switch(struct task_struct *prev, struct task_struct *next) { trace_mark(sched_switch, prev, next); } dropping the silly fmt string but using the multiplex of trace_mark, and then doing the stringify bit: "prev_pid %d next_pid %d prev_state %ld\n" in the actual tracer? IMHO the 'type safety' of the fmt string is over-rated, since it cannot distinguish between a task_struct * or a bio *, both are a pointers - and half arsed type safely is worse than no type safety. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-04 22:27 ` Peter Zijlstra @ 2008-06-04 23:22 ` Mathieu Desnoyers 2008-06-05 8:12 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Mathieu Desnoyers @ 2008-06-04 23:22 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel * Peter Zijlstra (peterz@infradead.org) wrote: > On Mon, 2008-06-02 at 19:21 -0400, Mathieu Desnoyers wrote: > > * Peter Zijlstra (peterz@infradead.org) wrote: > > > On Mon, 2008-06-02 at 18:12 -0400, Hideo AOKI wrote: > > > > Hello, > > > > > > > > I evaluated overhead of kernel marker using linux-2.6-sched-fixes > > > > git tree, which includes several markers for LTTng, using an ia64 > > > > server. > > > > > > > > While the immediate trace mark feature isn't implemented on ia64, > > > > there is no major performance regression. So, I think that we > > > > don't have any issues to propose merging marker point patches > > > > into Linus's tree from the viewpoint of performance impact. > > > > > > Performance is atm the least of the concerns regarding this work. > > > > > > I'm still convinced markers are too ugly to live. > > > > > > I also worry greatly about the fact that its too easy to expose too much > > > to user-space. There are no clear rules and the free form marker format > > > just begs for an inconsistent mess to arise. > > > > > > IMHO the current free-form trace_mark() should be removed from the tree > > > - its great for ad-hoc debugging but its a disaster waiting to happen > > > for anything else. Anybody doing ad-hoc debugging can patch it in > > > themselves if needed. > > > > > > Regular trace points can be custom made; this has the advantages that it > > > raises the implementation barrier and hopefully that encourages some > > > thought in the process. It also avoid the code from growing into > > > something that looks like someone had a long night of debugging. > > > > > > > Maybe we could settle for an intermediate solution : I agree with you > > that defining the trace points in headers, like you did for the > > scheduler, makes the code much cleaner and makes things much easier to > > maintain afterward. However, having the trace_mark mechanism underneath > > helps a lot in plugging a generic tracer (actually, if we can settle the > > marker issue, I've got a kernel tracer, LTTng, that I've been waiting > > for quite a while to push to mainline that I would like to post someday). > > > > So I would be in favor of requiring tracing statements to be described > > in static inline functions, in header files, that could preferably call > > trace_mark() and optionally also call other in-kernel tracers directly. > > > > Ideally, we could re-use the immediate values infrastructure to control > > activation of these trace points with minimal impact on the system. > > > > One of my goal is to provide a mechanism that can feed both non-debug > > and debug information to a generic tracing mechanism to allow > > system-wide analysis of the kernel, both for production system and > > kernel debugging. > > So are you proposing something like: > > static inline void > trace_sched_switch(struct task_struct *prev, struct task_struct *next) > { > trace_mark(sched_switch, prev, next); > } > Not exactly. Something more along the lines of static inline void trace_sched_switch(struct task_struct *prev, struct task_struct *next) { /* Internal tracers. */ ftrace_sched_switch(prev, next); othertracer_sched_switch(prev, next); /* * System-wide tracing. Useful information is exported here. * Probes connecting to these markers are expected to only use the * information provided to them for data collection purpose. Type * casting pointers is discouraged. */ trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld", prev->pid, next->pid, prev->state); } > dropping the silly fmt string but using the multiplex of trace_mark, and > then doing the stringify bit: > > "prev_pid %d next_pid %d prev_state %ld\n" > > in the actual tracer? > It would make much more sense to put this formatting information along with the trace point (e.g. in a a kernel/sched-trace.h header) rather that to hide it in a tracer (loadable module) because this information is an interface to the trace point. > > IMHO the 'type safety' of the fmt string is over-rated, since it cannot > distinguish between a task_struct * or a bio *, both are a pointers - > and half arsed type safely is worse than no type safety. > I totally agree with you that not having the capacity to inspect pointer types is a problem for tracers which wants to receive the "raw" pointer and deal with the data they need like big boys. On the other hand, it requires them to be closely tied to the kernel internals and therefore it makes sense to call them directly from the tracing site, thus bypassing the marker format string. However, letting the marker specify the data format so a tracer could format it into a memory buffer (in a binary or text format, depending on the implementation) or so that a tool like systemtap can use this identified information without having to be closely tied to the kernel makes sense to me. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-04 23:22 ` Mathieu Desnoyers @ 2008-06-05 8:12 ` Peter Zijlstra 2008-06-05 14:28 ` Masami Hiramatsu 2008-06-12 13:53 ` Mathieu Desnoyers 0 siblings, 2 replies; 30+ messages in thread From: Peter Zijlstra @ 2008-06-05 8:12 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt On Wed, 2008-06-04 at 19:22 -0400, Mathieu Desnoyers wrote: > * Peter Zijlstra (peterz@infradead.org) wrote: > > So are you proposing something like: > > > > static inline void > > trace_sched_switch(struct task_struct *prev, struct task_struct *next) > > { > > trace_mark(sched_switch, prev, next); > > } > > > > Not exactly. Something more along the lines of > > static inline void > trace_sched_switch(struct task_struct *prev, struct task_struct *next) > { > /* Internal tracers. */ > ftrace_sched_switch(prev, next); > othertracer_sched_switch(prev, next); > /* > * System-wide tracing. Useful information is exported here. > * Probes connecting to these markers are expected to only use the > * information provided to them for data collection purpose. Type > * casting pointers is discouraged. > */ > trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld", > prev->pid, next->pid, prev->state); > } Advantage of my method would be that ftrace (and othertracer) can use the same marker and doesn't need yet another hoook. > > dropping the silly fmt string but using the multiplex of trace_mark, and > > then doing the stringify bit: > > > > "prev_pid %d next_pid %d prev_state %ld\n" > > > > in the actual tracer? > > > > It would make much more sense to put this formatting information along > with the trace point (e.g. in a a kernel/sched-trace.h header) rather > that to hide it in a tracer (loadable module) because this information > is an interface to the trace point. I'm not sure - it seems to me it should be part of the tracer because its a detail/subset of the actual data - rendering it useless for others who'd like a different set. > > IMHO the 'type safety' of the fmt string is over-rated, since it cannot > > distinguish between a task_struct * or a bio *, both are a pointers - > > and half arsed type safely is worse than no type safety. > > > > I totally agree with you that not having the capacity to inspect pointer > types is a problem for tracers which wants to receive the "raw" pointer > and deal with the data they need like big boys. On the other hand, it > requires them to be closely tied to the kernel internals and therefore > it makes sense to call them directly from the tracing site, thus > bypassing the marker format string. > > However, letting the marker specify the data format so a tracer could > format it into a memory buffer (in a binary or text format, depending on > the implementation) or so that a tool like systemtap can use this > identified information without having to be closely tied to the kernel > makes sense to me. So s-tap is meant to parse this sting and interpret the varargs without being closely tied to the kernel? - Somehow that doesn't make me feel warm and fuzzy. That not only ties userspace to the information present in the marker, but to the actual string as well. The stronger you make this bind the less I like it. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-05 8:12 ` Peter Zijlstra @ 2008-06-05 14:28 ` Masami Hiramatsu 2008-06-12 14:04 ` Mathieu Desnoyers 2008-06-12 13:53 ` Mathieu Desnoyers 1 sibling, 1 reply; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-05 14:28 UTC (permalink / raw) To: Peter Zijlstra, Mathieu Desnoyers Cc: Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi Peter and Mathieu, Peter Zijlstra wrote: > On Wed, 2008-06-04 at 19:22 -0400, Mathieu Desnoyers wrote: >> * Peter Zijlstra (peterz@infradead.org) wrote: > >>> So are you proposing something like: >>> >>> static inline void >>> trace_sched_switch(struct task_struct *prev, struct task_struct *next) >>> { >>> trace_mark(sched_switch, prev, next); >>> } >>> >> Not exactly. Something more along the lines of >> >> static inline void >> trace_sched_switch(struct task_struct *prev, struct task_struct *next) >> { >> /* Internal tracers. */ >> ftrace_sched_switch(prev, next); >> othertracer_sched_switch(prev, next); >> /* >> * System-wide tracing. Useful information is exported here. >> * Probes connecting to these markers are expected to only use the >> * information provided to them for data collection purpose. Type >> * casting pointers is discouraged. >> */ >> trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld", >> prev->pid, next->pid, prev->state); >> } > > Advantage of my method would be that ftrace (and othertracer) can use > the same marker and doesn't need yet another hoook. If so, I'd like to suggest below changes, - introduce below macro in marker.h #define DEFINE_TRACE(name, vargs, args...) \ static inline void trace_##name vargs \ { \ trace_mark(name, #vargs, ##args); \ } - remove __marker_check_format from __trace_mark - and write a definition in sched_trace.h DEFINE_TRACE(sched_switch, (struct task_struct *prev, struct task_struct *next), prev, next); Thus, we can remove fmt string and also ensure the type checking, because; - Type checking at the trace point is done by the compiler. - Type checking of probe handler can be done by comparing #vargs strings. Thanks, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-05 14:28 ` Masami Hiramatsu @ 2008-06-12 14:04 ` Mathieu Desnoyers 2008-06-12 15:31 ` Masami Hiramatsu 0 siblings, 1 reply; 30+ messages in thread From: Mathieu Desnoyers @ 2008-06-12 14:04 UTC (permalink / raw) To: Masami Hiramatsu Cc: Peter Zijlstra, Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi Masami, * Masami Hiramatsu (mhiramat@redhat.com) wrote: > Hi Peter and Mathieu, > > Peter Zijlstra wrote: > > On Wed, 2008-06-04 at 19:22 -0400, Mathieu Desnoyers wrote: > >> * Peter Zijlstra (peterz@infradead.org) wrote: > > > >>> So are you proposing something like: > >>> > >>> static inline void > >>> trace_sched_switch(struct task_struct *prev, struct task_struct *next) > >>> { > >>> trace_mark(sched_switch, prev, next); > >>> } > >>> > >> Not exactly. Something more along the lines of > >> > >> static inline void > >> trace_sched_switch(struct task_struct *prev, struct task_struct *next) > >> { > >> /* Internal tracers. */ > >> ftrace_sched_switch(prev, next); > >> othertracer_sched_switch(prev, next); > >> /* > >> * System-wide tracing. Useful information is exported here. > >> * Probes connecting to these markers are expected to only use the > >> * information provided to them for data collection purpose. Type > >> * casting pointers is discouraged. > >> */ > >> trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld", > >> prev->pid, next->pid, prev->state); > >> } > > > > Advantage of my method would be that ftrace (and othertracer) can use > > the same marker and doesn't need yet another hoook. > > If so, I'd like to suggest below changes, > > - introduce below macro in marker.h > > #define DEFINE_TRACE(name, vargs, args...) \ > static inline void trace_##name vargs \ > { \ > trace_mark(name, #vargs, ##args); \ > } > > - remove __marker_check_format from __trace_mark > - and write a definition in sched_trace.h > > DEFINE_TRACE(sched_switch, (struct task_struct *prev, struct task_struct *next), > prev, next); > > Thus, we can remove fmt string and also ensure the type checking, because; > - Type checking at the trace point is done by the compiler. > - Type checking of probe handler can be done by comparing #vargs strings. > Hrm, interesting! The only problem I see with this is that it won't allow a tracer to efficiently parse the "format information". Parsing C code is not as straightforward and compact as parsing a format string. However, Peter and you are about to convince me that an hybrid between the solution you propose here and the marker scheme could be used. Your scheme could be used to declare the markers and probes (DEFINE_TRACE()) in header files. It would declare a static inline function called at the instrumentation site and a probe prototype that could then be used in the probe module to declare the probe that will have to be connected to the marker. This part would allow custom-made probes. Within the tracer, we would declare custom-made probes for each of these DEFINE_TRACE statements and associate them with format strings. Because the probe has to match the prototype, type correctness is ensured. The format strings would at that point be the exact same as the current trace_mark() statements. The information passed to trace_mark() would be send for direct interpretation or serialization with only basic types available, similar to printk(). We sould leave the trace_mark() statements available for people who want to add their own debug-style instrumentation to their kernel without having to add DEFINE_TRACE() statements and modify the tracer accordingly. I guess a bit of polishing will come with the implementation, which I plan to start soon. Thanks! Mathieu > Thanks, > > -- > Masami Hiramatsu > > Software Engineer > Hitachi Computer Products (America) Inc. > Software Solutions Division > > e-mail: mhiramat@redhat.com > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 14:04 ` Mathieu Desnoyers @ 2008-06-12 15:31 ` Masami Hiramatsu 0 siblings, 0 replies; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-12 15:31 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Peter Zijlstra, Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi Mathieu, Mathieu Desnoyers wrote: >> If so, I'd like to suggest below changes, >> >> - introduce below macro in marker.h >> >> #define DEFINE_TRACE(name, vargs, args...) \ >> static inline void trace_##name vargs \ >> { \ >> trace_mark(name, #vargs, ##args); \ >> } >> >> - remove __marker_check_format from __trace_mark >> - and write a definition in sched_trace.h >> >> DEFINE_TRACE(sched_switch, (struct task_struct *prev, struct task_struct *next), >> prev, next); >> >> Thus, we can remove fmt string and also ensure the type checking, because; >> - Type checking at the trace point is done by the compiler. >> - Type checking of probe handler can be done by comparing #vargs strings. >> > > Hrm, interesting! The only problem I see with this is that it won't > allow a tracer to efficiently parse the "format information". Parsing C > code is not as straightforward and compact as parsing a format string. Sure, Parsing C code is not a good idea. I think each tracer can have a lookup table to get a printf-style format corresponding to each "regular" marking point. Maintaining this lookup table is not so hard, because these "regular" marking points should be enough stable. > However, Peter and you are about to convince me that an hybrid between > the solution you propose here and the marker scheme could be used. > > Your scheme could be used to declare the markers and probes > (DEFINE_TRACE()) in header files. It would declare a static inline > function called at the instrumentation site and a probe prototype > that could then be used in the probe module to declare the probe that > will have to be connected to the marker. This part would allow > custom-made probes. > > Within the tracer, we would declare custom-made probes for each of these > DEFINE_TRACE statements and associate them with format strings. Because > the probe has to match the prototype, type correctness is ensured. The > format strings would at that point be the exact same as the current > trace_mark() statements. The information passed to trace_mark() would be > send for direct interpretation or serialization with only basic types > available, similar to printk(). If the tracer including systemtap introduces above the lookup table, that can translate "name(arguments)" to "format" easily, and can continue to use its format string parser. > We sould leave the trace_mark() statements available for people who want > to add their own debug-style instrumentation to their kernel without > having to add DEFINE_TRACE() statements and modify the tracer > accordingly. I agree, trace_mark() still useful for "non-regular" markers temporarily inserted to the developing code by individual developers. > I guess a bit of polishing will come with the implementation, which I > plan to start soon. > > Thanks! > > Mathieu > Thank you! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-05 8:12 ` Peter Zijlstra 2008-06-05 14:28 ` Masami Hiramatsu @ 2008-06-12 13:53 ` Mathieu Desnoyers 2008-06-12 14:27 ` Peter Zijlstra 1 sibling, 1 reply; 30+ messages in thread From: Mathieu Desnoyers @ 2008-06-12 13:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt, Frank Ch. Eigler Hi Peter, * Peter Zijlstra (peterz@infradead.org) wrote: > On Wed, 2008-06-04 at 19:22 -0400, Mathieu Desnoyers wrote: > > * Peter Zijlstra (peterz@infradead.org) wrote: > > > > So are you proposing something like: > > > > > > static inline void > > > trace_sched_switch(struct task_struct *prev, struct task_struct *next) > > > { > > > trace_mark(sched_switch, prev, next); > > > } > > > > > > > Not exactly. Something more along the lines of > > > > static inline void > > trace_sched_switch(struct task_struct *prev, struct task_struct *next) > > { > > /* Internal tracers. */ > > ftrace_sched_switch(prev, next); > > othertracer_sched_switch(prev, next); > > /* > > * System-wide tracing. Useful information is exported here. > > * Probes connecting to these markers are expected to only use the > > * information provided to them for data collection purpose. Type > > * casting pointers is discouraged. > > */ > > trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld", > > prev->pid, next->pid, prev->state); > > } > > Advantage of my method would be that ftrace (and othertracer) can use > the same marker and doesn't need yet another hoook. > Am I correct by saying that the method you propose completely removes type checking between the instrumentation site and what the probes expect ? If yes, this seems to be too fragile. Every time a marker would change, one would have to audit _every_ probes, both in-kernel and in modules. Adding type checking to the marker infrastructure makes automatic detection of these changes possible. > > > dropping the silly fmt string but using the multiplex of trace_mark, and > > > then doing the stringify bit: > > > > > > "prev_pid %d next_pid %d prev_state %ld\n" > > > > > > in the actual tracer? > > > > > > > It would make much more sense to put this formatting information along > > with the trace point (e.g. in a a kernel/sched-trace.h header) rather > > that to hide it in a tracer (loadable module) because this information > > is an interface to the trace point. > > I'm not sure - it seems to me it should be part of the tracer because > its a detail/subset of the actual data - rendering it useless for others > who'd like a different set. > If it ends up elsewhere, then we have to ensure type correctness in some way. > > > IMHO the 'type safety' of the fmt string is over-rated, since it cannot > > > distinguish between a task_struct * or a bio *, both are a pointers - > > > and half arsed type safely is worse than no type safety. > > > > > > > I totally agree with you that not having the capacity to inspect pointer > > types is a problem for tracers which wants to receive the "raw" pointer > > and deal with the data they need like big boys. On the other hand, it > > requires them to be closely tied to the kernel internals and therefore > > it makes sense to call them directly from the tracing site, thus > > bypassing the marker format string. > > > > However, letting the marker specify the data format so a tracer could > > format it into a memory buffer (in a binary or text format, depending on > > the implementation) or so that a tool like systemtap can use this > > identified information without having to be closely tied to the kernel > > makes sense to me. > > So s-tap is meant to parse this sting and interpret the varargs without > being closely tied to the kernel? - Somehow that doesn't make me feel > warm and fuzzy. That not only ties userspace to the information present > in the marker, but to the actual string as well. > > The stronger you make this bind the less I like it. > Well, the string contains each field name and type. Therefore, SystemTAP can hook on a marker and parse the string looking for some elements by passing a NULL format string upon probe registration. Alternatively, it can provide the exact format string expected when it registers its probe to the marker and a check will be done to verify that the format string passed along with the registered probe matches the marker format string. Also, about what you said earlier in this thread : "Regular trace points can be custom made; this has the advantages that it raises the implementation barrier and hopefully that encourages some thought in the process. It also avoid the code from growing into something that looks like someone had a long night of debugging." Before it has been moved to the markers, LTTng was once designed with custom-made code to save the trace information through custom hooks. To help maintainers instrument their own subsystem and do the right choice without being a tracing expert, we created a code generator which generated this custom code for each trace point given a description of the trace points. It turned out that keeping this duplicate list of trace points was cumbersome and that the generated code did eat a lot of instruction cache. This is why to turned to markers, so we could re-use a common infrastructure to serialize the data into trace buffers. We turned to the marker format string to allow the types to serialize to be parsed efficiently by the tracer. I strongly recommend not to declare the types associated with a kernel trace point in two unrelated locations without type checking in-between them (e.g. trace_mark in kernel code, string in the tracer module), because it would then become harder to track consistency when the code changes. However, I would not be against an hybrid of Masami's proposal and current markers, which I will propose in reply to his email. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 13:53 ` Mathieu Desnoyers @ 2008-06-12 14:27 ` Peter Zijlstra 2008-06-12 15:53 ` Frank Ch. Eigler 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2008-06-12 14:27 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt, Frank Ch. Eigler On Thu, 2008-06-12 at 09:53 -0400, Mathieu Desnoyers wrote: > Hi Peter, > > * Peter Zijlstra (peterz@infradead.org) wrote: > > On Wed, 2008-06-04 at 19:22 -0400, Mathieu Desnoyers wrote: > > > * Peter Zijlstra (peterz@infradead.org) wrote: > > > > > > So are you proposing something like: > > > > > > > > static inline void > > > > trace_sched_switch(struct task_struct *prev, struct task_struct *next) > > > > { > > > > trace_mark(sched_switch, prev, next); > > > > } > > > > > > > > > > Not exactly. Something more along the lines of > > > > > > static inline void > > > trace_sched_switch(struct task_struct *prev, struct task_struct *next) > > > { > > > /* Internal tracers. */ > > > ftrace_sched_switch(prev, next); > > > othertracer_sched_switch(prev, next); > > > /* > > > * System-wide tracing. Useful information is exported here. > > > * Probes connecting to these markers are expected to only use the > > > * information provided to them for data collection purpose. Type > > > * casting pointers is discouraged. > > > */ > > > trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld", > > > prev->pid, next->pid, prev->state); > > > } > > > > Advantage of my method would be that ftrace (and othertracer) can use > > the same marker and doesn't need yet another hoook. > > > > Am I correct by saying that the method you propose completely removes > type checking between the instrumentation site and what the probes > expect ? If yes, this seems to be too fragile. Every time a marker would > change, one would have to audit _every_ probes, both in-kernel and in > modules. Adding type checking to the marker infrastructure makes > automatic detection of these changes possible. would be as simple as: git grep sched_switch every time someone changes trace_sched_switch() arguments. Doesn't seem too hard, you could even make checkpatch remind you to do that if it sees a change to a trace_* function. The down-side of runtime type checking (of which Masami's proposal is the best so far), is that you'll still not find the breakage until someone actually tries to use a tracer - so you'll still need the above. > > > > dropping the silly fmt string but using the multiplex of trace_mark, and > > > > then doing the stringify bit: > > > > > > > > "prev_pid %d next_pid %d prev_state %ld\n" > > > > > > > > in the actual tracer? > > > > > > > > > > It would make much more sense to put this formatting information along > > > with the trace point (e.g. in a a kernel/sched-trace.h header) rather > > > that to hide it in a tracer (loadable module) because this information > > > is an interface to the trace point. > > > > I'm not sure - it seems to me it should be part of the tracer because > > its a detail/subset of the actual data - rendering it useless for others > > who'd like a different set. > > > > If it ends up elsewhere, then we have to ensure type correctness in some > way. Sure, idealy we'd want compile time type safety. We'd want trace_sched_switch()'s arguments to match trace_sched_switch_handler()'s arguments, and a compile time error if this is not the case. However - we cannot seem to get that. Runtime type safety just doesn't help this case. But the point I was making here is that: trace_sched_switch(prev->pid, next->pid, next->state) could be useless for some other tracer who'd want: trace_sched_switch(prev->vruntime, next->vruntime) Also, the ->pid stuff isn't even alive on the normal code path, so adding that to the marker also bloats the code generated there. So by using the marker: trace_sched_switch(prev, next) We can have various tracers that display different information and avoid livelyness issues. > > > > IMHO the 'type safety' of the fmt string is over-rated, since it cannot > > > > distinguish between a task_struct * or a bio *, both are a pointers - > > > > and half arsed type safely is worse than no type safety. > > > > > > > > > > I totally agree with you that not having the capacity to inspect pointer > > > types is a problem for tracers which wants to receive the "raw" pointer > > > and deal with the data they need like big boys. On the other hand, it > > > requires them to be closely tied to the kernel internals and therefore > > > it makes sense to call them directly from the tracing site, thus > > > bypassing the marker format string. > > > > > > However, letting the marker specify the data format so a tracer could > > > format it into a memory buffer (in a binary or text format, depending on > > > the implementation) or so that a tool like systemtap can use this > > > identified information without having to be closely tied to the kernel > > > makes sense to me. > > > > So s-tap is meant to parse this sting and interpret the varargs without > > being closely tied to the kernel? - Somehow that doesn't make me feel > > warm and fuzzy. That not only ties userspace to the information present > > in the marker, but to the actual string as well. > > > > The stronger you make this bind the less I like it. > > > > Well, the string contains each field name and type. Therefore, SystemTAP > can hook on a marker and parse the string looking for some elements by > passing a NULL format string upon probe registration. Alternatively, it > can provide the exact format string expected when it registers its probe > to the marker and a check will be done to verify that the format string > passed along with the registered probe matches the marker format string. Yes, I get that, its one of the ugliest things I've met in this whole marker story. Why can't stap not insert a normal trace handler that extracts the information from prev/next it wants? > Also, about what you said earlier in this thread : > "Regular trace points can be custom made; this has the advantages that > it raises the implementation barrier and hopefully that encourages some > thought in the process. It also avoid the code from growing into > something that looks like someone had a long night of debugging." > > Before it has been moved to the markers, LTTng was once designed with > custom-made code to save the trace information through custom hooks. To > help maintainers instrument their own subsystem and do the right choice > without being a tracing expert, > we created a code generator which > generated this custom code for each trace point given a description of > the trace points. > It turned out that keeping this duplicate list of > trace points was cumbersome and that the generated code did eat a lot of > instruction cache. Well, your last proposal of static inline functions basically returns thereto. So what was cumbersome about it? The I$ issue is unfortunate indeed - but it seems to be the price to pay for compile time type safety. As for that code-generator, that seems a sane idea, esp if the input file is simply a regular C header file with trace point definitions. > This is why to turned to markers, so we could re-use > a common infrastructure to serialize the data into trace buffers. We > turned to the marker format string to allow the types to serialize to be > parsed efficiently by the tracer. I strongly recommend not to declare > the types associated with a kernel trace point in two unrelated > locations without type checking in-between them (e.g. trace_mark in > kernel code, string in the tracer module), because it would then become > harder to track consistency when the code changes. I see the value of trace_mark() in debugging sessions, but merging these things is like merging the resulting code file after a printk debugging session. > However, I would not be against an hybrid of Masami's proposal and > current markers, which I will propose in reply to his email. Ah - I'm looking forward.. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 14:27 ` Peter Zijlstra @ 2008-06-12 15:53 ` Frank Ch. Eigler 2008-06-12 16:16 ` Masami Hiramatsu 2008-06-12 16:53 ` Peter Zijlstra 0 siblings, 2 replies; 30+ messages in thread From: Frank Ch. Eigler @ 2008-06-12 15:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt Hi - On Thu, Jun 12, 2008 at 04:27:03PM +0200, Peter Zijlstra wrote: > [...] > > Well, the string contains each field name and type. Therefore, SystemTAP > > can hook on a marker and parse the string looking for some elements by > > passing a NULL format string upon probe registration. Alternatively, it > > can provide the exact format string expected when it registers its probe > > to the marker and a check will be done to verify that the format string > > passed along with the registered probe matches the marker format string. > > Yes, I get that, its one of the ugliest things I've met in this whole > marker story. Why can't stap not insert a normal trace handler that > extracts the information from prev/next it wants? [...] Think this through. How should systemtap (or another user-space separate-compiled tool like lttng) do this exactly? (a) rely on debugging information? Even assuming we could get proper anchors (PC addresses or unambiguous type names) to start searching dwarf data, we lose a key attractions of markers: that it can robustly transfer data *without* dwarf data kept around. (b) rely on hand-written C code (prototypes, pointer derefrencing wrappers) distributed with systemtap? Not only would this be a brittle maintenance pain in the form of cude duplication, but then errors in it couldn't even be detected until the final C compilation stage. That would make a lousy user experience. (c) have systemtap try to parse the mhiramat-proposed "(struct task_struct * next, struct task_struct * prev)" format strings? Then we're asking systemtap to parse potentially general C type expressions, find the kernel headers that declare the types? Parse available subfields? That seems too much to ask for. (d) or another way? - FChE ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 15:53 ` Frank Ch. Eigler @ 2008-06-12 16:16 ` Masami Hiramatsu 2008-06-12 16:43 ` Frank Ch. Eigler 2008-06-12 16:53 ` Peter Zijlstra 1 sibling, 1 reply; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-12 16:16 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Peter Zijlstra, Mathieu Desnoyers, Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi Frank, Frank Ch. Eigler wrote: > Hi - > > On Thu, Jun 12, 2008 at 04:27:03PM +0200, Peter Zijlstra wrote: >> [...] >>> Well, the string contains each field name and type. Therefore, SystemTAP >>> can hook on a marker and parse the string looking for some elements by >>> passing a NULL format string upon probe registration. Alternatively, it >>> can provide the exact format string expected when it registers its probe >>> to the marker and a check will be done to verify that the format string >>> passed along with the registered probe matches the marker format string. >> Yes, I get that, its one of the ugliest things I've met in this whole >> marker story. Why can't stap not insert a normal trace handler that >> extracts the information from prev/next it wants? [...] > > Think this through. How should systemtap (or another user-space > separate-compiled tool like lttng) do this exactly? > > (a) rely on debugging information? Even assuming we could get proper > anchors (PC addresses or unambiguous type names) to start > searching dwarf data, we lose a key attractions of markers: that > it can robustly transfer data *without* dwarf data kept around. > > (b) rely on hand-written C code (prototypes, pointer derefrencing > wrappers) distributed with systemtap? Not only would this be a > brittle maintenance pain in the form of cude duplication, but then > errors in it couldn't even be detected until the final C > compilation stage. That would make a lousy user experience. > > (c) have systemtap try to parse the mhiramat-proposed "(struct > task_struct * next, struct task_struct * prev)" format strings? > Then we're asking systemtap to parse potentially general C type > expressions, find the kernel headers that declare the types? > Parse available subfields? That seems too much to ask for. > > (d) or another way? use a lookup table. we can expect that the marking points which regularly inserted in the upstream kernel are stable(not so frequently change). In that case, we can easily maintain a lookup table which has pairs of format strings like as "sched_switch(struct task_struct * next, struct task_struct * prev)":"next %p prev %p" out of tree. Thus, you can use the printf-style format parser. This actually is a kind of duplication, but in this way, I think we can detect errors before generating C code, and easily add lookup pairs of format strings. (additionally, we can choose %s or %p for "char *" ;-)) > > > - FChE Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 16:16 ` Masami Hiramatsu @ 2008-06-12 16:43 ` Frank Ch. Eigler 2008-06-12 16:56 ` Peter Zijlstra 2008-06-12 17:05 ` Masami Hiramatsu 0 siblings, 2 replies; 30+ messages in thread From: Frank Ch. Eigler @ 2008-06-12 16:43 UTC (permalink / raw) To: Masami Hiramatsu Cc: Peter Zijlstra, Mathieu Desnoyers, Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi - On Thu, Jun 12, 2008 at 12:16:35PM -0400, Masami Hiramatsu wrote: > [...] > > Think this through. How should systemtap (or another user-space > > separate-compiled tool like lttng) do this exactly? > > [...] > > (d) or another way? > > use a lookup table. we can expect that the marking points which > regularly inserted in the upstream kernel are stable(not so > frequently change). In that case, we can easily maintain > a lookup table which has pairs of format strings like as > "sched_switch(struct task_struct * next, struct task_struct * prev)":"next %p prev %p" > out of tree. Thus, you can use the printf-style format parser. That's an interesting idea, but errors in this table would themselves only be caught at C compilation time. Worse, it does nothing helpful for actually pulling out the next/prev fields of interest. Remember, real tracing users don't care so much about the task_struct pointers, but about observable things like PIDs. Systemtap would be back to the debuginfo or C-header-guessing/parsing job (or embedded-C, yuck). This is another reason why markers are a nice solution. They allow passing of actual useful values: not just the %p pointers but the most interesting derived values (e.g., prev->pid). And they do this *efficiently* - in out-of-line code that imposes no measurable overhead in the normal case.. - FChE ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 16:43 ` Frank Ch. Eigler @ 2008-06-12 16:56 ` Peter Zijlstra 2008-06-12 22:10 ` Mathieu Desnoyers 2008-06-12 17:05 ` Masami Hiramatsu 1 sibling, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2008-06-12 16:56 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Masami Hiramatsu, Mathieu Desnoyers, Hideo AOKI, mingo, linux-kernel, Steven Rostedt On Thu, 2008-06-12 at 12:43 -0400, Frank Ch. Eigler wrote: > This is another reason why markers are a nice solution. They allow > passing of actual useful values: not just the %p pointers but the most > interesting derived values (e.g., prev->pid). Useful to whoem? stap isn't the holy grail of tracing and certainly not the only consumer of trace points, so restricting the information to just what stap needs is actually making trace points less useful. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 16:56 ` Peter Zijlstra @ 2008-06-12 22:10 ` Mathieu Desnoyers 0 siblings, 0 replies; 30+ messages in thread From: Mathieu Desnoyers @ 2008-06-12 22:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Frank Ch. Eigler, Masami Hiramatsu, Hideo AOKI, mingo, linux-kernel, Steven Rostedt * Peter Zijlstra (peterz@infradead.org) wrote: > On Thu, 2008-06-12 at 12:43 -0400, Frank Ch. Eigler wrote: > > > This is another reason why markers are a nice solution. They allow > > passing of actual useful values: not just the %p pointers but the most > > interesting derived values (e.g., prev->pid). > > Useful to whoem? stap isn't the holy grail of tracing and certainly not > the only consumer of trace points, so restricting the information to > just what stap needs is actually making trace points less useful. > LTTng also currently relies on the markers identifying useful data and ideally won't contain any tracepoint-specific code to extract subfields from structures. By the way, a point that should be clarified is that putting a "prev->pid" field in a marker does not make this variable live unless the marker is activated. When disabled, the marker site jumps over the stack setup, including any pointer dereference. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 16:43 ` Frank Ch. Eigler 2008-06-12 16:56 ` Peter Zijlstra @ 2008-06-12 17:05 ` Masami Hiramatsu 2008-06-12 17:48 ` Frank Ch. Eigler 1 sibling, 1 reply; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-12 17:05 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Peter Zijlstra, Mathieu Desnoyers, Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi, Frank Ch. Eigler wrote: > Hi - > > On Thu, Jun 12, 2008 at 12:16:35PM -0400, Masami Hiramatsu wrote: >> [...] >>> Think this through. How should systemtap (or another user-space >>> separate-compiled tool like lttng) do this exactly? >>> [...] >>> (d) or another way? >> use a lookup table. we can expect that the marking points which >> regularly inserted in the upstream kernel are stable(not so >> frequently change). In that case, we can easily maintain >> a lookup table which has pairs of format strings like as >> "sched_switch(struct task_struct * next, struct task_struct * prev)":"next %p prev %p" >> out of tree. Thus, you can use the printf-style format parser. > > That's an interesting idea, but errors in this table would themselves > only be caught at C compilation time. Hmm, why would you think so? I think if we can't find corresponding entry from the lookup table, it becomes an error. > Worse, it does nothing helpful > for actually pulling out the next/prev fields of interest. Remember, > real tracing users don't care so much about the task_struct pointers, > but about observable things like PIDs. Systemtap would be back to the > debuginfo or C-header-guessing/parsing job (or embedded-C, yuck). Yeah, but that is same as previous marker. It depends on what parameter the kernel pass to the marker. I mean, the parameter issue is not an issue of the marker framework, but a discussion point of marking points. > This is another reason why markers are a nice solution. They allow > passing of actual useful values: not just the %p pointers but the most > interesting derived values (e.g., prev->pid). And they do this > *efficiently* - in out-of-line code that imposes no measurable > overhead in the normal case.. Even if you use trace_mark() markers, you have to post a kernel patch which passes the prev->pid to the marking point and to discuss it. for example, DEFINE_TRACE(sched_switch, (int prev_pid, int next_pid), prev_pid, next_pid) But it might not so general, we have to discuss what parameters are enough good for each marking point. > > > - FChE Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 17:05 ` Masami Hiramatsu @ 2008-06-12 17:48 ` Frank Ch. Eigler 2008-06-12 19:34 ` Masami Hiramatsu 0 siblings, 1 reply; 30+ messages in thread From: Frank Ch. Eigler @ 2008-06-12 17:48 UTC (permalink / raw) To: Masami Hiramatsu Cc: Peter Zijlstra, Mathieu Desnoyers, Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi - On Thu, Jun 12, 2008 at 01:05:52PM -0400, Masami Hiramatsu wrote: > [...] > >> "sched_switch(struct task_struct * next, struct task_struct * prev)":"next %p prev %p" > >> out of tree. Thus, you can use the printf-style format parser. > > > > That's an interesting idea, but errors in this table would themselves > > only be caught at C compilation time. > Hmm, why would you think so? I think if we can't find corresponding > entry from the lookup table, it becomes an error. Sure, but if the entry exists but is wrong, we'd emit C code that won't compile. > [...] Even if you use trace_mark() markers, you have to post a > kernel patch which passes the prev->pid to the marking point and to > discuss it. for example, > DEFINE_TRACE(sched_switch, (int prev_pid, int next_pid), prev_pid, next_pid) (If it were up to me, I would add the task pointers too, which debuginfo-less systemtap could ignore but other tracers may use.) > But it might not so general, we have to discuss what parameters are > enough good for each marking point. That's exactly what the "lttng instrumentation markers" threads from the recent past had started. - FChE ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 17:48 ` Frank Ch. Eigler @ 2008-06-12 19:34 ` Masami Hiramatsu 2008-06-13 4:19 ` Takashi Nishiie 0 siblings, 1 reply; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-12 19:34 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Peter Zijlstra, Mathieu Desnoyers, Hideo AOKI, mingo, linux-kernel, Steven Rostedt Hi Frank, Frank Ch. Eigler wrote: > Hi - > > On Thu, Jun 12, 2008 at 01:05:52PM -0400, Masami Hiramatsu wrote: >> [...] >>>> "sched_switch(struct task_struct * next, struct task_struct * prev)":"next %p prev %p" >>>> out of tree. Thus, you can use the printf-style format parser. >>> That's an interesting idea, but errors in this table would themselves >>> only be caught at C compilation time. > >> Hmm, why would you think so? I think if we can't find corresponding >> entry from the lookup table, it becomes an error. > > Sure, but if the entry exists but is wrong, we'd emit C code that > won't compile. I think if someone changes the trace point in the kernel, Module.markers is also changed. ex.) DEFINE_TRACE(sched_switch, (struct task_struct * next, struct task_struct * prev), next, prev); if someone changes above to below, DEFINE_TRACE(sched_switch, (int prev_pid, int next_pid), prev_pid, next_pid); Module.markers also change like sched_switch vmlinux (struct task_struct * next, struct task_struct * prev) to sched_switch vmlinux (int prev_pid, int next_pid) In this case, the below entry never matches to new Module.markers. "sched_switch(struct task_struct * next, struct task_struct * prev)":"next %p prev %p" Thus, we can find an error. However, of cause, we should take care of the task_struct changes. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Kernel marker has no performance impact on ia64. 2008-06-12 19:34 ` Masami Hiramatsu @ 2008-06-13 4:19 ` Takashi Nishiie 2008-06-13 18:02 ` Masami Hiramatsu 0 siblings, 1 reply; 30+ messages in thread From: Takashi Nishiie @ 2008-06-13 4:19 UTC (permalink / raw) To: 'Masami Hiramatsu', 'Frank Ch. Eigler' Cc: 'Peter Zijlstra', 'Mathieu Desnoyers', 'Hideo AOKI', mingo, linux-kernel, 'Steven Rostedt' Hi,Hiramatsu Hiramatsu wrote: >I think if someone changes the trace point in the kernel, >Module.markers is also changed. > >ex.) > DEFINE_TRACE(sched_switch, (struct task_struct * next, struct task_struct * prev), > next, prev); > >if someone changes above to below, > > DEFINE_TRACE(sched_switch, (int prev_pid, int next_pid), prev_pid, next_pid); > >Module.markers also change like > > sched_switch vmlinux (struct task_struct * next, struct task_struct * prev) > >to > > sched_switch vmlinux (int prev_pid, int next_pid) > >In this case, the below entry never matches to new Module.markers. > >"sched_switch(struct task_struct * next, struct task_struct * prev)":"next %p prev %p" > >Thus, we can find an error. >However, of cause, we should take care of the task_struct changes. The one expected of markers with SystemTap and LTTng is only different the foundation of this discussion. SystemTap: Because the output format is basically defined with user script in systemtap script, "format information" is unnecessary. It can do nothing but guess from "format information" in present markers though the type of the argument is indispensable for SystemTap on the other hand. If the type of an accurate argument is understood, correct processing can be executed. Moreover, being able to refer to the member of the structure if the argument is a pointer of the structure becomes possible, and it is convenient. LTTng: Because the output format is not originally defined in LTTng, "format information" is indispensable. I think that I only have to add the item that shows the type of an accurate argument if the format of "Module.markers" is changed. Thank you, -- Takashi Nishiie ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-13 4:19 ` Takashi Nishiie @ 2008-06-13 18:02 ` Masami Hiramatsu 2008-06-16 2:58 ` Takashi Nishiie 0 siblings, 1 reply; 30+ messages in thread From: Masami Hiramatsu @ 2008-06-13 18:02 UTC (permalink / raw) To: Takashi Nishiie Cc: 'Frank Ch. Eigler', 'Peter Zijlstra', 'Mathieu Desnoyers', 'Hideo AOKI', mingo, linux-kernel, 'Steven Rostedt' Hi Takashi, Thank you for comment. Takashi Nishiie wrote: > Hi,Hiramatsu > > Hiramatsu wrote: > >> I think if someone changes the trace point in the kernel, >> Module.markers is also changed. >> >> ex.) >> DEFINE_TRACE(sched_switch, (struct task_struct * next, struct task_struct > * prev), >> next, prev); >> >> if someone changes above to below, >> >> DEFINE_TRACE(sched_switch, (int prev_pid, int next_pid), prev_pid, > next_pid); >> Module.markers also change like >> >> sched_switch vmlinux (struct task_struct * next, struct task_struct * > prev) >> to >> >> sched_switch vmlinux (int prev_pid, int next_pid) >> >> In this case, the below entry never matches to new Module.markers. >> >> "sched_switch(struct task_struct * next, struct task_struct * prev)":"next > %p prev %p" >> Thus, we can find an error. >> However, of cause, we should take care of the task_struct changes. > > The one expected of markers with SystemTap and LTTng is only > different the foundation of this discussion. > > SystemTap: > Because the output format is basically defined with user script in > systemtap script, "format information" is unnecessary. It can do > nothing but guess from "format information" in present markers though > the type of the argument is indispensable for SystemTap on the other > hand. If the type of an accurate argument is understood, correct > processing can be executed. Moreover, being able to refer to the > member of the structure if the argument is a pointer of the structure > becomes possible, and it is convenient. It's not so easy for systemtap... for example, if a marker pass 'char *', how can we do it? it might have to record as a string, or might have to record as a pointer. Obviously, printf-style information includes a kind of 'semantic' information. So, I think the 'lookup table' is good. > LTTng: > Because the output format is not originally defined in LTTng, "format > information" is indispensable. Sure, however, LTTng can have original "format information" itself. > I think that I only have to add the item that shows the type of an > accurate argument if the format of "Module.markers" is changed. I think those 'semantic' type information (which informs you what is this value and how this value should be recorded) should be defined by each tracer, because what information they want to extract from arguments are strongly depends on their use cases. Actually, that requires tracers to pay maintenance costs, but it's not so big if the "regular" marking point is enough stable. Regards, > > Thank you, > > -- > Takashi Nishiie > > > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Kernel marker has no performance impact on ia64. 2008-06-13 18:02 ` Masami Hiramatsu @ 2008-06-16 2:58 ` Takashi Nishiie 0 siblings, 0 replies; 30+ messages in thread From: Takashi Nishiie @ 2008-06-16 2:58 UTC (permalink / raw) To: 'Masami Hiramatsu' Cc: 'Frank Ch. Eigler', 'Peter Zijlstra', 'Mathieu Desnoyers', 'Hideo AOKI', mingo, linux-kernel, 'Steven Rostedt' Hi. Takashi Nishiie wrote: >> I think that I only have to add the item that shows the type of an >> accurate argument if the format of "Module.markers" is changed. > Masami Hiramatsu wrote: >I think those 'semantic' type information (which informs you what >is this value and how this value should be recorded) should be >defined by each tracer, because what information they want to >extract from arguments are strongly depends on their use cases. > >Actually, that requires tracers to pay maintenance costs, but >it's not so big if the "regular" marking point is enough stable. Certainly, because it only has to be able to acquire name of the point that wants to be traced, value that can be referred, the type, and those three items if interchangeability with an existing tool is not thought, it might not have to stick to a detailed thing. By the way, 'pr_debug' and 'dev_debug' and 'DEBUGP', etc. might unite handling by replacing it with kernel markers. Thank you, -- Takashi Nishiie ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 15:53 ` Frank Ch. Eigler 2008-06-12 16:16 ` Masami Hiramatsu @ 2008-06-12 16:53 ` Peter Zijlstra 2008-06-12 17:38 ` Frank Ch. Eigler 1 sibling, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2008-06-12 16:53 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Mathieu Desnoyers, Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt On Thu, 2008-06-12 at 11:53 -0400, Frank Ch. Eigler wrote: > Hi - > > On Thu, Jun 12, 2008 at 04:27:03PM +0200, Peter Zijlstra wrote: > > [...] > > > Well, the string contains each field name and type. Therefore, SystemTAP > > > can hook on a marker and parse the string looking for some elements by > > > passing a NULL format string upon probe registration. Alternatively, it > > > can provide the exact format string expected when it registers its probe > > > to the marker and a check will be done to verify that the format string > > > passed along with the registered probe matches the marker format string. > > > > Yes, I get that, its one of the ugliest things I've met in this whole > > marker story. Why can't stap not insert a normal trace handler that > > extracts the information from prev/next it wants? [...] > > Think this through. How should systemtap (or another user-space > separate-compiled tool like lttng) do this exactly? lttng has trace handlers to write data into some buffer, right? The trace point need not be concerned with which data, nor into what buffer. > (a) rely on debugging information? Even assuming we could get proper > anchors (PC addresses or unambiguous type names) to start > searching dwarf data, we lose a key attractions of markers: that > it can robustly transfer data *without* dwarf data kept around. Perhaps you can ship a reduced set of dwarf info tailored to contain the bits you need, like where to find ->pid in struct task_struct. Or you can hook into the lttng tracer. > (b) rely on hand-written C code (prototypes, pointer derefrencing > wrappers) distributed with systemtap? Not only would this be a > brittle maintenance pain in the form of cude duplication, but then > errors in it couldn't even be detected until the final C > compilation stage. That would make a lousy user experience. Not really sure what you mean here - I throught compile time errors were the goal?! > (c) have systemtap try to parse the mhiramat-proposed "(struct > task_struct * next, struct task_struct * prev)" format strings? > Then we're asking systemtap to parse potentially general C type > expressions, find the kernel headers that declare the types? > Parse available subfields? That seems too much to ask for. tcc and sourcefs sound like way fun ;-) > (d) or another way? Get your own tracer in kernel - that provides exactly what stap needs? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 16:53 ` Peter Zijlstra @ 2008-06-12 17:38 ` Frank Ch. Eigler 2008-06-13 11:01 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Frank Ch. Eigler @ 2008-06-12 17:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt Hi - On Thu, Jun 12, 2008 at 06:53:41PM +0200, Peter Zijlstra wrote: > [...] > > Think this through. How should systemtap (or another user-space > > separate-compiled tool like lttng) do this exactly? > > lttng has trace handlers to write data into some buffer, right? > The trace point need not be concerned with which data, nor into what > buffer. The "which data" is definitely a trace point concern. It communicates from the developer to users as to what values are likely to be of interest. > > (a) rely on debugging information? Even assuming we could get proper > > anchors (PC addresses or unambiguous type names) to start > > searching dwarf data, we lose a key attractions of markers: that > > it can robustly transfer data *without* dwarf data kept around. > > Perhaps you can ship a reduced set of dwarf info [...] "I" don't ship or generate dwarf data. Distributors do. > > (b) rely on hand-written C code (prototypes, pointer derefrencing > > wrappers) distributed with systemtap? Not only would this be a > > brittle maintenance pain in the form of cude duplication, but then > > errors in it couldn't even be detected until the final C > > compilation stage. That would make a lousy user experience. > > Not really sure what you mean here - I throught compile time errors > were the goal?! For us, it's too late. In systemtap, we try to give people useful information when they make mistakes. For probes of whatever sort, we want to know the available data types and names, while just starting to process their script, so that we can check types and suggest alternatives. C code compilation is quite some way removed and is supposed to be a systemtap internal implementation detail. > > (c) have systemtap try to parse the mhiramat-proposed "(struct > > task_struct * next, struct task_struct * prev)" format strings? > > Then we're asking systemtap to parse potentially general C type > > expressions, find the kernel headers that declare the types? > > Parse available subfields? That seems too much to ask for. > > tcc and sourcefs sound like way fun ;-) Really... > > (d) or another way? > > Get your own tracer in kernel - that provides exactly what stap needs? You are missing that (a) this is the point of markers - to allow the the gajillion tracers a single place per event to hook through, and (b) we would like to leave to subsystem developers and/or end-users as to what data should be available. We don't want to get into the middle of it. > > This is another reason why markers are a nice solution. They allow > > passing of actual useful values: not just the %p pointers but the most > > interesting derived values (e.g., prev->pid). > > Useful to whoem? stap isn't the holy grail of tracing and certainly not > the only consumer of trace points, so restricting the information to > just what stap needs is actually making trace points less useful. Only you are talking about "restricting". I am talking about thoughtfully *adding* simple scalar values that most tracing type tools will generally want, so that they don't have to perform heroic measures like ship their own dwarf subsets or parse kernel source code. Which scalars? Well, it depends on the subsystem and event, and you can review here several discussions about specific lttng instrumentation markers to see how they generally go. - FChE ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-12 17:38 ` Frank Ch. Eigler @ 2008-06-13 11:01 ` Peter Zijlstra 2008-06-13 14:17 ` Frank Ch. Eigler 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2008-06-13 11:01 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Mathieu Desnoyers, Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt On Thu, 2008-06-12 at 13:38 -0400, Frank Ch. Eigler wrote: > Hi - > > On Thu, Jun 12, 2008 at 06:53:41PM +0200, Peter Zijlstra wrote: > > [...] > > > Think this through. How should systemtap (or another user-space > > > separate-compiled tool like lttng) do this exactly? > > > > lttng has trace handlers to write data into some buffer, right? > > The trace point need not be concerned with which data, nor into what > > buffer. > > The "which data" is definitely a trace point concern. It communicates > from the developer to users as to what values are likely to be of > interest. But that is tracer specific - I might write a scheduler tracer that looks at the quality of scheduling decisions and thus wants to look at the virtual timeline variables and the scheduling class of the tasks involved. That's a whole different context, but the trace points are the same. Are you saying trace points are not to allow me to do that? > > > (a) rely on debugging information? Even assuming we could get proper > > > anchors (PC addresses or unambiguous type names) to start > > > searching dwarf data, we lose a key attractions of markers: that > > > it can robustly transfer data *without* dwarf data kept around. > > > > Perhaps you can ship a reduced set of dwarf info [...] > > "I" don't ship or generate dwarf data. Distributors do. That's ignoring the point - 'someone' could generate reduced debug info to allow you to easily get what you want. > > > (b) rely on hand-written C code (prototypes, pointer derefrencing > > > wrappers) distributed with systemtap? Not only would this be a > > > brittle maintenance pain in the form of cude duplication, but then > > > errors in it couldn't even be detected until the final C > > > compilation stage. That would make a lousy user experience. > > > > Not really sure what you mean here - I throught compile time errors > > were the goal?! > > For us, it's too late. In systemtap, we try to give people useful > information when they make mistakes. For probes of whatever sort, we > want to know the available data types and names, while just starting > to process their script, so that we can check types and suggest > alternatives. C code compilation is quite some way removed and is > supposed to be a systemtap internal implementation detail. Sounds like you have an incomplete Native-Interface system then. You should be able to match a native function description back to your script language. > > > (c) have systemtap try to parse the mhiramat-proposed "(struct > > > task_struct * next, struct task_struct * prev)" format strings? > > > Then we're asking systemtap to parse potentially general C type > > > expressions, find the kernel headers that declare the types? > > > Parse available subfields? That seems too much to ask for. > > > > tcc and sourcefs sound like way fun ;-) > > Really... Yeah, wouldn't it be cool if the kernel came with an embedded compiler and a filesystem that included its exact source code? Together with the entry instrumentation site and dynamic jump patches you can do really weird stuff... /me dreams on :-) > > > (d) or another way? > > > > Get your own tracer in kernel - that provides exactly what stap needs? > > You are missing that (a) this is the point of markers - to allow the > the gajillion tracers a single place per event to hook through, and > (b) we would like to leave to subsystem developers and/or end-users as > to what data should be available. We don't want to get into the > middle of it. I think a) and b) contradict each other, you cannot cater for all tracers and limit the data they can use. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Kernel marker has no performance impact on ia64. 2008-06-13 11:01 ` Peter Zijlstra @ 2008-06-13 14:17 ` Frank Ch. Eigler 0 siblings, 0 replies; 30+ messages in thread From: Frank Ch. Eigler @ 2008-06-13 14:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, Hideo AOKI, mingo, Masami Hiramatsu, linux-kernel, Steven Rostedt Hi - On Fri, Jun 13, 2008 at 01:01:12PM +0200, Peter Zijlstra wrote: > [...] > > > The trace point need not be concerned with which data, nor into what > > > buffer. > > > > The "which data" is definitely a trace point concern. It communicates > > from the developer to users as to what values are likely to be of > > interest. > > But that is tracer specific - I might write a scheduler tracer that > looks at the quality of scheduling decisions and thus wants to look at > the virtual timeline variables and the scheduling class of the tasks > involved. > That's a whole different context, but the trace points are the same. Are > you saying trace points are not to allow me to do that? Of course markers can allow you to do that. You can add two markers in one spot if you really want to, one with far more details than the other. You can still write your own "tracer" kernel-side code to attach to that if you wish. But you might find that you don't have to, if a more general tool can get you the same data (and more). (A separate issue, brought up some number of times here, has been whether even detailed debugging-oriented markers would be worth keeping in the code. I've argued that, yes, it's worthwhile, since they cost nearly nothing, and in exchange would allow your users/customers to gather the same debugging data for bug reports, in situ on a live system.) > > > > (a) rely on debugging information? Even assuming we could get proper > > > > anchors (PC addresses or unambiguous type names) to start > > > > searching dwarf data, we lose a key attractions of markers: that > > > > it can robustly transfer data *without* dwarf data kept around. > > > > > > Perhaps you can ship a reduced set of dwarf info [...] > > > > "I" don't ship or generate dwarf data. Distributors do. > > That's ignoring the point - 'someone' could generate reduced debug info > to allow you to easily get what you want. But is there clear benefit to blocking potential incremental progress, waiting for newer technology? What if one can transition smoothly once godot does show up? > [...] > > > Get your own tracer in kernel - that provides exactly what stap needs? > > > > You are missing that (a) this is the point of markers - to allow the > > the gajillion tracers a single place per event to hook through, and > > (b) we would like to leave to subsystem developers and/or end-users as > > to what data should be available. We don't want to get into the > > middle of it. > > I think a) and b) contradict each other, you cannot cater for all > tracers and limit the data they can use. We are not talking about "limiting". Markers are about identifying the large intersection of interest (events + data) amongst multiple tracing-type tools, and letting them share an single efficient hooking mechanism for getting at that. No one said it must to be the solitary tracing-hook mechanism. My (b) point was that systemtap per se does not want to specify what subset of data should be available to an end user -- so we don't want systemtap-specific markers or hooks or whatever. We want to expose whatever the developer deemed appropriate. I merely lobby that this set should include some values that tools can use without heroic measures. - FChE ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-06-16 2:58 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-02 22:12 Kernel marker has no performance impact on ia64 Hideo AOKI 2008-06-02 22:32 ` Peter Zijlstra 2008-06-02 23:21 ` Mathieu Desnoyers 2008-06-03 6:07 ` Takashi Nishiie 2008-06-04 4:58 ` Masami Hiramatsu 2008-06-04 23:26 ` Mathieu Desnoyers 2008-06-04 23:40 ` Masami Hiramatsu 2008-06-04 22:27 ` Peter Zijlstra 2008-06-04 23:22 ` Mathieu Desnoyers 2008-06-05 8:12 ` Peter Zijlstra 2008-06-05 14:28 ` Masami Hiramatsu 2008-06-12 14:04 ` Mathieu Desnoyers 2008-06-12 15:31 ` Masami Hiramatsu 2008-06-12 13:53 ` Mathieu Desnoyers 2008-06-12 14:27 ` Peter Zijlstra 2008-06-12 15:53 ` Frank Ch. Eigler 2008-06-12 16:16 ` Masami Hiramatsu 2008-06-12 16:43 ` Frank Ch. Eigler 2008-06-12 16:56 ` Peter Zijlstra 2008-06-12 22:10 ` Mathieu Desnoyers 2008-06-12 17:05 ` Masami Hiramatsu 2008-06-12 17:48 ` Frank Ch. Eigler 2008-06-12 19:34 ` Masami Hiramatsu 2008-06-13 4:19 ` Takashi Nishiie 2008-06-13 18:02 ` Masami Hiramatsu 2008-06-16 2:58 ` Takashi Nishiie 2008-06-12 16:53 ` Peter Zijlstra 2008-06-12 17:38 ` Frank Ch. Eigler 2008-06-13 11:01 ` Peter Zijlstra 2008-06-13 14:17 ` Frank Ch. Eigler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox