[RFC] The New and Improved Logdev (now with kprobes!)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] The New and Improved Logdev (now with kprobes!)
@ 2006-10-05  5:11 Steven Rostedt
  2006-10-05 14:31 ` Mathieu Desnoyers
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2006-10-05  5:11 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Mathieu Desnoyers, Chris Wright, fche, Tom Zanussi

Hi all,

This is my annual post to LKML advertising my beloved logdev
logger/tracer/debugger tool.  OK, it's not as fancy as LTT and
SystemTrap, but it's mine so I hold it dear to my heart :-)

This has been used mainly for debugging. Although it has tracing
abilities, there's much better tools out there for that.  What this has,
that the others don't is the output on a lockup or crash.  That's what I
mainly use this tool for.

I'm currently 5732 messages behind on LKML and I'm trying desperately to
catch up.  But yesterday I read a nice little friendly thread between
some of my dear colleagues about static vs dynamic trace points. (I'm
friends with those on both sides of that fence so I will keep my
opinions and comments far from that fire).

Anyway, it made me think.  Logdev is solely dependent on static trace
points.  I wanted to change that.  So looking into how kprobes works, I
added it to logdev.  So tglx can't say anymore that it's just another
trivial logger that anyone and everyone and their grandmother has
written up!

Well, as luck may have it, the cable to my Internet access had a loose
connection at the top of the telephone pole.  This loose connection
allowed water to seep in and it made it all the way down to the splice.
I still had cable TV, but it killed the reception to the cable modem. So
while I was waiting for Road Runner to appear and replace the cable I
did a hack fest on Logdev.

So now, when logdev is compiled into the kernel, and you have
CONFIG_KPROBES turned on, you will have the ability to log using the
logger dynamically from user space.

I currently have four methods, but the potential is so much more.

1. break point and a watch address

This simply allows you to set a break point at some address (or pass in
a function name if it exists in kallsyms).

example:

logprobe -f hrtimer_start  -v jiffies_64

produces (either on serial console on crash, or user utility):

  [ 7167.692815] cpu:0 emacs:4358 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e1a
  [ 7167.760701] cpu:0 emacs:3960 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e2b
  [ 7167.760700] cpu:1 emacs:4362 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e2b
  [ 7167.791281] cpu:1 Xorg:3714 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e33
  [ 7167.800631] cpu:0 emacs:4358 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e35
  [ 7167.839553] cpu:0 Xorg:3714 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e3f
  [ 7167.868523] cpu:1 emacs:4362 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e46
  [ 7167.872506] cpu:1 emacs:3960 func: hrtimer_start (0xc0137aab) var: jiffies_64 (0xc0428400) = 001a3e47

2. break point and watch from current

This allows a user to see something on the current task_struct. You need
to know the offset exactly. In the below example, I know that 20 (dec)
is the offset in the task_struct to lock_depth.

example:

logprobe -f schedule -c 20 "lock_depth"

produces:

  [ 8757.854029] cpu:1 sawfish:3862 func: schedule (0xc02f8320) lock_depth index:20 = 0xffffffff

3. break point and watch fixed type

This is a catch all for me. I currently only implement preempt_count.

 logprobe -t pc -f _spin_lock

produces:

   [ 9442.215693] cpu:0 logread:6398 func: _spin_lock (0xc02fab9d)  preempt_count:0x0

4. function break, and parameters.

This one was a fun little hack!  It seems to work on x86 though (haven't
tried it on x86_64 or others yet).

Here we can see the function parameters using a printf type format.

example:

  logprobe -f try_to_wake_up "task=%p state=%x sync=%d"

produces:

  [ 7837.656037] cpu:0 Xorg:3714 func: try_to_wake_up (0xc01197d3) task=c19bdf30 state=c0318e49 sync=0

Logdev still uses my own custom made ring buffer, but I'm working with
Tom Zanussi to get it working with relayfs.  It sorta works, but is
currently in a "broken" state.  So don't select it, unless you don't
mind my code eating your Doritos!

Anyway, like I said, this is my annual push of Logdev (more of a tap
than a push), just to let other kernel hackers know what I have a
debugging aid, and if it can be of use to anyone else out there.  That
alone would make me happy (doesn't take much ;-)

All tools and everything is under GPLv2, and can currently be found at
http://rostedt.homelinux.com/logdev  when water isn't ruining my
connection.

Have fun!

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05  5:11 [RFC] The New and Improved Logdev (now with kprobes!) Steven Rostedt
@ 2006-10-05 14:31 ` Mathieu Desnoyers
  2006-10-05 15:49   ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Mathieu Desnoyers @ 2006-10-05 14:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Chris Wright, fche, Tom Zanussi

Hi Steven,

The dynamic abilities of your logdev are very interesting! If I may emit some
ideas :

It would be great to have this logging information recorded into a standardized
buffer format so it could be analyzed with data gathered by other
instrumentation. Instead of using Tom's relay mechanism directly, you might
want to have a look at LTTng (http://ltt.polymtl.ca) : it would be a simple
matter of describing your own facility (group of event), the data types they
record, run genevent (serialization code generator) and call those
serialization functions when you want to record to the buffers from logdev.

One thing logdev seems to have that LTTng does't currently is the integration
with a mechanism that dumps the output upon a crash (LKCD integration). It's no
rocket science, but I just did not have time to do it.

I think it would be great to integrate those infrastructures together so we can
easily merge information coming from various sources (markers, logdev, systemTAP
scripts, LKET).

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 1. break point and a watch address
> 
> This simply allows you to set a break point at some address (or pass in
> a function name if it exists in kallsyms).
> 
> logprobe -f hrtimer_start  -v jiffies_64
> 
Does it automatically get the data type, or is there any way to specify it ?

> 
> 2. break point and watch from current
> 
> This allows a user to see something on the current task_struct. You need
> to know the offset exactly. In the below example, I know that 20 (dec)
> is the offset in the task_struct to lock_depth.
> 
> example:
> 
> logprobe -f schedule -c 20 "lock_depth"
> 
> produces:
> 
>   [ 8757.854029] cpu:1 sawfish:3862 func: schedule (0xc02f8320) lock_depth index:20 = 0xffffffff
> 

Could we think of a quick hack that would involve using gcc on stdin and return
an "offsetof", all in user-space ?

> 
> 3. break point and watch fixed type
> 
> This is a catch all for me. I currently only implement preempt_count.
> 
> 
>  logprobe -t pc -f _spin_lock
> 
> produces:
> 
>    [ 9442.215693] cpu:0 logread:6398 func: _spin_lock (0xc02fab9d)  preempt_count:0x0
> 
Ouch, I can imagine the performance impact of this breakpoint though :) This is
a case where marking the code helps a lot.

Regards,

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 14:31 ` Mathieu Desnoyers
@ 2006-10-05 15:49   ` Steven Rostedt
  2006-10-05 17:01     ` Mathieu Desnoyers
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2006-10-05 15:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Chris Wright, fche, Tom Zanussi

On Thu, 5 Oct 2006, Mathieu Desnoyers wrote:

> Hi Steven,
>
> The dynamic abilities of your logdev are very interesting! If I may emit some
> ideas :

Thanks, I appreciate all constructive ideas!

>
> It would be great to have this logging information recorded into a standardized
> buffer format so it could be analyzed with data gathered by other
> instrumentation. Instead of using Tom's relay mechanism directly, you might
> want to have a look at LTTng (http://ltt.polymtl.ca) : it would be a simple
> matter of describing your own facility (group of event), the data types they
> record, run genevent (serialization code generator) and call those
> serialization functions when you want to record to the buffers from logdev.

Hmm, interesting. But at the mean time, what you describe seems a little
out of scope with logdev. This doesn't mean that it can't be applied, now
or later.  But currently, I do use logdev for 90% debugging and 10%
analyzing.  Perhaps for the analyzing part, this would be useful.  I have
to admit, I didn't get far trying to convert LTTng to 2.6.18. Didn't have
the time. Ah, I see you have a patch there now for 2.6.18.  Adding this
would be good to do.  But unfortunately, my time is currently very limited
(who's isn't. But mine currently is more limited than it usually is).

When things slow down for me a little, I'll see where you are at, and take
a look.  Something we can also discuss at the next OLS.

>
> One thing logdev seems to have that LTTng does't currently is the integration
> with a mechanism that dumps the output upon a crash (LKCD integration). It's no
> rocket science, but I just did not have time to do it.

heehee, in pine the "no" was cut off by my 80 cols, and it looked like
"It's rocket science".  Well if it _was_ rocket science, I wouldn't be
able to do it ;-)

But from that really bright thread (lit up mainly by the flames), there
was strong talk about LTTng not tracing for debugging.  It _can_ be a
debugging tool, but that's not its main purpose. It is an analyzing tool.
Logdev _was_ written to be a debugging tool, and that is why I never
pushed to hard to get it into the kernel. Because, mainly, it was used for
kernel hackers only.

This is why the output of the crashes is very important for Logdev, and
not so important for LTTng.  Logdev's biggest asset was the ability to
find deadlocks.  The output always showed the order of events between
processors, and time and time again, I've submitted race condition
fix patches to the kernel, to the -rt patch and even to tglx's hrtimer
work.

To logdev, speed of the trace is important, but not that important.
Accuracy of the trace is the most important.  Originally, I had a single
buffer, and would use spinlocks to protect it.  All CPUs would share this
buffer. The reason for this, is I wanted simple code to prove that the
sequence events really did happen in a certain order.  I just recently
changed the ring buffer to use a lockless buffer per cpu, but I still
question it's accuracy. But I guess it does make things faster now.

>
> I think it would be great to integrate those infrastructures together so we can
> easily merge information coming from various sources (markers, logdev, systemTAP
> scripts, LKET).

The one argument I have against this, is that some of these have different
objectives.  Merging too much can dilute the objective of the app.  But I
do think that a cooperation between the tools would be nice.

>
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > 1. break point and a watch address
> >
> > This simply allows you to set a break point at some address (or pass in
> > a function name if it exists in kallsyms).
> >
> > logprobe -f hrtimer_start  -v jiffies_64
> >
> Does it automatically get the data type, or is there any way to specify it ?

Not yet, but all of the kprobes code for logdev was written in about 26
hours.  And really because I lost access to the Internet, did I do so
much.  I got to get back to other things, so the progress will once again
slow down to almost a halt.

I would really like to integrate the logdev tools with gdb so that I can
load the vmlinux kernel and get all sorts of good stuff.

But that's not going to happen in 26 hours :-)

I also need to learn how to do that.

>
> >
> > 2. break point and watch from current
> >
> > This allows a user to see something on the current task_struct. You need
> > to know the offset exactly. In the below example, I know that 20 (dec)
> > is the offset in the task_struct to lock_depth.
> >
> > example:
> >
> > logprobe -f schedule -c 20 "lock_depth"
> >
> > produces:
> >
> >   [ 8757.854029] cpu:1 sawfish:3862 func: schedule (0xc02f8320) lock_depth index:20 = 0xffffffff
> >
>
> Could we think of a quick hack that would involve using gcc on stdin and return
> an "offsetof", all in user-space ?

Have an idea, I'd love to see it!

>
> >
> > 3. break point and watch fixed type
> >
> > This is a catch all for me. I currently only implement preempt_count.
> >
> >
> >  logprobe -t pc -f _spin_lock
> >
> > produces:
> >
> >    [ 9442.215693] cpu:0 logread:6398 func: _spin_lock (0xc02fab9d)  preempt_count:0x0
> >
> Ouch, I can imagine the performance impact of this breakpoint though :) This is
> a case where marking the code helps a lot.

True, but it also matters what you write.  My old static tracing still
noticeably slowed down the system.  But this thread is _not_ about
static vs dynamic, that's been beaten to death already and I don't want
to be involved in that debate until the emotions calm down (like that has
ever happened on LKML).

But the above really didn't slow the system down too noticeably.

>
>
> Regards,
>
> Mathieu

Well Mathieu, thanks a lot for taking the time to look at the code.  I'd
like to know more about LTTng too, and understand it better.  Hopefully,
when things slow down a little I will.

I know I said I'm staying out of the debate, but I need to ask this
anyway.  Couldn't LTTng be fully implemented with dynamic traces? And if
so, then what would be the case, to get that into the kernel, and then
maintain a separate patch to convert those dynamic traces into static
onces where performance is critical.  This way, you can get the
infrastructure into the kernel, and get more eyes on it. Also make the
patch smaller.

As you stated, there can be more users than LTTng to what gets into the
kernel.  Now, grant you, I don't know LTTng too well, so all this that I'm
saying could just be coming out of my butt. But from what I did read,
LTTng is a _good_ tool, and should be supported.  It just seems that the
method isn't accepted.

I've learned a lot from Ingo and Thomas.  One thing I watch them do with
the -rt patch was to get small pieces into the kernel, a little at a time.
But these pieces got into the kernel not because it came from Ingo, but
because they actually benefited other parts of the kernel.

Like relay getting into the kernel, that was a part of LTT that helped
other parts of the kernel.  So if there is an underlining infrastructure
that can be used by multiple systems, then that should be something to
strive for.

So if you convert LTTng to use dynamic traces only (for now), and that
gets accepted into the kernel, you will then have a larger user base of
LTTng.  Yes, the performance may be a problem, but you have a separate
patch (as you do now) to change to static tracing for those that need the
performance.  But in the mean time, you have those that can use it without
recompiling their kernel.

Now heres the kicker!

When the demand comes to make LTTng (that's in the kernel) perform better,
then the kernel developers will feel the pressure to either introduce
static tracing in critical points, or fix up the dynamic tracing to
perform better.

As Ingo showed in the thread.  He added code to speed up kprobes. This was
done just because it was noticed that kprobes was slow. It will never be
done if no one notices.  But as soon as there is a demand for speed on
that system, there will be lots of good ideas coming out to make it
better.

So I personally am not for or against static tracing, because simply I
don't mind patching my own kernel, and I'm not maintaining it.  Logdev was
used to add static tracing quickly.  Logdev started in the 2.1 kernel, and
is now at 2.6.18.  I've seldom had problems porting it.  The most
difficult port (which took 2 hours) was to the -rt patch.  And that was
just because my tracing had to be aware of spinlocks turning into mutexes,
and interrupts not really being disabled.

But the moral here is that logdev was as much contained out of the kernel,
and I could just slam in trace points when needed.  After a bug was found,
I removed all them right away. (thank God for subversion and quilt).

Just some thoughts. Although I'm sure I'm going to regret bringing this
back up. :-/

Tschuess,

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 15:49   ` Steven Rostedt
@ 2006-10-05 17:01     ` Mathieu Desnoyers
  2006-10-05 18:09       ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Mathieu Desnoyers @ 2006-10-05 17:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Chris Wright, fche, Tom Zanussi

* Steven Rostedt (rostedt@goodmis.org) wrote:
> > It would be great to have this logging information recorded into a standardized
> > buffer format so it could be analyzed with data gathered by other
> > instrumentation. Instead of using Tom's relay mechanism directly, you might
> > want to have a look at LTTng (http://ltt.polymtl.ca) : it would be a simple
> > matter of describing your own facility (group of event), the data types they
> > record, run genevent (serialization code generator) and call those
> > serialization functions when you want to record to the buffers from logdev.
> 
> Hmm, interesting. But at the mean time, what you describe seems a little
> out of scope with logdev. This doesn't mean that it can't be applied, now
> or later.  But currently, I do use logdev for 90% debugging and 10%
> analyzing.  Perhaps for the analyzing part, this would be useful.  I have
> to admit, I didn't get far trying to convert LTTng to 2.6.18. Didn't have
> the time. Ah, I see you have a patch there now for 2.6.18.  Adding this
> would be good to do.  But unfortunately, my time is currently very limited
> (who's isn't. But mine currently is more limited than it usually is).
> 

Usage of LTTng that I am aware of are not limited to analysis : some users,
Autodesk for instance, use its user space tracing capabilities extensively to
find deadlocks and deadline misses in their video applications. That I have
found is that having both some general overview of the system in the same trace
where the debugging information sits is a very powerful aid to developers.

> When things slow down for me a little, I'll see where you are at, and take
> a look.  Something we can also discuss at the next OLS.
> 

Sure, I'll be glad to discuss about it.

> To logdev, speed of the trace is important, but not that important.
> Accuracy of the trace is the most important.  Originally, I had a single
> buffer, and would use spinlocks to protect it.  All CPUs would share this
> buffer. The reason for this, is I wanted simple code to prove that the
> sequence events really did happen in a certain order.  I just recently
> changed the ring buffer to use a lockless buffer per cpu, but I still
> question it's accuracy. But I guess it does make things faster now.
> 

That's why I directly use the timestamp counter (when synchronized) of the CPUs.
I do not rely on the kernel time base when it is not needed. As I use the
timestamps to merge the events from the multiple buffers, they must be as
accurate as possible.

> >
> > I think it would be great to integrate those infrastructures together so we can
> > easily merge information coming from various sources (markers, logdev, systemTAP
> > scripts, LKET).
> 
> The one argument I have against this, is that some of these have different
> objectives.  Merging too much can dilute the objective of the app.  But I
> do think that a cooperation between the tools would be nice.
> 

Yes, I don't think that it sould become "one" big project, just that each
project should be able to interface with others.

> I know I said I'm staying out of the debate, but I need to ask this
> anyway.  Couldn't LTTng be fully implemented with dynamic traces? And if
> so, then what would be the case, to get that into the kernel, and then
> maintain a separate patch to convert those dynamic traces into static
> onces where performance is critical.  This way, you can get the
> infrastructure into the kernel, and get more eyes on it. Also make the
> patch smaller.
> 

It its current state, LTTng is already splitted into such pieces. The parts
that are the most highly reusable are :

- Code markup mechanism (markers)
- Serialization mechanism (facilities) within probes (ltt-probes kernel
  modules) dynamically connected to markers.
- Tracing control mechanism (ltt-tracer, ltt-control)
- Buffer management mechanism (ltt-relay)

To answer your question, I will distinguish elements of this "dynamic"
term that is so widely used :

* Dynamic probe connexion

LTTng 0.6.0 now supports dynamic probe connexion on the markers. A probe is a
dynamically loadable kernel module. It supports load/unload of these modules.

* Dynamic registration of new events/event record types

LTTng supports such dynamic registration since the 0.5.x series.

* Probe placement

What makes debugging information based probe placement unsuitable as the only
option for LTTng :
- inability to extract all the local variables
- performance impact
- inability to follow the kernel code changes as well as a marker inserted
  in the code itself.

Regards,

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 17:01     ` Mathieu Desnoyers
@ 2006-10-05 18:09       ` Steven Rostedt
  2006-10-05 18:29         ` Daniel Walker
  2006-10-05 20:50         ` Mathieu Desnoyers
  0 siblings, 2 replies; 15+ messages in thread
From: Steven Rostedt @ 2006-10-05 18:09 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Chris Wright, fche, Tom Zanussi

On Thu, 5 Oct 2006, Mathieu Desnoyers wrote:

>
> Usage of LTTng that I am aware of are not limited to analysis : some users,
> Autodesk for instance, use its user space tracing capabilities extensively to
> find deadlocks and deadline misses in their video applications. That I have
> found is that having both some general overview of the system in the same trace
> where the debugging information sits is a very powerful aid to developers.
>

Well, I never said it wasn't good for debugging :-)  But then again, when
someone does an analysis, that can be argued that they are also debugging.
Why analyze when the system works 100% efficiently :-P

> > When things slow down for me a little, I'll see where you are at, and take
> > a look.  Something we can also discuss at the next OLS.
> >
>
> Sure, I'll be glad to discuss about it.

OK, I'll bring a notebook.

>
> > To logdev, speed of the trace is important, but not that important.
> > Accuracy of the trace is the most important.  Originally, I had a single
> > buffer, and would use spinlocks to protect it.  All CPUs would share this
> > buffer. The reason for this, is I wanted simple code to prove that the
> > sequence events really did happen in a certain order.  I just recently
> > changed the ring buffer to use a lockless buffer per cpu, but I still
> > question it's accuracy. But I guess it does make things faster now.
> >
>
> That's why I directly use the timestamp counter (when synchronized) of the CPUs.
> I do not rely on the kernel time base when it is not needed. As I use the
> timestamps to merge the events from the multiple buffers, they must be as
> accurate as possible.

My problem with using a timestamp, is that I ran logdev on too many archs.
So I need to have a timestamp that I can get to that is always reliable.
How does LTTng get the time for different archs?  Does it have separate
code for each arch?

> > I know I said I'm staying out of the debate, but I need to ask this
> > anyway.  Couldn't LTTng be fully implemented with dynamic traces? And if
> > so, then what would be the case, to get that into the kernel, and then
> > maintain a separate patch to convert those dynamic traces into static
> > onces where performance is critical.  This way, you can get the
> > infrastructure into the kernel, and get more eyes on it. Also make the
> > patch smaller.
> >
>
> It its current state, LTTng is already splitted into such pieces. The parts
> that are the most highly reusable are :
>
> - Code markup mechanism (markers)

Is this the static marks everyone is fighting over?

> - Serialization mechanism (facilities) within probes (ltt-probes kernel
>   modules) dynamically connected to markers.

Sorry, I don't really understand what the above is.  Is it a loadable
module that connects to the static markers?  You might need to dumb this
one down for me.

> - Tracing control mechanism (ltt-tracer, ltt-control)

Is this in kernel or tools?

> - Buffer management mechanism (ltt-relay)

So this uses the current relay system?

>
> To answer your question, I will distinguish elements of this "dynamic"
> term that is so widely used :

Ah, terminology. I've never been good at that.  I've programmed in an
Object oriented fashion years before I knew what object oriented was :P

So when I talk about dynamic, I'm really talking about a way to cause a
trigger for tracing inside the kernel, without actually modifying that
kernel source to do so.  So currently today in the vanilla kernel, we can
statically place  a marker  (some macro, like I have in logdev: ldprint)
or something that modifies the binary code or uses some hardware mechanism
to trigger it (such as kprobes, which as of yesterday, logdev does that
too).

The greatest resistance that I currently see with LTTng is the adding of
static trace points.  So if LTTng isn't fully crippled by working with
dynamic addition of trace point (unmodifying the code), then try to get
that in first. See below.

>
> * Dynamic probe connexion
>
> LTTng 0.6.0 now supports dynamic probe connexion on the markers. A probe is a
> dynamically loadable kernel module. It supports load/unload of these modules.

But are the markers still static?  I'm confused her. Not sure what it
means to have a kernel module do a dynamic probe connexion on a marker.
In logdev, I use to have the tracepoints have crazy macros so that I can
load and unload the logdev main module.  But I still needed to have hooks
into the kernel.  I finally got rid of that support and by doing so I
cleaned up logdev quite a bit.

>
> * Dynamic registration of new events/event record types
>
> LTTng supports such dynamic registration since the 0.5.x series.

I feel really stupid!  What do you define as an event, and how would
one add a new one dynamically.

>
> * Probe placement
>
> What makes debugging information based probe placement unsuitable as the only
> option for LTTng :

First thing which is a key point:  "only option"  OK, while reading that
nasty thread, I saw that LTTng can still function when certain features
are not present.  Basically, convert all posible static tracepoints into
dynamic ones and make a code base for that.  Have a patch to convert
critical trace points that are not suitable for performance into static
traces, and also add static traces that were not able to be done by
dynamic ones.  This way you have a functioning LTTng in the kernel (if the
resistance falls by doing this), and still maintaing a patch for a "value
added" to your customers. Perhaps call it "Turbo LTTng" ;-)

> - inability to extract all the local variables

Some comments about the above. This is interesting and not always needed.
As with kgdb, we can't look at all local valiables since gcc may optimize
them out.  But I've never needed to know a local variable unless I was
debugging that code.  Which usually means that I add my static tracing
with logdev until I find the bug, and then remove the tracing.

So local variable tracing is not a good candidate for a static tracing to
be put in the kernel anyway.

I'm not saying on a dynamic only LTTng, to strip out the static tracing
abilities.  I'm saying that it just wont be using them when brought into
the kernel.  But the patch can still use them, and those that are
debugging the kernel can pull in your patch.

> - performance impact

Yes, good point!  But when people see the problem, they will find a way
to fix in.  In the mean time, those that need the performance, can still
use your "Turbo LTTng".

Also, if the dynamic trace points never fit the needs of LTTng, if LTTng
were in the kernel there would be more pressure on the kernel developers
to add the static trace points in the critical sections.  That is assuming
that the dynamic trace points can't be fixed.

> - inability to follow the kernel code changes as well as a marker inserted
>   in the code itself.

This one I will argue against.  Basically, if you have a static trace
point, it too can be moved around and "broken" by the maintainer of that
code.  There's times that I've submitted patches ignoring the effect it
would have on some accounting code.  Simply because I don't know what that
accounting code does, and didn't care.  The same will happen with the
tracing code.  If the maintainer, doesn't fully understand what is being
traced, and changes the code, they might just break the tracing tool.  Now
someone, like you, will need to submit a patch to fix that static marker.
And here is where we get into the burden part. Because now the maintainer
of the said code needs to make sure that your patch didn't break anything
else.

A dynamic trace point won't ever bother the maintainer.  But it may still
break. When it does, you just fix your dynamic part and go on. No one will
be bothered except the ones that use your stuff.

Also, and this is the cool part (IMHO), this will drive more inovation in
what the kernel can do with the compiler.  Like having a debug setting in
the compiler where the dynamic trace point adder can read the code better
and see what to do with it.  As you mentioned about my logdev reading an
offset.  Have gdb tricks to automatically find things for you.  All your
tools will need is a vmlinux build for the running kernel.

I need to understand how gdb gets its info better, and use that to really
extract things dynamically.

Basically, Mathieu, I want to help you get this into the kernel.  I could
be wrong, since I'm only a spectator, and not really involved on either
side. But I have been reading LKML long enough to have an idea of what it
takes.

If you can modulize LTTng further down. Add non intrusive parts to the
kernel.  If you can make a LTTng functional (but "crippled" due to the
limitations you are saying) and have it doing what the ney-sayers want,
you will have a better time getting it accepted.  Once accepted, it will
be a lot easier to add controversial things than it is to add it before
any of it is accepted.

Just a thought.

Cheers,

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 18:09       ` Steven Rostedt
@ 2006-10-05 18:29         ` Daniel Walker
  2006-10-05 18:38           ` Steven Rostedt
  2006-10-05 20:50         ` Mathieu Desnoyers
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Walker @ 2006-10-05 18:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Thomas Gleixner,
	Karim Yaghmour, Andrew Morton, Chris Wright, fche, Tom Zanussi

On Thu, 2006-10-05 at 14:09 -0400, Steven Rostedt wrote:

> 
> My problem with using a timestamp, is that I ran logdev on too many archs.
> So I need to have a timestamp that I can get to that is always reliable.
> How does LTTng get the time for different archs?  Does it have separate
> code for each arch?
> 

I just got done updating a patchset that exposes the clocksources from
generic time to take low level time stamps.. But even without that you
can just call gettimeofday() directly to get a timestamp .

Daniel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 18:29         ` Daniel Walker
@ 2006-10-05 18:38           ` Steven Rostedt
  2006-10-05 18:49             ` Daniel Walker
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2006-10-05 18:38 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Thomas Gleixner,
	Karim Yaghmour, Andrew Morton, Chris Wright, fche, Tom Zanussi


On Thu, 5 Oct 2006, Daniel Walker wrote:

> On Thu, 2006-10-05 at 14:09 -0400, Steven Rostedt wrote:
>
> >
> > My problem with using a timestamp, is that I ran logdev on too many archs.
> > So I need to have a timestamp that I can get to that is always reliable.
> > How does LTTng get the time for different archs?  Does it have separate
> > code for each arch?
> >
>
> I just got done updating a patchset that exposes the clocksources from
> generic time to take low level time stamps.. But even without that you
> can just call gettimeofday() directly to get a timestamp .
>

unless you're tracing something that his holding the xtime_lock ;-)

-- Steve


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 18:38           ` Steven Rostedt
@ 2006-10-05 18:49             ` Daniel Walker
  2006-10-05 19:39               ` Daniel Walker
  2006-10-05 20:18               ` Mathieu Desnoyers
  0 siblings, 2 replies; 15+ messages in thread
From: Daniel Walker @ 2006-10-05 18:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Thomas Gleixner,
	Karim Yaghmour, Andrew Morton, Chris Wright, fche, Tom Zanussi

On Thu, 2006-10-05 at 14:38 -0400, Steven Rostedt wrote:
> On Thu, 5 Oct 2006, Daniel Walker wrote:
> 
> > On Thu, 2006-10-05 at 14:09 -0400, Steven Rostedt wrote:
> >
> > >
> > > My problem with using a timestamp, is that I ran logdev on too many archs.
> > > So I need to have a timestamp that I can get to that is always reliable.
> > > How does LTTng get the time for different archs?  Does it have separate
> > > code for each arch?
> > >
> >
> > I just got done updating a patchset that exposes the clocksources from
> > generic time to take low level time stamps.. But even without that you
> > can just call gettimeofday() directly to get a timestamp .
> >
> 
> unless you're tracing something that his holding the xtime_lock ;-)

That's part of the reason for the changes that I made to the clocksource
API . It makes it so instrumentation, with other things, can generically
read a low level cycle clock. Like on PPC you would read the
decrementer, and on x86 you would read the TSC . However, the
application has no idea what it's reading.

I submitted one version to LKML already, but I'm planning to submit
another version shortly.

Daniel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 18:49             ` Daniel Walker
@ 2006-10-05 19:39               ` Daniel Walker
  2006-10-05 20:18               ` Mathieu Desnoyers
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Walker @ 2006-10-05 19:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Thomas Gleixner,
	Karim Yaghmour, Andrew Morton, Chris Wright, fche, Tom Zanussi

On Thu, 2006-10-05 at 11:49 -0700, Daniel Walker wrote:

> That's part of the reason for the changes that I made to the clocksource
> API . It makes it so instrumentation, with other things, can generically
> read a low level cycle clock. Like on PPC you would read the
> decrementer, and on x86 you would read the TSC . However, the
> application has no idea what it's reading.

Meant to say PowerPC uses the timebase clocksource. Sorry I've got to
many architectures swirling around.

Daniel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 18:49             ` Daniel Walker
  2006-10-05 19:39               ` Daniel Walker
@ 2006-10-05 20:18               ` Mathieu Desnoyers
  2006-10-05 20:26                 ` Steven Rostedt
  1 sibling, 1 reply; 15+ messages in thread
From: Mathieu Desnoyers @ 2006-10-05 20:18 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Steven Rostedt, LKML, Ingo Molnar, Thomas Gleixner,
	Karim Yaghmour, Andrew Morton, Chris Wright, fche, Tom Zanussi

* Daniel Walker (dwalker@mvista.com) wrote:
> On Thu, 2006-10-05 at 14:38 -0400, Steven Rostedt wrote:
> > On Thu, 5 Oct 2006, Daniel Walker wrote:
> > 
> > > On Thu, 2006-10-05 at 14:09 -0400, Steven Rostedt wrote:
> > >
> > > >
> > > > My problem with using a timestamp, is that I ran logdev on too many archs.
> > > > So I need to have a timestamp that I can get to that is always reliable.
> > > > How does LTTng get the time for different archs?  Does it have separate
> > > > code for each arch?
> > > >
> > >
> > > I just got done updating a patchset that exposes the clocksources from
> > > generic time to take low level time stamps.. But even without that you
> > > can just call gettimeofday() directly to get a timestamp .
> > >
> > 
> > unless you're tracing something that his holding the xtime_lock ;-)
> 
> That's part of the reason for the changes that I made to the clocksource
> API . It makes it so instrumentation, with other things, can generically
> read a low level cycle clock. Like on PPC you would read the
> decrementer, and on x86 you would read the TSC . However, the
> application has no idea what it's reading.
> 
> I submitted one version to LKML already, but I'm planning to submit
> another version shortly.
> 

Just as a detail : LTTng traces NMI, which can happen on top of a
xtime_lock. So yes, I have to consider the impact of this kind of lock when I
choose my time source, which is currently a per architecture TSC read,
or a read of the jiffies counter when the architecture does not have a
synchronised TSC over the CPUs. This is abstracted in include/asm-*/ltt.h.

I know it doesn't support dynamic ticks, I'm working on using the HRtimers
instead, but I must make sure that the seqlock read will fail if it nests over
a write seqlock.

MAthieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 20:18               ` Mathieu Desnoyers
@ 2006-10-05 20:26                 ` Steven Rostedt
  2006-10-05 20:31                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2006-10-05 20:26 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Daniel Walker, LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour,
	Andrew Morton, Chris Wright, fche, Tom Zanussi


On Thu, 5 Oct 2006, Mathieu Desnoyers wrote:

>
> Just as a detail : LTTng traces NMI, which can happen on top of a
> xtime_lock. So yes, I have to consider the impact of this kind of lock when I
> choose my time source, which is currently a per architecture TSC read,
> or a read of the jiffies counter when the architecture does not have a
> synchronised TSC over the CPUs. This is abstracted in include/asm-*/ltt.h.
>

I'm curious.  How do you show the interactions between two CPUs when the
TSC isn't in sync?  Using jiffies is not fast enough to know the order of
events that happen within usecs.

-- Steve


> I know it doesn't support dynamic ticks, I'm working on using the HRtimers
> instead, but I must make sure that the seqlock read will fail if it nests over
> a write seqlock.
>
> MAthieu
>
> OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
> Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 20:26                 ` Steven Rostedt
@ 2006-10-05 20:31                   ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2006-10-05 20:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Walker, LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour,
	Andrew Morton, Chris Wright, fche, Tom Zanussi

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> On Thu, 5 Oct 2006, Mathieu Desnoyers wrote:
> 
> >
> > Just as a detail : LTTng traces NMI, which can happen on top of a
> > xtime_lock. So yes, I have to consider the impact of this kind of lock when I
> > choose my time source, which is currently a per architecture TSC read,
> > or a read of the jiffies counter when the architecture does not have a
> > synchronised TSC over the CPUs. This is abstracted in include/asm-*/ltt.h.
> >
> 
> I'm curious.  How do you show the interactions between two CPUs when the
> TSC isn't in sync?  Using jiffies is not fast enough to know the order of
> events that happen within usecs.
> 

I shift the jiffies and OR that with a logical clock which increments atomically
and is shared across the CPUs. It is slow and ugly, but it works. :)

Mathieu

> -- Steve
> 
> 
> > I know it doesn't support dynamic ticks, I'm working on using the HRtimers
> > instead, but I must make sure that the seqlock read will fail if it nests over
> > a write seqlock.
> >
> > MAthieu
> >
> > OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
> > Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> >
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 18:09       ` Steven Rostedt
  2006-10-05 18:29         ` Daniel Walker
@ 2006-10-05 20:50         ` Mathieu Desnoyers
  2006-10-05 21:28           ` Steven Rostedt
  1 sibling, 1 reply; 15+ messages in thread
From: Mathieu Desnoyers @ 2006-10-05 20:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Chris Wright, fche, Tom Zanussi

* Steven Rostedt (rostedt@goodmis.org) wrote:
> My problem with using a timestamp, is that I ran logdev on too many archs.
> So I need to have a timestamp that I can get to that is always reliable.
> How does LTTng get the time for different archs?  Does it have separate
> code for each arch?
> 
See my answer to Daniel Walker on this one.

> > It its current state, LTTng is already splitted into such pieces. The parts
> > that are the most highly reusable are :
> >
> > - Code markup mechanism (markers)
> 
> Is this the static marks everyone is fighting over?
> 

I consider that the previous discussion let to a general concensus where it has
been recognised that marking the code is generally acceptable and needed. As a
personal initiative following this discussion, I proposed a marker mechanism,
iterated for 2 weeks, and it is now at the 0.20 release. It went through about 3
complete rewrites during the process, but it seems that most objections has been
answered.

This week, I ported LTTng to the marker mechanism by creating "probes", which
are the dynamically loadable modules that connects to the markers.

> > - Serialization mechanism (facilities) within probes (ltt-probes kernel
> >   modules) dynamically connected to markers.
> 
> Sorry, I don't really understand what the above is.  Is it a loadable
> module that connects to the static markers?  You might need to dumb this
> one down for me.
>

Ok,

A marker, in my implementation, is a statement placed in the code, i.e. :

MARK(kernel_sched_schedule, "%d %d %ld", prev->pid, next->pid, prev->state);

A probe is a dynamically loadable module which implements a callback, i.e. :

#define KERNEL_SCHED_SCHEDULE_FORMAT "%d %d %ld"
void probe_kernel_sched_schedule(const char *format, ...)
{
        va_list ap;
        /* Declare args */
        int prev_pid, next_pid;
        long state;

        /* Assign args */
        va_start(ap, format);
        prev_pid = va_arg(ap, typeof(prev_pid));
        next_pid = va_arg(ap, typeof(next_pid));
        state = va_arg(ap, typeof(state));

        /* Call tracer */
        trace_process_schedchange(prev_pid, next_pid, state);

        va_end(ap);
}

Which, in my case, takes the variable arguments of the marker call and gives
them to my inline tracing function.

trace_process_schedchange is the "serialization" mechanism which takes its input
(arguments) and writes them in a event record in the buffers, dealing with
reentrancy.

> 
> > - Tracing control mechanism (ltt-tracer, ltt-control)
> 
> Is this in kernel or tools?
> 

I am absolutely not talking about the user space tools here, only kernel code.
This would be another discussion :)

In fact, ltt-control is the netlink interface that helps user space controlling
the tracer. It's not mandatory : the tracer can be controlled from within the
kernel too (useful for embedded systems).

> > - Buffer management mechanism (ltt-relay)
> 
> So this uses the current relay system?
> 

Yes, but I do my own synchronization. I implement my own reserve/commit and I
also export my own ioctl and poll to communicate with the user space buffer
reading daemon.

> The greatest resistance that I currently see with LTTng is the adding of
> static trace points.  So if LTTng isn't fully crippled by working with
> dynamic addition of trace point (unmodifying the code), then try to get
> that in first. See below.
> 

My first goal is to have the infrastructure in, without the instrumentation. And
yes, I want to connect this infrastructure with a nice dynamic instrumentation
tool like logdev or systemtap, as it will give fast usability of the
infrastructure to a user base.

> >
> > * Dynamic probe connexion
> >
> > LTTng 0.6.0 now supports dynamic probe connexion on the markers. A probe is a
> > dynamically loadable kernel module. It supports load/unload of these modules.
> 
> But are the markers still static?  I'm confused her. Not sure what it
> means to have a kernel module do a dynamic probe connexion on a marker.
> In logdev, I use to have the tracepoints have crazy macros so that I can
> load and unload the logdev main module.  But I still needed to have hooks
> into the kernel.  I finally got rid of that support and by doing so I
> cleaned up logdev quite a bit.
> 

Yes, a marker is a low performance impact hook in the kernel (I jump over the
call when the marker is disabled).

> 
> >
> > * Dynamic registration of new events/event record types
> >
> > LTTng supports such dynamic registration since the 0.5.x series.
> 
> I feel really stupid!  What do you define as an event, and how would
> one add a new one dynamically.
> 

Please don't :) Defining a new event would be to say :

I want to create an event named "schedchange", which belongs to the "kernel"
subsystem. In its definition, I say that it will take two integers and a long,
respectively names "prev_pid", "next_pid" and "state".

We can think of various events for various subsystems, and even for modules. It
becomes interesting to have dynamically loadable event definitions which gets
loaded with kernel modules. The "description" of the events is saved in the
trace, in a special low traffic channel (small buffers with a separate file).

> >
> > * Probe placement
> >
> > What makes debugging information based probe placement unsuitable as the only
> > option for LTTng :
> 
> First thing which is a key point:  "only option"  OK, while reading that
> nasty thread, I saw that LTTng can still function when certain features
> are not present.  Basically, convert all posible static tracepoints into
> dynamic ones and make a code base for that.  Have a patch to convert
> critical trace points that are not suitable for performance into static
> traces, and also add static traces that were not able to be done by
> dynamic ones.  This way you have a functioning LTTng in the kernel (if the
> resistance falls by doing this), and still maintaing a patch for a "value
> added" to your customers. Perhaps call it "Turbo LTTng" ;-)
> 

I won't try to convert all the existing "marker" based trace points into
dynamic ones, as I see no real use of it in the long run. However, I would
really like to see a kprobe based instrumentation being a little more nicely
integrated with LTTng (and it is not hard to do!).

> Basically, Mathieu, I want to help you get this into the kernel.  I could
> be wrong, since I'm only a spectator, and not really involved on either
> side. But I have been reading LKML long enough to have an idea of what it
> takes.
> 
> If you can modulize LTTng further down. Add non intrusive parts to the
> kernel.  If you can make a LTTng functional (but "crippled" due to the
> limitations you are saying) and have it doing what the ney-sayers want,
> you will have a better time getting it accepted.  Once accepted, it will
> be a lot easier to add controversial things than it is to add it before
> any of it is accepted.
> 

Yes, that's why I am splitting my projects in parts "markers, tracer, control,
facilities, ..." and plan to keep the most intrusives as an external patchset
for the moment. Anyway, the marker-probe mechanism lets me put all the
serialization code inside probes external to the kernel, which can be connected
either with the "marker" mechanism or with kprobes, it doesn't matter.

Thanks for your hints,

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 20:50         ` Mathieu Desnoyers
@ 2006-10-05 21:28           ` Steven Rostedt
  2006-10-06  1:29             ` Mathieu Desnoyers
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2006-10-05 21:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Chris Wright, fche, Tom Zanussi



On Thu, 5 Oct 2006, Mathieu Desnoyers wrote:

> >
> > Is this the static marks everyone is fighting over?
> >
>
> I consider that the previous discussion let to a general concensus where it has
> been recognised that marking the code is generally acceptable and needed. As a
> personal initiative following this discussion, I proposed a marker mechanism,
> iterated for 2 weeks, and it is now at the 0.20 release. It went through about 3
> complete rewrites during the process, but it seems that most objections has been
> answered.

Currently at 5932 unread messages, and I'm still reading that thread :-)

A little birdy pointed me to http://lwn.net/Articles/200059/ which talks
about this.  Good to see it.  I should have read the rest of the thread
before posting, but this was originally about my logdev, and I got
excited.

>
> This week, I ported LTTng to the marker mechanism by creating "probes", which
> are the dynamically loadable modules that connects to the markers.

Cool.

>
> > > - Serialization mechanism (facilities) within probes (ltt-probes kernel
> > >   modules) dynamically connected to markers.
> >
> > Sorry, I don't really understand what the above is.  Is it a loadable
> > module that connects to the static markers?  You might need to dumb this
> > one down for me.
> >
>
> Ok,
>
> A marker, in my implementation, is a statement placed in the code, i.e. :
>
> MARK(kernel_sched_schedule, "%d %d %ld", prev->pid, next->pid, prev->state);
>
> A probe is a dynamically loadable module which implements a callback, i.e. :
>
>
> #define KERNEL_SCHED_SCHEDULE_FORMAT "%d %d %ld"
> void probe_kernel_sched_schedule(const char *format, ...)
> {
>         va_list ap;
>         /* Declare args */
>         int prev_pid, next_pid;
>         long state;
>
>         /* Assign args */
>         va_start(ap, format);
>         prev_pid = va_arg(ap, typeof(prev_pid));
>         next_pid = va_arg(ap, typeof(next_pid));
>         state = va_arg(ap, typeof(state));
>
>         /* Call tracer */
>         trace_process_schedchange(prev_pid, next_pid, state);
>
>         va_end(ap);
> }
>
> Which, in my case, takes the variable arguments of the marker call and gives
> them to my inline tracing function.
>
> trace_process_schedchange is the "serialization" mechanism which takes its input
> (arguments) and writes them in a event record in the buffers, dealing with
> reentrancy.
>
>

OK, this makes a lot more sense. I still have a ton of questions, but they
are probably answered in the 5932 messages I have yet to read/skim.

> >
> > > - Tracing control mechanism (ltt-tracer, ltt-control)
> >
> > Is this in kernel or tools?
> >
>
> I am absolutely not talking about the user space tools here, only kernel code.
> This would be another discussion :)

I didn't think you were, but I had to be sure.

>
> In fact, ltt-control is the netlink interface that helps user space controlling
> the tracer. It's not mandatory : the tracer can be controlled from within the
> kernel too (useful for embedded systems).
>
> > > - Buffer management mechanism (ltt-relay)
> >
> > So this uses the current relay system?
> >
>
> Yes, but I do my own synchronization. I implement my own reserve/commit and I
> also export my own ioctl and poll to communicate with the user space buffer
> reading daemon.

Sounds like what logdev does too.

>
> > The greatest resistance that I currently see with LTTng is the adding of
> > static trace points.  So if LTTng isn't fully crippled by working with
> > dynamic addition of trace point (unmodifying the code), then try to get
> > that in first. See below.
> >
>
> My first goal is to have the infrastructure in, without the instrumentation. And
> yes, I want to connect this infrastructure with a nice dynamic instrumentation
> tool like logdev or systemtap, as it will give fast usability of the
> infrastructure to a user base.

systemtap would obviously be prefered over my logdev (but I can still
dream ;)


> > > * Dynamic registration of new events/event record types
> > >
> > > LTTng supports such dynamic registration since the 0.5.x series.
> >
> > I feel really stupid!  What do you define as an event, and how would
> > one add a new one dynamically.
> >
>
> Please don't :) Defining a new event would be to say :
>
> I want to create an event named "schedchange", which belongs to the "kernel"
> subsystem. In its definition, I say that it will take two integers and a long,
> respectively names "prev_pid", "next_pid" and "state".
>
> We can think of various events for various subsystems, and even for modules. It
> becomes interesting to have dynamically loadable event definitions which gets
> loaded with kernel modules. The "description" of the events is saved in the
> trace, in a special low traffic channel (small buffers with a separate file).
>

But these events still need the marker in the source code right?


> > >
> > > * Probe placement
> > >
> > > What makes debugging information based probe placement unsuitable as the only
> > > option for LTTng :
> >
> > First thing which is a key point:  "only option"  OK, while reading that
> > nasty thread, I saw that LTTng can still function when certain features
> > are not present.  Basically, convert all posible static tracepoints into
> > dynamic ones and make a code base for that.  Have a patch to convert
> > critical trace points that are not suitable for performance into static
> > traces, and also add static traces that were not able to be done by
> > dynamic ones.  This way you have a functioning LTTng in the kernel (if the
> > resistance falls by doing this), and still maintaing a patch for a "value
> > added" to your customers. Perhaps call it "Turbo LTTng" ;-)
> >
>
> I won't try to convert all the existing "marker" based trace points into
> dynamic ones, as I see no real use of it in the long run. However, I would
> really like to see a kprobe based instrumentation being a little more nicely
> integrated with LTTng (and it is not hard to do!).

That should definitely be a step. If I'm understanding this (which I may
not be), you can have a dynamic event added with also using dynamic
trace points like kprobes.

>
> > Basically, Mathieu, I want to help you get this into the kernel.  I could
> > be wrong, since I'm only a spectator, and not really involved on either
> > side. But I have been reading LKML long enough to have an idea of what it
> > takes.
> >
> > If you can modulize LTTng further down. Add non intrusive parts to the
> > kernel.  If you can make a LTTng functional (but "crippled" due to the
> > limitations you are saying) and have it doing what the ney-sayers want,
> > you will have a better time getting it accepted.  Once accepted, it will
> > be a lot easier to add controversial things than it is to add it before
> > any of it is accepted.
> >
>
> Yes, that's why I am splitting my projects in parts "markers, tracer, control,
> facilities, ..." and plan to keep the most intrusives as an external patchset
> for the moment. Anyway, the marker-probe mechanism lets me put all the
> serialization code inside probes external to the kernel, which can be connected
> either with the "marker" mechanism or with kprobes, it doesn't matter.
>
> Thanks for your hints,

No prob, I should read the rest of the thread, and try to catch up more,
before posting more comments.

Later,

-- Steve


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] The New and Improved Logdev (now with kprobes!)
  2006-10-05 21:28           ` Steven Rostedt
@ 2006-10-06  1:29             ` Mathieu Desnoyers
  0 siblings, 0 replies; 15+ messages in thread
From: Mathieu Desnoyers @ 2006-10-06  1:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Karim Yaghmour, Andrew Morton,
	Chris Wright, fche, Tom Zanussi

* Steven Rostedt (rostedt@goodmis.org) wrote:
 >
> > Please don't :) Defining a new event would be to say :
> >
> > I want to create an event named "schedchange", which belongs to the "kernel"
> > subsystem. In its definition, I say that it will take two integers and a long,
> > respectively names "prev_pid", "next_pid" and "state".
> >
> > We can think of various events for various subsystems, and even for modules. It
> > becomes interesting to have dynamically loadable event definitions which gets
> > loaded with kernel modules. The "description" of the events is saved in the
> > trace, in a special low traffic channel (small buffers with a separate file).
> >
> 
> But these events still need the marker in the source code right?
> 

Yes, but not necessarily. But it could also be a kernel module built
on-the-fly by a generator like SystemTAP which defines new events.

> 
> That should definitely be a step. If I'm understanding this (which I may
> not be), you can have a dynamic event added with also using dynamic
> trace points like kprobes.
> 

Yes, we could use a kprobes based approach with only one data type (string is
always a good example), but we could also define one specific event and its
associates data types for each probes.


> No prob, I should read the rest of the thread, and try to catch up more,
> before posting more comments.
> 

No problem, constructive comments and ideas are always welcome.


Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-10-06  1:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-05  5:11 [RFC] The New and Improved Logdev (now with kprobes!) Steven Rostedt
2006-10-05 14:31 ` Mathieu Desnoyers
2006-10-05 15:49   ` Steven Rostedt
2006-10-05 17:01     ` Mathieu Desnoyers
2006-10-05 18:09       ` Steven Rostedt
2006-10-05 18:29         ` Daniel Walker
2006-10-05 18:38           ` Steven Rostedt
2006-10-05 18:49             ` Daniel Walker
2006-10-05 19:39               ` Daniel Walker
2006-10-05 20:18               ` Mathieu Desnoyers
2006-10-05 20:26                 ` Steven Rostedt
2006-10-05 20:31                   ` Mathieu Desnoyers
2006-10-05 20:50         ` Mathieu Desnoyers
2006-10-05 21:28           ` Steven Rostedt
2006-10-06  1:29             ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox