LTT user input

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* LTT user input
@ 2004-07-22 20:47 zanussi
  2004-07-23 10:01 ` Roger Luethi
  2004-07-28  2:48 ` Todd Poynor
  0 siblings, 2 replies; 13+ messages in thread
From: zanussi @ 2004-07-22 20:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: karim, richardj_moore, bob, michel.dagenais

Hi,

One of the things people mentioned wanting to see during Karim's LTT
talk at the Kernel Summit was cases where LTT had been useful to real
users.  Here are some examples culled from the ltt/ltt-dev mailing
lists:

http://www.listserv.shafik.org/pipermail/ltt/2004-July/000631.html
http://www.listserv.shafik.org/pipermail/ltt/2004-July/000630.html
http://www.listserv.shafik.org/pipermail/ltt/2004-July/000629.html
http://www.listserv.shafik.org/pipermail/ltt/2004-March/000559.html
http://www.listserv.shafik.org/pipermail/ltt/2003-April/000341.html
http://www.listserv.shafik.org/pipermail/ltt/2002-April/000199.html
http://www.listserv.shafik.org/pipermail/ltt/2001-December/000118.html
http://www.listserv.shafik.org/pipermail/ltt/2001-July/000064.html
http://www.listserv.shafik.org/pipermail/ltt/2001-April/000020.html

As with most other tools, we don't tend to hear from users unless they
have problems with the tool. :-( LTT has also been picked up by
Debian, SuSE, and MontaVista - maybe they have user input that we
don't get to see as well...

Another thing that came up was the impression that the overhead of
tracing is too high.  I'm not sure where the number mentioned (5%)
came from, but the peformance numbers we generated for the relayfs OLS
paper last year, using LTT as a test case, were 1.40% when tracing
everything but having the userspace daemon discard the transferred
data and 2.01% when tracing everything and having the daemon write all
data to disk.

The test system was a 4-way 700MHz Pentium III system, tracing all
event types (syscall entry/exit, interrupt entry/exit, trap
entry/exit, scheduling changes, kernel timer, softirq, process,
filesystem, memory management, socket, ipc, network device).  For each
number, we ran 10 kernel compiles while tracing.  Each 10-compile run
generated about 200 million events comprising about 2 gigabytes.

Tom

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-22 20:47 LTT user input zanussi
@ 2004-07-23 10:01 ` Roger Luethi
  2004-07-23 17:34   ` zanussi
  2004-07-28  2:48 ` Todd Poynor
  1 sibling, 1 reply; 13+ messages in thread
From: Roger Luethi @ 2004-07-23 10:01 UTC (permalink / raw)
  To: zanussi; +Cc: linux-kernel, karim, richardj_moore, bob, michel.dagenais

On Thu, 22 Jul 2004 15:47:03 -0500, zanussi@us.ibm.com wrote:
> One of the things people mentioned wanting to see during Karim's LTT
> talk at the Kernel Summit was cases where LTT had been useful to real
> users.  Here are some examples culled from the ltt/ltt-dev mailing
> lists:
[...]
> Another thing that came up was the impression that the overhead of
> tracing is too high.  I'm not sure where the number mentioned (5%)

The examples you mentioned confirm what Andrew mentioned recently:
What little public evidence there is comes from developers trying
to understand the kernel or debugging their own applications.

I'd be interested to see examples of how these tools help regular sys
admins or technically inclined users (no Aunt Tillie compatibility
required) -- IMO that would go a long way to make a case for inclusion [1].

Another concern raised at the summit (and what I am personally most
concerned about) is the overlap in all the frameworks that add logging
hooks for all kinds of purposes: auditing, performance, user level
debugging, etc.

Out of mainline examples that have been around for a while include:

- systrace http://niels.xtdnet.nl/systrace/
- syscalltrack http://syscalltrack.sourceforge.net/
- LTT http://www.opersys.com/LTT/

I wonder if a basic framework that can serve more than one purpose
makes sense.

When considering which tracing functionlity should be in mainline,
performance measurments for user-space come in pretty much at the
bottom of my list: Questions like "which process is overwriting this
config file behind my back" seem a lot more common and more likely to
be asked by people not willing or capable of compiling a patched kernel
for that purpose. And tools that are useful for kernel developers (while
unpopular with the powers that be) are nice to have in mainline because
as a kernel hacker, you often _have_ to debug the latest kernel for
which your favorite debug tool is not working yet. An argument for
adding security auditing to mainline is that it helps convince the
conservative and cautious security folks that the functionality is
accepted and here to stay.

None of these arguments apply for LTT as it presents itself: If you
are debugging or tuning a multi-threaded user space app or trying to
understand the kernel, patching some kernel supported by the respective
tool should hardly be a problem.

Please note that I just compared the relative merits of merging various
kinds of tracing functionality into mainline. I did not argue in favor
or against the inclusion of LTT-type functionality.

My point is that the best bet for tools that seem to aim at user-space
performance debugging is to demonstrate how they can be useful for a
wider audience, or to hitch a ride with a framework that does appeal
to a wider audience.

Roger

[1] You could take a page from how DTrace was introduced:
    http://www.sun.com/bigadmin/content/dtrace/
    Or take a look at:
    http://syscalltrack.sourceforge.net/when.html
    http://syscalltrack.sourceforge.net/examples.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 10:01 ` Roger Luethi
@ 2004-07-23 17:34   ` zanussi
  2004-07-23 19:19     ` Roger Luethi
  0 siblings, 1 reply; 13+ messages in thread
From: zanussi @ 2004-07-23 17:34 UTC (permalink / raw)
  To: Roger Luethi
  Cc: zanussi, linux-kernel, karim, richardj_moore, bob,
	michel.dagenais

Roger Luethi writes:
 > On Thu, 22 Jul 2004 15:47:03 -0500, zanussi@us.ibm.com wrote:
 > > One of the things people mentioned wanting to see during Karim's LTT
 > > talk at the Kernel Summit was cases where LTT had been useful to real
 > > users.  Here are some examples culled from the ltt/ltt-dev mailing
 > > lists:
 > [...]
 > > Another thing that came up was the impression that the overhead of
 > > tracing is too high.  I'm not sure where the number mentioned (5%)
 > 
 > The examples you mentioned confirm what Andrew mentioned recently:
 > What little public evidence there is comes from developers trying
 > to understand the kernel or debugging their own applications.
 > 
 > I'd be interested to see examples of how these tools help regular sys
 > admins or technically inclined users (no Aunt Tillie compatibility
 > required) -- IMO that would go a long way to make a case for inclusion [1].
 > 
 > Another concern raised at the summit (and what I am personally most
 > concerned about) is the overlap in all the frameworks that add logging
 > hooks for all kinds of purposes: auditing, performance, user level
 > debugging, etc.
 > 
 > Out of mainline examples that have been around for a while include:
 > 
 > - systrace http://niels.xtdnet.nl/systrace/
 > - syscalltrack http://syscalltrack.sourceforge.net/
 > - LTT http://www.opersys.com/LTT/
 > 
 > I wonder if a basic framework that can serve more than one purpose
 > makes sense.
 > 

I agree that it would make sense for all these tools to at least share
a common set of hooks in the kernel; it would be great if a single
framework could serve them all too.  The question at the summit was
'why not use the auditing framework for tracing?'.  I haven't had a
chance to look much at the code, but the performance numbers published
for tracing syscalls using the auditing framework aren't encouraging
for an application as intensive as tracing the entire system, as LTT
does.

http://marc.theaimsgroup.com/?l=linux-kernel&m=107826445023282&w=2


 > When considering which tracing functionlity should be in mainline,
 > performance measurments for user-space come in pretty much at the
 > bottom of my list: Questions like "which process is overwriting this
 > config file behind my back" seem a lot more common and more likely to
 > be asked by people not willing or capable of compiling a patched kernel
 > for that purpose. And tools that are useful for kernel developers (while
 > unpopular with the powers that be) are nice to have in mainline because
 > as a kernel hacker, you often _have_ to debug the latest kernel for
 > which your favorite debug tool is not working yet. An argument for
 > adding security auditing to mainline is that it helps convince the
 > conservative and cautious security folks that the functionality is
 > accepted and here to stay.
 > 

OK, so peformance isn't that important for your application, but for
LTT it is, the idea being that tracing the system should disrupt it as
little as possible and be able to deal with large numbers of events
efficiently.  That's also why the base LTT tracer doesn't do things in
the kernel that some of these other tools do, such as filtering on
param values for instance.  That type of filtering in the kernel can
however be done using the dynamic tracepoints provided by dprobes,
which can conditionally log data into the LTT data stream.  There's
even a C compiler that allows you to define your probes in C and
access arbitrary kernel data symbolically, including function params
and locals.

 > None of these arguments apply for LTT as it presents itself: If you
 > are debugging or tuning a multi-threaded user space app or trying to
 > understand the kernel, patching some kernel supported by the respective
 > tool should hardly be a problem.
 > 
 > Please note that I just compared the relative merits of merging various
 > kinds of tracing functionality into mainline. I did not argue in favor
 > or against the inclusion of LTT-type functionality.
 > 
 > My point is that the best bet for tools that seem to aim at user-space
 > performance debugging is to demonstrate how they can be useful for a
 > wider audience, or to hitch a ride with a framework that does appeal
 > to a wider audience.
 > 
 > Roger
 > 
 > [1] You could take a page from how DTrace was introduced:
 >     http://www.sun.com/bigadmin/content/dtrace/

Yes, dtrace is interesting.  It has a lot of bells and whistles, but
the basic architecture seems very similar to the pieces we already
have and have had for awhile:

- basic infrastructure (LTT)
- static tracepoints via something like kernel hooks
  (http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/)
- dynamic tracepoints via something like dprobes
  (http://www-124.ibm.com/developerworks/oss/linux/projects/dprobes/)
- low-level probe language something like dprobes' rpn language
- high-level probe language something like the dprobes C compiler

I too would like to have a polished 400 page manual with copious usage
examples but there are only so many hours in the day... ;-)

 >     Or take a look at:
 >     http://syscalltrack.sourceforge.net/when.html
 >     http://syscalltrack.sourceforge.net/examples.html

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 17:34   ` zanussi
@ 2004-07-23 19:19     ` Roger Luethi
  2004-07-23 20:44       ` zanussi
  2004-07-23 22:40       ` Robert Wisniewski
  0 siblings, 2 replies; 13+ messages in thread
From: Roger Luethi @ 2004-07-23 19:19 UTC (permalink / raw)
  To: zanussi; +Cc: linux-kernel, karim, richardj_moore, bob, michel.dagenais

On Fri, 23 Jul 2004 12:34:19 -0500, zanussi@us.ibm.com wrote:
> I agree that it would make sense for all these tools to at least share
> a common set of hooks in the kernel; it would be great if a single
> framework could serve them all too.  The question at the summit was
> 'why not use the auditing framework for tracing?'.  I haven't had a
> chance to look much at the code, but the performance numbers published
> for tracing syscalls using the auditing framework aren't encouraging
> for an application as intensive as tracing the entire system, as LTT
> does.
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107826445023282&w=2

Looking for a common base was certainly easier before one tracing
framework got merged. I don't claim to know if a common basic framework
would be beneficial, but I am somewhat amazed that not more effort has
gone into exploring this.

>  > When considering which tracing functionlity should be in mainline,
>  > performance measurments for user-space come in pretty much at the
>  > bottom of my list: Questions like "which process is overwriting this
>  > config file behind my back" seem a lot more common and more likely to
>  > be asked by people not willing or capable of compiling a patched kernel
>  > for that purpose. And tools that are useful for kernel developers (while
>  > unpopular with the powers that be) are nice to have in mainline because
>  > as a kernel hacker, you often _have_ to debug the latest kernel for
>  > which your favorite debug tool is not working yet. An argument for
>  > adding security auditing to mainline is that it helps convince the
>  > conservative and cautious security folks that the functionality is
>  > accepted and here to stay.
>  > 
> 
> OK, so peformance isn't that important for your application, but for

What is important to me is irrelevant. Both Linus and Andrew have stated
that demonstrated usefulness for many people is one key criteria for
merging new stuff.

> LTT it is, the idea being that tracing the system should disrupt it as

That's your problem right there. Nobody cares if LTT is happy. It is
people who matter. LTT users.

> little as possible and be able to deal with large numbers of events
> efficiently.  That's also why the base LTT tracer doesn't do things in
> the kernel that some of these other tools do, such as filtering on
> param values for instance.  That type of filtering in the kernel can

Which seems reasonable. It would be nice though if adding parameter
filters became easier with a basic framework merged.

> even a C compiler that allows you to define your probes in C and
> access arbitrary kernel data symbolically, including function params
> and locals.

Heh, don't tell Linus. You may want to tout other benefits instead.

>  > [1] You could take a page from how DTrace was introduced:
>  >     http://www.sun.com/bigadmin/content/dtrace/
> 
> Yes, dtrace is interesting.  It has a lot of bells and whistles, but
> the basic architecture seems very similar to the pieces we already
> have and have had for awhile:
> 
> - basic infrastructure (LTT)
> - static tracepoints via something like kernel hooks
>   (http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/)
> - dynamic tracepoints via something like dprobes
>   (http://www-124.ibm.com/developerworks/oss/linux/projects/dprobes/)
> - low-level probe language something like dprobes' rpn language
> - high-level probe language something like the dprobes C compiler
> 
> I too would like to have a polished 400 page manual with copious usage
> examples but there are only so many hours in the day... ;-)

What got many people interested in DTrace was hardly a polished 400
page manual. Most of the excitement I've seen was based on one usenet
posting and the Usenix paper.

Here's a challenge: Take the "Introducing DTrace" usenet posting and
let us know how much closer you can get to those results compared to
Linux mainline. Bonus points for explaining which components from your
list quoted above were required for each result. I suspect that merging
whatever might be realistically considered for mainline will not result
in functionality even remotely comparable to DTrace.

Roger

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 19:19     ` Roger Luethi
@ 2004-07-23 20:44       ` zanussi
  2004-07-23 22:06         ` Roger Luethi
  2004-07-23 22:40       ` Robert Wisniewski
  1 sibling, 1 reply; 13+ messages in thread
From: zanussi @ 2004-07-23 20:44 UTC (permalink / raw)
  To: Roger Luethi
  Cc: zanussi, linux-kernel, karim, richardj_moore, bob,
	michel.dagenais

Roger Luethi writes:
 > On Fri, 23 Jul 2004 12:34:19 -0500, zanussi@us.ibm.com wrote:
 > > I agree that it would make sense for all these tools to at least share
 > > a common set of hooks in the kernel; it would be great if a single
 > > framework could serve them all too.  The question at the summit was
 > > 'why not use the auditing framework for tracing?'.  I haven't had a
 > > chance to look much at the code, but the performance numbers published
 > > for tracing syscalls using the auditing framework aren't encouraging
 > > for an application as intensive as tracing the entire system, as LTT
 > > does.
 > > 
 > > http://marc.theaimsgroup.com/?l=linux-kernel&m=107826445023282&w=2
 > 
 > Looking for a common base was certainly easier before one tracing
 > framework got merged. I don't claim to know if a common basic framework
 > would be beneficial, but I am somewhat amazed that not more effort has
 > gone into exploring this.

I didn't know the auditing framework was a tracing framework.  It
certainly doesn't seem light-weight enough for real system tracing,
which was the question.  Are there other frameworks we should consider
tracing on top of?

 > 
 > >  > When considering which tracing functionlity should be in
 > >  > mainline, performance measurments for user-space come in
 > >  > pretty much at the bottom of my list: Questions like "which
 > >  > process is overwriting this config file behind my back" seem a
 > >  > lot more common and more likely to be asked by people not
 > >  > willing or capable of compiling a patched kernel for that
 > >  > purpose. And tools that are useful for kernel developers
 > >  > (while unpopular with the powers that be) are nice to have in
 > >  > mainline because as a kernel hacker, you often _have_ to debug
 > >  > the latest kernel for which your favorite debug tool is not
 > >  > working yet. An argument for adding security auditing to
 > >  > mainline is that it helps convince the conservative and
 > >  > cautious security folks that the functionality is accepted and
 > >  > here to stay.
 > >  > 
 > > 
 > > OK, so peformance isn't that important for your application, but for
 > 
 > What is important to me is irrelevant. Both Linus and Andrew have stated
 > that demonstrated usefulness for many people is one key criteria for
 > merging new stuff.

And where was the 'demonstrated usefulness for many people' of the
auditing framework?

 > 
 > > LTT it is, the idea being that tracing the system should disrupt it as
 > 
 > That's your problem right there. Nobody cares if LTT is happy. It is
 > people who matter. LTT users.
 > 

Right, so LTT is the only potential user of the framework that would
care about performance.  I guess we and anyone else who does can't use
it then.

 > > little as possible and be able to deal with large numbers of events
 > > efficiently.  That's also why the base LTT tracer doesn't do things in
 > > the kernel that some of these other tools do, such as filtering on
 > > param values for instance.  That type of filtering in the kernel can
 > 
 > Which seems reasonable. It would be nice though if adding parameter
 > filters became easier with a basic framework merged.

I don't see why it would be too hard to add to any basic framework.

 > 
 > > even a C compiler that allows you to define your probes in C and
 > > access arbitrary kernel data symbolically, including function params
 > > and locals.
 > 
 > Heh, don't tell Linus. You may want to tout other benefits instead.

Well, this is what DTrace does too and in almost exactly the same way,
using an in-kernel interpreter similar to a stripped-down JVM where
nothing malicious can get out and alter the system.  It's basically
where all the 'magic' of DTrace happens.  I know, trying to get
something like this into mainline would be a hard sell, but if you
know of anything less scary that would let us do thing as exciting as
DTrace does, let me know...

 > 
 > >  > [1] You could take a page from how DTrace was introduced:
 > >  >     http://www.sun.com/bigadmin/content/dtrace/
 > > 
 > > Yes, dtrace is interesting.  It has a lot of bells and whistles, but
 > > the basic architecture seems very similar to the pieces we already
 > > have and have had for awhile:
 > > 
 > > - basic infrastructure (LTT)
 > > - static tracepoints via something like kernel hooks
 > >   (http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/)
 > > - dynamic tracepoints via something like dprobes
 > >   (http://www-124.ibm.com/developerworks/oss/linux/projects/dprobes/)
 > > - low-level probe language something like dprobes' rpn language
 > > - high-level probe language something like the dprobes C compiler
 > > 
 > > I too would like to have a polished 400 page manual with copious usage
 > > examples but there are only so many hours in the day... ;-)
 > 
 > What got many people interested in DTrace was hardly a polished 400
 > page manual. Most of the excitement I've seen was based on one usenet
 > posting and the Usenix paper.
 > 
 > Here's a challenge: Take the "Introducing DTrace" usenet posting and
 > let us know how much closer you can get to those results compared to
 > Linux mainline. Bonus points for explaining which components from your
 > list quoted above were required for each result. I suspect that merging
 > whatever might be realistically considered for mainline will not result
 > in functionality even remotely comparable to DTrace.
 > 
 > Roger

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 20:44       ` zanussi
@ 2004-07-23 22:06         ` Roger Luethi
  2004-09-01 16:36           ` zanussi
  0 siblings, 1 reply; 13+ messages in thread
From: Roger Luethi @ 2004-07-23 22:06 UTC (permalink / raw)
  To: zanussi; +Cc: linux-kernel, karim, richardj_moore, bob, michel.dagenais

On Fri, 23 Jul 2004 15:44:19 -0500, zanussi@us.ibm.com wrote:
> I didn't know the auditing framework was a tracing framework.  It
> certainly doesn't seem light-weight enough for real system tracing,
> which was the question.  Are there other frameworks we should consider
> tracing on top of?

I haven't looked at any of these frameworks closely enough to answer
that. My comments were largely based on the observation that there are
several interesting projects that instrument the kernel (typically system
calls) to log information: auditing, performance, or something else.

All of them seem useful, but we can't keep adding hooks for each purpose.
It's like what we had before LSM (in a different area).

>  > What is important to me is irrelevant. Both Linus and Andrew have stated
>  > that demonstrated usefulness for many people is one key criteria for
>  > merging new stuff.
> 
> And where was the 'demonstrated usefulness for many people' of the
> auditing framework?

Well, it's _one_ key criteria. I suspect in this case the decisive
factor was rather the desire to please certain institutions that won't
consider an OS if it can't spy on its users <g>. But I'm making this up,
I'm sure someone remembers the real answer.

Quite frankly, I couldn't care less about auditing. I am much more
interested in tools that help me track down problems. Dprobes and LTT
do look promising. Then again, so did devfs.

>  > That's your problem right there. Nobody cares if LTT is happy. It is
>  > people who matter. LTT users.
> 
> Right, so LTT is the only potential user of the framework that would
> care about performance.  I guess we and anyone else who does can't use
> it then.

No reason to be sarcastic. I didn't say nobody uses it. But those users
aren't exactly highly visible, either.

If you want a textbook example of how to spectacularly fail on this
very issue, recall the LKCD flame war (a couple of years ago?).

> Well, this is what DTrace does too and in almost exactly the same way,
> using an in-kernel interpreter similar to a stripped-down JVM where
> nothing malicious can get out and alter the system.  It's basically
> where all the 'magic' of DTrace happens.  I know, trying to get
> something like this into mainline would be a hard sell, but if you
> know of anything less scary that would let us do thing as exciting as
> DTrace does, let me know...

Heh, that's your job :-). Given that a Java/FORTH/whatever interpreter
is unlikely to be merged into mainline anytime soon, what excitement
can we still offer with the complex stuff living in user space?

Even if your goal is to beat DTrace eventually, you need to sell patches
on their own merits, not based on what we could do in some unlikely or
distant future. DTrace is a red herring, more interesting is what we
can do with, say, basic LTT infrastructure, or dprobes, etc.

Roger

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 19:19     ` Roger Luethi
  2004-07-23 20:44       ` zanussi
@ 2004-07-23 22:40       ` Robert Wisniewski
  2004-07-23 23:45         ` Roger Luethi
  1 sibling, 1 reply; 13+ messages in thread
From: Robert Wisniewski @ 2004-07-23 22:40 UTC (permalink / raw)
  To: Roger Luethi
  Cc: zanussi, linux-kernel, karim, richardj_moore, bob,
	michel.dagenais

Roger Luethi writes:
 > On Fri, 23 Jul 2004 12:34:19 -0500, zanussi@us.ibm.com wrote:
 > > I agree that it would make sense for all these tools to at least share
 > > a common set of hooks in the kernel; it would be great if a single
 > > framework could serve them all too.  The question at the summit was
 > > 'why not use the auditing framework for tracing?'.  I haven't had a
 > > chance to look much at the code, but the performance numbers published
 > > for tracing syscalls using the auditing framework aren't encouraging
 > > for an application as intensive as tracing the entire system, as LTT
 > > does.
 > > 
 > > http://marc.theaimsgroup.com/?l=linux-kernel&m=107826445023282&w=2
 > 
 > Looking for a common base was certainly easier before one tracing
 > framework got merged. I don't claim to know if a common basic framework
 > would be beneficial, but I am somewhat amazed that not more effort has
 > gone into exploring this.

Argh.  I had up to this point been passively following this thread because
a while ago, prior to dtrace and other such work I, Karim, and others
invested quite of bit of effort and time responding to this group pointing
out the benefits of performance monitoring via tracing and

IN FACT this was exactly one of the points I ardently made.  Having each
subsystem set up their own monitoring was not only counter productive in
terms of time and implementation effort, but prevented a unified view of
performance from being achieved.  Nevertheless, it appears that some
subsystem tracing has been incorporated, though tbh I have not followed as
closely recently.

LTT and relayfs offered the best performing, most comprehensive solution,
and was reasonably unintrusive.  The work was integrated with dprodes,
allowing dynamic insertion and the zero cost non-monitored overhead
proclaimed by dtrace.  As Karim has pointed out in previous posts, though
the technical concerns that were raised were addressed, it didn't seem to
help as other nits would crop up appearing to imply that something else was
happening.  If indeed the remaining issue is whether there is a benefit to
a performance monitoring infrastructure, then I wonder how you would
interpret reactions to dtrace.

Robert Wisniewski
The K42 MP OS Project
IBM T.J. Watson Research Center
http://www.research.ibm.com/K42/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 22:40       ` Robert Wisniewski
@ 2004-07-23 23:45         ` Roger Luethi
  2004-07-25 19:58           ` Karim Yaghmour
  0 siblings, 1 reply; 13+ messages in thread
From: Roger Luethi @ 2004-07-23 23:45 UTC (permalink / raw)
  To: Robert Wisniewski
  Cc: zanussi, linux-kernel, karim, richardj_moore, michel.dagenais

On Fri, 23 Jul 2004 18:40:26 -0400, Robert Wisniewski wrote:
>  > Looking for a common base was certainly easier before one tracing
>  > framework got merged. I don't claim to know if a common basic framework
>  > would be beneficial, but I am somewhat amazed that not more effort has
>  > gone into exploring this.
> 
> Argh.  I had up to this point been passively following this thread because
> a while ago, prior to dtrace and other such work I, Karim, and others
> invested quite of bit of effort and time responding to this group pointing
> out the benefits of performance monitoring via tracing and
> 
> IN FACT this was exactly one of the points I ardently made.  Having each
> subsystem set up their own monitoring was not only counter productive in
> terms of time and implementation effort, but prevented a unified view of
> performance from being achieved.  Nevertheless, it appears that some

This may be somewhat of a misunderstanding: You seem to be talking about
a unified framework for performance monitoring -- something I silently
assumed should be the case, while the discussion here was about various
forms of logging -- with performance monitoring being one of them.

So the question is (again, this is an issue that has been raised at the
kernel summit as well): Is there some overlap between those various
frameworks? Or do we really need completely separate frameworks for
logging time stamps (performance), auditing information, etc.?

> proclaimed by dtrace.  As Karim has pointed out in previous posts, though
> the technical concerns that were raised were addressed, it didn't seem to
> help as other nits would crop up appearing to imply that something else was
> happening.

My postings were motivated by my personal interest in better tracing
and monitoring facilities. However, I'm getting LKCD flashbacks when
reading your arguments. Which doesn't bode well.

> If indeed the remaining issue is whether there is a benefit to
> a performance monitoring infrastructure, then I wonder how you would
> interpret reactions to dtrace.

DTrace is not a performance monitoring infrastructure, so what's your
point? -- But let's assume for the sake of argument that LTT, dprobes
& Co.  provide something comparable to DTrace, and we just disagree on
what "performance monitoring" means: The chance of getting such a pile
of complexity into mainline are virtually zero (unless it's called ACPI
and required to boot some machines :-/).

So what you can push for inclusion is bound to be a subset, and the
question remains: What does such a subset, which is clearly nothing
like DTrace, offer?

Roger

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 23:45         ` Roger Luethi
@ 2004-07-25 19:58           ` Karim Yaghmour
  2004-07-25 21:10             ` Roger Luethi
  2004-07-27 23:51             ` Tim Bird
  0 siblings, 2 replies; 13+ messages in thread
From: Karim Yaghmour @ 2004-07-25 19:58 UTC (permalink / raw)
  To: Roger Luethi
  Cc: Robert Wisniewski, zanussi, linux-kernel, richardj_moore,
	michel.dagenais

Roger Luethi wrote:
> So the question is (again, this is an issue that has been raised at the
> kernel summit as well): Is there some overlap between those various
> frameworks? Or do we really need completely separate frameworks for
> logging time stamps (performance), auditing information, etc.?

Hmm... I was at the kernel summit and OLS this week, and I had taken a
vacation from my laptop, so I haven't had the chance to reply to you
sooner. Nevertheless, let me talk to you about something I've discussed
with some people at OLS and with which most agreed:
The LKML smart-ass effect.

Here's how this works: Whenever you post something to LKML, you have
to assume that there's at least one smart-ass out there that's going to
pick on a tiny fraction of what you said and blow it out of proportion
while using other peoples' past quotes to try to paint you in as tiny
a corner as possible. Of course the more famous (in kernel development
terms) the person being quoted, the more convincing the smart-ass thinks
he is.

Obviously, this is a general rule, and you've got people who are better
at this than others. It can sometimes be funny, other times just anoying,
and others times still outright counter-productive. All in all, I
personally believe that this plays against Linux on the long term because
a lot of people avoid the LKML for that very reason.

So I have a few questions for you:
- Were you at KS?
- Were you at OLS?
- If you were at either events, then why didn't you come and talk to me
face-to-face?
- If you weren't, then how can you judge of the general mood of kernel
developers regarding LTT's adoption?

As to the issue you mention above, I don't remember any of the kernel
developers I've spoken to mentioning the need for merging what you claim
to be overlapping functionalities (not that such a thing is bad, and I
had suggested to the maintainers of some of the other tools you mention
to use LTT because it already existed, and their answer was: we'd gladly
use it if were already part of the kernel.) What was made very clear to
me by quite a few people, and by Andrew in person, was that LTT had a
sales problem (i.e. the LTT development team has to demonstrate that this
is actually needed by real users.) And this criticism is fair enough. We
have indeed negleted to document with real-world scenarios how LTT was
essential at solving problems.

As for DTrace, then its existence and current feature set is only a
testament to the fact that the Linux development model can sometimes have
perverted effects, as in the case of LTT. The only reason some people can
rave and wax about DTrace's capabilities and others, as yourself, drag
LTT through the mud because of not having these features, is because the
DTrace developers did not have to put up with having to be excluded from
their kernel for 5 years. As I had said earlier, we would be eons ahead
if LTT had been integrated into the kernel in one of the multiple attempts
that was made to get it in in the past few years. Lest you think that
DTrace actually got all its features in a single merge iteration ...

No one has summarized what happens to tools like LTT than Andrew in his
keynote to OLS: kernel developers are not always aware of the usefullness
of certain tools and sometimes need to be educated about said usefullness.
I concur with Andrew, and do take part of the blame for not having done
enough to address this issue in the past.

Nevertheless, not all is bad. Andrew and others have made suggestions
to me during KS/OLS and I intend to follow-up on these.

Plus, I've run into a ton of people who have told me that this type of
tool is essential for their day-to-day work. I will stop short of
covering actual names, but you should hear about such things in the near
future.

> DTrace is not a performance monitoring infrastructure, so what's your
> point? -- But let's assume for the sake of argument that LTT, dprobes
> & Co.  provide something comparable to DTrace, and we just disagree on
> what "performance monitoring" means: The chance of getting such a pile
> of complexity into mainline are virtually zero (unless it's called ACPI
> and required to boot some machines :-/).

You may want to be somewhat constructive here. You don't necessarily need
to follow the Modus Operandi of others on this list. The fact of the matter
is that we've been maintaining a very large stack of software components
for the past few years. We didn't do this just for the fun of it. We've
done it because we were asked to make the pieces small, efficient, and as
independent as possible. As a result, you can use crash dump without
tracing, you can use dprobes without LTT, and you can use LTT without
dprobes, etc.

> So what you can push for inclusion is bound to be a subset, and the
> question remains: What does such a subset, which is clearly nothing
> like DTrace, offer?

This kind of sound-bite would be great if this were FOX, but it isn't.
So if the benchmark is going to be DTrace, then you have to look as to
how DTrace came to be. It came to be because its developers did not have
to release a new Solaris patch for every iteration of the Solaris OS for
5 years. Level the playing field for us, and you'll see what comes next.
That's what OSS is about. It's when you see things like DTrace speed pass
projects like LTT/DPRobes/etc. that you begin to understand that the
kernel development model is not fail-safe.

There's absolutely no justification for letting a set of OSS projects
led by motivated people be overtaken by a propriatery product.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-25 19:58           ` Karim Yaghmour
@ 2004-07-25 21:10             ` Roger Luethi
  2004-07-27 23:51             ` Tim Bird
  1 sibling, 0 replies; 13+ messages in thread
From: Roger Luethi @ 2004-07-25 21:10 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Robert Wisniewski, zanussi, linux-kernel, richardj_moore,
	michel.dagenais

Wow, that was low. Does your doctor even let you go outside with such
a thin skin? I did not say most things you are attacking me for, and I
could easily defend the points I did make. But I don't care for wasting
my time in flame wars, and while I didn't see one before, there is one
now. It's your pet project, the stage is all yours, have fun.

Roger

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-25 19:58           ` Karim Yaghmour
  2004-07-25 21:10             ` Roger Luethi
@ 2004-07-27 23:51             ` Tim Bird
  1 sibling, 0 replies; 13+ messages in thread
From: Tim Bird @ 2004-07-27 23:51 UTC (permalink / raw)
  To: karim
  Cc: Roger Luethi, Robert Wisniewski, zanussi, linux-kernel,
	richardj_moore, michel.dagenais

Karim Yaghmour wrote:
> Plus, I've run into a ton of people who have told me that this type of
> tool is essential for their day-to-day work. I will stop short of
> covering actual names, but you should hear about such things in the near
> future.

Sony has used LTT in the past, and we plan to use it for a few more
development projects underway currently.  It would be nice if we
didn't have to wait for all the pieces to fall together for each
new kernel release (arch support, trace point patches, desired
sub-system stability).  Having seen LTT used for a number of years,
I'd have to agree with Karim's assessment that it would probably
be "neater" today if so much time hadn't been spent over the
years wrangling it into the kernel.

=============================
Tim Bird
Architecture Group Co-Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
E-mail: tim.bird@am.sony.com
=============================

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-22 20:47 LTT user input zanussi
  2004-07-23 10:01 ` Roger Luethi
@ 2004-07-28  2:48 ` Todd Poynor
  1 sibling, 0 replies; 13+ messages in thread
From: Todd Poynor @ 2004-07-28  2:48 UTC (permalink / raw)
  To: zanussi; +Cc: linux-kernel, karim, richardj_moore, bob, michel.dagenais

zanussi@us.ibm.com wrote:

> As with most other tools, we don't tend to hear from users unless they
> have problems with the tool. :-( LTT has also been picked up by
> Debian, SuSE, and MontaVista - maybe they have user input that we
> don't get to see as well...

I used LTT once to help investigate system startup performance issues on 
a Linux-based cell phone prototype.  One thing that might be different 
from most LTT user's experiences is that it was somebody else's 
software, for which I did not have the source.  This might help 
illustrate ways in which system administrators can analyze systems for 
improvements, rather than describing a more typical development 
scenario, although this does describe the development phase of a system.

LTT helped quantify the performance impacts of various system activities 
that might be best minimized (including unneeded system startup scripts 
and the importance of using shell builtins, as well as suggesting 
improvements that might be obtained through use of prelinking shared 
libraries), point out various repeated operations that could probably be 
consolidated (such as file access, process scheduling, and X 
client/server communication), and rule out low memory or the need for 
swapping as a cause of performance problems at that phase of system 
operation.

A great tool, highly recommended.

-- 
Todd Poynor
MontaVista Software

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: LTT user input
  2004-07-23 22:06         ` Roger Luethi
@ 2004-09-01 16:36           ` zanussi
  0 siblings, 0 replies; 13+ messages in thread
From: zanussi @ 2004-09-01 16:36 UTC (permalink / raw)
  To: Roger Luethi
  Cc: zanussi, linux-kernel, karim, richardj_moore, bob,
	michel.dagenais

Roger Luethi writes:
 > 
 > Heh, that's your job :-). Given that a Java/FORTH/whatever interpreter
 > is unlikely to be merged into mainline anytime soon, what excitement
 > can we still offer with the complex stuff living in user space?
 > 
 > Even if your goal is to beat DTrace eventually, you need to sell patches
 > on their own merits, not based on what we could do in some unlikely or
 > distant future. DTrace is a red herring, more interesting is what we
 > can do with, say, basic LTT infrastructure, or dprobes, etc.
 > 
 > Roger

I agree, and to that end have taken the existing trace infrastructure
(LTT and kprobes), bolted a Perl interpreter onto the user side to
make it capable of continuously monitoring the trace stream with
arbitrary logic, and come up with a few example scripts which I hope
might interest a wider audience and demonstrate the utility of this
approach, which is really pretty simple at its core: static and
dynamic instrumentation as provided by LTT and kprobes respectively do
little more in the kernel than efficiently get the relevant data to
user space, where user-defined scripts can make use of the full power
of standard languages like Perl to do whatever they like.

I've posted the code to the ltt-dev mailing list - obviously I won't
repost it here; if you're interested you can grab it from the archive:

http://www.listserv.shafik.org/pipermail/ltt-dev/2004-August/000649.html

I am though including the text of that posting below, as it goes into
more detail than the little I've described above, and contains some
concrete examples.

Tom

- copy of posting to ltt mailing list -

Hi,

The attached patch adds a new continuous trace monitoring capability
to the LTT trace daemon, allowing user-defined Perl scripts to analyze
and look for patterns in the LTT trace event data as it becomes
available to the daemon.  The same scripts can be used off-line if the
tracevisualizer is pointed at an existing trace file.  Note that this
is purely a user tools modification - no kernel files were harmed in
the making of this feature ;-)

Also attached are a couple of example kprobes modules which
demonstrate a way to insert dynamic tracepoints into the kernel in
order to gather data not included by the LTT static tracepoints.  The
gathered data is then passed along to LTT via custom events.

What this capability attempts to do is give regular sysadmins or
technically inclined users a quick and easy way to not only gather
system-wided statistics or detect patterns in system event data in an
ad-hoc manner, but to also answer questions like those that tools like
syscalltrack for example answers e.g. which process is modifying my
config file behind my back, who's deleting an important file once in
awhile, who's killing a particular process, etc.  (See examples below)

Basically the way it works is that when the trace daemon receives a
buffer of trace events from the kernel, it iterates over each event in
the buffer and invokes a callback handler in a user-defined Perl
script for that event (if there's a handler defined for the event
type).  This gives the script a chance to do whatever Perlish thing it
feels is appropriate for that event e.g. update counts, calculate time
differences, page someone, etc.  Since the embedded Perl interpreter
is persistent for the lifetime of the trace, global data structures
persist across handlers and are available to all.  Typically what
handlers do is update global counters or hashes or flags and let the
script-end handler output the results.  But of course since this is
Perl, anything goes and the only limit is your imagination (and what
you can reasonably do in a handler).  A word on performance - I was at
first sceptical that any scripting language interpreter could handle
the volume of events that LTT can throw at it, but in practice I
haven't seen any evidence of the trace scripts being unable to keep up
with the event stream, even during relatively heavy activity
e.g. kernel compile.  If it does become a problem, you can always do a
normal trace to disk and post-process the file using the same script
with the tracevisualizer.

The complete list of callback handlers is listed in the allcounts.pl
script, which can be found in the tracewatch-scripts directory.
Running this script causes all trace events to be counted and the
results displayed when the trace ends (You can stop a trace by using
Ctrl-C or by killing the tracedaemon (but don't kill -9) or via the
normal tracedaemon timeout (-ts option)):

# tracedaemon -o trace.out -z allcounts.pl

callback invocation counts:
   TraceWatch::network_packet_in: 808
   TraceWatch::irq_exit: 17508
   TraceWatch::memory_page_alloc: 21
   TraceWatch::softirq_soft_irq: 17500
   TraceWatch::irq_entry: 17508
   TraceWatch::schedchange: 44
   TraceWatch::fs_select: 76
   TraceWatch::fs_ioctl: 12
   TraceWatch::timer_expired: 9
   TraceWatch::fs_iowait_start: 2
   TraceWatch::trap_exit: 132
   TraceWatch::fs_read: 4
   TraceWatch::process_wakeup: 26
   TraceWatch::syscall_entry: 60
   TraceWatch::softirq_tasklet_action: 1
   TraceWatch::syscall_exit: 60
   TraceWatch::trap_entry: 132
   TraceWatch::network_packet_out: 14
   TraceWatch::kernel_timer: 16687
   TraceWatch::socket_send: 1
   TraceWatch::fs_write: 10

Here's the ouptut of a short script (tracewatch-scripts/syscall.pl)
that simply counts system-wide syscalls:

# tracedaemon -o trace.out -z syscall.pl

Total # of syscalls: 517

Counts by syscall number:

   sigreturn: 2
   stat64: 6
   time: 6
   ioctl: 92
   fstat64: 3
   poll: 2
   rt_sigaction: 1
   rt_sigprocmask: 4
   read: 36
   alarm: 1
   writev: 1
   fcntl64: 262
   write: 40
   select: 61

And here's the script, showing that a syscall_entry() handler is
defined to catch syscall events, which updates a global variable
containing the total syscall count and updates a per-syscall count by
updating a global hash keyed on the $syscall_name parameter of the
syscall_entry() handler.  The end_watch() handler is called when
tracing stops and allows the script to output its results, which in
this case entails just iterating over the hash and printing the
key/value pairs:

# Track the total number of syscalls by syscall name
#
# Usage: tracedaemon trace.out -o -z syscall.pl

package TraceWatch;

sub end_watch {
     print "\nTotal # of syscalls: $syscall_count\n";
     print "\nCounts by syscall number:\n\n";
     while (($key, $value) = each %syscall_counts) {
	print "  $key: $value\n";
     }
     print "\n";
}

sub syscall_entry {
     my ($tv_sec, $tv_usec, $syscall_name, $address) = @_;

     $syscall_count++;
     $syscall_counts{$syscall_name}++;
}

The tracewatch-scripts/syscalls-by-pid.pl script breaks down the
syscall totals to individual syscall totals for each pid.  Here's the
output:

# tracedaemon -o trace.out -z syscalls-by-pid.pl

Total # of syscalls: 998

Syscall counts by pid:

PID: 1327 [nmbd]
   close: 1
   socketcall: 4
   time: 9
   rt_sigprocmask: 10
   ioctl: 7
   fcntl64: 262
   select: 5
PID: 1 [init]
   stat64: 6
   time: 3
   select: 3
   fstat64: 3
PID: 1806 [wterm]
   read: 162
   ioctl: 225
   writev: 2
   select: 164
   write: 112
PID: 2199 [tracedaemon]
   poll: 1
   ioctl: 4
   write: 1
PID: 1359 [cron]
   stat64: 3
   rt_sigaction: 1
   rt_sigprocmask: 2
   time: 2
   nanosleep: 1
PID: 1270 [atalkd]
   sigreturn: 2
   select: 2

Here's the script, which is a little more involved but demonstrates a
few important things.  First, the start_watch() handler is called
before tracing starts to let the script set things up beforehand.  In
this case, start_watch() calls a helper function, get_process_names()
(from read-proc.pl) which reads /proc and returns a pid/procname hash.
The process_fork() and fs_exec() callbacks are used here only to keep
this hash up-to-date (this combination is common enough that it should
be put in a separate module, which would also make the actually
important of script look as simple as it really is).  We also see here
another bookkeeping handler, schedchange, which allows us to keep
track of the current pid.  The real meat of this script is in the
syscall_entry() handler, which basically keeps track of things using
nested hashes.  Isn't that wonderful?

# Tracks the total number of individual syscall invocations for each pid.
#
# Usage: tracedaemon trace.out -o -z syscalls-by-pid.pl

package TraceWatch;
require "read-proc.pl";

my $current_pid = -1;
my $last_entry;
my $last_fork_pid = -1;

# At start of tracing, get all the current pids from /proc
sub start_watch {
     get_process_names();
}

# At end of tracing, dump our nested hash
sub end_watch {
     print "\nTotal # of syscalls: $syscall_count\n";
     print "\nSyscall counts by pid:\n\n";
     while (($pid, $syscall_name_hash) = each %pids) {
	print "PID: $pid [$process_names{$pid}]\n";
	while (($syscall_name, $count) = each %$syscall_name_hash) {
	    print "  $syscall_name: $count\n";
	}
     }
     print "\n";
}

# For each syscall entry, add count to nested pid/syscall hash
sub syscall_entry {
     my ($tv_sec, $tv_usec, $syscall_name, $address) = @_;
     $syscall_count++;
     if ($current_pid != -1) { # ignore until we have a current pid
	$pids{$current_pid}{$syscall_name}++;
     }
}

# We need to track the current pid as one of our hash keys
sub schedchange {
     my ($tv_sec, $tv_usec, $in_pid, $out_pid, $out_pid_state) = @_;
     $current_pid = $in_pid;
}

# We need to track exec so we can keep our pid/name table up-to-date.
# The process_fork() callback has saved the pid we make the association 
with.
sub fs_exec {
     my ($tv_sec, $tv_usec, $filename) = @_;

     if ($last_fork_pid != -1) {
	$process_names{$last_fork_pid} = $filename; # process_fork saved the pid
     }
}

# We need to track forks so we can keep our pid/name table up-to-date.
sub process_fork {
     my ($tv_sec, $tv_usec, $pid) = @_;

     $last_fork_pid = $pid;
}

If we wanted to get further details about a particular pid, such as
how much time was spent in each syscall for that pid, we could run
tracewatch-scripts/syscalls-by-pid.pl:

# tracedaemon -o trace.out -z syscall-times-for-pid.pl

Total times per syscall type for pid 1327:

time: 2 usecs for 2 calls
rt_sigprocmask: 7 usecs for 4 calls
fcntl64: 326 usecs for 262 calls
select: 628866 usecs for 2 calls

See the script for an example of manipulating timestamps.

Up until now, the examples have focused mainly on gathering and
summarizing data.  The following examples use the data in the trace
stream to detect possibly sporadic conditions that the user would like
to be notified of when they happen.  For instance, if you have an
important file that keeps getting modified by some unknown assailant,
the tracewatch-scripts/who-modified.pl script helps you track it down.
It provides handlers for the fs_open(), fs_write() and fs_close()
callbacks, which allow it to detect that a file has been modified.  It
also demonstrates the use of the ltt::stop_trace() call, which you can
use from inside your Perl script to automatically stop the trace.  In
this case, when the script detects that the file has been modified, it
prints out that fact and who the culprit was, and then stops the
trace.  There's also a tracewatch-scripts/who-modified-with-tk.pl
script that does the same thing except that when it detects the
modification, it pops up a Tk window, which means you don't have to
constantly be checking the output of the script.  Or use Net::Pager
and have it page you at the beach ;-)

# tracedaemon -o trace.out -z who-modified.pl

The file you were watching (passwd), has been modified!  The culprit is 
pid 2213 [emacs21-x]

The final two examples demonstrate the same idea, but in both cases,
the LTT trace stream doesn't provide enough information to allow
detection of the problem.  The general solution to this is to use
kprobes to insert dynamic tracepoints, which do nothing more than log
the data necessary for our script to detect the situation (kprobes has
been included in the -mm kernel tree and will likely be included in
mainline. )  In the first example, we want to be notified when some
particular file disappears behind our backs and who the culprit is.
Here are the steps you need to carry out to test this:

# tracedaemon -o trace.out -z unlink.pl
# insmod trace-unlink-params.ko
# touch rabbit
# rm rabbit

The file you were watching (rabbit), has disappeared!  The culprit is 
pid 2631 [rm]

In the first step, we start the tracedaemn with the
tracewatch-scripts/unlink.pl script.  We then insmod the test kprobes
module, trace-unlink-params.ko, which instruments the sys_unlink()
system call to send an LTT event when any file is unlinked.  Here's
the relevant code in the kprobes handler in
syscall-kprobes/trace-unlink-params.c.  It simply copies the string
from userspace and logs it to ltt via ltt_log_raw_event().  getname()
can sleep, so it shouldn't really be called from here, but we're just
playing around for now...

	char *tmp = getname(pathname);

	if(!IS_ERR(tmp))
		ltt_log_raw_event(scpt.trace_id, strlen(tmp)+1, tmp);

The data we just logged in our kprobe will end up in our Perl
interpreter via the custom_event() handler.  All we need to do there
is use Perl's unpack() routine to get the data back out.  In this
case, we know that what we've logged is a character string, so we go
ahead and unpack one of those, compare it with the file name we're
tracking, and if we get a match, we've detected the file deletion and
can let the user know who the culprit was.

# If the given file disappeared, print the alert message and stop tracing
sub custom_event {
     my ($tv_sec, $tv_usec, $event_id, $data_size, $data) = @_;

     ($filename) = unpack("A*", $data);
     if ($filename =~ /^($alert_if_disappears)$/) {
	print "The file you were watching ($alert_if_disappears), has 
disappeared!  The culprit is pid $current_pid 
[$process_names{$current_pid}]\n";
	ltt::stop_trace();
       }
}

The final example is tracewatch-scripts/kill.pl.  This is similar to
the previous example, except that here, we're trying to figure out
who's killing a particular process.  Here, I started vi, got its pid
from ps and killed it.

# tracedaemon -o trace.out -z kill.p
# insmod trace-kill-params.ko
# kill 2832

The pid you were watching (2832), was killed!  The culprit is pid 2836 
[bash]

We start the tracedaemn with the tracewatch-scripts/kill.pl script.
Again, we then insmod the test kprobes module, trace-kill-params.ko,
which instruments the kill_something_info() kernel function to send an
LTT event when any process is killed.  Here's the relevant code in the
kprobes handler in syscall-kprobes/trace-kill-params.c.  It fills a
simple struct with the relevant values and logs it to ltt via
ltt_log_raw_event().

	event_data.sig = sig;
	event_data.pid = pid;
	event_data.sender_pid = info->si_pid;

	ltt_log_raw_event(scpt.trace_id, sizeof(event_data), &event_data);

And here's the corresponding custom_event() Perl handler.  Again, we
use unpack() to unpack 3 ints from the data, compare it with the
process we're interested in, and if we get a match, we know the
process has been killed, and who the culprit is.

# If the given process was killed, print the alert message and stop tracing
sub custom_event {
     my ($tv_sec, $tv_usec, $event_id, $data_size, $data) = @_;

     ($sig, $pid, $sender_pid) = unpack("iii", $data);
     if ($pid == $alert_if_killed) {
	print "The pid you were watching ($alert_if_killed), was killed!  The 
culprit is pid $current_pid [$process_names{$current_pid}]\n";
	ltt::stop_trace(); # Calls into the trace daemon or visualizer
     }
}

Well, that's it as far as examples and documention go - it should be
pretty straightforward if you know a little bit of Perl to just follow
and expand on the current examples.  If you come up with a useful Perl
script, please post it or send it to me and I'll try to include it in
the next version, if there is one.  Oh, and it should be obvious I'm
not an expert Perl programmer, so any cleanup of current scripts would
be welcome too.

I consider the current code to be somewhere between a prototype and
alpha feature at this point - the actual Perl interface and scripting
engine seems pretty solid at this point, and there are callbacks for
all current LTT events, so in that sense things are complete, but
there are some gaping holes that I'll fix if there's sufficient
interest:

- currently things break badly if you trace more than 1 cpu

- currently you need to trace everything in order to get anything.
   The reason for this is that data isn't ready for userspace until a
   sub-buffer is complete (since it uses relayfs bulk mode).  It also
   means there can be a considerable lag between the time an event
   happens and it's seen by the script.  relayfs also supports a packet
   mode, which can be read(2) from when a single event is available.
   This would give you pretty much immediate response time, at the cost
   of lower throughput.  Some thought needs to be given to tuning this
   tradeoff.
- tracevisualizer (i.e. reading from trace file) does the wrong thing
   with the pid/name hash, which it reads from the current system, but
   should be reading the proc.out file actually associated with the trace.
- TSC timestamping doesn't work - you need to use the -o tracedaemon
   option for gettimeofday timestamping
- command-line needs cleaning up

Just FYI, for the time being the only command-lines that's guaranteed
to probably not cause you any problems are the following:

# tracedaemon trace.out -o -z scriptfile

where trace.out is just a placeholder and currenty results in a
0-length file.

# tracevisualizer trace.out -z scriptfile

where trace.out is a real tracefile produced normally by the
tracedaemon.

Unfortunately, getting everything properly patched isn't much fun at 
this point.  This patch is against the 0.9.6-pre3 user tools.  You apply 
the LTT user tools patch (tracewatch.tar.bz2) after you've applied the 
following usertools patch:

http://www.listserv.shafik.org/pipermail/ltt-dev/2004-April/000611.html

which itself is applied to the user tools:

http://www.opersys.com/ftp/pub/LTT/ltt-0.9.6-pre3.tar.bz2

For the kernel side, I used the most recent relayfs and LTT patches 
recently posted to ltt-dev by Mathieu Desnoyers, and the kprobes patches 
   recently posted to the lkml by Prasanna Panchamukhi.  You might want 
to try applying the relayfs and LTT to the latest -mm kernel, which 
already includes kprobes.

relayfs:

http://www.listserv.shafik.org/pipermail/ltt-dev/2004-August/000637.html

LTT:

http://www.listserv.shafik.org/pipermail/ltt-dev/2004-August/000638.html

kprobes:

http://marc.theaimsgroup.com/?l=linux-kernel&m=109231438003930&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=109231406530886&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=109231366419453&w=2

Regards,

Tom

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-09-01 16:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-22 20:47 LTT user input zanussi
2004-07-23 10:01 ` Roger Luethi
2004-07-23 17:34   ` zanussi
2004-07-23 19:19     ` Roger Luethi
2004-07-23 20:44       ` zanussi
2004-07-23 22:06         ` Roger Luethi
2004-09-01 16:36           ` zanussi
2004-07-23 22:40       ` Robert Wisniewski
2004-07-23 23:45         ` Roger Luethi
2004-07-25 19:58           ` Karim Yaghmour
2004-07-25 21:10             ` Roger Luethi
2004-07-27 23:51             ` Tim Bird
2004-07-28  2:48 ` Todd Poynor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox