[RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-14 23:22   ` 2.6.11-rc1-mm1 Tim Bird
@ 2005-01-15 13:08     ` Thomas Gleixner
  2005-01-16  2:09       ` Karim Yaghmour
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2005-01-15 13:08 UTC (permalink / raw)
  To: Tim Bird; +Cc: LKML, Karim Yaghmour

</Flame off>

On Fri, 2005-01-14 at 15:22 -0800, Tim Bird wrote:
>  but not 1) supporting infrastructure for timestamping, managing event
>  data, etc., and 2) a static list of generally useful tracepoints.

Both points are well taken. Thats the essential minimum what
instrumentation needs.

I'd like to see this infrastructure usable for all kinds of
instrumentation mechanisms which are built in to the kernel already or
functions which are used for similar purposes in experimental trees and
other instrumentation related projects. 

This requires to seperate the backend from the infrastructure, so you
can chose from a set of backends which fit best for the intended use. 

One of those backends is LTT+relayfs. 
I really respect the work you have done there, but please accept that I
just see the limitations and try to figure out a way to make it more
generic and flexible before it is cemented into the kernel and makes it
hard to use for other interesting instrumentation aspects and maybe
enforces redundant implementation of infrastructure related
functionality.

E.g. tracking down timing related issues can make use from such
functionality if the infrastructure is provided seperately.
I guess a lot of developers would be happy to use it when it is already
around in the kernel and it can help testers for giving better
information to developers.

tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-15 13:08     ` [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Thomas Gleixner
@ 2005-01-16  2:09       ` Karim Yaghmour
  2005-01-16  3:11         ` Roman Zippel
  0 siblings, 1 reply; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-16  2:09 UTC (permalink / raw)
  To: tglx; +Cc: Tim Bird, LKML, Andrew Morton

Hello Thomas,

I don't mind having a general discussion about instrumentation, but
it has to be understood that the topic is so general and means so
many different things to different people that we are unlikely to
reach any useful consensus. Believe me, it's not for the lack of
trying. More below.

Thomas Gleixner wrote:
> </Flame off>

:D

> One of those backends is LTT+relayfs. 
> I really respect the work you have done there, but please accept that I
> just see the limitations and try to figure out a way to make it more
> generic and flexible before it is cemented into the kernel and makes it
> hard to use for other interesting instrumentation aspects and maybe
> enforces redundant implementation of infrastructure related
> functionality.
> 
> E.g. tracking down timing related issues can make use from such
> functionality if the infrastructure is provided seperately.
> I guess a lot of developers would be happy to use it when it is already
> around in the kernel and it can help testers for giving better
> information to developers.

I would invite you to review the history behind LTT and the history
behind the efforts to get LTT integrated in the kernel (which are
two separate topics.) If you look back, you will see that I worked
very hard trying to get people to think about a common framework
and that I and others made numerous suggestions in this regard. Here
are a few examples:

- DProbes (kprobes ancestor):
Shortly after dprobes came out in 2000, I was one of the first to
suggest that there could be interfacing between both to allow
dynamically added trace points. We worked with, and eventually
joined forces with, the IBM team working on this and very early
on, LTT and DProbes were interfacing:
http://marc.theaimsgroup.com/?l=linux-kernel&m=97079714009328&w=2
- OProfile:
When time came to integrate oprofile in the kernel, I tried to push
for oprofile to use ltt as it's logging engine (to John's utter
horror.) relayfs didn't exist at the time, and obviously oprofile
made it in without relying on ltt.
Here's a posting from July 2002 where I suggested oprofile rely on
ltt. In that same posting I listed a number of drivers/subsystems
that already contained tracing statements. Obviously I was pointing
out that there was an opportunity to create a common, uniform
infrastructure based on ltt:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102624656615567&w=2
- Syscalltrack:
In replying to a posting of someone looking for tracing info, there
was a brief discussion as to how syscalltrack could use ltt instead
of: a) redirecting the syscall table, b) have its own buffering
mechanism. Again, relayfs didn't exist at the time:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102822343523369&w=2
- Event logging:
When there was discussion about event logging, there was suggestion
to use ltt's engine. Again, relayfs wasn't there:
http://marc.theaimsgroup.com/?l=linux-kernel&m=101836133400796&w=2

And there are many other cases. As you can see, it's not as if
I didn't try to have this discussion before. Unfortunately, interest
in this was rather limited.

In addition, and this is a very important issue, quite a few
kernel developers mistook LTT for a kernel debugging tool, which
it was never meant to be. When, in fact, if you ask those who have
looked at using it for that purpose (try Marcelo or Andrea) you will
see that they didn't find it to be appropriate for them. And
rightly so, it was never meant for that purpose. Even lately, when
I suggested Ingo try using relayfs instead of his custom tracing
code for his preemption work, he looked at it and said that it
wasn't suited, but would consider reusing parts of it if it were
in the kernel.

So, in general, one thing I learned over the years is to not touch
the topic of kernel debugging even with a 10 foot poll when
discussing LTT.

What you are hinting at here (mention of developers vs. testers,
for example), and your stated preference for the type of ring-buffer
you described earlier clearly goes in the direction I've learned to
avoid: buffering support for the general purpose of kernel debugging.

Let me say outright that I see the relevance of what you are looking
for, but let me also say that what we tried to achieve with relayfs
is to provide a general mechanism for kernel subsystems that need to
convey large amounts of data to user-space. We did not attempt to
solve the problem of providing a buffering framework for core kernel
debugging. As I mentioned to Ingo in the mail I referred to earlier
regarding the type of buffering you are looking for:
> The above tracer may indeed be very appropriate for kernel development,
> but it doesn't provide enough functionality for the requirements of
> mainstream users.

If there is interest for using either relayfs and/or ltt for that
purpose, then this is an entirely different mandate and a few things
would need to be added for that to happen. For starters, we could
add another mode to relayfs. Currently, it supports a locking and
a lockless buffering scheme. We could also have ring-buffer mode
which would function very much as you, and Ingo before, have
described. But let me be crystal clear about this: don't count on
me to make a case for it on LKML. I've had enough flak as it is.
If you believe this is necessary, then you are welcome to make a
case for it, and obtain support from others on LKML. Obviously, as
the maintainers of relayfs, we see no reason to avoid extending it
for purposes others may find it useful for and/or accepting patches
to that end, if indeed such extensions don't preclude its adoption
in the mainline kernel.

Hope this helps clarify things a little,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-16  2:09       ` Karim Yaghmour
@ 2005-01-16  3:11         ` Roman Zippel
  2005-01-16  4:23           ` Karim Yaghmour
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Zippel @ 2005-01-16  3:11 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: tglx, Tim Bird, LKML, Andrew Morton

Hi,

On Sat, 15 Jan 2005, Karim Yaghmour wrote:

> In addition, and this is a very important issue, quite a few
> kernel developers mistook LTT for a kernel debugging tool, which
> it was never meant to be. When, in fact, if you ask those who have
> looked at using it for that purpose (try Marcelo or Andrea) you will
> see that they didn't find it to be appropriate for them. And
> rightly so, it was never meant for that purpose. Even lately, when
> I suggested Ingo try using relayfs instead of his custom tracing
> code for his preemption work, he looked at it and said that it
> wasn't suited, but would consider reusing parts of it if it were
> in the kernel.

Well, that's really a core problem. We don't want to duplicate 
infrastructure, which practically does the same. So if relayfs isn't 
usable in this kind of situation, it really raises the question whether 
relayfs is usable at all. We need to make relayfs generally usable, 
otherwise it will join the fate of devfs.

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-16  3:11         ` Roman Zippel
@ 2005-01-16  4:23           ` Karim Yaghmour
  2005-01-16 23:43             ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-16  4:23 UTC (permalink / raw)
  To: Roman Zippel; +Cc: tglx, Tim Bird, LKML, Andrew Morton, Tom Zanussi


Hello Roman,

Roman Zippel wrote:
> On Sat, 15 Jan 2005, Karim Yaghmour wrote:
>>In addition, and this is a very important issue, quite a few
>>kernel developers mistook LTT for a kernel debugging tool, which
>>it was never meant to be. When, in fact, if you ask those who have
>>looked at using it for that purpose (try Marcelo or Andrea) you will
>>see that they didn't find it to be appropriate for them. And
>>rightly so, it was never meant for that purpose. Even lately, when
>>I suggested Ingo try using relayfs instead of his custom tracing
>>code for his preemption work, he looked at it and said that it
>>wasn't suited, but would consider reusing parts of it if it were
>>in the kernel.
> 
> Well, that's really a core problem. We don't want to duplicate 
> infrastructure, which practically does the same. So if relayfs isn't 
> usable in this kind of situation, it really raises the question whether 
> relayfs is usable at all. We need to make relayfs generally usable, 
> otherwise it will join the fate of devfs.

Hmm, coming from you I will take this is a pretty strong endorsement
for what I was suggesting earlier: provide a basic buffering mode
in relayfs to be used in kernel debugging. However, it must be
understood that this is separate from the existing modes and ltt,
for example, could not use such a basic infrastructure. If this is
ok with you, and no one wants to complain too loudly about this, I
will go ahead and add this to our to-do list for relayfs.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-16  4:23           ` Karim Yaghmour
@ 2005-01-16 23:43             ` Thomas Gleixner
  2005-01-17  1:54               ` Karim Yaghmour
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2005-01-16 23:43 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi

On Sat, 2005-01-15 at 23:23 -0500, Karim Yaghmour wrote:
> > Well, that's really a core problem. We don't want to duplicate 
> > infrastructure, which practically does the same. So if relayfs isn't 
> > usable in this kind of situation, it really raises the question whether 
> > relayfs is usable at all. We need to make relayfs generally usable, 
> > otherwise it will join the fate of devfs.
> 
> Hmm, coming from you I will take this is a pretty strong endorsement
> for what I was suggesting earlier: provide a basic buffering mode
> in relayfs to be used in kernel debugging. However, it must be
> understood that this is separate from the existing modes and ltt,
> for example, could not use such a basic infrastructure. If this is
> ok with you, and no one wants to complain too loudly about this, I
> will go ahead and add this to our to-do list for relayfs.

This implies to seperate 

- infrastructure 
- event registration
- transport mechanism

tglx



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-16 23:43             ` Thomas Gleixner
@ 2005-01-17  1:54               ` Karim Yaghmour
  2005-01-17 10:26                 ` Thomas Gleixner
  2005-01-19  7:13                 ` Werner Almesberger
  0 siblings, 2 replies; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-17  1:54 UTC (permalink / raw)
  To: tglx
  Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

Thomas Gleixner wrote:
> This implies to seperate 
> 
> - infrastructure 
> - event registration
> - transport mechanism

Like I said in my first response: we can't be everything for everbody,
the requirements are just too broad. ISO tried it with OSI. Have a
look at net/* for the result.

Currently, LTT provides the first two in one piece, and relayfs
provides the third. Like I acknowledged earlier, there is room for
generalizing the transport mechanism, and I'm thinking of amending
the relayfs API proposal further and rename the modes to make them
more straight-forward:
- Managed (locking or lockless.)
- Ad-Hoc (which works like Ingo, yourself, and others have requested.)

If you really want to define layers, then there are actually four
layers:
1- hooking mechanism
2- event definition / registration
3- event management infrastructure
4- transport mechanism

LTT currently does 1, 2 & 3. Clearly, as in the mail I refered to
earlier, there is code in the kernel that already does 1, 2, 3,
and 4 in very hardwired/ad-hoc fashion and there isn't anyone asking
for them to remove it. We're offering 4 separately and are putting
LTT on top of it. If you want to get 1 & 2 separately, have a look
at kernel hooks and genevent:
http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/
http://www.listserv.shafik.org/pipermail/ltt-dev/2003-January/000408.html

We'd gladly take a serious look at using the former if it was
included, and there is work in progress being conducted on getting
the latter being the standard way for declaring LTT events instead
of using a static ltt-events.h.

Five years ago, there was a discussion about integrating GKHI into
the kernel (the kernel hooks ancestor). Have a look for yourself
as to the response to this suggestion (basically people weren't
ready to accept a generalized hooking mechanism without a defined
set of hooks, and then others didn't like the idea at all because
creating general hooks in the kernel which anybody can register
to creates legal and maintenance problems ... basically it's a
can of worms):
http://marc.theaimsgroup.com/?l=linux-kernel&m=97371908916365&w=2

There's only so much we can push into the kernel in the same time.
Not to mention that before you can be generic, you've got to have
some specific implementation to start working off on. I believe
that what we've ironed out through the discussion of the past
two days is a good basis.

There is some irony in all this. For years, we were told that we
couldn't make it into the kernel because we were perceived as
providing a kernel debugging tool, and now that we're starting
to get our things seriously reviewed we're being told that maybe
it ain't really that useful because those who want to do kernel
debugging can't use it as-is ... go figure.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-17  1:54               ` Karim Yaghmour
@ 2005-01-17 10:26                 ` Thomas Gleixner
  2005-01-17 20:34                   ` Karim Yaghmour
  2005-01-19  7:13                 ` Werner Almesberger
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2005-01-17 10:26 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

On Sun, 2005-01-16 at 20:54 -0500, Karim Yaghmour wrote:

> If you really want to define layers, then there are actually four
> layers:
> 1- hooking mechanism
> 2- event definition / registration
> 3- event management infrastructure
> 4- transport mechanism
> 
> LTT currently does 1, 2 & 3. Clearly, as in the mail I refered to
> earlier, there is code in the kernel that already does 1, 2, 3,
> and 4 in very hardwired/ad-hoc fashion and there isn't anyone asking
> for them to remove it. We're offering 4 separately and are putting
> LTT on top of it. If you want to get 1 & 2 separately, have a look
> at kernel hooks and genevent:

I know that there is enough code which does x,y,z hardcoded/hardwired
already. 

Thats the point. Adding another hardwired implementation does not give
us a possibility to solve the hardwired problem of the already available
stuff.

tglx



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-17 10:26                 ` Thomas Gleixner
@ 2005-01-17 20:34                   ` Karim Yaghmour
  2005-01-17 22:18                     ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-17 20:34 UTC (permalink / raw)
  To: tglx
  Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore


Thomas Gleixner wrote:
> Thats the point. Adding another hardwired implementation does not give
> us a possibility to solve the hardwired problem of the already available
> stuff.

Well then, like I said before, you know what you need to do:
http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-17 20:34                   ` Karim Yaghmour
@ 2005-01-17 22:18                     ` Thomas Gleixner
  2005-01-17 23:57                       ` Karim Yaghmour
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2005-01-17 22:18 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

On Mon, 2005-01-17 at 15:34 -0500, Karim Yaghmour wrote:
> Thomas Gleixner wrote:
> > Thats the point. Adding another hardwired implementation does not give
> > us a possibility to solve the hardwired problem of the already available
> > stuff.
> 
> Well then, like I said before, you know what you need to do:
> http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/

Oh, I guess my English must be really bad.

I was talking about seperation of layers, so why do I need
kernelhooks ? 

The seperation of layers makes it possible to actually reuse
functionality and gives the possibility that existing hardwired stuff
can be cleaned up to use the new functionality too. 

If we add another hardwired implementation then we do not have said
benefits.

tglx




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-17 22:18                     ` Thomas Gleixner
@ 2005-01-17 23:57                       ` Karim Yaghmour
  2005-01-18  8:46                         ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-17 23:57 UTC (permalink / raw)
  To: tglx
  Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore


Thomas Gleixner wrote:
> If we add another hardwired implementation then we do not have said
> benefits.

Please stop handwaving. Folks like Andrew, Christoph, Zwane, Roman,
and others actually made specific requests for changes in the code.
What makes you think you're so special that you think you are
entitled to stay on the side and handwave about concepts.

If there is a limitation with the code, please present actual
snippets that need to be changed and suggest alternatives. That's
what everyone else does on this list.

If you want to clean-up the existing tracing code in the kernel,
then here are some ltt calls you may be interested in:
int ltt_create_event(char *event_type,
		     char *event_desc,
		     int format_type,
		     char *format_data);
int ltt_log_raw_event(int event_id, int event_size, void *event_data);

And here's an actual example:
...
  delta_id = ltt_create_event("Delta",
                              NULL,
                              CUSTOM_EVENT_FORMAT_TYPE_HEX,
                              NULL);
...
  ltt_log_raw_event(delta_id, sizeof(a_delta_event), &a_delta_event);
...
  ltt_destroy_event(delta_id);

You can then use LibLTT to read the trace and extract your custom
events and format your binary data as it suits you.

Save the bandwidth and start cleaning.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-17 23:57                       ` Karim Yaghmour
@ 2005-01-18  8:46                         ` Thomas Gleixner
  2005-01-18 16:31                           ` Karim Yaghmour
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2005-01-18  8:46 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

On Mon, 2005-01-17 at 18:57 -0500, Karim Yaghmour wrote: 
> Thomas Gleixner wrote:
> > If we add another hardwired implementation then we do not have said
> > benefits.
> 
> Please stop handwaving. Folks like Andrew, Christoph, Zwane, Roman,
> and others actually made specific requests for changes in the code.
> What makes you think you're so special that you think you are
> entitled to stay on the side and handwave about concepts.

So the points you added to your todo list which were brought up by me
are worthless ?

I'm not handwaving. I started this RFC to move the discussion into a
general discussion about instrumentation. A couple of people are
seriosly interested to do this. If you are not interested then ignore
the thread, but you're way not in a position to tell me to shut up.

You turned this thread into your LTT prayer wheel.

Roman pointed out your unwillingness to create a common framework
before. But I have to disagree with him in one point. It's not amazing,
it's annoying.

> If there is a limitation with the code, please present actual
> snippets that need to be changed and suggest alternatives. That's
> what everyone else does on this list.

I pointed you to actually broken code and you accused me of throwing
mud.

> Save the bandwidth 

Please remove me from cc, it's a good start to save bandwidth.

> and start cleaning.

Yes, I did already start cleaning

cat ../broken-out/ltt* | patch -p1 -R

tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-18  8:46                         ` Thomas Gleixner
@ 2005-01-18 16:31                           ` Karim Yaghmour
  0 siblings, 0 replies; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-18 16:31 UTC (permalink / raw)
  To: tglx
  Cc: Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

Thomas,

Thomas Gleixner wrote:
> Yes, I did already start cleaning
> 
> cat ../broken-out/ltt* | patch -p1 -R

:D

If it gives you a warm and fuzzy feeling to have the last
cheap-shot, then I'm all for it, it is of no consequence anyway.
And _please_ don't forget to answer this very email with
something of the same substance.

For my part I consider that I've invested a substantial amount
of time in responding to both your conceptual and practical
feedback, as the archives clearly show.

That being said, I have to thank you for making sure that all
the obvious questions have been asked. I now have more than a
dozen archive links of my answers to those. I'll sure come in
handy when writing an FAQ.

Thanks again,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-17  1:54               ` Karim Yaghmour
  2005-01-17 10:26                 ` Thomas Gleixner
@ 2005-01-19  7:13                 ` Werner Almesberger
  2005-01-19 17:38                   ` Karim Yaghmour
  1 sibling, 1 reply; 16+ messages in thread
From: Werner Almesberger @ 2005-01-19  7:13 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: tglx, Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

>From all I've heard and seen of LTT (and I have to admit that most
of it comes from reading this thread, not from reading the code),
I have the impression that it may try to be a bit too specialized,
and thus might miss opportunities for synergy. 

You must be getting tired of people trying to redesign things from
scratch, but maybe you'll humor me anyway ;-)

Karim Yaghmour wrote:
> If you really want to define layers, then there are actually four
> layers:
> 1- hooking mechanism
> 2- event definition / registration
> 3- event management infrastructure
> 4- transport mechanism

For 1, kprobes would seem largely sufficient. In cases where you
don't have a usable attachment point (e.g. in the middle of a
function and you need access to variables with unknown location),
you can add lightweight instrumentation that arranges the code
flow suitably. [1, 2]

2 and 3 should be the main domain of LTT, with 2 sitting on top
of kprobes. kprobes currently doesn't have a nice way for
describing handlers, but that can be fixed [3]. But you probably
don't need a "nice" interface right now, but might be satisfied
with one that works and is fast (?)

>From the discussion, it seems that the management is partially
done by relayfs. I find this a little strange. E.g. instead of
filtering events, you may just not generate them in the first
place, e.g. by not placing a probe, or by filtering in LTT,
before submitting the event.

Timestamps may be fine either way. Restoring sequence should be
a task user-space can handle: in the worst case, you'd have to
read and merge from #cpus streams. Seeking works in that context,
too.

Last but not least, 4 should be simple. Particularly since you're
worried about extreme speeds, there should be as little
processing as you can afford. If you need to seek efficiently
(do you, really ?), you may not even want message boundaries at
that level.

Something that isn't entirely clear to me is if you also need to
aggregate information in buffers. E.g. by updating a record until
is has been retrieved by user space, or by updating a record
when there is no space to create a new one. Such functionality
would add complexity and needs tight sychronization with the
transport.

[1] I've seen the argument that kprobes aren't portable. This
    strikes me a highly questionable. Even if an architecture
    doesn't have a trap instruction (or equivalent code sequence)
    that is at least as short as the shortest instruction, you
    can always fall back to adding instrumentation [2]. Also, if
    you know where your basic blocks are, you may be able to
    use traps that span multiple instructions. I recall that
    things of this kind are already planned for kprobes.

[2] See the "reliable markers" of umlsim from umlsim.sf.net.
    Implementation: cd umlsim/lib; make; tail -50 markers_kernel.h
    Examples: cd umlsim/sim/tests; cat sbug.marker
    They're basically extra-light markup in the source code.
    Works on ia32, but I haven't found a way to get the assembler
    to cooperate for amd64, yet.

[3] I've already solved this problem in umlsim: there, I have a
    Perl/C-like scripting language that allows handlers to do
    pretty much anything they want. Of course, kprobes would
    want pre-compiled C code, not some scripts, but I think the
    design could be developped in a direction that would allow
    both. Will take a while, but since I'll eventually have to
    rewrite the "microcode" anyway, ...

So my comments are basically as follows:

1) kprobes seems like a suitable and elegant mechanism for
   placing all the hooks LTT needs, so I think that it would
   be better to build on this basis, and extend it where
   necessary, than to build yet another specialized variant
   in parallel.
2) LTT should do what it is good at, and not have to worry
   about the rest (i.e. supporting infrastructure).
3) relayfs should be lean and fast, as you intend it to be, so
   that non-LTT tracing or fnord debugging fnord code may find
   it useful, too.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-19  7:13                 ` Werner Almesberger
@ 2005-01-19 17:38                   ` Karim Yaghmour
  0 siblings, 0 replies; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-19 17:38 UTC (permalink / raw)
  To: Werner Almesberger
  Cc: tglx, Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore


Werner Almesberger wrote:
>>From all I've heard and seen of LTT (and I have to admit that most
> of it comes from reading this thread, not from reading the code),

Might I add that this is part of the problem ... No personal
offence intended, but there's been _A LOT_ of things said about
LTT that were based on third-hand account and no direct contact
with the toolset/code. And part of the problem is that _many_
people on this list, and elsewhere, have done some form of
tracing or another as part of their development, so they all
have their idea of how this is best done. Yet, while such
experience can help provide additional ideas to LTT's development,
it also often requires re-explaining to every new suggestor why we
added features he couldn't imagine would be useful to any of
his/her own tracing needs ... Sometimes I wish my interests lied
in some arcane feature that few had ever played with ;)

IOW, while I don't discount anybody else's experience with tracing,
please give us at least the benefit of the doubt by actually:
a) Looking at the code
b) Looking at the mailing list archives
c) Asking us questions directly related to the code

> I have the impression that it may try to be a bit too specialized,
> and thus might miss opportunities for synergy. 

Bare with me on this one ...

> You must be getting tired of people trying to redesign things from
> scratch, but maybe you'll humor me anyway ;-)

Hey, from you Werner I'll take anything. It's always a pleasure
talking with you :)

> Karim Yaghmour wrote:
> 
>>If you really want to define layers, then there are actually four
>>layers:
>>1- hooking mechanism
>>2- event definition / registration
>>3- event management infrastructure
>>4- transport mechanism
> 
> 
> For 1, kprobes would seem largely sufficient. In cases where you
> don't have a usable attachment point (e.g. in the middle of a
> function and you need access to variables with unknown location),
> you can add lightweight instrumentation that arranges the code
> flow suitably. [1, 2]

Let me say outright, as I said to Andi early on in the sister thread,
that I have no problems with having the trace points being fed by
kprobes. In fact, in 2000, way back before kprobes even existed, LTT
was already interfacing with DProbes for dynamic insertion of trace
points.

... There I said it ... now watch me have to repeat this yet again
later on ... :/

However, kprobes is not magic:
a) Like I said to Andi:
> As far as kprobes go, then you still need to have some form or another
> of marking the code for key events, unless you keep maintaining a set
> of kprobes-able points separately, which really makes it unusable for
> the rest of us, as the users of LTT have discovered over time (having
> to create a new patch for every new kernel that comes out.)

b) Like I said to Andrew back in July:
> I've double-checked what I already knew about kprobes and have looked again
> at the site and the patch, and unless there's some feature of kprobes I don't
> know about that allows using something else than the debug interrupt to add
> hooks,
...
> Generating new interrupts is simply unacceptable for LTT's functionality.
> Not to mention that it breaks LTT because tracing something will generate
> events of its own, which will generating tracing events of their own ...
> recursion.

Ok, you can argue about the recursion thing with an "if()", but you'll
have to admit that like in the case I described to Roman:
> ... Say you're getting
> 2MB/s of data (which is not unrealistic on a loaded system.) That means
> that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour).
IOW, something like 200,000events/s (average of 10bytes/event). Do I
really need to explain that 200,000 traps/interrupts per second is
not something you want ... ?

But don't despair, like I said to Andi:
> So lately I've been thinking that there may be a middle-ground here
> where everyone could be happy. Define three states for the hooks:
> disabled, static, marker. The third one just adds some info into
> System.map for allowing the automation of the insertion of kprobes
> hooks (though you would still need the debugging info to find the
> values of the variables that you want to log.) Hence, you get to
> choose which type of poison you prefer. For my part, I think the
> noop/early-check should be sufficient to get better performance from
> the existing hook-set.
I have received very little feedback on this suggestion, though I
really think it's worth entertaining, especially with your mention
of uml-sim markers further below.

As for the location of ltt trace points, then they are very rarely
at function boundaries. Here's a classic:
		prepare_arch_switch(rq, next);
		ltt_ev_schedchange(prev, next);
		prev = context_switch(rq, prev, next);

> 2 and 3 should be the main domain of LTT, with 2 sitting on top
> of kprobes. kprobes currently doesn't have a nice way for
> describing handlers, but that can be fixed [3]. But you probably
> don't need a "nice" interface right now, but might be satisfied
> with one that works and is fast (?)

The functions have been there for DProbes for 5 years:
int ltt_create_event(char *event_type,
		     char *event_desc,
		     int format_type,
		     char *format_data)
int ltt_log_raw_event(int event_id, int event_size, void *event_data)

>>From the discussion, it seems that the management is partially
> done by relayfs. I find this a little strange. E.g. instead of
> filtering events, you may just not generate them in the first
> place, e.g. by not placing a probe, or by filtering in LTT,
> before submitting the event.

Like I said to Andi:
> ... For one thing, the current
> ltt hooks aren't as fast as they should be (i.e. we check whether
> the tracing is enabled for a certain event way too far in the code-path.)
> This should be rather simple to fix.
And I've already got the code snippet to fix this ready.

> Timestamps may be fine either way. Restoring sequence should be
> a task user-space can handle: in the worst case, you'd have to
> read and merge from #cpus streams. Seeking works in that context,
> too.
> 
> Last but not least, 4 should be simple. Particularly since you're
> worried about extreme speeds, there should be as little
> processing as you can afford. If you need to seek efficiently
> (do you, really ?), you may not even want message boundaries at
> that level.

Like I said to Roman:
> Removing this data would require more data for each event to
> be logged, and require parsing through the trace before reading it in
> order to obtain markers allowing random access. This wouldn't be so
> bad if we were expecting users to use LTT sporadically for very short
> periods of time. However, given ltt's target audience (i.e. need to
> run traces for hours, maybe days, weeks), traces would rapidely become
> useless because while plowing through a few hundred KBs of data and
> allocating RAM for building internal structures as you go is fine,
> plowing through tens of GBs of data, possibly hundreds, requires that
> you come up with a format that won't require unreasonable resources
> from your system, while incuring negligeable runtime costs for generating
> it. We believe the format we currently have achieves the right balance
> here.

What we've agreed with Roman is that relayfs won't write anything at
the boundaries. Its clients will provide it with callbacks to be
invoked at buffer boundaries. When invoked, said callbacks can add
whatever they feel is important to the buffer, relayfs doesn't care.

> Something that isn't entirely clear to me is if you also need to
> aggregate information in buffers. E.g. by updating a record until
> is has been retrieved by user space, or by updating a record
> when there is no space to create a new one. Such functionality
> would add complexity and needs tight sychronization with the
> transport.

If I understand you correctly, you are talking about the fact that
the transport layer's management of the buffers is syncrhonized
with some user-space entity that consumes the buffers produced
and talks back to relayfs (albeit indirectly) to let it know that
said buffers are now available? If so, then that's why I suggested
elsewhere that we have two modes for relayfs: managed and adhoc.
In the former, you have the required mechanics for what I just
described. In the latter, you have a very basic buffering scheme
that cares nothing about user-space synchronization.

> [1] I've seen the argument that kprobes aren't portable. This
>     strikes me a highly questionable. Even if an architecture
>     doesn't have a trap instruction (or equivalent code sequence)
>     that is at least as short as the shortest instruction, you
>     can always fall back to adding instrumentation [2]. Also, if
>     you know where your basic blocks are, you may be able to
>     use traps that span multiple instructions. I recall that
>     things of this kind are already planned for kprobes.

I have nothing against kprobes. People keep refering to it as if
it magically made all the related problems go away, and it doesn't.
See above.

> [2] See the "reliable markers" of umlsim from umlsim.sf.net.
>     Implementation: cd umlsim/lib; make; tail -50 markers_kernel.h
>     Examples: cd umlsim/sim/tests; cat sbug.marker
>     They're basically extra-light markup in the source code.
>     Works on ia32, but I haven't found a way to get the assembler
>     to cooperate for amd64, yet.

Nothing precludes us to move in this direction once something is
in the kernel, it's all currently hidden away in a .h, and it would
be the same with this.

> [3] I've already solved this problem in umlsim: there, I have a
>     Perl/C-like scripting language that allows handlers to do
>     pretty much anything they want. Of course, kprobes would
>     want pre-compiled C code, not some scripts, but I think the
>     design could be developped in a direction that would allow
>     both. Will take a while, but since I'll eventually have to
>     rewrite the "microcode" anyway, ...

Like I said, nothing precludes us ...

> So my comments are basically as follows:
> 
> 1) kprobes seems like a suitable and elegant mechanism for
>    placing all the hooks LTT needs, so I think that it would
>    be better to build on this basis, and extend it where
>    necessary, than to build yet another specialized variant
>    in parallel.

Whichever way you look at this, you need to mark the code. What's
in the .h is something we can tweak ad-nauseam.

> 2) LTT should do what it is good at, and not have to worry
>    about the rest (i.e. supporting infrastructure).

I'm guessing that when you're talking about "supporting
infrastructure" you are refering to the trace statements. If so,
please see above. Also note that without the existing marker set
LTT is useless to its users (application developers, sysadmins,
power users, etc.)

> 3) relayfs should be lean and fast, as you intend it to be, so
>    that non-LTT tracing or fnord debugging fnord code may find
>    it useful, too.

relayfs has already been used for many non-LTT related. Ask
Hubertus or Jamal, to name a few.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
@ 2005-01-20 21:39 Werner Almesberger
  2005-01-20 23:07 ` Karim Yaghmour
  0 siblings, 1 reply; 16+ messages in thread
From: Werner Almesberger @ 2005-01-20 21:39 UTC (permalink / raw)
  To: Karim Yaghmour
  Cc: tglx, Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

[ 3rd try. Apologies to Karim, Thomas, and Roman, who apparently also
  received my previous attempts. For some reason, one of my upstream
  DNS servers decided to send me highly bogus MX records. ]

Karim Yaghmour wrote:
> Might I add that this is part of the problem ... No personal
> offence intended, but there's been _A LOT_ of things said about
> LTT that were based on third-hand account and no direct contact
> with the toolset/code.

Sigh, yes, guilty as charged ...

At least today, I have a good excuse: my cable modem died, and I
couldn't possibly have download things to look at :)

> > As far as kprobes go, then you still need to have some form or another
> > of marking the code for key events, unless you keep maintaining a set
> > of kprobes-able points separately, which really makes it unusable for
> > the rest of us, as the users of LTT have discovered over time (having
> > to create a new patch for every new kernel that comes out.)

Yes, I think you will need some set of "pads" in the code, where you
can attach probes. I'm not sure how many, though. An alternative, at
least in some cases, would be to move such things into separate
functions, so that you could put the probe just at function entry.
Then add a comment that this function isn't supposed to be torn
apart without dire need.

> > Generating new interrupts is simply unacceptable for LTT's functionality.

Absolutely. If I remember correctly, this is in the process of being
addressed in kprobes. You basically have the following choices:

 - if the probe target is an instruction long enough, replace it with
   a jump or call (that's what I think the kprobes folks are working
   on. I remember for sure that they were thinking about it.)
 - if the probe target is in a basic block with enough room after the
   target, see above (needs feedback from compiler or assembler)
 - if all else fails, add some NOPs (i.e. the marker approach)

> I have received very little feedback on this suggestion,

Probably because everybody saw that it was good :-)

> As for the location of ltt trace points, then they are very rarely
> at function boundaries. Here's a classic:
> 		prepare_arch_switch(rq, next);
> 		ltt_ev_schedchange(prev, next);
> 		prev = context_switch(rq, prev, next);

Yes, in some cases, you don't have a choice but to add some marker.

> > Removing this data would require more data for each event to
> > be logged, and require parsing through the trace before reading it in
> > order to obtain markers allowing random access.

So you need seeking, even in the presence of fine-grained control
over what gets traced in the first place ? (As opposed to extracting
the interesting data from the full trace, given that the latter
shouldn't contain too much noise.)

> If I understand you correctly, you are talking about the fact that
> the transport layer's management of the buffers is syncrhonized
> with some user-space entity that consumes the buffers produced
> and talks back to relayfs (albeit indirectly) to let it know that
> said buffers are now available?

Or that they have been consumed. My question is just whether this
kind of aggregation is something you need.

> I have nothing against kprobes. People keep refering to it as if
> it magically made all the related problems go away, and it doesn't.

Yes, I know just too well :-) In umlsim, I have pretty much the
same problems, and the solutions aren't always nice. So far, I've
been lucky enough that I could almost always find a suitable
function entry to abuse.

However, since a kprobes-based mechanism is - in the worst case,
i.e. when needing markup - as good as direct calls to LTT, and gives
you a lot more flexibility if things aren't quite as hostile, I
think it makes sense to focus on such a solution.

> Nothing precludes us to move in this direction once something is
> in the kernel, it's all currently hidden away in a .h, and it would
> be the same with this.

Yup, but you could move even more intelligence outside the kernel.
All you really need in the kernel is a place to put the probe,
plus some debugging information to tell you where you find the
data (the latter possibly combined with gently coercing the
compiler to put it at some accessible place).

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
  2005-01-20 21:39 [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Werner Almesberger
@ 2005-01-20 23:07 ` Karim Yaghmour
  0 siblings, 0 replies; 16+ messages in thread
From: Karim Yaghmour @ 2005-01-20 23:07 UTC (permalink / raw)
  To: Werner Almesberger
  Cc: tglx, Roman Zippel, Tim Bird, LKML, Andrew Morton, Tom Zanussi,
	Richard J Moore

Werner Almesberger wrote:
>  - if the probe target is an instruction long enough, replace it with
>    a jump or call (that's what I think the kprobes folks are working
>    on. I remember for sure that they were thinking about it.)

I heard about this years ago, but I don't know that anything came of
it. I suspect that this is not as simple as it looks and that the
only reliable way to do it is with a trap.

> Probably because everybody saw that it was good :-)

Great, thanks. That's what we'll aim for then. We've already got
the "disable" and "static" implemented, so now we need to figure
out how do we best implement this tagging. IBM's kernel hooks
allowed the NOP solution, so I'm guessing it shouldn't be that
much of a stretch to extend it for marking up the code for kprobes
and friends. I don't know whether this code is still maintained or
not, but I'd like to hear input as to whether this is a good basis,
or whether you're thinking of something like your uml-sim hooks?

> So you need seeking, even in the presence of fine-grained control
> over what gets traced in the first place ? (As opposed to extracting
> the interesting data from the full trace, given that the latter
> shouldn't contain too much noise.)

The problem is that you don't necessarily know beforehand what's
the problem. So here's an actual example:

I had a client who had this box on which a task was always getting
picked up by the OOM killer. Try as they might, the development
team couldn't figure out which part of the code was causing this.
So we put LTT in there and in less than 5 minutes we found the
problem. It turned out that a user-space access to a memory-mapped
FPGA caused an unexpected FP interrupt to occur, and the application
found itself in a recursive signal handler. In this case there was
an application symptom, but it was a hardware problem.

This is just a simple example, but there are plenty of other
examples where a sysadmin will be experiencing some weird
hard to reproduce bugs on some of his systems and he'll spend
a considerable amount of time trying to guess what's happening.
This is especially complicated when there's no indication as to
what's the root of the problem. So at that point being able to
log everything and being able to rapidely browse through it is
critical.

Once you've done such a first trace you _may_ _possibly_ be
able to refine your search requirements and relog with that in
mind, but that's after the fact.

> Or that they have been consumed. My question is just whether this
> kind of aggregation is something you need.

Absolutely. If you're thinking about short 100kb or MBs traces,
then a simpler scheme would be possible. But when we're talking
about GB and 100GBs spaning days, there's got to be a managed
way of doing it.

>>I have nothing against kprobes. People keep refering to it as if
>>it magically made all the related problems go away, and it doesn't.
> 
> 
> Yes, I know just too well :-) In umlsim, I have pretty much the
> same problems, and the solutions aren't always nice. So far, I've
> been lucky enough that I could almost always find a suitable
> function entry to abuse.

Glad you acknowledge as much.

> However, since a kprobes-based mechanism is - in the worst case,
> i.e. when needing markup - as good as direct calls to LTT, and gives
> you a lot more flexibility if things aren't quite as hostile, I
> think it makes sense to focus on such a solution.

You certainly have a lot more experience than I do with that, so
I'd like to solicit your help. As above: what's the best way to
provide this in addition to the static and disable points?

> Yup, but you could move even more intelligence outside the kernel.
> All you really need in the kernel is a place to put the probe,
> plus some debugging information to tell you where you find the
> data (the latter possibly combined with gently coercing the
> compiler to put it at some accessible place).

Right, but then you end up with a mechanism with generalized hooks.
Actually there was a time when LTT was a driver and you could
either build it as a module or keep it built-in. However, when
we published patches to get LTT accepted in 2.5 we were told on
LKML to move LTT into kernel/ and avoid all this driver stuff.
Having it, or parts of it, in the kernel makes it much simpler
and much more likely that the existing ad-hoc tracing code
spreading accross the sources be removed in exchange for a
single agreed upon way of doing things.

It must be said that like I had done with relayfs, the LTT patch
will go through a major redux and I will post the patches for
review like before on LKML.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-01-20 22:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-20 21:39 [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Werner Almesberger
2005-01-20 23:07 ` Karim Yaghmour
  -- strict thread matches above, loose matches on Subject: below --
2005-01-14  8:23 2.6.11-rc1-mm1 Andrew Morton
2005-01-14 22:46 ` 2.6.11-rc1-mm1 Thomas Gleixner
2005-01-14 23:22   ` 2.6.11-rc1-mm1 Tim Bird
2005-01-15 13:08     ` [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1) Thomas Gleixner
2005-01-16  2:09       ` Karim Yaghmour
2005-01-16  3:11         ` Roman Zippel
2005-01-16  4:23           ` Karim Yaghmour
2005-01-16 23:43             ` Thomas Gleixner
2005-01-17  1:54               ` Karim Yaghmour
2005-01-17 10:26                 ` Thomas Gleixner
2005-01-17 20:34                   ` Karim Yaghmour
2005-01-17 22:18                     ` Thomas Gleixner
2005-01-17 23:57                       ` Karim Yaghmour
2005-01-18  8:46                         ` Thomas Gleixner
2005-01-18 16:31                           ` Karim Yaghmour
2005-01-19  7:13                 ` Werner Almesberger
2005-01-19 17:38                   ` Karim Yaghmour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox