public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Josh Stone <jistone@redhat.com>
To: Theodore Tso <tytso@mit.edu>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ext4: Add a stub for mpage_da_data in the trace header
Date: Wed, 30 Sep 2009 17:13:05 -0700	[thread overview]
Message-ID: <4AC3F411.5040506@redhat.com> (raw)
In-Reply-To: <20090930212332.GN24383@mit.edu>

On 09/30/2009 02:23 PM, Theodore Tso wrote:
> On Wed, Sep 30, 2009 at 12:45:23PM -0700, Josh Stone wrote:
>> If you just want the data in the trace buffer, then SystemTap is not the
>> tool for you.  By all means, just write yourself a perl script or
>> something that parses the trace buffer however you like.
>>
>> On the other hand, stap is useful to do some processing/inspection
>> *live*, at the moment the event happens.  For that, we register our own
>> tracepoint handler that can do something different than ftrace.
> 
> So there are two things I would point out here.  First of all, now
> that ftrace has the ability to do basic filtering, just about the only
> thing SystemTap can do which is unique is either complex filtering,
> summary statistics, or some kind of correlation between multiple
> events (within the limits of restricted memory allocation limits of
> SystemTap).

This "only thing" seems like quite a lot to me, but I suppose the
significance could be a matter of opinion.  I would also add that
SystemTap can better support concurrent users who want to monitor
different things.

> So I'm not sure it's such a great idea to cede a large bit of
> functionality to as being something that SystemTap will never 
> accomplish --- especially when it's far more convenient and stable
> to depend on fixed trace points than setting arbitrary dynamic trace 
> points in the middle of source files which will break all the time 
> when distro's release new kernels, etc.

I don't understand your point about ceding here.  But yes, I agree that
fixed trace points are more convenient and stable, which is why we've
long supported static instrumentation in the kernel.

> Secondly, while I'm not so sure it's that big of a restriction to have
> Systemtap pull events out of the trace buffer, if you must capture the
> event right as it happens, it should be possible set a kprobe in the
> ftrace subsystem, and then pull out the data of the event from the
> trace buffer.

This is possible, but it's a step backward for a few reasons:

- A kprobe will be inherently slower than a tracepoint handler.
- It requires debuginfo (maybe not to place the probe, but surely to dig
into ftrace's internal data structures).
- It requires knowledge about the ftrace internals, which is fragile and
unmaintainable.
- It assumes that every bit of data that the user wants is captured in
the trace buffer.

I think that last point is particularly significant.  Kernel devs are
not prescient, so the trace event might not be capturing all of the data
that's relevant to a particular troubleshooting effort.  With stap you
can gather whatever data you want.

(By the way, I seem to recall that we once discussed adding a proper
hook for stap to grab ftrace data as it comes, but I don't think that
went anywhere.)

> Keep in mind that one of the advantage DTrace has over SystemTap is
> that it can use pre-defined events in the kernel, and not have to
> keep userspace macro files in sync with a changing kernel source
> base.  It seems counterproductive to throw away the opportunity of
> being able to read the tracepoint event data, since it would give 
> SystemTap a lot more power.

Aren't "pre-defined events" == tracepoints?  That's exactly what we're
trying to use in SystemTap!  But then, DTrace doesn't dictate what data
is captured at those events, so I don't understand why you think we
should be more restrictive.

>> However, SystemTap does *not* require the kernel debuginfo for using
>> tracepoints, even when reading parameters.  It should work in the
>> complete absence of CONFIG_DEBUGINFO, so if you find otherwise, please
>> let me know and I will fix it.
> 
> Well, how is it going to do that if you don't have access to the
> structure definition?  This is why fetching the information from the
> ring buffer is much more powerful.

True, when neither a header nor debuginfo for a private type is
available, then it will be opaque to us, so the ring buffer can offer
pre-defined insight into those structures.  But in sched_switch, for
example, the ring buffer only knows prev/next->comm/pid/prio/state,
whereas stap has the entire rq and task_structs at your disposal.  Each
has power in their own place...

Josh

      reply	other threads:[~2009-10-01  0:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-29 21:40 [PATCH] ext4: Add a stub for mpage_da_data in the trace header Josh Stone
2009-09-30 13:33 ` Christoph Hellwig
2009-09-30 14:20   ` Theodore Tso
2009-09-30 19:45     ` Josh Stone
2009-09-30 21:23       ` Theodore Tso
2009-10-01  0:13         ` Josh Stone [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AC3F411.5040506@redhat.com \
    --to=jistone@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox