From: Josh Stone <jistone@redhat.com>
To: Theodore Tso <tytso@mit.edu>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ext4: Add a stub for mpage_da_data in the trace header
Date: Wed, 30 Sep 2009 17:13:05 -0700 [thread overview]
Message-ID: <4AC3F411.5040506@redhat.com> (raw)
In-Reply-To: <20090930212332.GN24383@mit.edu>
On 09/30/2009 02:23 PM, Theodore Tso wrote:
> On Wed, Sep 30, 2009 at 12:45:23PM -0700, Josh Stone wrote:
>> If you just want the data in the trace buffer, then SystemTap is not the
>> tool for you. By all means, just write yourself a perl script or
>> something that parses the trace buffer however you like.
>>
>> On the other hand, stap is useful to do some processing/inspection
>> *live*, at the moment the event happens. For that, we register our own
>> tracepoint handler that can do something different than ftrace.
>
> So there are two things I would point out here. First of all, now
> that ftrace has the ability to do basic filtering, just about the only
> thing SystemTap can do which is unique is either complex filtering,
> summary statistics, or some kind of correlation between multiple
> events (within the limits of restricted memory allocation limits of
> SystemTap).
This "only thing" seems like quite a lot to me, but I suppose the
significance could be a matter of opinion. I would also add that
SystemTap can better support concurrent users who want to monitor
different things.
> So I'm not sure it's such a great idea to cede a large bit of
> functionality to as being something that SystemTap will never
> accomplish --- especially when it's far more convenient and stable
> to depend on fixed trace points than setting arbitrary dynamic trace
> points in the middle of source files which will break all the time
> when distro's release new kernels, etc.
I don't understand your point about ceding here. But yes, I agree that
fixed trace points are more convenient and stable, which is why we've
long supported static instrumentation in the kernel.
> Secondly, while I'm not so sure it's that big of a restriction to have
> Systemtap pull events out of the trace buffer, if you must capture the
> event right as it happens, it should be possible set a kprobe in the
> ftrace subsystem, and then pull out the data of the event from the
> trace buffer.
This is possible, but it's a step backward for a few reasons:
- A kprobe will be inherently slower than a tracepoint handler.
- It requires debuginfo (maybe not to place the probe, but surely to dig
into ftrace's internal data structures).
- It requires knowledge about the ftrace internals, which is fragile and
unmaintainable.
- It assumes that every bit of data that the user wants is captured in
the trace buffer.
I think that last point is particularly significant. Kernel devs are
not prescient, so the trace event might not be capturing all of the data
that's relevant to a particular troubleshooting effort. With stap you
can gather whatever data you want.
(By the way, I seem to recall that we once discussed adding a proper
hook for stap to grab ftrace data as it comes, but I don't think that
went anywhere.)
> Keep in mind that one of the advantage DTrace has over SystemTap is
> that it can use pre-defined events in the kernel, and not have to
> keep userspace macro files in sync with a changing kernel source
> base. It seems counterproductive to throw away the opportunity of
> being able to read the tracepoint event data, since it would give
> SystemTap a lot more power.
Aren't "pre-defined events" == tracepoints? That's exactly what we're
trying to use in SystemTap! But then, DTrace doesn't dictate what data
is captured at those events, so I don't understand why you think we
should be more restrictive.
>> However, SystemTap does *not* require the kernel debuginfo for using
>> tracepoints, even when reading parameters. It should work in the
>> complete absence of CONFIG_DEBUGINFO, so if you find otherwise, please
>> let me know and I will fix it.
>
> Well, how is it going to do that if you don't have access to the
> structure definition? This is why fetching the information from the
> ring buffer is much more powerful.
True, when neither a header nor debuginfo for a private type is
available, then it will be opaque to us, so the ring buffer can offer
pre-defined insight into those structures. But in sched_switch, for
example, the ring buffer only knows prev/next->comm/pid/prio/state,
whereas stap has the entire rq and task_structs at your disposal. Each
has power in their own place...
Josh
prev parent reply other threads:[~2009-10-01 0:13 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-29 21:40 [PATCH] ext4: Add a stub for mpage_da_data in the trace header Josh Stone
2009-09-30 13:33 ` Christoph Hellwig
2009-09-30 14:20 ` Theodore Tso
2009-09-30 19:45 ` Josh Stone
2009-09-30 21:23 ` Theodore Tso
2009-10-01 0:13 ` Josh Stone [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AC3F411.5040506@redhat.com \
--to=jistone@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.