From: Josh Stone <jistone@redhat.com>
To: Theodore Tso <tytso@mit.edu>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ext4: Add a stub for mpage_da_data in the trace header
Date: Wed, 30 Sep 2009 17:13:05 -0700 [thread overview]
Message-ID: <4AC3F411.5040506@redhat.com> (raw)
In-Reply-To: <20090930212332.GN24383@mit.edu>
On 09/30/2009 02:23 PM, Theodore Tso wrote:
> On Wed, Sep 30, 2009 at 12:45:23PM -0700, Josh Stone wrote:
>> If you just want the data in the trace buffer, then SystemTap is not the
>> tool for you. By all means, just write yourself a perl script or
>> something that parses the trace buffer however you like.
>>
>> On the other hand, stap is useful to do some processing/inspection
>> *live*, at the moment the event happens. For that, we register our own
>> tracepoint handler that can do something different than ftrace.
>
> So there are two things I would point out here. First of all, now
> that ftrace has the ability to do basic filtering, just about the only
> thing SystemTap can do which is unique is either complex filtering,
> summary statistics, or some kind of correlation between multiple
> events (within the limits of restricted memory allocation limits of
> SystemTap).
This "only thing" seems like quite a lot to me, but I suppose the
significance could be a matter of opinion. I would also add that
SystemTap can better support concurrent users who want to monitor
different things.
> So I'm not sure it's such a great idea to cede a large bit of
> functionality to as being something that SystemTap will never
> accomplish --- especially when it's far more convenient and stable
> to depend on fixed trace points than setting arbitrary dynamic trace
> points in the middle of source files which will break all the time
> when distro's release new kernels, etc.
I don't understand your point about ceding here. But yes, I agree that
fixed trace points are more convenient and stable, which is why we've
long supported static instrumentation in the kernel.
> Secondly, while I'm not so sure it's that big of a restriction to have
> Systemtap pull events out of the trace buffer, if you must capture the
> event right as it happens, it should be possible set a kprobe in the
> ftrace subsystem, and then pull out the data of the event from the
> trace buffer.
This is possible, but it's a step backward for a few reasons:
- A kprobe will be inherently slower than a tracepoint handler.
- It requires debuginfo (maybe not to place the probe, but surely to dig
into ftrace's internal data structures).
- It requires knowledge about the ftrace internals, which is fragile and
unmaintainable.
- It assumes that every bit of data that the user wants is captured in
the trace buffer.
I think that last point is particularly significant. Kernel devs are
not prescient, so the trace event might not be capturing all of the data
that's relevant to a particular troubleshooting effort. With stap you
can gather whatever data you want.
(By the way, I seem to recall that we once discussed adding a proper
hook for stap to grab ftrace data as it comes, but I don't think that
went anywhere.)
> Keep in mind that one of the advantage DTrace has over SystemTap is
> that it can use pre-defined events in the kernel, and not have to
> keep userspace macro files in sync with a changing kernel source
> base. It seems counterproductive to throw away the opportunity of
> being able to read the tracepoint event data, since it would give
> SystemTap a lot more power.
Aren't "pre-defined events" == tracepoints? That's exactly what we're
trying to use in SystemTap! But then, DTrace doesn't dictate what data
is captured at those events, so I don't understand why you think we
should be more restrictive.
>> However, SystemTap does *not* require the kernel debuginfo for using
>> tracepoints, even when reading parameters. It should work in the
>> complete absence of CONFIG_DEBUGINFO, so if you find otherwise, please
>> let me know and I will fix it.
>
> Well, how is it going to do that if you don't have access to the
> structure definition? This is why fetching the information from the
> ring buffer is much more powerful.
True, when neither a header nor debuginfo for a private type is
available, then it will be opaque to us, so the ring buffer can offer
pre-defined insight into those structures. But in sched_switch, for
example, the ring buffer only knows prev/next->comm/pid/prio/state,
whereas stap has the entire rq and task_structs at your disposal. Each
has power in their own place...
Josh
prev parent reply other threads:[~2009-10-01 0:13 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-29 21:40 [PATCH] ext4: Add a stub for mpage_da_data in the trace header Josh Stone
2009-09-30 13:33 ` Christoph Hellwig
2009-09-30 14:20 ` Theodore Tso
2009-09-30 19:45 ` Josh Stone
2009-09-30 21:23 ` Theodore Tso
2009-10-01 0:13 ` Josh Stone [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AC3F411.5040506@redhat.com \
--to=jistone@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox