From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org
Subject: Re: [patch 34/37] LTTng instrumentation ipc
Date: Fri, 25 Apr 2008 08:56:07 -0400 [thread overview]
Message-ID: <20080425125607.GC28411@Krystal> (raw)
In-Reply-To: <y0mve26r874.fsf@ton.toronto.redhat.com>
* Frank Ch. Eigler (fche@redhat.com) wrote:
>
> Alexey Dobriyan <adobriyan@gmail.com> writes:
>
Alexey, I'll reply to Frank's email, but first let me thank you for
looking into this.
> > [...]
> > Can I write
> > if (rv < 0)
> > trace_mark(foo, "rv %d", rv);
>
> Sure.
>
> > Looks like i could. But people want also want to see success, so what?
> > Two markers per exit?
> >
> > rv = ipc_get(...);
> > if (rv < 0)
> > trace_marker(foo_err, ...);
> > trace_marker(foo_all, ...);
>
> That seems excessive. Just pass "rv" value and let the consumer
> decide whether they care about < 0.
>
> You seem to be operating under the mistaken assumption that marker
> consumers will simply have to pass on the full firehose flow without
> filtering. That is not so. I suspect lttng can do it, but I know
> that with systemtap, it's trivial to encode conditions on the marker
> parameters and other state (e.g., recent events of interest), so that
> only finely tuned events actually get sent to the end user.
>
The preferred way to do it would be
trace_mark(foo, "rv %d", rv);
And let the probe deal with rv. The main reason is that by adding
if (test)
trace_mark()
you will add a branch in the normal kernel code flow and slow down the
kernel a bit when disabled compared to the optimized markers.
>
> > Also everything inserted so far is static. Sometimes only one bit in
> > mask is interesting and to save time to parse nibbles people do:
> > printk("foo = %d\n", !!(mask & foo));
> > And interesting bits vary.
>
> OK, perhaps pass both mask & foo, and let the consumer perform the
> arithmetic they deem appropriate.
>
Agreed. However, adding stuff like
!!(mask & foo)
as parameter to a marker won't add to the disabled marker runtime cost,
since it's evaluated within the "marker enabled" block. So, if all you
really need is !!(mask & foo) (and never other information about foo),
then it could make sense to use the most restrictive version so we don't
export internal details about the kernel implementation.
>
> > Again, all events aren't interesting:
> > if (file && file->f_op == &shm_file_operations)
> > printk("%s: file = %p: START\n", __func__, file);
> > Can I write this with markers?
>
> Of course, if you really want to.
>
__func__ is not really interesting here, because you can name your
marker. A useful trick can be to use __builtin_return_address(0) when
needed though.
for the rest, a way that would not export too much information about
kernel's internals :
trace_mark(shm_start, "is_shm_fop %d file %p",
file->f_op == &shm_file_operations, file);
And yes, there is a small runtime cost added here when the marker is
enabled: the test is done in the probe called rather that in the spot.
On the other hand, the impact is nearly zero when the marker is
disabled.
If you really really want to, I could modify the markers to make this
even easier by doing something like :
trace_mark_cond(file->f_op == &shm_file_operations,
shm_start, "file %p", file);
Where the first argument would be a supplementary condition tested in
the marker block. That would make the active marker case faster. How do
you like that ? (see patch appended at the end of the email)
> > So what is proposed? Insert markers at places that look strategic?
>
> "strategic" is the wrong term. Choose those places that reflect
> internal occurrences that are useful but difficult to reverse-engineer
> from other visible interface points like system calls. Data that
> helps answer questions like "Why did (subtle internal phenomenon)
> happen?" in a live system.
>
I totally agree. And we need to do some work in the system call tracing
area as a starting point. That will help remove some unnecessary
instrumentation I have in LTTng.
>
> > mm/ patch is full if "file %p". Do you realize that pointers are
> > only sometimes interesting and sometimes you want dentry (not pointer,
> > but name!):
> > printk("file = '%s'\n", file->f_dentry->d_name.name);
>
> It may not be excessive to put both file and the dname.name as marker
> parameters.
>
eek, well, if we really want to identify a file, we need more than its
name. mount points, full path and file name are required. It brings me a
few years back, but I don't think the dentry name gives us that. This is
why I extract information about all opened files to my tracer once
(mapping the mount point, path and file name to file pointer) and then I
don't have to do the lookup each time the marker is encountered. Yes,
this involved a file pointers dumping at tracer start and keeping track
of open/close events.
>
> > So, let me say even more explicitly. Looking at proposed places elite
> > enough to be blessed with marker...
> >
> > Those which are close enough to system call boundary are essentially
> > strace(1).
>
> Those may not sound worthwhile to put a marker for, BUT, you're
> ignoring the huge differences of impact and scope. A system-wide
> marker-based trace (filtered a la systemtap if desired) can be done
> with a tiny fraction of system load and none of the disruption caused
> by an strace of all the processes.
>
I agree with both ;) Actually we need a low-overhead hook in
syscall_trace(), so we can perform efficient system-wide tracing of
system calls. I'll dig in this as soon as I find time.
Basic ideas :
- I already have the TIF_KERNEL_TRACE thread flag added to all
architectures in another patchset.
- We add a function called on TIF_KERNEL_TRACE, from do_syscall_trace(),
which is architecture-specific. It's basically a big switch() for all
system calls. syscalls which takes similar types could be grouped
together, but I don't think it would be useful at all. It might be
better just to add a trace_mark for each so we extract the syscall
fields in the marker string.
- We perform the page fault (caused by strings and structures) reads on
the spot, because we prefer not to do this in atomic context.
- We put a marker, e.g., for x86_32, a pseudo-code like :
syscall_trace_enter()
{
...
if (test_thread_flag(TIF_KERNEL_TRACE))
do_marker_syscall_trace();
...
}
do_marker_syscall_trace()
{
char *tmpbuf;
switch(regs->orig_ax) {
case SYS_OPEN:
tmpbuf = vmalloc(4096); /* what size is needed ? */
copy_from_user(tmpbuf, regs->bx);
trace_mark(sys_open, "filename %p flags %d mode %d",
tmpbuf, regs->cx, regs->dx);
vfree(tmpbuf);
break;
}
}
Modulo some optimization, what do you think of this ? If someone is
willing to implement this, I can provide the patchset for
TIF_KERNEL_TRACE.
>
> > [...] Markers points won't be removed, only accumulated -- somebody
> > _might_ be interested in this information.
>
> We all (data producers and consumers) need to use good judgment and
> accept moderate change.
>
>
> - FChE
Yes, and I see more and more that we need the in-kernel syscall tracing
infrastructure as a starting point. Then, the only markers left will
deal with useful inner-kernel information like scheduler change, vmalloc
memory allocation, and so on.
Mathieu
Markers condition
>
> > Again, all events aren't interesting:
> > if (file && file->f_op == &shm_file_operations)
> > printk("%s: file = %p: START\n", __func__, file);
> > Can I write this with markers?
>
> Of course, if you really want to.
>
__func__ is not really interesting here, because you can name your
marker. A useful trick can be to use __builtin_return_address(0) when
needed though.
for the rest, a way that would not export too much information about
kernel's internals :
trace_mark(shm_start, "is_shm_fop %d file %p",
file->f_op == &shm_file_operations, file);
And yes, there is a small runtime cost added here when the marker is
enabled: the test is done in the probe called rather that in the spot.
On the other hand, the impact is nearly zero when the marker is
disabled.
If you really really whant to, I could modify the markers to make this
even easier by doing something like :
trace_mark_cond(file->f_op == &shm_file_operations,
shm_start, "file %p", file);
Where the first argument would be a supplementary condition tested in
the marker block. That would make the active marker case faster. How do
you like that ?
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: "Frank Ch. Eigler" <fche@redhat.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: akpm@linux-foundation.org
---
include/linux/marker.h | 34 +++++++++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 5 deletions(-)
Index: linux-2.6-lttng/include/linux/marker.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/marker.h 2008-04-25 08:16:21.000000000 -0400
+++ linux-2.6-lttng/include/linux/marker.h 2008-04-25 08:21:27.000000000 -0400
@@ -64,7 +64,7 @@ struct marker {
* If generic is true, a variable read is used.
* If generic is false, immediate values are used.
*/
-#define __trace_mark(generic, name, call_private, format, args...) \
+#define __trace_mark(generic, cond, name, call_private, format, args...)\
do { \
static const char __mstrtab_##name[] \
__attribute__((section("__markers_strings"))) \
@@ -76,12 +76,12 @@ struct marker {
{ __mark_empty_function, NULL}, NULL }; \
__mark_check_format(format, ## args); \
if (!generic) { \
- if (unlikely(imv_cond(__mark_##name.state))) \
+ if (unlikely(imv_cond(__mark_##name.state) && (cond))) \
(*__mark_##name.call) \
(&__mark_##name, call_private, \
## args); \
} else { \
- if (unlikely(_imv_read(__mark_##name.state))) \
+ if (unlikely(_imv_read(__mark_##name.state) && (cond)))\
(*__mark_##name.call) \
(&__mark_##name, call_private, \
## args); \
@@ -108,7 +108,7 @@ static inline void marker_update_probe_r
* to be enabled when immediate values are present.
*/
#define trace_mark(name, format, args...) \
- __trace_mark(0, name, NULL, format, ## args)
+ __trace_mark(0, 1, name, NULL, format, ## args)
/**
* _trace_mark - Marker using variable read
@@ -122,7 +122,31 @@ static inline void marker_update_probe_r
* lockdep, some traps, printk).
*/
#define _trace_mark(name, format, args...) \
- __trace_mark(1, name, NULL, format, ## args)
+ __trace_mark(1, 1, name, NULL, format, ## args)
+
+/**
+ * trace_mark_cond - Marker using code patching, testing a condition
+ * @cond: condition to test
+ * @name: marker name, not quoted.
+ * @format: format string
+ * @args...: variable argument list
+ *
+ * Like trace_mark(), but tests if cond is true to execute the trace mark.
+ */
+#define trace_mark_cond(cond, name, format, args...) \
+ __trace_mark(0, cond, name, NULL, format, ## args)
+
+/**
+ * _trace_mark_cond - Marker using variable read, testing a condition
+ * @cond: condition to test
+ * @name: marker name, not quoted.
+ * @format: format string
+ * @args...: variable argument list
+ *
+ * Like _trace_mark(), but tests if cond is true to execute the trace mark.
+ */
+#define _trace_mark_cond(cond, name, format, args...) \
+ __trace_mark(1, cond, name, NULL, format, ## args)
/**
* MARK_NOARGS - Format string for a marker with no argument.
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-04-25 12:56 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-24 15:03 [patch 00/37] Linux Kernel Markers instrumentation for sched-devel.git Mathieu Desnoyers
2008-04-24 15:03 ` [patch 01/37] Stringify support commas Mathieu Desnoyers
2008-04-24 15:03 ` [patch 02/37] x86_64 page fault NMI-safe Mathieu Desnoyers
2008-04-24 15:03 ` [patch 03/37] Change Alpha active count bit Mathieu Desnoyers
2008-04-24 15:03 ` [patch 04/37] Change avr32 " Mathieu Desnoyers
2008-04-24 15:03 ` [patch 05/37] x86 NMI-safe INT3 and Page Fault Mathieu Desnoyers
2008-04-24 15:03 ` [patch 06/37] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
2008-04-24 15:03 ` [patch 07/37] Kprobes - do not use kprobes mutex in arch code Mathieu Desnoyers
2008-04-24 15:03 ` [patch 08/37] Kprobes - declare kprobe_mutex static Mathieu Desnoyers
2008-04-24 15:03 ` [patch 09/37] Fix sched-devel text_poke Mathieu Desnoyers
2008-04-24 15:03 ` [patch 10/37] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2008-04-24 15:03 ` [patch 11/37] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 12/37] Add all cpus option to stop machine run Mathieu Desnoyers
2008-04-24 15:03 ` [patch 13/37] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2008-04-25 14:55 ` Ingo Molnar
2008-04-26 9:36 ` Ingo Molnar
2008-04-26 11:09 ` Ingo Molnar
2008-04-26 14:17 ` Mathieu Desnoyers
2008-04-24 15:03 ` [patch 14/37] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2008-04-24 15:03 ` [patch 15/37] Immediate Values - x86 Optimization Mathieu Desnoyers
2008-04-24 15:03 ` [patch 16/37] Add text_poke and sync_core to powerpc Mathieu Desnoyers
2008-04-24 15:03 ` [patch 17/37] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2008-04-24 15:03 ` [patch 18/37] Immediate Values - Documentation Mathieu Desnoyers
2008-04-24 15:03 ` [patch 19/37] Immediate Values Support init Mathieu Desnoyers
2008-04-24 15:03 ` [patch 20/37] Immediate Values - Move Kprobes x86 restore_interrupt to kdebug.h Mathieu Desnoyers
2008-04-24 15:03 ` [patch 21/37] Add __discard section to x86 Mathieu Desnoyers
2008-04-24 15:03 ` [patch 22/37] Immediate Values - x86 Optimization NMI and MCE support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 23/37] Immediate Values - Powerpc Optimization NMI " Mathieu Desnoyers
2008-04-24 15:03 ` [patch 24/37] Immediate Values Use Arch NMI and MCE Support Mathieu Desnoyers
2008-04-24 15:03 ` [patch 25/37] Immediate Values - Jump Mathieu Desnoyers
2008-04-24 15:03 ` [patch 26/37] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
2008-04-24 15:03 ` [patch 27/37] From: Adrian Bunk <bunk@kernel.org> Mathieu Desnoyers
2008-04-28 9:54 ` Adrian Bunk
2008-04-28 12:37 ` Mathieu Desnoyers
2008-04-24 15:03 ` [patch 28/37] Markers - remove extra format argument Mathieu Desnoyers
2008-04-24 15:03 ` [patch 29/37] Markers - define non optimized marker Mathieu Desnoyers
2008-04-24 15:03 ` [patch 30/37] Linux Kernel Markers - Use Immediate Values Mathieu Desnoyers
2008-04-24 15:03 ` [patch 31/37] Markers use imv jump Mathieu Desnoyers
2008-04-24 15:03 ` [patch 32/37] Port ftrace to markers Mathieu Desnoyers
2008-04-24 15:03 ` [patch 33/37] LTTng instrumentation fs Mathieu Desnoyers
2008-04-24 15:03 ` [patch 34/37] LTTng instrumentation ipc Mathieu Desnoyers
2008-04-24 23:02 ` Alexey Dobriyan
2008-04-25 2:15 ` Frank Ch. Eigler
2008-04-25 12:56 ` Mathieu Desnoyers [this message]
2008-04-25 13:17 ` [RFC] system-wide in-kernel syscall tracing Mathieu Desnoyers
2008-05-04 21:04 ` [patch 34/37] LTTng instrumentation ipc Alexey Dobriyan
2008-05-04 20:39 ` Frank Ch. Eigler
2008-04-24 15:03 ` [patch 35/37] LTTng instrumentation kernel Mathieu Desnoyers
2008-04-24 15:04 ` [patch 36/37] LTTng instrumentation mm Mathieu Desnoyers
2008-04-28 2:12 ` Masami Hiramatsu
2008-04-24 15:04 ` [patch 37/37] LTTng instrumentation net Mathieu Desnoyers
2008-04-24 15:52 ` Pavel Emelyanov
2008-04-24 16:13 ` Mathieu Desnoyers
2008-04-24 16:30 ` Pavel Emelyanov
2008-04-26 19:38 ` [patch 00/37] Linux Kernel Markers instrumentation for sched-devel.git Peter Zijlstra
2008-04-26 20:11 ` Mathieu Desnoyers
2008-04-27 10:00 ` Peter Zijlstra
2008-05-04 15:08 ` Mathieu Desnoyers
2008-04-28 18:22 ` Ingo Molnar
2008-04-28 18:36 ` Andrew Morton
2008-04-28 18:40 ` Christoph Hellwig
2008-04-28 18:47 ` Andrew Morton
2008-04-28 18:49 ` Christoph Hellwig
2008-04-28 19:01 ` KOSAKI Motohiro
2008-04-28 19:52 ` Peter Zijlstra
2008-04-28 22:25 ` Masami Hiramatsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080425125607.GC28411@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=fche@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox