From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>, systemtap@sourceware.org
Subject: Re: [RFC] simple dprobe like markers for the kernel
Date: Wed, 9 Jul 2008 23:39:44 -0400 [thread overview]
Message-ID: <20080710033943.GA19022@Krystal> (raw)
In-Reply-To: <1215638551.3444.39.camel@localhost.localdomain>
* James Bottomley (James.Bottomley@HansenPartnership.com) wrote:
> I've been looking at using the existing in kernel markers for dtrace
> named probing in systemtap. What I find is that they're a bit
> heavyweight when compared to what dtrace does (because of the way they
> drop stubbable calling points).
>
> This patch adds incredibly simple markers which are designed to be used
> via kprobes. All it does is add an extra section to the kernel (and
> modules) which annotates the location in source file/line of the marker
> and a description of the variables of interest. Tools like systemtap
> can then use the kernel dwarf2 debugging information to transform this
> to a precise probe point that gives access to the named variables.
>
> The beauty of this scheme is that it has zero cost in the unactivated
> case (the extra section is discardable if you're not interested in the
> information, and nothing is actually added into the routine being
> marked). The disadvantage is that it's really unusable for rolling your
> own marker probes because it relies on the dwarf2 information to locate
> the probe point for kprobes and unravel the local variables of interest,
> so you need an external tool like systemtap to help you.
>
> The scheme uses a printk format like string to describe the variables of
> interest, so if those variables disappear, the compile breaks (even in
> the unmarked case) which should help us keep the marked probe points
> current.
>
> For instance, this is what SCSI would look like with a probe point added
> just before the command goes to the low level device
>
> trace_simple(queuecommand, "Command being queued %p Done function %p", cmd, scsi_done);
> rtn = host->hostt->queuecommand(cmd, scsi_done);
> trace_simple(queuecommand_return, "Command returning %p Return value %d", cmd, rtn);
>
> Here you can see that each trace point describes two variables whose
> values can be viewed at that point by the relevant tools. The format
> strings and variables can be used by a tool to perform dtrace -l like
> functionality:
>
> MODULE FUNCTION NAME DESCRIPTION
> scsi_mod scsi_dispatch_io queuecommand Command being queued $sdev; Done function $scsi_done
> scsi_mod scsi_dispatch_io queuecommand_return Command being queued $sdev; Return value $ret
>
> So the trace points recommend to the user what variables to use and
> briefly what they mean.
>
> James
>
Hi James,
It's interesting to see this try at a stubless marker scheme. A few
things as you and Frank pointed out :
- It depends on an external tool to parse the dwarf info, so it cannot
be used by in-kernel tracers such as ftrace.
- It does not require variable liveliness at the marker site : the
compiler can freely optimize out the variable whenever it needs to.
Besides those core concerns, I went through your patch, a small detail
seems incorrect. Please see the comment below.
> ---
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index f054778..c0c38b8 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -299,6 +299,8 @@
> .debug_funcnames 0 : { *(.debug_funcnames) } \
> .debug_typenames 0 : { *(.debug_typenames) } \
> .debug_varnames 0 : { *(.debug_varnames) } \
> + /* simple markers (depends on dwarf2 debugging info) */ \
> + __simple_marker (INFO) : { *(__simple_marker) } \
>
> /* Stabs debugging sections. */
> #define STABS_DEBUG \
> diff --git a/include/linux/simple_marker.h b/include/linux/simple_marker.h
> new file mode 100644
> index 0000000..675f5b1
> --- /dev/null
> +++ b/include/linux/simple_marker.h
> @@ -0,0 +1,41 @@
> +#include <linux/stringify.h>
> +
> +/* To be used for string format validity checking with gcc */
> +static inline void __printf(1, 2)
> +__trace_simple_check_format(const char *fmt, ...)
> +{
> +}
> +
> +#ifdef CONFIG_DEBUG_INFO
> +#define trace_simple(name, format, args...) \
> + do { \
> + static const char __simple_name_##name[] \
> + __attribute__((section("__simple_marker"))) \
> + __attribute__((__used__)) \
> + = #name; \
> + static const char __simple_file_##name[] \
> + __attribute__((section("__simple_marker"))) \
> + __attribute__((__used__)) \
> + = __FILE__; \
> + static const char __simple_line_##name[] \
> + __attribute__((section("__simple_marker"))) \
> + __attribute__((__used__)) \
> + = __stringify(__LINE__); \
> + static const char __simple_format_##name[] \
> + __attribute__((section("__simple_marker"))) \
> + __attribute__((__used__)) \
> + = #format; \
> + static const char __simple_args_##name[] \
> + __attribute__((section("__simple_marker"))) \
> + __attribute__((__used__)) \
> + = #args; \
All those variables placed in the __simple_marker section are not
guaranteed to be placed nicely together. There should be a structure
containing pointers to name, file, line, format and args strings (all
together) in a special section, and then those strings could be emitted
in another section. Otherwise, the compiler can freely choose to
interleave different strings from various tracing statements.
Mathieu
> + if (0) \
> + __trace_simple_check_format(format, ## args); \
> + } while(0)
> +#else
> +#define trace_simple(name, format, args...) \
> + do { \
> + if (0) \
> + __trace_simple_check_format(format, ## args); \
> + } while(0)
> +#endif
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-07-10 3:39 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-09 21:22 [RFC] simple dprobe like markers for the kernel James Bottomley
2008-07-10 3:39 ` Mathieu Desnoyers [this message]
2008-07-10 13:55 ` James Bottomley
[not found] <1215638551.3444.39.camel__22002.9595810503$1215638656$gmane$org@localhost.localdomain>
2008-07-10 2:29 ` Frank Ch. Eigler
2008-07-10 13:49 ` James Bottomley
2008-07-10 14:22 ` Frank Ch. Eigler
2008-07-10 14:43 ` James Bottomley
2008-07-10 15:30 ` Theodore Tso
2008-07-10 15:57 ` James Bottomley
2008-07-10 18:18 ` Frank Ch. Eigler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080710033943.GA19022@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=James.Bottomley@HansenPartnership.com \
--cc=linux-kernel@vger.kernel.org \
--cc=systemtap@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox