Re: [RFC] simple dprobe like markers for the kernel

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>, systemtap@sourceware.org
Subject: Re: [RFC] simple dprobe like markers for the kernel
Date: Wed, 9 Jul 2008 23:39:44 -0400	[thread overview]
Message-ID: <20080710033943.GA19022@Krystal> (raw)
In-Reply-To: <1215638551.3444.39.camel@localhost.localdomain>

* James Bottomley (James.Bottomley@HansenPartnership.com) wrote:
> I've been looking at using the existing in kernel markers for dtrace
> named probing in systemtap.  What I find is that they're a bit
> heavyweight when compared to what dtrace does (because of the way they
> drop stubbable calling points).
> 
> This patch adds incredibly simple markers which are designed to be used
> via kprobes.  All it does is add an extra section to the kernel (and
> modules) which annotates the location in source file/line of the marker
> and a description of the variables of interest.  Tools like systemtap
> can then use the kernel dwarf2 debugging information to transform this
> to a precise probe point that gives access to the named variables.
> 
> The beauty of this scheme is that it has zero cost in the unactivated
> case (the extra section is discardable if you're not interested in the
> information, and nothing is actually added into the routine being
> marked).  The disadvantage is that it's really unusable for rolling your
> own marker probes because it relies on the dwarf2 information to locate
> the probe point for kprobes and unravel the local variables of interest,
> so you need an external tool like systemtap to help you.
> 
> The scheme uses a printk format like string to describe the variables of
> interest, so if those variables disappear, the compile breaks (even in
> the unmarked case) which should help us keep the marked probe points
> current.
> 
> For instance, this is what SCSI would look like with a probe point added
> just before the command goes to the low level device
> 
> 		trace_simple(queuecommand, "Command being queued %p Done function %p", cmd, scsi_done);
> 		rtn = host->hostt->queuecommand(cmd, scsi_done);
> 		trace_simple(queuecommand_return, "Command returning %p Return value %d", cmd, rtn);
> 
> Here you can see that each trace point describes two variables whose
> values can be viewed at that point by the relevant tools.  The format
> strings and variables can be used by a tool to perform dtrace -l like
> functionality:
> 
> MODULE    FUNCTION         NAME                DESCRIPTION
> scsi_mod  scsi_dispatch_io queuecommand        Command being queued $sdev; Done function $scsi_done
> scsi_mod  scsi_dispatch_io queuecommand_return Command being queued $sdev; Return value $ret
> 
> So the trace points recommend to the user what variables to use and
> briefly what they mean.
> 
> James
> 

Hi James,

It's interesting to see this try at a stubless marker scheme. A few
things as you and Frank pointed out :
- It depends on an external tool to parse the dwarf info, so it cannot
  be used by in-kernel tracers such as ftrace.
- It does not require variable liveliness at the marker site : the
  compiler can freely optimize out the variable whenever it needs to.

Besides those core concerns, I went through your patch, a small detail
seems incorrect. Please see the comment below.

> ---
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index f054778..c0c38b8 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -299,6 +299,8 @@
>  		.debug_funcnames 0 : { *(.debug_funcnames) }		\
>  		.debug_typenames 0 : { *(.debug_typenames) }		\
>  		.debug_varnames  0 : { *(.debug_varnames) }		\
> +		/* simple markers (depends on dwarf2 debugging info) */	\
> +		__simple_marker (INFO) : { *(__simple_marker) }		\
>  
>  		/* Stabs debugging sections.  */
>  #define STABS_DEBUG							\
> diff --git a/include/linux/simple_marker.h b/include/linux/simple_marker.h
> new file mode 100644
> index 0000000..675f5b1
> --- /dev/null
> +++ b/include/linux/simple_marker.h
> @@ -0,0 +1,41 @@
> +#include <linux/stringify.h>
> +
> +/* To be used for string format validity checking with gcc */
> +static inline void __printf(1, 2)
> +__trace_simple_check_format(const char *fmt, ...)
> +{
> +}
> +
> +#ifdef CONFIG_DEBUG_INFO
> +#define trace_simple(name, format, args...)				\
> +	do {								\
> +		static const char __simple_name_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= #name;						\
> +		static const char __simple_file_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= __FILE__;						\
> +		static const char __simple_line_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= __stringify(__LINE__);				\
> +		static const char __simple_format_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= #format;						\
> +		static const char __simple_args_##name[]		\
> +		__attribute__((section("__simple_marker")))		\
> +		__attribute__((__used__))				\
> +		= #args;						\

All those variables placed in the __simple_marker section are not
guaranteed to be placed nicely together. There should be a structure
containing pointers to name, file, line, format and args strings (all
together) in a special section, and then those strings could be emitted
in another section. Otherwise, the compiler can freely choose to
interleave different strings from various tracing statements.

Mathieu

> +		if (0)							\
> +			__trace_simple_check_format(format, ## args);	\
> +	} while(0)
> +#else
> +#define trace_simple(name, format, args...)				\
> +	do {								\
> +		if (0)							\
> +			__trace_simple_check_format(format, ## args);	\
> +	 } while(0)
> +#endif
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

next prev parent reply	other threads:[~2008-07-10  3:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-09 21:22 [RFC] simple dprobe like markers for the kernel James Bottomley
2008-07-10  3:39 ` Mathieu Desnoyers [this message]
2008-07-10 13:55   ` James Bottomley
     [not found] <1215638551.3444.39.camel__22002.9595810503$1215638656$gmane$org@localhost.localdomain>
2008-07-10  2:29 ` Frank Ch. Eigler
2008-07-10 13:49   ` James Bottomley
2008-07-10 14:22     ` Frank Ch. Eigler
2008-07-10 14:43       ` James Bottomley
2008-07-10 15:30         ` Theodore Tso
2008-07-10 15:57           ` James Bottomley
2008-07-10 18:18           ` Frank Ch. Eigler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080710033943.GA19022@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox