Re: [PATCH] Linux Kernel Markers 0.5 for Linux 2.6.17 (with probe management)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Martin Bligh <mbligh@google.com>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	prasanna@in.ibm.com, Andrew Morton <akpm@osdl.org>,
	Paul Mundt <lethal@linux-sh.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Jes Sorensen <jes@sgi.com>, Tom Zanussi <zanussi@us.ibm.com>,
	Richard J Moore <richardj_moore@uk.ibm.com>,
	Michel Dagenais <michel.dagenais@polymtl.ca>,
	Christoph Hellwig <hch@infradead.org>,
	Greg Kroah-Hartman <gregkh@suse.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	William Cohen <wcohen@redhat.com>,
	ltt-dev@shafik.org, systemtap@sources.redhat.com,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: [PATCH] Linux Kernel Markers 0.5 for Linux 2.6.17 (with probe management)
Date: Fri, 22 Sep 2006 11:08:11 -0400	[thread overview]
Message-ID: <20060922150810.GB20839@Krystal> (raw)
In-Reply-To: <20060922070714.GB4167@elte.hu>

Good morning Ingo,

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> 
> > I clearly expressed my position in the previous emails, so did you. 
> > You argued about a use of tracing that is not relevant to my vision of 
> > reality, which is :
> > 
> > - Embedded systems developers won't want a breakpoint-based probe
> 
> are you arguing that i'm trying to force breakpoint-based probing on 
> you? I dont. In fact i explicitly mentioned that i'd accept and support 
> a 5-byte NOP in the body of the marker, in the following mails:
> 
>     "just go for [...] the 5-NOP variant"
>       http://marc.theaimsgroup.com/?l=linux-kernel&m=115859771924187&w=2
>         (my reply to your second proposal)
> 
>     "or at most one NOP"
>       http://marc.theaimsgroup.com/?l=linux-kernel&m=115865412332230&w=2
>         (my reply to your third proposal)
> 
>     "at most a NOP inserted"
>       http://marc.theaimsgroup.com/?l=linux-kernel&m=115886524224874&w=2
>         (my reply to your fifth proposal)
> 
> That enables the probe to be turned into a function call - not an INT3 
> breakpoint. Does it take some effort to implement that on your part? 
> Yes, of course, but getting code upstream is never easy, /especially/ in 
> cases where most of the users wont use a particular feature.
> 

Some details are worth to be mentioned :

- The 5-NOP variant will imply a replacement of 5 1 bytes instructions with 1 5
  bytes one, which is trickier. Masami Hiramatsu's proposal of 2 bytes near jump
  + 3 NOPS is nicer.
- Patching such a 5-bytes instruction memory region doesn't turn markers into a
  complete function call, which includes argument passing.
- The argument "most of the users wont use a particular feature" contradicts
  what you said earlier about every distribution wanting to enable a tracing
  mechanism for their users.

> > - High performance computing users won't want a breakpoint-based probe
> 
> I am not forcing breakpoint-based probing, at all. I dont want _static, 
> build-time function call based_ probing, and there is a big difference. 
> And one reason why i want to avoid "static, build-time function call 
> based probing" is because high-performance computing users dont want any 
> overhead at all in the kernel fastpath.
> 

I think that the performance benefits gained by using tracing information for
studying a system makes the overhead of a jump in the kernel fast path
insignificant. Having a stack setup + function call already put there by the
compiler has the following advantages :

- It is very robust (I could think of using it on a live server, which is not
  true of the djprobe approach).
- It is predictable on every architecture.
- The information extracted is _always_ coherent with the marked variables,
  because the compiler itself created the full function call (stack setup
  included).


> > - djprobe is far away from being in an acceptable state on 
> >   architectures with very inconvenient erratas (x86).
> 
> djprobes over a NOP marker are perfectly usable and safe: just add a 
> simple constraint to them to only allow a djprobes insertion if it 
> replaces a 5-byte NOP.
> 

2 bytes jump + 3 bytes nops.. Yes, it should modify it without causing an
illegal instruction, but how ? Are you aware that their approach has to :
- put an int3
- wait for _all_ the CPUs to execute this int3
- then change the 5 bytes instruction

I can think of a lot of cases where the CPUs will never execute this int3.
Probably that sending an IPI or launching a kernel thread on each CPU to make
sure that this int3 is executed could give more guarantees there. But my point
is not even there : I have seen very skillful teams work hard on those
hardware-caused problems for years and the result is still not usable. It looks
to me like a race between software developers and hardware manufacturers, where
the software guy is always one step behind. This kind of scenario happens when
you want to use an architecture in a way it was not designed and tested for.

As long as CPU manufacturers won't design for live instruction patching (and why
should they do that ? the in3 breakpoint is all what is needed from their
perspective), this will be a race where software developers will lose.


> > - kprobe and djprobe cannot access local variables in every cases
> 
> it is possible with the marker mechanism i outlined before:
> 
>   http://marc.theaimsgroup.com/?l=linux-kernel&m=115886524224874&w=2
> 
> have i missed to address any concern of yours?
> 

Interesting idea. That would make it possible to probe local variables at the
marker site. That's very good for use of kprobes on low rate debug-type markers,
but that doesn't solve my concern about the cat-and-mouse race expressed earlier
about live kernel polymorphic code.

I would be all in for this kind of combo :

If you can find a way to make a kprobe-based probe extract the variables from
such a variable-dependency marked site, that would be great for dynamic of low
event rate code paths. For the high event rate, and while we wait for such a
probe to exist, I think that the load+jump over a complete call is the lowest
cost, most robust, coherent, predictable and portable mechanism I have seen
so far.


Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

next prev parent reply	other threads:[~2006-09-22 15:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-21 16:00 [PATCH] Linux Kernel Markers 0.5 for Linux 2.6.17 (with probe management) Mathieu Desnoyers
2006-09-21 16:06 ` Ingo Molnar
2006-09-21 21:42   ` Mathieu Desnoyers
2006-09-21 21:49     ` Mathieu Desnoyers
2006-09-22  6:29     ` Karim Yaghmour
2006-09-22  6:49     ` Ingo Molnar
2006-09-22 14:03       ` Mathieu Desnoyers
2006-09-22 16:53         ` Ingo Molnar
2006-09-22 17:11           ` Mathieu Desnoyers
2006-09-22 17:12             ` Ingo Molnar
2006-09-22 17:28               ` Mathieu Desnoyers
2006-09-22  7:07     ` Ingo Molnar
2006-09-22  8:14       ` Karim Yaghmour
2006-09-22 15:08       ` Mathieu Desnoyers [this message]
2006-09-22 16:24         ` Karim Yaghmour
2006-09-22 16:13           ` Mathieu Desnoyers
2006-09-22 17:03             ` Karim Yaghmour
2006-09-22 18:06               ` Mathieu Desnoyers
2006-09-22 19:24                 ` Karim Yaghmour
2006-09-22 16:45         ` Ingo Molnar
2006-09-22 14:31     ` Christoph Hellwig
2006-09-23 16:51   ` Mathieu Desnoyers
2006-09-21 17:56 ` Frank Ch. Eigler
2006-09-21 18:50   ` Ingo Molnar
2006-09-21 19:54     ` Jeremy Fitzhardinge
2006-09-25 17:45     ` Frank Ch. Eigler
2006-09-21 20:59   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060922150810.GB20839@Krystal \
    --to=compudj@krystal.dyndns.org \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=fche@redhat.com \
    --cc=gregkh@suse.de \
    --cc=hch@infradead.org \
    --cc=jes@sgi.com \
    --cc=lethal@linux-sh.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltt-dev@shafik.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mbligh@google.com \
    --cc=michel.dagenais@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=prasanna@in.ibm.com \
    --cc=richardj_moore@uk.ibm.com \
    --cc=systemtap@sources.redhat.com \
    --cc=tglx@linutronix.de \
    --cc=wcohen@redhat.com \
    --cc=zanussi@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.