From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Martin Bligh <mbligh@google.com>,
"Frank Ch. Eigler" <fche@redhat.com>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
prasanna@in.ibm.com, Andrew Morton <akpm@osdl.org>,
Paul Mundt <lethal@linux-sh.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Jes Sorensen <jes@sgi.com>, Tom Zanussi <zanussi@us.ibm.com>,
Richard J Moore <richardj_moore@uk.ibm.com>,
Michel Dagenais <michel.dagenais@polymtl.ca>,
Christoph Hellwig <hch@infradead.org>,
Greg Kroah-Hartman <gregkh@suse.de>,
Thomas Gleixner <tglx@linutronix.de>,
William Cohen <wcohen@redhat.com>,
ltt-dev@shafik.org, systemtap@sources.redhat.com,
Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: [PATCH] Linux Kernel Markers 0.5 for Linux 2.6.17 (with probe management)
Date: Fri, 22 Sep 2006 11:08:11 -0400 [thread overview]
Message-ID: <20060922150810.GB20839@Krystal> (raw)
In-Reply-To: <20060922070714.GB4167@elte.hu>
Good morning Ingo,
* Ingo Molnar (mingo@elte.hu) wrote:
>
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
>
> > I clearly expressed my position in the previous emails, so did you.
> > You argued about a use of tracing that is not relevant to my vision of
> > reality, which is :
> >
> > - Embedded systems developers won't want a breakpoint-based probe
>
> are you arguing that i'm trying to force breakpoint-based probing on
> you? I dont. In fact i explicitly mentioned that i'd accept and support
> a 5-byte NOP in the body of the marker, in the following mails:
>
> "just go for [...] the 5-NOP variant"
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115859771924187&w=2
> (my reply to your second proposal)
>
> "or at most one NOP"
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115865412332230&w=2
> (my reply to your third proposal)
>
> "at most a NOP inserted"
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115886524224874&w=2
> (my reply to your fifth proposal)
>
> That enables the probe to be turned into a function call - not an INT3
> breakpoint. Does it take some effort to implement that on your part?
> Yes, of course, but getting code upstream is never easy, /especially/ in
> cases where most of the users wont use a particular feature.
>
Some details are worth to be mentioned :
- The 5-NOP variant will imply a replacement of 5 1 bytes instructions with 1 5
bytes one, which is trickier. Masami Hiramatsu's proposal of 2 bytes near jump
+ 3 NOPS is nicer.
- Patching such a 5-bytes instruction memory region doesn't turn markers into a
complete function call, which includes argument passing.
- The argument "most of the users wont use a particular feature" contradicts
what you said earlier about every distribution wanting to enable a tracing
mechanism for their users.
> > - High performance computing users won't want a breakpoint-based probe
>
> I am not forcing breakpoint-based probing, at all. I dont want _static,
> build-time function call based_ probing, and there is a big difference.
> And one reason why i want to avoid "static, build-time function call
> based probing" is because high-performance computing users dont want any
> overhead at all in the kernel fastpath.
>
I think that the performance benefits gained by using tracing information for
studying a system makes the overhead of a jump in the kernel fast path
insignificant. Having a stack setup + function call already put there by the
compiler has the following advantages :
- It is very robust (I could think of using it on a live server, which is not
true of the djprobe approach).
- It is predictable on every architecture.
- The information extracted is _always_ coherent with the marked variables,
because the compiler itself created the full function call (stack setup
included).
> > - djprobe is far away from being in an acceptable state on
> > architectures with very inconvenient erratas (x86).
>
> djprobes over a NOP marker are perfectly usable and safe: just add a
> simple constraint to them to only allow a djprobes insertion if it
> replaces a 5-byte NOP.
>
2 bytes jump + 3 bytes nops.. Yes, it should modify it without causing an
illegal instruction, but how ? Are you aware that their approach has to :
- put an int3
- wait for _all_ the CPUs to execute this int3
- then change the 5 bytes instruction
I can think of a lot of cases where the CPUs will never execute this int3.
Probably that sending an IPI or launching a kernel thread on each CPU to make
sure that this int3 is executed could give more guarantees there. But my point
is not even there : I have seen very skillful teams work hard on those
hardware-caused problems for years and the result is still not usable. It looks
to me like a race between software developers and hardware manufacturers, where
the software guy is always one step behind. This kind of scenario happens when
you want to use an architecture in a way it was not designed and tested for.
As long as CPU manufacturers won't design for live instruction patching (and why
should they do that ? the in3 breakpoint is all what is needed from their
perspective), this will be a race where software developers will lose.
> > - kprobe and djprobe cannot access local variables in every cases
>
> it is possible with the marker mechanism i outlined before:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115886524224874&w=2
>
> have i missed to address any concern of yours?
>
Interesting idea. That would make it possible to probe local variables at the
marker site. That's very good for use of kprobes on low rate debug-type markers,
but that doesn't solve my concern about the cat-and-mouse race expressed earlier
about live kernel polymorphic code.
I would be all in for this kind of combo :
If you can find a way to make a kprobe-based probe extract the variables from
such a variable-dependency marked site, that would be great for dynamic of low
event rate code paths. For the high event rate, and while we wait for such a
probe to exist, I think that the load+jump over a complete call is the lowest
cost, most robust, coherent, predictable and portable mechanism I have seen
so far.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2006-09-22 15:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-21 16:00 [PATCH] Linux Kernel Markers 0.5 for Linux 2.6.17 (with probe management) Mathieu Desnoyers
2006-09-21 16:06 ` Ingo Molnar
2006-09-21 21:42 ` Mathieu Desnoyers
2006-09-21 21:49 ` Mathieu Desnoyers
2006-09-22 6:29 ` Karim Yaghmour
2006-09-22 6:49 ` Ingo Molnar
2006-09-22 14:03 ` Mathieu Desnoyers
2006-09-22 16:53 ` Ingo Molnar
2006-09-22 17:11 ` Mathieu Desnoyers
2006-09-22 17:12 ` Ingo Molnar
2006-09-22 17:28 ` Mathieu Desnoyers
2006-09-22 7:07 ` Ingo Molnar
2006-09-22 8:14 ` Karim Yaghmour
2006-09-22 15:08 ` Mathieu Desnoyers [this message]
2006-09-22 16:24 ` Karim Yaghmour
2006-09-22 16:13 ` Mathieu Desnoyers
2006-09-22 17:03 ` Karim Yaghmour
2006-09-22 18:06 ` Mathieu Desnoyers
2006-09-22 19:24 ` Karim Yaghmour
2006-09-22 16:45 ` Ingo Molnar
2006-09-22 14:31 ` Christoph Hellwig
2006-09-23 16:51 ` Mathieu Desnoyers
2006-09-21 17:56 ` Frank Ch. Eigler
2006-09-21 18:50 ` Ingo Molnar
2006-09-21 19:54 ` Jeremy Fitzhardinge
2006-09-25 17:45 ` Frank Ch. Eigler
2006-09-21 20:59 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060922150810.GB20839@Krystal \
--to=compudj@krystal.dyndns.org \
--cc=akpm@osdl.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=fche@redhat.com \
--cc=gregkh@suse.de \
--cc=hch@infradead.org \
--cc=jes@sgi.com \
--cc=lethal@linux-sh.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ltt-dev@shafik.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mbligh@google.com \
--cc=michel.dagenais@polymtl.ca \
--cc=mingo@elte.hu \
--cc=prasanna@in.ibm.com \
--cc=richardj_moore@uk.ibm.com \
--cc=systemtap@sources.redhat.com \
--cc=tglx@linutronix.de \
--cc=wcohen@redhat.com \
--cc=zanussi@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox