From: Ingo Molnar <mingo@elte.hu>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Tom Tromey <tromey@redhat.com>,
Kyle Moffett <kyle@moffetthome.net>,
"Frank Ch. Eigler" <fche@redhat.com>,
Oleg Nesterov <oleg@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Stephen Rothwell <sfr@canb.auug.org.au>,
Fr??d??ric Weisbecker <fweisbec@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
linux-next@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
utrace-devel@redhat.com, Thomas Gleixner <tglx@linutronix.de>,
JimKeniston <jkenisto@us.ibm.com>
Subject: Re: linux-next: add utrace tree
Date: Wed, 27 Jan 2010 09:54:42 +0100 [thread overview]
Message-ID: <20100127085442.GA28422@elte.hu> (raw)
In-Reply-To: <1264575134.4283.1983.camel@laptop>
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2010-01-26 at 15:37 -0800, Linus Torvalds wrote:
> >
> > On Tue, 26 Jan 2010, Tom Tromey wrote:
> > >
> > > In non-stop mode (where you can stop one thread but leave the others
> > > running), gdb wants to have the breakpoints always inserted. So,
> > > something must emulate the displaced instruction.
> >
> > I'm almost totally uninterested in breakpoints that actually re-write
> > instructions. It's impossible to do that efficiently and well, especially
> > in threaded environments.
> >
> > So if you do instruction rewriting, I can only say "that's your problem".
>
> Right, so you're going to love uprobes, which does exactly that. The current
> proposal is overwriting the target instruction with an INT3 and injecting an
> extra vma into the target process's address space containing the original
> instruction(s) and possible jumps back to the old code stream.
>
> I'm all in favor of not doing that extra vma and instead use stack or TLS
> space, but then people complain about having to make that executable (which
> is something I don't really mind, x86 had executable everything for very
> long, and also, its only so when debugging the thing anyway).
I think the best solution for user probes (by far) is to use a simplified
in-kernel instruction emulator for the few common probes instruction. (Kprobes
already partially decodes x86 instructions to make it safe to apply
accelerated probes and there's other decoding logic in the kernel too.)
The design and practical advantages are numerous:
- People want to probe their function prologues most of the time ...
a single INT3 there will in most cases just hit the initial stack
allocation and that's it. We could get quite good coverage (and very fast
emulation) for the common case in not too much code - and much of that code
we already have available. No re-trapping, no extra instruction patching
and complex maintenance of trampolines.
- It's as transparent as it gets - no user-space trampoline or other visible
state that modifies behavior or can be stomped upon by user-space bugs.
- Lightweight and simple probe insertion: no weird setup sequence needing the
stopping of all tasks to install the trampoline. We just add the INT3 and
off you go.
- Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on
task local state.
- The points we can probe are never truly limited as it's all freely
upscalable: if you cannot probe an instruction you want to probe today,
extend the emulator. Deny the rest. _All_ versions of uprobes code i've
seen so far already restricts the probe-compatible instruction set:
RIP-relative instructions are excluded on 64-bit for example.
- Emulation has the _least_ semantical side effects as we really execute
'that' instruction - not some other instruction put elsewhere into a
special vma or into the process/thread stack, or some special in-kernel
trampoline, etc.
- Emulation can be very fast for the common case as well. Nobody will probe
weird, complex instructions. They will use 'perf probe' to insert probes
into their functions 90% of the time ...
- FPU and complex ops and pagefault emulation is not really what i'd expect
to be necessary for simple probing - but it _can_ be added by people who
care about it, if they so wish.
Such a scheme would be _far_ more preferable form a maintenance POV as well,
as the initial code will be small, and we can extend it gradually. All the
other proposals are complex 'all or nothing' schemes with no flexibility for
complexity at all.
Thanks,
Ingo
next prev parent reply other threads:[~2010-01-27 8:55 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20100119211646.GF16096@redhat.com>
2010-01-20 0:12 ` linux-next: add utrace tree Stephen Rothwell
2010-01-20 5:49 ` Ingo Molnar
2010-01-20 6:15 ` Ananth N Mavinakayanahalli
2010-01-20 6:28 ` Ingo Molnar
2010-01-20 6:40 ` Ananth N Mavinakayanahalli
2010-01-20 10:43 ` Frederic Weisbecker
2010-01-20 6:59 ` Stephen Rothwell
2010-01-20 13:24 ` Frank Ch. Eigler
2010-01-20 7:29 ` Ingo Molnar
2010-01-20 14:38 ` Stephen Rothwell
2010-01-21 1:22 ` Roland McGrath
2010-01-22 0:17 ` Stephen Rothwell
2010-01-22 0:30 ` Andrew Morton
2010-01-22 0:31 ` Andrew Morton
2010-01-22 0:51 ` Frank Ch. Eigler
2010-01-22 1:05 ` Andrew Morton
2010-01-22 1:25 ` Frank Ch. Eigler
2010-01-22 1:32 ` Linus Torvalds
2010-01-22 2:22 ` Frank Ch. Eigler
2010-01-22 2:35 ` Linus Torvalds
2010-01-22 20:51 ` Oleg Nesterov
2010-01-23 6:04 ` Ingo Molnar
2010-01-23 12:03 ` Frank Ch. Eigler
2010-01-24 16:36 ` Thomas Gleixner
2010-01-22 1:28 ` Linus Torvalds
2010-01-22 5:21 ` Ananth N Mavinakayanahalli
2010-01-22 13:43 ` Valdis.Kletnieks
2010-01-22 19:39 ` Oleg Nesterov
2010-01-26 13:58 ` Pavel Machek
2010-01-22 18:28 ` Oleg Nesterov
2010-01-22 20:01 ` Frank Ch. Eigler
2010-01-22 20:16 ` Peter Zijlstra
2010-01-22 21:44 ` Frank Ch. Eigler
2010-01-22 21:59 ` Linus Torvalds
2010-01-22 22:13 ` Frank Ch. Eigler
2010-01-23 0:11 ` Linus Torvalds
2010-01-23 0:22 ` Linus Torvalds
2010-01-23 6:20 ` Kyle Moffett
2010-01-23 11:01 ` Alan Cox
2010-01-23 11:51 ` Frank Ch. Eigler
2010-01-23 15:57 ` Arnaldo Carvalho de Melo
2010-01-23 11:23 ` Ingo Molnar
2010-01-23 11:47 ` Frank Ch. Eigler
2010-01-23 19:48 ` tytso
2010-01-24 18:01 ` Frank Ch. Eigler
2010-01-25 1:42 ` Kyle Moffett
2010-01-25 4:55 ` tytso
2010-01-25 16:52 ` Linus Torvalds
2010-01-25 17:02 ` Frank Ch. Eigler
2010-01-25 17:36 ` Linus Torvalds
2010-01-25 17:45 ` Linus Torvalds
2010-01-25 17:54 ` Steven Rostedt
2010-01-25 18:03 ` Alan Cox
2010-01-25 18:12 ` Linus Torvalds
2010-01-25 18:30 ` Steven Rostedt
2010-01-25 18:45 ` Thomas Gleixner
2010-01-25 20:34 ` Ingo Molnar
2010-01-25 20:30 ` Mark Wielaard
2010-01-25 20:42 ` Linus Torvalds
2010-01-26 0:02 ` Renzo Davoli
2010-01-26 0:07 ` Linus Torvalds
2010-01-26 16:08 ` Johannes Stezenbach
2010-01-26 16:28 ` Linus Torvalds
2010-01-26 16:34 ` Christoph Hellwig
2010-01-28 23:53 ` Benjamin Herrenschmidt
2010-01-29 0:21 ` Linus Torvalds
2010-01-25 4:59 ` Ananth N Mavinakayanahalli
2010-01-25 10:13 ` Peter Zijlstra
2010-01-24 5:04 ` Linus Torvalds
2010-01-24 10:25 ` tytso
2010-01-24 13:20 ` Frank Ch. Eigler
2010-01-25 21:05 ` Tom Tromey
2010-01-25 21:41 ` Linus Torvalds
2010-01-26 14:21 ` Ananth N Mavinakayanahalli
2010-01-26 23:20 ` Tom Tromey
2010-01-26 23:37 ` Linus Torvalds
2010-01-27 6:52 ` Peter Zijlstra
2010-01-27 8:54 ` Ingo Molnar [this message]
2010-01-28 1:52 ` Jim Keniston
2010-01-28 8:55 ` Ingo Molnar
2010-01-29 0:59 ` Jim Keniston
2010-01-29 7:39 ` Ingo Molnar
2010-01-29 7:52 ` Ananth N Mavinakayanahalli
2010-01-29 7:55 ` Ananth N Mavinakayanahalli
2010-01-29 9:16 ` Ingo Molnar
2010-01-29 9:11 ` Ingo Molnar
2010-01-29 9:31 ` Ananth N Mavinakayanahalli
2010-01-29 9:51 ` Ingo Molnar
2010-01-29 18:13 ` Frank Ch. Eigler
2010-01-29 4:55 ` Ananth N Mavinakayanahalli
2010-01-29 7:42 ` Ingo Molnar
2010-01-30 17:49 ` Steven Rostedt
2010-01-30 17:59 ` Linus Torvalds
2010-02-02 6:47 ` Masami Hiramatsu
2010-01-27 10:43 ` Linus Torvalds
2010-01-27 10:55 ` Peter Zijlstra
2010-01-27 10:58 ` Peter Zijlstra
2010-01-27 11:04 ` Linus Torvalds
2010-01-27 16:01 ` Frederic Weisbecker
2010-01-27 11:05 ` Ananth N Mavinakayanahalli
2010-01-27 11:08 ` Peter Zijlstra
2010-01-27 11:20 ` Ananth N Mavinakayanahalli
2010-02-08 10:09 ` Avi Kivity
2010-01-27 11:07 ` Srikar Dronamraju
2010-01-27 13:59 ` Steven Rostedt
2010-01-27 17:42 ` H. Peter Anvin
2010-01-27 18:53 ` Steven Rostedt
2010-02-08 6:54 ` Pavel Machek
2010-02-08 9:30 ` H. Peter Anvin
2010-02-08 9:53 ` Arjan van de Ven
2010-01-27 19:18 ` H. Peter Anvin
2010-01-27 0:38 ` Frank Ch. Eigler
2010-01-26 15:00 ` Frank Ch. Eigler
2010-01-26 17:33 ` Andi Kleen
2010-01-26 18:46 ` Linus Torvalds
2010-01-26 21:02 ` Andi Kleen
2010-01-26 21:53 ` Oleg Nesterov
2010-01-26 22:03 ` Andi Kleen
2010-01-26 23:32 ` Oleg Nesterov
2010-01-26 21:30 ` Oleg Nesterov
2010-01-26 23:27 ` Tom Tromey
2010-01-23 8:05 ` Alexey Dobriyan
2010-01-22 17:45 ` Oleg Nesterov
2010-01-20 8:52 ` Peter Zijlstra
2010-01-20 13:01 ` Frank Ch. Eigler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100127085442.GA28422@elte.hu \
--to=mingo@elte.hu \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=jkenisto@us.ibm.com \
--cc=kyle@moffetthome.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sfr@canb.auug.org.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=tromey@redhat.com \
--cc=utrace-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).