public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Steven Rostedt <srostedt@redhat.com>
Subject: Re: [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions
Date: Mon, 23 Feb 2009 10:42:58 -0500	[thread overview]
Message-ID: <20090223154258.GB28727@Krystal> (raw)
In-Reply-To: <alpine.DEB.2.00.0902230928000.18221@gandalf.stny.rr.com>

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> On Sun, 22 Feb 2009, Mathieu Desnoyers wrote:
> 
> > * Steven Rostedt (rostedt@goodmis.org) wrote:
> > > 
> > > On Sun, 22 Feb 2009, Mathieu Desnoyers wrote:
> > > > > 
> > > > > We are changing over 19000 locations in the kernel. This touches almost 
> > > > > all kernel text pages anyway. You want to map a page in and out for over 
> > > > > 19000 locations?
> > > > > 
> > > > > -- Steve
> > > > > 
> > > > 
> > > > Hi Steve,
> > > > 
> > > > Can you provide numbers to indicate why it's required to be so intrusive
> > > > in the kernel mappings while doing these modifications ? I think opening
> > > > such time window where standard code mapping is writeable globally in
> > > > config RO_DATA kernels could open the door to unexpected side-effects,
> > > > so ideally going through the "backdoor" page mapped by text_poke seems
> > > > safer. Given similar performance, I would tend to use a text_poke-like
> > > > approach.
> > > > 
> > > 
> > > Not sure which numbers you are asking for. The dynamic function tracer 
> > > modifies all mcount calls, which are done by practically every function in 
> > > the kernel. With a normal Fedora kernel (and all its loaded modules), 
> > > that's between 15 to 20 thousand functions, depending on what modules are 
> > > loaded.
> > > 
> > 
> > I mean comparing the cost of changing the kernel mappings and doing the
> > edits (as you do) vs doing it through a text-poke-like mapping.
> 
> Well, I could try to do the benchmarks, but that would require a bit of 
> development (see below).
> 
> > 
> > > At boot up we convert them all to nops, but when we enable the function 
> > > tracer, we convert them back to calls to the function tracer. This is done 
> > > by a priviledge user, since the function tracer can add quite a bit of 
> > > overhead when activated.
> > > 
> > > I do not really see how changing this for the short period of time is any 
> > > different than making another mapping point to the kernel code. If you 
> > > could find a way to break this security, you should be able to break it 
> > > with another mapping as well.
> > 
> > It's not only about breaking the security. It's mostly to insure
> > internal kernel consistency. While you are changing these mappings, you
> > could possibly have a window where other kernel code is running (irq,
> > other cpus threads). That code could itself be buggy and use the
> > writeable window to overwrite some kernel code or RO data. This
> > side-effect would go undetected while users think the RO data *is* is,
> > but it wouldn't. You also bring a good point about security : if someone
> > ever rely on CONFIG_DEBUG_RODATA for some security reasons, then we give
> > a big window where kernel text and ro data is writeable at *known*
> > addresses, while we can randomize the address used for text_poke.
> 
> As Ingo already mentioned, if an attacker can write to kernel memory then 
> the game is pretty much over.
> 
> As for RO_DATA and bugs, it is a very small window for this to happen, and 
> the sys-admin is the one making the change. This is not some periodical 
> update. The sys-admin must be the one to initiate the tracer to modify 
> text, ie, enabling or disabling the function tracer. Which, by the way, is 
> something a sys-admin should only do when the system is off line. The 
> overhead of all functions being traced, would not be something to be 
> doing on a production system, unless they need to analyze something going 
> wrong.
> 

The argument "not to be used on production systems" is incompatible with
the LTTng view, sorry. If you design your code so it's usable only in
debugging scenarios on development machines and not in the field, then I
doubt LTTng will be able to rely on it. I'm OK with that, as long as
nobody argue that such tracepoint could be replaced by the function
tracer, because tracepoints has to be enabled in the field on production
machines.

I agree that the racy time window is not that large and is not really a
security concern, but it's still just annoying.


> > 
> > > 
> > > Also note that this dynamic tracing code works for not only x86, it also 
> > > works for PPC, ARM, superH and ia64. To use text_poke, that would require 
> > > all of these to have text_poke ported.
> > > 
> > 
> > Do these architecture have DEBUG_RODATA config ? If not, then a simple
> > memcpy is ok.
> 
> No, the stop_machine has nothing to do with RODATA config. It has to do 
> with a safe way of modifying text that might run on another CPU.
> 
> > 
> > > How does text_poke solve the issue of executing code on another CPU that 
> > > is changing?
> > > 
> > 
> > text_poke itself does not provide that. This must be insured by the
> 
> Exactly! Then I can not replace stop_machine with "text_poke".
> 
> > user on a case-by-case basis. For instance, kprobes is changing code
> > atomically _and_ just inserting/removing breakpoints. Doing this is fine
> > with cross-cpu code modification (XMC). alternative code is only
> > changing code when the CPU is UP, so it's also ok. However, changing
> > multi-bytes instructions without changing those for a trap-generating
> > instruction when the CPUs are up (SMP) falls into the erratas, and the
> > code that uses text_poke must carefully perform this modification, e.g.
> > by doing like I do in my immediate values patchset in the lttng git
> > tree (using a temporary breakpoint while doing the modification). This
> > would apply directly to the function tracer, and you could get rid of
> > this ugly latency-inducing stop_machine() call.
> 
> Then I would need to implement this break point code on every arch. 

No, just on every arch which has such XMC erratas. Intel and ia64 are
the two I am aware of. But I guess if you want to play safe, doing it on
each architecture makes sense.

> Actually, I find the stop_machine quite an elegant solution, and not that 
> ugly. Modifying code on an SMP box is very dangerous. The "stop_machine" 
> turns the SMP box into a UP box while it is running (with the exception of 
> NMIs, but we deal with that separately).

The whole aspect of "we deal with that separately" seems like it adds
complexity to something that should stay simple by working around the
real problem.

> 
> The current method (with DEBUG_RODATA on x86), is to make the kernel text 
> writable. Call stop_machine that will modify all the locations (no 
> interruptions), then switch the kernel text back to read only.
> 

I'm pretty sure that leads to unacceptably long interrupt latency on
production machines.

> What you want me to do is, memory map a location, set a break point on 
> it for anyone that happens to hit it (and do what? call the previous 
> command?)

No. iret just after the modified site. You are flipping between "call"
and "nops", so you take the simplest behavior : nops.

> modify the code, remove the break point, remove the memory 
> mapping and do that for another 19000 times. And this must also be 
> implemented on all archs that support dynamic ftrace.
> 

#define BREAKPOINT_INSN ...
#define BREAKPOINT_INSN_LEN ...

This can be abstracted pretty easily.


> That seems to be much more complex and quite frankly, much more error 
> prone. Tracing's number one priority is stability. The more complex it 
> becomes the less stable it will be.
> 

Given you plan to use tracing only in debugging setups seems to make you
miss another very important aspect : tracer intrusiveness. Disabling
interrupts for a few milliseconds in a row on a telecommunication system
is just unacceptable.

I agree that stability is very important, but, as Einstein said :
"Make everything as simple as possible, but not simpler".

Mathieu

> -- Steve
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-02-23 15:43 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20  1:13 [git pull] changes for tip, and a nasty x86 page table bug Steven Rostedt
2009-02-20  1:13 ` [PATCH 1/6] x86: check PMD in spurious_fault handler Steven Rostedt
2009-02-20  1:13 ` [PATCH 2/6] x86: keep pmd rw bit set when creating 4K level pages Steven Rostedt
2009-02-20  1:13 ` [PATCH 3/6] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
2009-02-20  1:13 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-20  1:32   ` Andrew Morton
2009-02-20  1:44     ` Steven Rostedt
2009-02-20  2:05       ` [PATCH][git pull] update to tip/tracing/ftrace Steven Rostedt
2009-02-22 17:50   ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Andi Kleen
2009-02-22 22:53     ` Steven Rostedt
2009-02-23  0:29       ` Andi Kleen
2009-02-23  2:33       ` Mathieu Desnoyers
2009-02-23  4:29         ` Steven Rostedt
2009-02-23  4:53           ` Mathieu Desnoyers
2009-02-23 14:48             ` Steven Rostedt
2009-02-23 15:42               ` Mathieu Desnoyers [this message]
2009-02-23 15:51                 ` Steven Rostedt
2009-02-23 15:55                   ` Steven Rostedt
2009-02-23 16:13                   ` Mathieu Desnoyers
2009-02-23 16:48                     ` Steven Rostedt
2009-02-23 17:31                       ` Mathieu Desnoyers
2009-02-23 18:17                         ` Steven Rostedt
2009-02-23 18:34                           ` Mathieu Desnoyers
2009-02-27 17:52                           ` Masami Hiramatsu
2009-02-27 18:07                             ` Mathieu Desnoyers
2009-02-27 18:34                               ` Masami Hiramatsu
2009-02-27 18:53                                 ` Mathieu Desnoyers
2009-02-27 20:57                                   ` Masami Hiramatsu
2009-03-02 17:01                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-02 17:19                                       ` Mathieu Desnoyers
2009-03-02 22:15                                         ` Masami Hiramatsu
2009-03-02 22:22                                           ` Ingo Molnar
2009-03-02 22:55                                             ` Masami Hiramatsu
2009-03-02 23:09                                               ` Ingo Molnar
2009-03-02 23:38                                                 ` Masami Hiramatsu
2009-03-02 23:49                                                   ` Ingo Molnar
2009-03-03  0:00                                                     ` Mathieu Desnoyers
2009-03-03  0:00                                                     ` [PATCH] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2009-03-03  0:32                                                       ` Ingo Molnar
2009-03-03  0:39                                                         ` Mathieu Desnoyers
2009-03-03  1:30                                                         ` [PATCH] Text Edit Lock - Architecture Independent Code (v2) Mathieu Desnoyers
2009-03-03  1:31                                                         ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Mathieu Desnoyers
2009-03-03  9:27                                                           ` Ingo Molnar
2009-03-03 12:06                                                             ` Ananth N Mavinakayanahalli
2009-03-03 14:28                                                               ` Mathieu Desnoyers
2009-03-03 14:33                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v3) Mathieu Desnoyers
2009-03-03 14:53                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Ingo Molnar
2009-03-03  0:01                                                     ` [PATCH] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2009-03-03  0:10                                                       ` Masami Hiramatsu
2009-03-03  0:05                                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-03  0:22                                                       ` Ingo Molnar
2009-03-03  0:31                                                         ` Masami Hiramatsu
2009-03-03 16:31                                                           ` [PATCH] x86: make text_poke() atomic using fixmap Masami Hiramatsu
2009-03-03 17:08                                                             ` Mathieu Desnoyers
2009-03-05 10:38                                                             ` Ingo Molnar
2009-03-06 14:06                                                               ` Ingo Molnar
2009-03-06 14:49                                                                 ` Masami Hiramatsu
2009-03-02 18:28                                       ` [RFC][PATCH] x86: make text_poke() atomic Arjan van de Ven
2009-03-02 18:36                                         ` Mathieu Desnoyers
2009-03-02 18:55                                           ` Arjan van de Ven
2009-03-02 19:13                                             ` Masami Hiramatsu
2009-03-02 19:23                                               ` H. Peter Anvin
2009-03-02 19:47                                             ` Mathieu Desnoyers
2009-03-02 18:42                                         ` Linus Torvalds
2009-03-03  4:54                                       ` Nick Piggin
2009-02-23 18:23                         ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-23  9:02         ` Ingo Molnar
2009-02-27 21:08     ` Pavel Machek
2009-02-28 16:56       ` Andi Kleen
2009-02-28 22:08         ` Pavel Machek
     [not found]           ` <87wsba1a9f.fsf@basil.nowhere.org>
2009-02-28 22:19             ` Pavel Machek
2009-02-28 23:52               ` Andi Kleen
2009-02-20  1:13 ` [PATCH 5/6] ftrace: immediately stop code modification if failure is detected Steven Rostedt
2009-02-20  1:13 ` [PATCH 6/6] ftrace: break out modify loop immediately on detection of error Steven Rostedt
2009-02-20  2:00 ` [git pull] changes for tip, and a nasty x86 page table bug Linus Torvalds
2009-02-20  2:08   ` Steven Rostedt
2009-02-20  3:44     ` Linus Torvalds
2009-02-20  4:00       ` Steven Rostedt
2009-02-20  4:17         ` Linus Torvalds
2009-02-20  4:34           ` Steven Rostedt
2009-02-20  5:02           ` Huang Ying
2009-02-20  7:29       ` [PATCH] x86: use the right protections for split-up pagetables Ingo Molnar
2009-02-20  7:39         ` [PATCH, v2] " Ingo Molnar
2009-02-20  8:02           ` Ingo Molnar
2009-02-20 10:24             ` Ingo Molnar
2009-02-20 13:57         ` [PATCH] " Steven Rostedt
2009-02-20 15:40         ` Linus Torvalds
2009-02-20 16:59           ` Ingo Molnar
2009-02-20 18:33           ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090223154258.GB28727@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arjan@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox