All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Jiri Kosina <jkosina@suse.cz>,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH v2 1/8] x86: speed up int3-based patching using less paranoid write
Date: Fri, 08 Nov 2013 21:04:26 +0900	[thread overview]
Message-ID: <527CD34A.30406@hitachi.com> (raw)
In-Reply-To: <1383901932-1945-3-git-send-email-pmladek@suse.cz>

(2013/11/08 18:12), Petr Mladek wrote:
> This change is inspired by the int3-based patching code used in
> ftrace. See the commit fd4363fff3d9 (x86: Introduce int3
> (breakpoint)-based instruction patching).
> 
> When trying to use text_poke_bp in ftrace, the result was slower than
> the original implementation.
> 
> It turned out that one difference was in text_poke vs. ftrace_write.
> text_poke did many extra operations to make sure that the change
> was atomic.

AFAIK, the main reason why text_poke is used is avoiding
RODATA protection (by alias mapping).

> In fact, we do not need this luxury in text_poke_bp because
> the modified code is guarded by the int3 handler. The int3
> interrupt itself is added and removed by an atomic operation
> because we modify only one byte.
> 
> This patch adds text_poke_part which is inspired by ftrace_write.
> Then it is used in text_poke_bp instead of the paranoid text_poke.
> 
> For example, I tried to switch between 7 tracers: blk, branch,
> function_graph, wakeup_rt, irqsoff, function, and nop. Every tracer
> has also been enabled and disabled. With 100 cycles, I got these
> times with text_poke:
> 
>     real    25m7.380s     25m13.44s     25m9.072s
>     user    0m0.004s      0m0.004s      0m0.004s
>     sys     0m3.480s      0m3.508s      0m3.420s
> 
> and with text_poke_part:
> 
>     real    3m23.335s     3m24.422s     3m26.686s
>     user    0m0.004s      0m0.004s	0m0.004s
>     sys     0m3.724s      0m3.772s	0m3.588s
> 
> Signed-off-by: Petr Mladek <pmladek@suse.cz>
> ---
>  arch/x86/kernel/alternative.c | 49 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index df94598..0586dc1 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -8,6 +8,7 @@
>  #include <linux/kprobes.h>
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
> +#include <linux/uaccess.h>
>  #include <linux/memory.h>
>  #include <linux/stop_machine.h>
>  #include <linux/slab.h>
> @@ -586,6 +587,44 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
>  	return addr;
>  }
>  
> +/**
> + * text_poke_part - Update part of the instruction on a live kernel when using
> + * 		    int3 breakpoint
> + * @addr:   address to modify
> + * @opcode: source of the copy
> + * @len:    length to copy
> + *
> + * If we patch instructions using int3 interrupt, we do not need to be so
> + * careful about an atomic write. Adding and removing the interrupt is atomic
> + * because we modify only one byte. The rest of the instruction could be
> + * modified in several steps because it is guarded by the interrupt handler.
> + * Hence we could use faster code here. See also text_poke_bp below for more
> + * details.
> + *
> + * Note: Must be called under text_mutex.
> + */
> +static int text_poke_part(void *addr, const void *opcode, size_t len)
> +{
> +	/*
> +	 * On x86_64, kernel text mappings are mapped read-only with
> +	 * CONFIG_DEBUG_RODATA. So we use the kernel identity mapping instead
> +	 * of the kernel text mapping to modify the kernel text.

Hmm, I couldn't find the guarantee that __va(__pa_symbol(x)) gives us a
raw(writable) text pages always.
Even it is true, it seems wired that we can still rewrite such protected
page without any spacial page remapping operation.


> +	 *
> +	 * For 32bit kernels, these mappings are same and we can use
> +	 * kernel identity mapping to modify code.
> +	 */
> +	if (((unsigned long)addr >= (unsigned long)_text) &&
> +	    ((unsigned long)addr < (unsigned long)_etext))
> +		addr = __va(__pa_symbol(addr));

You should also warn or return error if the given addr is not in the kernel text.

> +
> +	if (probe_kernel_write(addr, opcode, len))
> +		return -EPERM;
> +
> +	sync_core();
> +
> +	return 0;
> +}
> +
>  static void do_sync_core(void *info)
>  {
>  	sync_core();
> @@ -648,15 +687,15 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
>  	 */
>  	smp_wmb();
>  
> -	text_poke(addr, &int3, sizeof(int3));
> +	text_poke_part(addr, &int3, sizeof(int3));

Anyway, text_poke itself will cause a BUG if it hits an error, but text_poke_part() seems
silently failing. It should handle the error correctly (or call BUG()).

>  
>  	on_each_cpu(do_sync_core, NULL, 1);
>  
>  	if (len - sizeof(int3) > 0) {
>  		/* patch all but the first byte */
> -		text_poke((char *)addr + sizeof(int3),
> -			  (const char *) opcode + sizeof(int3),
> -			  len - sizeof(int3));
> +		text_poke_part((char *)addr + sizeof(int3),
> +			       (const char *) opcode + sizeof(int3),
> +			       len - sizeof(int3));

Here we should check an error too.

>  		/*
>  		 * According to Intel, this core syncing is very likely
>  		 * not necessary and we'd be safe even without it. But
> @@ -666,7 +705,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
>  	}
>  
>  	/* patch the first byte */
> -	text_poke(addr, opcode, sizeof(int3));
> +	text_poke_part(addr, opcode, sizeof(int3));

Ditto.

>  
>  	on_each_cpu(do_sync_core, NULL, 1);

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



  reply	other threads:[~2013-11-08 12:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-08  9:12 [PATCH v2 0/8] x86: use new text_poke_bp in ftrace Petr Mladek
2013-11-08  9:12 ` Petr Mladek
2013-11-08  9:12 ` [PATCH v2 1/8] x86: speed up int3-based patching using less paranoid write Petr Mladek
2013-11-08 12:04   ` Masami Hiramatsu [this message]
2013-11-08 12:43     ` Steven Rostedt
2013-11-12 10:44       ` Petr Mladek
2013-11-08 14:49     ` Steven Rostedt
2013-11-08  9:12 ` [PATCH v2 2/8] x86: return error code in text_poke_bp Petr Mladek
2013-11-08 12:36   ` Masami Hiramatsu
2013-11-08 14:53   ` Steven Rostedt
2013-11-08  9:12 ` [PATCH v2 3/8] x86: allow to call text_poke_bp during boot Petr Mladek
2013-11-08  9:12 ` [PATCH v2 4/8] x86: add generic function to modify more calls using int3 framework Petr Mladek
2013-11-08  9:12 ` [PATCH v2 5/8] x86: do not trace __probe_kernel_read Petr Mladek
2013-11-08  9:12 ` [PATCH v2 6/8] x86: modify ftrace function using the new int3-based framework Petr Mladek
2013-11-08  9:12 ` [PATCH v2 7/8] x86: patch all traced function calls using the " Petr Mladek
2013-11-08  9:12 ` [PATCH v2 8/8] x86: enable/disable ftrace graph call using new " Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=527CD34A.30406@hitachi.com \
    --to=masami.hiramatsu.pt@hitachi.com \
    --cc=fweisbec@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pmladek@suse.cz \
    --cc=rostedt@goodmis.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.