All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	David Long <dave.long@linaro.org>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Jim Keniston <jkenisto@us.ibm.com>,
	Jonathan Lebon <jlebon@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()
Date: Tue, 1 Apr 2014 18:39:34 +0200	[thread overview]
Message-ID: <20140401163934.GA26272@redhat.com> (raw)
In-Reply-To: <20140401143346.GA18503@redhat.com>

On 04/01, Oleg Nesterov wrote:
>
> Good point, I'll send v2.

Masami, et al, I was going to send v2 plus the next short RFC series
which fixes the problem today, but it turns out that I have no time.
Will try to do this tomorrow.

So let me explain the problem, and how (I think) it should be solved.
Unfortunately, I do not even know the terminology, so firstly I have
to explain you the things I recently learned when I investigated the
bug report ;)

Lets only discuss the "call" instruction (the one which starts with
"e8" byte), because (compared to "jmp") the fix is more complicated.
This instruction simply does

	push rip
	rip += offset_encoded_in_insn	// ignoring the length of insn

To simplify, suppose that we probe such an insn at virtual address
0x1000 and offset_encoded_in_insn == 0x10.

When this task hits the probe, the kernel simply copies this (unmodified)
insn into xol area, and the task does single step. Lets suppose that
the virtual address of xol slot == 0x2000. This means that regs->ip
becomes 0x2000 + 0x10 = 0x2010, and we only need to adjust ->ip and
return adrress.

Note that the new ->ip == 0x2010 can point to nowhere, to the unmapped
area or it can point to the kernel memory. This still works because
the task doesn't even try to execute the next insn at this address.

But! this does NOT works if the new ->ip is non-canonical (not sure this
is the common term, I mean the hole starting from 0xffff800000000000,
which I didn't know about until recently ;). In this case CPU generates
#GP and do_general_protection() kills the task which tries to execute
this insn out-of-line.

The test-case I wrote to ensure:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It dies if you probe "probe_func" (although this test-case is not very
reliable, randomize_va_space/etc can change the placement of xol area).

Now let me describe how I am going to fix this. Please let me know if
you think this can't work or you see the better solution.

To simplify, lets ignore the fact we have lib/insn.c. The first change
will add

	bool call_emulate_op(...)
	{
		// push return adress
		if (put_user(regs->ip + 5, (long __user *)(regs->sp - 8)))
			return false;

		regs->sp -= 8;
		regs->ip += 5 + *(s32*)(auprobe->insn + 1);
		return true;
	}

and change "case 0xe8" in arch_uprobe_analyze_insn() to setup
->ops = call_xol_ops.

But this is obviously incomplete because put_user() can fail. In this case
we could send a signal an restart, but this is not simple. We do not know
which signal (say, SIGSEGV or SIGBUS ?), we do not know what should we put
into siginfo, etc. And we do not want to emulate __do_page_fault ;)

So the 2nd patch will do the following:

	1. Add "long call_offset" into "struct arch_uprobe". (yes, we
	   should also add a "union" into arch_uprobe, but lets ignore
	   the details).

	2. change "case 0xe8" in arch_uprobe_analyze_insn() to also do

		s32 *offset = (void*)(auprobe->insn + 1);

		arch_uprobe->call_offset = *offset;
		/*
		 * Turn this insn into
		 *
		 *	1: call 1b;
		 *
		 * This is only needed to expand the stack if emulate
		 * fails. We do not turn it into, say, "pushf" because
		 * we do not want to change the 1st byte used by
		 * set_orig_insn().
		 *
		*offset = -5;

	3. Change call_emulate_op() to use arch_uprobe->call_offset

	4. Add

		//
		// In practice, this will be almost never called. put_user()
		// in call_emulate_op() failed, single-step can only succeed
		// if another thread expanded our stack.
		//
		int call_post_xol_op(...)
		{
			regs->sp += 8;
			// tell the caller to restart
			return -EAGAIN;
		}

Once again, if this can work we need more changes to handle jmp's/etc. But
lets discuss this later. I am thinking in horror about conditional jmp ;)
In fact this should be simple, just I do not know (yet) to parse such an
insn, and I simply do not know if lib/insn.c can help me to figure out which
flag in regs->flags ->emulate() should check.

So this all looks a bit complicated, but I do not see a simpler solution.
Well, we could avoid ->emulate() altogether, we could just mangle the
problematic instructions and complicate arch_uprobe_post_xol(), but I do
not think this would be better. And if nothing else, ->emulate is a) simple
and b) should likely succeed and make the probing faster.

But perhaps I missed something?

Oleg.


  reply	other threads:[~2014-04-01 17:40 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-31 19:43 [PATCH 0/7] uprobes/x86: introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
2014-03-31 19:43 ` [PATCH 1/7] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() Oleg Nesterov
2014-04-01  1:41   ` Masami Hiramatsu
2014-04-01 15:39   ` David Long
2014-04-03 16:32   ` Srikar Dronamraju
2014-03-31 19:43 ` [PATCH 2/7] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn() Oleg Nesterov
2014-04-01  2:00   ` Masami Hiramatsu
2014-04-03 16:33   ` Srikar Dronamraju
2014-03-31 19:44 ` [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn() Oleg Nesterov
2014-04-01  3:17   ` Masami Hiramatsu
2014-04-01 14:33     ` Oleg Nesterov
2014-04-01 16:39       ` Oleg Nesterov [this message]
2014-04-02 17:57         ` Jim Keniston
2014-04-02 19:14           ` Oleg Nesterov
2014-03-31 19:44 ` [PATCH 4/7] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg Oleg Nesterov
2014-04-01  3:10   ` Masami Hiramatsu
2014-04-03 16:33   ` Srikar Dronamraju
2014-03-31 19:44 ` [PATCH 5/7] uprobes/x86: Gather "riprel" functions together Oleg Nesterov
2014-04-01  3:21   ` Masami Hiramatsu
2014-04-02 19:53   ` Jim Keniston
2014-04-03 19:50     ` Oleg Nesterov
2014-03-31 19:44 ` [PATCH 6/7] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks Oleg Nesterov
2014-04-01  3:24   ` Masami Hiramatsu
2014-04-03 16:34   ` Srikar Dronamraju
2014-03-31 19:44 ` [PATCH 7/7] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops Oleg Nesterov
2014-04-01  6:52   ` Masami Hiramatsu
2014-04-01 14:44     ` Oleg Nesterov
2014-04-01 15:01   ` Oleg Nesterov
2014-04-02 19:46   ` Jim Keniston
2014-04-03 19:49     ` Oleg Nesterov
2014-04-02 19:58 ` [PATCH 0/7] uprobes/x86: introduce " Jim Keniston
2014-04-03 20:00 ` [PATCH 8-9/7] " Oleg Nesterov
2014-04-03 20:00   ` [PATCH 8/7] uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails Oleg Nesterov
2014-04-03 20:01   ` [PATCH 9/7] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140401163934.GA26272@redhat.com \
    --to=oleg@redhat.com \
    --cc=ananth@in.ibm.com \
    --cc=dave.long@linaro.org \
    --cc=dvlasenk@redhat.com \
    --cc=fche@redhat.com \
    --cc=jkenisto@us.ibm.com \
    --cc=jlebon@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@elte.hu \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.