[PATCH] uprobes: fix scratch register selection for rip-relative fixups

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] uprobes: fix scratch register selection for rip-relative fixups
@ 2014-05-01 17:09 Denys Vlasenko
  2014-05-01 19:11 ` Oleg Nesterov
  2014-05-02  0:48 ` Jim Keniston
  0 siblings, 2 replies; 5+ messages in thread
From: Denys Vlasenko @ 2014-05-01 17:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denys Vlasenko, Jim Keniston, Masami Hiramatsu, Srikar Dronamraju,
	Ingo Molnar, Oleg Nesterov

Before this patch, instructions such as div, mul,
shifts with count in CL, cmpxchg are mishandled.

This patch adds vex prefix handling. In particular,
it avoids colliding with register operand encoded
in vex.vvvv field.

Since we need to avoid two possible register operands,
the selection of scratch register needs to be from at least
three registers.

After looking through a lot of CPU docs, it looks like
the safest choice is SI,DI,BX. Selecting BX needs care
to not collide with implicit use of BX by cmpxchg8b.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jim Keniston <jkenisto@us.ibm.com>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c | 154 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 116 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index c41bb4b..6cb53d1 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -41,8 +41,10 @@
 /* Instruction will modify TF, don't change it */
 #define UPROBE_FIX_SETF		0x04
 
-#define UPROBE_FIX_RIP_AX	0x08
-#define UPROBE_FIX_RIP_CX	0x10
+#define UPROBE_FIX_RIP_SI	0x08
+#define UPROBE_FIX_RIP_DI	0x10
+#define UPROBE_FIX_RIP_BX	0x20
+#define UPROBE_FIX_RIP_ALL	0x38
 
 #define	UPROBE_TRAP_NR		UINT_MAX
 
@@ -51,6 +53,8 @@
 #define OPCODE2(insn)		((insn)->opcode.bytes[1])
 #define OPCODE3(insn)		((insn)->opcode.bytes[2])
 #define MODRM_REG(insn)		X86_MODRM_REG((insn)->modrm.value)
+#define VEX2_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[1])
+#define VEX3_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[2])
 
 #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
 	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
@@ -274,63 +278,137 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
 static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn)
 {
 	u8 *cursor;
+	u8 can_use_regs;
 	u8 reg;
+	int reg2;
 
 	if (!insn_rip_relative(insn))
 		return;
 
 	/*
-	 * insn_rip_relative() would have decoded rex_prefix, modrm.
+	 * insn_rip_relative() would have decoded rex_prefix,
+	 * vex_prefix, modrm.
 	 * Clear REX.b bit (extension of MODRM.rm field):
-	 * we want to encode rax/rcx, not r8/r9.
+	 * we want to encode low numbered reg, not r8+.
 	 */
 	if (insn->rex_prefix.nbytes) {
 		cursor = auprobe->insn + insn_offset_rex_prefix(insn);
-		*cursor &= 0xfe;	/* Clearing REX.B bit */
+		/* REX byte has 0100wrxb layout, clearing REX.b bit */
+		*cursor &= 0xfe;
+	}
+	/* Similar treatment for VEX3 prefix */
+	/* TODO: add XOP/EVEX treatment when insn decoder supports them */
+	if (insn->vex_prefix.nbytes == 3) {
+		/*
+		 * vex2:     c5    rvvvvLpp   (has no b bit)
+		 * vex3/xop: c4/8f rxbmmmmm wvvvvLpp
+		 * evex:     62    rxbR00mm.wvvvv1pp.zllBVaaa
+		 *   (evex will need setting of both b and x since
+		 *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
+		 * Setting VEX3.b (setting because it has inverted meaning):
+		 */
+		cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
+		*cursor |= 0x20;
 	}
 
 	/*
+	 * Convert from rip-relative addressing
+	 * to register-relative addressing via a scratch register.
+	 *
+	 * This is tricky since there are insns with modrm byte
+	 * which also use registers not encoded in modrm byte:
+	 * [i]div/[i]mul: implicitly use dx:ax
+	 * shift ops: implicitly use cx
+	 * cmpxchg: implicitly uses ax
+	 * cmpxchg8/16b: implicitly uses dx:ax and bx:cx
+	 *   Encoding: 0f c7/1 modrm
+	 *   The code below thinks that reg=1 (cx), chooses si as scratch.
+	 * mulx: implicitly uses dx: mulx r/m,r1,r2: r1:r2 = dx * r/m
+	 *   First appeared in Haswell (BMI2 insn). It is vex-encoded.
+	 *   Example where none of bx,cx,dx can be used as scratch reg:
+	 *   c4 e2 63 f6 0d disp32   mulx disp32(%rip),%ebx,%ecx
+	 * [v]pcmpistri: implicitly uses cx, xmm0
+	 * [v]pcmpistrm: implicitly uses xmm0
+	 * [v]pcmpestri: implicitly uses ax, dx, cx, xmm0
+	 * [v]pcmpestrm: implicitly uses ax, dx, xmm0
+	 *   Evil SSE4.2 string comparison ops from hell.
+	 * maskmovq/[v]maskmovdqu: implicitly uses (ds:rdi) as destination.
+	 *   Encoding: 0f f7 modrm, 66 0f f7 modrm, vex-encoded: c5 f9 f7 modrm.
+	 *   Store op1, byte-masked by op2 msb's in each byte, to (ds:rdi).
+	 *   AMD says it has no 3-operand form (vex.vvvv must be 1111)
+	 *   and that it can have only register operands, not mem
+	 *   (its modrm byte must have mode=11).
+	 *   If these restrictions will ever be lifted,
+	 *   we'll need code to prevent selection of di as scratch reg!
+	 *
+	 * Summary: I don't know any insns with modrm byte which
+	 * use SI register implicitly. DI register is used only
+	 * by one insn (maskmovq) and BX register is used
+	 * only by one too (cmpxchg8b).
+	 * BP is stack-segment based (may be a problem?).
+	 * AX, DX, CX are off-limits (many implicit users).
+	 * SP is unusable (it's stack pointer - think about "pop mem";
+	 * also, rsp+disp32 needs sib encoding -> insn length change).
+	 */
+	reg = MODRM_REG(insn);
+	reg2 = -1;
+	if (insn->vex_prefix.nbytes == 2) {
+		reg2 = VEX2_VVVV(insn) ^ 0xf;
+	}
+	if (insn->vex_prefix.nbytes == 3) {
+		reg2 = VEX3_VVVV(insn) ^ 0xf;
+	}
+	/* TODO: add XOP, EXEV vvvv reading */
+
+	/* Register numbering is ax,cx,dx,bx, sp,bp,si,di, r8..r15 */
+	can_use_regs = UPROBE_FIX_RIP_SI | UPROBE_FIX_RIP_DI | UPROBE_FIX_RIP_BX;
+#if 0
+	/* In any case, one bit will remain.
+	 * Clearing bx bit is pointless. Its state does not affect
+	 * scratch reg selection: we always prefer si/di to bx.
+	 */
+	if (reg == 3 || reg2 == 3)
+		can_use_regs &= ~UPROBE_FIX_RIP_BX;
+#endif
+	if (reg == 6 || reg2 == 6)
+		can_use_regs &= ~UPROBE_FIX_RIP_SI;
+	if (reg == 7 || reg2 == 7)
+		can_use_regs &= ~UPROBE_FIX_RIP_DI;
+	/* TODO: force maskmovq to not use di */
+	auprobe->def.fixups = can_use_regs;
+
+	/*
+	 * Choose scratch reg. Order is important:
+	 * must not select bx if we can use si (cmpxchg8b case!)
+	 */
+	reg2 = 3; /* BX */
+	if (can_use_regs & UPROBE_FIX_RIP_DI)
+		reg2 = 7;
+	if (can_use_regs & UPROBE_FIX_RIP_SI)
+		reg2 = 6;
+	/*
 	 * Point cursor at the modrm byte.  The next 4 bytes are the
 	 * displacement.  Beyond the displacement, for some instructions,
 	 * is the immediate operand.
 	 */
 	cursor = auprobe->insn + insn_offset_modrm(insn);
 	/*
-	 * Convert from rip-relative addressing
-	 * to register-relative addressing via a scratch register.
+	 * Change modrm from "00 reg 101" to "10 reg reg2". Example:
+	 * 89 05 disp32  mov %eax,disp32(%rip) becomes
+	 * 89 86 disp32  mov %eax,disp32(%rsi)
 	 */
-	reg = MODRM_REG(insn);
-	if (reg == 0) {
-		/*
-		 * The register operand (if any) is either the A register
-		 * (%rax, %eax, etc.) or (if the 0x4 bit is set in the
-		 * REX prefix) %r8.  In any case, we know the C register
-		 * is NOT the register operand, so we use %rcx (register
-		 * #1) for the scratch register.
-		 */
-		auprobe->def.fixups |= UPROBE_FIX_RIP_CX;
-		/*
-		 * Change modrm from "00 000 101" to "10 000 001". Example:
-		 * 89 05 disp32  mov %eax,disp32(%rip) becomes
-		 * 89 81 disp32  mov %eax,disp32(%rcx)
-		 */
-		*cursor = 0x81;
-	} else {
-		/* Use %rax (register #0) for the scratch register. */
-		auprobe->def.fixups |= UPROBE_FIX_RIP_AX;
-		/*
-		 * Change modrm from "00 reg 101" to "10 reg 000". Example:
-		 * 89 1d disp32  mov %edx,disp32(%rip) becomes
-		 * 89 98 disp32  mov %edx,disp32(%rax)
-		 */
-		*cursor = (reg << 3) | 0x80;
-	}
+	*cursor = (reg << 3) | 0x80 | reg2;
 }
 
 static inline unsigned long *
 scratch_reg(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
-	return (auprobe->def.fixups & UPROBE_FIX_RIP_AX) ? &regs->ax : &regs->cx;
+	/* Order is important - more than one bit can be set! */
+	if (auprobe->def.fixups & UPROBE_FIX_RIP_SI)
+		return &regs->si;
+	if (auprobe->def.fixups & UPROBE_FIX_RIP_DI)
+		return &regs->di;
+	return &regs->bx;
 }
 
 /*
@@ -339,7 +417,7 @@ scratch_reg(struct arch_uprobe *auprobe, struct pt_regs *regs)
  */
 static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
-	if (auprobe->def.fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
+	if (auprobe->def.fixups & UPROBE_FIX_RIP_ALL) {
 		struct uprobe_task *utask = current->utask;
 		unsigned long *sr = scratch_reg(auprobe, regs);
 
@@ -350,7 +428,7 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 
 static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
-	if (auprobe->def.fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
+	if (auprobe->def.fixups & UPROBE_FIX_RIP_ALL) {
 		struct uprobe_task *utask = current->utask;
 		unsigned long *sr = scratch_reg(auprobe, regs);
 
@@ -724,9 +802,9 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
  *
  * If the original instruction was a rip-relative instruction such as
  * "movl %edx,0xnnnn(%rip)", we have instead executed an equivalent
- * instruction using a scratch register -- e.g., "movl %edx,0xnnnn(%rax)".
+ * instruction using a scratch register -- e.g., "movl %edx,0xnnnn(%rsi)".
  * We need to restore the contents of the scratch register
- * (FIX_RIP_AX or FIX_RIP_CX).
+ * (FIX_RIP_reg).
  */
 int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups
  2014-05-01 17:09 [PATCH] uprobes: fix scratch register selection for rip-relative fixups Denys Vlasenko
@ 2014-05-01 19:11 ` Oleg Nesterov
  2014-05-02  0:48 ` Jim Keniston
  1 sibling, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2014-05-01 19:11 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: linux-kernel, Jim Keniston, Masami Hiramatsu, Srikar Dronamraju,
	Ingo Molnar

On 05/01, Denys Vlasenko wrote:
>
> Before this patch, instructions such as div, mul,
> shifts with count in CL, cmpxchg are mishandled.

Thanks. I'll try to read this patch tomorrow, but you do know that I can't
review (or even understand ;) the change in riprel_analyze().

As for other changes, perhaps we can cleanup them later somehow, but I am
not going to discuss this now. I agree that the fix should be as simple
as possible.

Only on nit,

> -#define UPROBE_FIX_RIP_AX	0x08
> -#define UPROBE_FIX_RIP_CX	0x10
> +#define UPROBE_FIX_RIP_SI	0x08
> +#define UPROBE_FIX_RIP_DI	0x10
> +#define UPROBE_FIX_RIP_BX	0x20
> +#define UPROBE_FIX_RIP_ALL	0x38

OK, but please define UPROBE_FIX_RIP_ALL as

	#define UPROBE_FIX_RIP_ALL	\
		(UPROBE_FIX_RIP_SI | UPROBE_FIX_RIP_DI | UPROBE_FIX_RIP_BX)

Please do not bother to do this right now, lets wait for review.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups
  2014-05-01 17:09 [PATCH] uprobes: fix scratch register selection for rip-relative fixups Denys Vlasenko
  2014-05-01 19:11 ` Oleg Nesterov
@ 2014-05-02  0:48 ` Jim Keniston
  2014-05-02 15:04   ` Denys Vlasenko
  1 sibling, 1 reply; 5+ messages in thread
From: Jim Keniston @ 2014-05-02  0:48 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: linux-kernel, Masami Hiramatsu, Srikar Dronamraju, Ingo Molnar,
	Oleg Nesterov

On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
> Before this patch, instructions such as div, mul,
> shifts with count in CL, cmpxchg are mishandled.
> 
> This patch adds vex prefix handling. In particular,
> it avoids colliding with register operand encoded
> in vex.vvvv field.
> 
> Since we need to avoid two possible register operands,
> the selection of scratch register needs to be from at least
> three registers.
> 
> After looking through a lot of CPU docs, it looks like
> the safest choice is SI,DI,BX. Selecting BX needs care
> to not collide with implicit use of BX by cmpxchg8b.

I skipped reviewing some of this because I don't know about those parts
of the x86 architecture.  What I saw and understood looks like it'll
work, although I have a couple of suggestions there.  See below.

Thanks for your diligence and creativity on this.

Jim

> 
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Jim Keniston <jkenisto@us.ibm.com>
> CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/kernel/uprobes.c | 154 ++++++++++++++++++++++++++++++++++------------
>  1 file changed, 116 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index c41bb4b..6cb53d1 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -41,8 +41,10 @@
>  /* Instruction will modify TF, don't change it */
>  #define UPROBE_FIX_SETF		0x04
> 
> -#define UPROBE_FIX_RIP_AX	0x08
> -#define UPROBE_FIX_RIP_CX	0x10
> +#define UPROBE_FIX_RIP_SI	0x08
> +#define UPROBE_FIX_RIP_DI	0x10
> +#define UPROBE_FIX_RIP_BX	0x20
> +#define UPROBE_FIX_RIP_ALL	0x38

I agree with Oleg's suggestion about how to define UPROBE_FIX_RIP_ALL.

> 
>  #define	UPROBE_TRAP_NR		UINT_MAX
> 
> @@ -51,6 +53,8 @@
>  #define OPCODE2(insn)		((insn)->opcode.bytes[1])
>  #define OPCODE3(insn)		((insn)->opcode.bytes[2])
>  #define MODRM_REG(insn)		X86_MODRM_REG((insn)->modrm.value)
> +#define VEX2_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[1])
> +#define VEX3_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[2])

I disclaim any knowledge about VEX* stuff.

> 
>  #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
>  	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
> @@ -274,63 +278,137 @@ static inline bool is_64bit_mm(struct mm_struct *mm)
>  static void riprel_analyze(struct arch_uprobe *auprobe, struct insn *insn)
>  {
>  	u8 *cursor;
> +	u8 can_use_regs;
>  	u8 reg;
> +	int reg2;
> 
>  	if (!insn_rip_relative(insn))
>  		return;
> 
>  	/*
> -	 * insn_rip_relative() would have decoded rex_prefix, modrm.
> +	 * insn_rip_relative() would have decoded rex_prefix,
> +	 * vex_prefix, modrm.
>  	 * Clear REX.b bit (extension of MODRM.rm field):
> -	 * we want to encode rax/rcx, not r8/r9.
> +	 * we want to encode low numbered reg, not r8+.
>  	 */
>  	if (insn->rex_prefix.nbytes) {
>  		cursor = auprobe->insn + insn_offset_rex_prefix(insn);
> -		*cursor &= 0xfe;	/* Clearing REX.B bit */
> +		/* REX byte has 0100wrxb layout, clearing REX.b bit */
> +		*cursor &= 0xfe;
> +	}

skipped this next part...

> +	/* Similar treatment for VEX3 prefix */
> +	/* TODO: add XOP/EVEX treatment when insn decoder supports them */
> +	if (insn->vex_prefix.nbytes == 3) {
> +		/*
> +		 * vex2:     c5    rvvvvLpp   (has no b bit)
> +		 * vex3/xop: c4/8f rxbmmmmm wvvvvLpp
> +		 * evex:     62    rxbR00mm.wvvvv1pp.zllBVaaa
> +		 *   (evex will need setting of both b and x since
> +		 *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
> +		 * Setting VEX3.b (setting because it has inverted meaning):
> +		 */
> +		cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
> +		*cursor |= 0x20;
>  	}
> 

... resumed reviewing here.

>  	/*
> +	 * Convert from rip-relative addressing
> +	 * to register-relative addressing via a scratch register.
> +	 *
> +	 * This is tricky since there are insns with modrm byte
> +	 * which also use registers not encoded in modrm byte:
> +	 * [i]div/[i]mul: implicitly use dx:ax
> +	 * shift ops: implicitly use cx
> +	 * cmpxchg: implicitly uses ax
> +	 * cmpxchg8/16b: implicitly uses dx:ax and bx:cx
> +	 *   Encoding: 0f c7/1 modrm
> +	 *   The code below thinks that reg=1 (cx), chooses si as scratch.

I verified the above in the AMD manual, but cannot comment on this next
section...

> +	 * mulx: implicitly uses dx: mulx r/m,r1,r2: r1:r2 = dx * r/m
> +	 *   First appeared in Haswell (BMI2 insn). It is vex-encoded.
> +	 *   Example where none of bx,cx,dx can be used as scratch reg:
> +	 *   c4 e2 63 f6 0d disp32   mulx disp32(%rip),%ebx,%ecx
> +	 * [v]pcmpistri: implicitly uses cx, xmm0
> +	 * [v]pcmpistrm: implicitly uses xmm0
> +	 * [v]pcmpestri: implicitly uses ax, dx, cx, xmm0
> +	 * [v]pcmpestrm: implicitly uses ax, dx, xmm0
> +	 *   Evil SSE4.2 string comparison ops from hell.
> +	 * maskmovq/[v]maskmovdqu: implicitly uses (ds:rdi) as destination.
> +	 *   Encoding: 0f f7 modrm, 66 0f f7 modrm, vex-encoded: c5 f9 f7 modrm.
> +	 *   Store op1, byte-masked by op2 msb's in each byte, to (ds:rdi).
> +	 *   AMD says it has no 3-operand form (vex.vvvv must be 1111)
> +	 *   and that it can have only register operands, not mem
> +	 *   (its modrm byte must have mode=11).
> +	 *   If these restrictions will ever be lifted,
> +	 *   we'll need code to prevent selection of di as scratch reg!

... resumed reviewing here.

> +	 *
> +	 * Summary: I don't know any insns with modrm byte which
> +	 * use SI register implicitly. DI register is used only
> +	 * by one insn (maskmovq) and BX register is used
> +	 * only by one too (cmpxchg8b).
> +	 * BP is stack-segment based (may be a problem?).
> +	 * AX, DX, CX are off-limits (many implicit users).
> +	 * SP is unusable (it's stack pointer - think about "pop mem";
> +	 * also, rsp+disp32 needs sib encoding -> insn length change).
> +	 */
> +	reg = MODRM_REG(insn);
> +	reg2 = -1;

VEX again.  Covering my ears and humming...

> +	if (insn->vex_prefix.nbytes == 2) {
> +		reg2 = VEX2_VVVV(insn) ^ 0xf;
> +	}
> +	if (insn->vex_prefix.nbytes == 3) {
> +		reg2 = VEX3_VVVV(insn) ^ 0xf;
> +	}
> +	/* TODO: add XOP, EXEV vvvv reading */
> +

... and I'm back.

> +	/* Register numbering is ax,cx,dx,bx, sp,bp,si,di, r8..r15 */
> +	can_use_regs = UPROBE_FIX_RIP_SI | UPROBE_FIX_RIP_DI | UPROBE_FIX_RIP_BX;
> +#if 0
> +	/* In any case, one bit will remain.
> +	 * Clearing bx bit is pointless. Its state does not affect
> +	 * scratch reg selection: we always prefer si/di to bx.
> +	 */
> +	if (reg == 3 || reg2 == 3)
> +		can_use_regs &= ~UPROBE_FIX_RIP_BX;
> +#endif
> +	if (reg == 6 || reg2 == 6)
> +		can_use_regs &= ~UPROBE_FIX_RIP_SI;
> +	if (reg == 7 || reg2 == 7)
> +		can_use_regs &= ~UPROBE_FIX_RIP_DI;
> +	/* TODO: force maskmovq to not use di */
> +	auprobe->def.fixups = can_use_regs;
> +
> +	/*
> +	 * Choose scratch reg. Order is important:
> +	 * must not select bx if we can use si (cmpxchg8b case!)
> +	 */
> +	reg2 = 3; /* BX */
> +	if (can_use_regs & UPROBE_FIX_RIP_DI)
> +		reg2 = 7;
> +	if (can_use_regs & UPROBE_FIX_RIP_SI)
> +		reg2 = 6;

It seems more natural to code this as:
	if (can_use_regs & UPROBE_FIX_RIP_SI)
		reg2 = 6;
	else if (can_use_regs & UPROBE_FIX_RIP_DI)
		reg2 = 7;
	else
		reg2 = 3; /* BX */
... which is pretty much how you do it in scratch_reg().

> +	/*
>  	 * Point cursor at the modrm byte.  The next 4 bytes are the
>  	 * displacement.  Beyond the displacement, for some instructions,
>  	 * is the immediate operand.
>  	 */
>  	cursor = auprobe->insn + insn_offset_modrm(insn);
>  	/*
> -	 * Convert from rip-relative addressing
> -	 * to register-relative addressing via a scratch register.
> +	 * Change modrm from "00 reg 101" to "10 reg reg2". Example:
> +	 * 89 05 disp32  mov %eax,disp32(%rip) becomes
> +	 * 89 86 disp32  mov %eax,disp32(%rsi)
>  	 */
> -	reg = MODRM_REG(insn);
> -	if (reg == 0) {
> -		/*
> -		 * The register operand (if any) is either the A register
> -		 * (%rax, %eax, etc.) or (if the 0x4 bit is set in the
> -		 * REX prefix) %r8.  In any case, we know the C register
> -		 * is NOT the register operand, so we use %rcx (register
> -		 * #1) for the scratch register.
> -		 */
> -		auprobe->def.fixups |= UPROBE_FIX_RIP_CX;
> -		/*
> -		 * Change modrm from "00 000 101" to "10 000 001". Example:
> -		 * 89 05 disp32  mov %eax,disp32(%rip) becomes
> -		 * 89 81 disp32  mov %eax,disp32(%rcx)
> -		 */
> -		*cursor = 0x81;
> -	} else {
> -		/* Use %rax (register #0) for the scratch register. */
> -		auprobe->def.fixups |= UPROBE_FIX_RIP_AX;
> -		/*
> -		 * Change modrm from "00 reg 101" to "10 reg 000". Example:
> -		 * 89 1d disp32  mov %edx,disp32(%rip) becomes
> -		 * 89 98 disp32  mov %edx,disp32(%rax)
> -		 */
> -		*cursor = (reg << 3) | 0x80;
> -	}
> +	*cursor = (reg << 3) | 0x80 | reg2;
>  }
> 
>  static inline unsigned long *
>  scratch_reg(struct arch_uprobe *auprobe, struct pt_regs *regs)
>  {
> -	return (auprobe->def.fixups & UPROBE_FIX_RIP_AX) ? &regs->ax : &regs->cx;
> +	/* Order is important - more than one bit can be set! */
> +	if (auprobe->def.fixups & UPROBE_FIX_RIP_SI)
> +		return &regs->si;
> +	if (auprobe->def.fixups & UPROBE_FIX_RIP_DI)
> +		return &regs->di;
> +	return &regs->bx;
>  }
> 
>  /*
> @@ -339,7 +417,7 @@ scratch_reg(struct arch_uprobe *auprobe, struct pt_regs *regs)
>   */
>  static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>  {
> -	if (auprobe->def.fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
> +	if (auprobe->def.fixups & UPROBE_FIX_RIP_ALL) {
>  		struct uprobe_task *utask = current->utask;
>  		unsigned long *sr = scratch_reg(auprobe, regs);
> 
> @@ -350,7 +428,7 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> 
>  static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>  {
> -	if (auprobe->def.fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
> +	if (auprobe->def.fixups & UPROBE_FIX_RIP_ALL) {
>  		struct uprobe_task *utask = current->utask;
>  		unsigned long *sr = scratch_reg(auprobe, regs);
> 
> @@ -724,9 +802,9 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
>   *
>   * If the original instruction was a rip-relative instruction such as
>   * "movl %edx,0xnnnn(%rip)", we have instead executed an equivalent
> - * instruction using a scratch register -- e.g., "movl %edx,0xnnnn(%rax)".
> + * instruction using a scratch register -- e.g., "movl %edx,0xnnnn(%rsi)".
>   * We need to restore the contents of the scratch register
> - * (FIX_RIP_AX or FIX_RIP_CX).
> + * (FIX_RIP_reg).
>   */
>  int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>  {



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups
  2014-05-02  0:48 ` Jim Keniston
@ 2014-05-02 15:04   ` Denys Vlasenko
  2014-05-05  6:00     ` Jim Keniston
  0 siblings, 1 reply; 5+ messages in thread
From: Denys Vlasenko @ 2014-05-02 15:04 UTC (permalink / raw)
  To: Jim Keniston
  Cc: linux-kernel, Masami Hiramatsu, Srikar Dronamraju, Ingo Molnar,
	Oleg Nesterov

On 05/02/2014 02:48 AM, Jim Keniston wrote:
> On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
>> +#define VEX2_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[1])
>> +#define VEX3_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[2])
> 
> I disclaim any knowledge about VEX* stuff.
...
> skipped this next part...
> 
>> +	/* Similar treatment for VEX3 prefix */
>> +	/* TODO: add XOP/EVEX treatment when insn decoder supports them */
>> +	if (insn->vex_prefix.nbytes == 3) {
>> +		/*
>> +		 * vex2:     c5    rvvvvLpp   (has no b bit)
>> +		 * vex3/xop: c4/8f rxbmmmmm wvvvvLpp
>> +		 * evex:     62    rxbR00mm.wvvvv1pp.zllBVaaa
>> +		 *   (evex will need setting of both b and x since
>> +		 *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
>> +		 * Setting VEX3.b (setting because it has inverted meaning):
>> +		 */
>> +		cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
>> +		*cursor |= 0x20;
>>  	}
> 
> ... resumed reviewing here.

I do realize that most people reading this code won't be
aficionados of memorizing x86 insn encoding quirks.

To help them, I added the comment which spells out bit layout
of vex/evex/xop prefixes (see above). Does it help?

> VEX again.  Covering my ears and humming...
> 
>> +	if (insn->vex_prefix.nbytes == 2) {
>> +		reg2 = VEX2_VVVV(insn) ^ 0xf;
>> +	}
>> +	if (insn->vex_prefix.nbytes == 3) {
>> +		reg2 = VEX3_VVVV(insn) ^ 0xf;
>> +	}
>> +	/* TODO: add XOP, EXEV vvvv reading */
>> +

In fact there is room for improvement in this part:
it's better to ignore most-significant bit of this field
(32-bit mode strikes back).

>> +	reg2 = 3; /* BX */
>> +	if (can_use_regs & UPROBE_FIX_RIP_DI)
>> +		reg2 = 7;
>> +	if (can_use_regs & UPROBE_FIX_RIP_SI)
>> +		reg2 = 6;
> 
> It seems more natural to code this as:
> 	if (can_use_regs & UPROBE_FIX_RIP_SI)
> 		reg2 = 6;
> 	else if (can_use_regs & UPROBE_FIX_RIP_DI)
> 		reg2 = 7;
> 	else
> 		reg2 = 3; /* BX */
> ... which is pretty much how you do it in scratch_reg().

Okay.
I can also get rid of "can_use_regs" variable altogether then!
Sending new version in a jiffy.
-- 
vda



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: fix scratch register selection for rip-relative fixups
  2014-05-02 15:04   ` Denys Vlasenko
@ 2014-05-05  6:00     ` Jim Keniston
  0 siblings, 0 replies; 5+ messages in thread
From: Jim Keniston @ 2014-05-05  6:00 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: linux-kernel, Masami Hiramatsu, Srikar Dronamraju, Ingo Molnar,
	Oleg Nesterov

On Fri, 2014-05-02 at 17:04 +0200, Denys Vlasenko wrote:
> On 05/02/2014 02:48 AM, Jim Keniston wrote:
> > On Thu, 2014-05-01 at 19:09 +0200, Denys Vlasenko wrote:
> >> +#define VEX2_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[1])
> >> +#define VEX3_VVVV(insn)		X86_VEX_V((insn)->vex_prefix.bytes[2])
> > 
> > I disclaim any knowledge about VEX* stuff.
> ...
> > skipped this next part...
> > 
> >> +	/* Similar treatment for VEX3 prefix */
> >> +	/* TODO: add XOP/EVEX treatment when insn decoder supports them */
> >> +	if (insn->vex_prefix.nbytes == 3) {
> >> +		/*
> >> +		 * vex2:     c5    rvvvvLpp   (has no b bit)
> >> +		 * vex3/xop: c4/8f rxbmmmmm wvvvvLpp
> >> +		 * evex:     62    rxbR00mm.wvvvv1pp.zllBVaaa
> >> +		 *   (evex will need setting of both b and x since
> >> +		 *   in non-sib encoding evex.x is 4th bit of MODRM.rm)
> >> +		 * Setting VEX3.b (setting because it has inverted meaning):
> >> +		 */
> >> +		cursor = auprobe->insn + insn_offset_vex_prefix(insn) + 1;
> >> +		*cursor |= 0x20;
> >>  	}
> > 
> > ... resumed reviewing here.
> 
> I do realize that most people reading this code won't be
> aficionados of memorizing x86 insn encoding quirks.
> 
> To help them, I added the comment which spells out bit layout
> of vex/evex/xop prefixes (see above). Does it help?
> 
...

Well, yes.  I read your comments in conjunction with the "VEX Prefix"
wikipedia page and figured out what you're doing, and it looks right.  I
would definitely leave the comments in; otherwise there's a chance that
even you won't remember all this stuff 6 months from now.

Jim


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-05  6:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-01 17:09 [PATCH] uprobes: fix scratch register selection for rip-relative fixups Denys Vlasenko
2014-05-01 19:11 ` Oleg Nesterov
2014-05-02  0:48 ` Jim Keniston
2014-05-02 15:04   ` Denys Vlasenko
2014-05-05  6:00     ` Jim Keniston

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox