[RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
@ 2014-04-26 17:29 Denys Vlasenko
  2014-04-26 17:29 ` [RFC PATCH] uprobes: simplify rip-relative handling Denys Vlasenko
  2014-04-27 11:29 ` [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Oleg Nesterov
  0 siblings, 2 replies; 5+ messages in thread
From: Denys Vlasenko @ 2014-04-26 17:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denys Vlasenko, Jim Keniston, Masami Hiramatsu, Oleg Nesterov

After adding these, it's clear we have some awkward choices there.
some valid instructions are prohibited from uprobing while
several invalid ones are allowed.

Hopefully future edits to the good-opcode tables will fix wrong bits
or explain why those bits are not wrong.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jim Keniston <jkenisto@us.ibm.com>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c | 153 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 134 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 29b152d..7021dbe 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -66,6 +66,49 @@
  * Good-instruction tables for 32-bit apps.  This is non-const and volatile
  * to keep gcc from statically optimizing it out, as variable_test_bit makes
  * some versions of gcc to think only *(unsigned long*) is used.
+ *
+ * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
+ * won't report *prefixes* as OPCODE1(insn).
+ * 0f - 2-byte opcode prefix
+ * 26,2e,36,3e - es:/cs:/ss:/ds:
+ * 64 - fs: (marked as "good", why?)
+ * 65 - gs: (marked as "good", why?)
+ * 66 - operand-size prefix
+ * 67 - address-size prefix
+ * f0 - lock prefix
+ * f2 - repnz    (marked as "good", why?)
+ * f3 - rep/repz (marked as "good", why?)
+ *
+ * Opcodes we'll probably never support:
+ * 6c-6f - ins,outs. SEGVs if used in userspace
+ * e4-e7 - in,out imm. SEGVs if used in userspace
+ * ec-ef - in,out acc. SEGVs if used in userspace
+ * cc - int3. SIGTRAP if used in userspace
+ * ce - into. Not used in userspace - no kernel support to make it useful. SEGVs
+ *	(why we support bound (62) then? it's similar, and similarly unused...)
+ * f1 - int1. SIGTRAP if used in userspace
+ * f4 - hlt. SEGVs if used in userspace
+ * fa - cli. SEGVs if used in userspace
+ * fb - sti. SEGVs if used in userspace
+ *
+ * Opcodes which need some work to be supported:
+ * 07,17,1f - pop es/ss/ds
+ *	Normally not used in userspace, but would execute if used.
+ *	Can cause GP or stack exception if tries to load wrong segment descriptor.
+ *	We hesitate to run them under single step since kernel's handling
+ *	of userspace single-stepping (TF flag) is fragile.
+ *	We can easily refuse to support push es/cs/ss/ds (06/0e/16/1e)
+ *	on the same grounds that they are never used.
+ * cd - int N.
+ *	Used by userspace for "int 80" syscall entry. (Other "int N"
+ *	cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
+ *	Not supported since kernel's handling of userspace single-stepping
+ *	(TF flag) is fragile.
+ * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
+ *
+ * Opcodes which can be enabled right away:
+ * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
+ * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
  */
 static volatile u32 good_insns_32[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
@@ -90,7 +133,48 @@ static volatile u32 good_insns_32[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 };
 
-/* Using this for both 64-bit and 32-bit apps */
+/* Using this for both 64-bit and 32-bit apps.
+ * Opcodes we don't support:
+ * 0f 00 - SLDT/STR/LLDT/LTR/VERR/VERW/-/- group. System insns
+ * 0f 01 - SGDT/SIDT/LGDT/LIDT/SMSW/-/LMSW/INVLPG group.
+ *	Also encodes tons of other system insns if mod=11.
+ *	Some are in fact non-system: xend, xtest, rdtscp, maybe more
+ * 0f 02 - lar (why? should be safe, it throws no exceptipons)
+ * 0f 03 - lsl (why? should be safe, it throws no exceptipons)
+ * 0f 04 - undefined
+ * 0f 05 - syscall
+ * 0f 06 - clts (CPL0 insn)
+ * 0f 07 - sysret
+ * 0f 08 - invd (CPL0 insn)
+ * 0f 09 - wbinvd (CPL0 insn)
+ * 0f 0a - undefined
+ * 0f 0b - ud1
+ * 0f 0c - undefined
+ * 0f 0d - prefetchFOO (amd prefetch insns)
+ * 0f 18 - prefetchBAR (intel prefetch insns)
+ * 0f 24 - mov from test regs (perhaps entire 20-27 area can be disabled (special reg ops))
+ * 0f 25 - undefined
+ * 0f 26 - mov to test regs
+ * 0f 27 - undefined
+ * 0f 30 - wrmsr (CPL0 insn)
+ * 0f 34 - sysenter
+ * 0f 35 - sysexit
+ * 0f 36 - undefined
+ * 0f 37 - getsec
+ * 0f 38-3f - 3-byte opcodes (why?? all look safe)
+ * 0f 78 - vmread
+ * 0f 79 - vmwrite
+ * 0f 7a - undefined
+ * 0f 7b - undefined
+ * 0f 7c - undefined
+ * 0f 7d - undefined
+ * 0f a6 - undefined
+ * 0f a7 - undefined
+ * 0f b8 - popcnt (why?? it's an ordinary ALU op)
+ * 0f d0 - undefined
+ * 0f f0 - lddqu (why?? it's an ordinary vector load op)
+ * 0f ff - undefined
+ */
 static volatile u32 good_2byte_insns[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 	/*      ----------------------------------------------         */
@@ -115,7 +199,55 @@ static volatile u32 good_2byte_insns[256 / 32] = {
 };
 
 #ifdef CONFIG_X86_64
-/* Good-instruction tables for 64-bit apps */
+/* Good-instruction tables for 64-bit apps.
+ *
+ * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
+ * won't report *prefixes* as OPCODE1(insn).
+ * 0f - 2-byte opcode prefix
+ * 26,2e,36,3e - es:/cs:/ss:/ds:
+ * 40-4f - rex prefixes
+ * 64 - fs: (marked as "good", why?)
+ * 65 - gs: (marked as "good", why?)
+ * 66 - operand-size prefix
+ * 67 - address-size prefix
+ * f0 - lock prefix
+ * f2 - repnz    (marked as "good", why?)
+ * f3 - rep/repz (marked as "good", why?)
+ *
+ * Genuinely invalid opcodes:
+ * 06,07 - formerly push/pop es
+ * 0e - formerly push cs
+ * 16,17 - formerly push/pop ss
+ * 1e,1f - formerly push/pop ds
+ * 27,2f,37,3f - formerly daa/das/aaa/aas
+ * 60,61 - formerly pusha/popa
+ * 62 - formerly bound. EVEX prefix for AVX512
+ * 82 - formerly redundant encoding of Group1
+ * 9a - formerly call seg:ofs (marked as "supported"???)
+ * c4,c5 - formerly les/lds. VEX prefixes for AVX
+ * ce - formerly into
+ * d4,d4 - formerly aam/aad
+ * d6 - formerly undocumented salc
+ * ea - formerly jmp seg:ofs (marked as "supported"???)
+ *
+ * Opcodes we'll probably never support:
+ * 6c-6f - ins,outs. SEGVs if used in userspace
+ * e4-e7 - in,out imm. SEGVs if used in userspace
+ * ec-ef - in,out acc. SEGVs if used in userspace
+ * cc - int3. SIGTRAP if used in userspace
+ * f1 - int1. SIGTRAP if used in userspace
+ * f4 - hlt. SEGVs if used in userspace
+ * fa - cli. SEGVs if used in userspace
+ * fb - sti. SEGVs if used in userspace
+ *
+ * Opcodes which need some work to be supported:
+ * cd - int N.
+ *	Used by userspace for "int 80" syscall entry. (Other "int N"
+ *	cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
+ *	Not supported since kernel's handling of userspace single-stepping
+ *	(TF flag) is fragile.
+ * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
+ */
 static volatile u32 good_insns_64[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 	/*      ----------------------------------------------         */
@@ -142,23 +274,6 @@ static volatile u32 good_insns_64[256 / 32] = {
 #undef W
 
 /*
- * opcodes we'll probably never support:
- *
- *  6c-6d, e4-e5, ec-ed - in
- *  6e-6f, e6-e7, ee-ef - out
- *  cc, cd - int3, int
- *  cf - iret
- *  d6 - illegal instruction
- *  f1 - int1/icebp
- *  f4 - hlt
- *  fa, fb - cli, sti
- *  0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2
- *
- * invalid opcodes in 64-bit mode:
- *
- *  06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5
- *  63 - we support this opcode in x86_64 but not in i386.
- *
  * opcodes we may need to refine support for:
  *
  *  0f - 2-byte instructions: For many of these instructions, the validity
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH] uprobes: simplify rip-relative handling
  2014-04-26 17:29 [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Denys Vlasenko
@ 2014-04-26 17:29 ` Denys Vlasenko
  2014-04-27 11:30   ` Oleg Nesterov
  2014-04-27 11:29 ` [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Oleg Nesterov
  1 sibling, 1 reply; 5+ messages in thread
From: Denys Vlasenko @ 2014-04-26 17:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denys Vlasenko, Jim Keniston, Masami Hiramatsu, Oleg Nesterov

It is possible to replace rip-relative addressing mode
with addressing mode of the same length: [reg+disp32].
This eliminates the need to fix up immediate.

Only

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jim Keniston <jkenisto@us.ibm.com>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/uprobes.h |  2 +-
 arch/x86/kernel/uprobes.c      | 30 +++++++++++++-----------------
 2 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 93bee7b..9197119 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -46,7 +46,7 @@ struct arch_uprobe {
 
 	union {
 #ifdef CONFIG_X86_64
-		unsigned long			rip_rela_target_address;
+		int				insn_length;
 #endif
 		struct {
 			s32	offs;
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 7021dbe..b461dda 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -349,7 +349,7 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
  * If arch_uprobe->insn doesn't use rip-relative addressing, return
  * immediately.  Otherwise, rewrite the instruction so that it accesses
  * its memory operand indirectly through a scratch register.  Set
- * arch_uprobe->fixups and arch_uprobe->rip_rela_target_address
+ * arch_uprobe->fixups and arch_uprobe->insn_length
  * accordingly.  (The contents of the scratch register will be saved
  * before we single-step the modified instruction, and restored
  * afterward.)
@@ -395,9 +395,12 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
 	insn_get_length(insn);
 
 	/*
-	 * Convert from rip-relative addressing to indirect addressing
+	 * Convert from rip-relative addressing to register-relative addressing
 	 * via a scratch register.  Change the r/m field from 0x5 (%rip)
-	 * to 0x0 (%rax) or 0x1 (%rcx), and squeeze out the offset field.
+	 * to 0x0 (%rax) or 0x1 (%rcx), change mode field
+	 * from 00 to 10 (reg+disp32). Example:
+	 * 89 05 disp32  mov %eax,disp32(%rip) becomes
+	 * 89 81 disp32  mov %eax,disp32(%rcx)
 	 */
 	reg = MODRM_REG(insn);
 	if (reg == 0) {
@@ -409,23 +412,16 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
 		 * #1) for the scratch register.
 		 */
 		auprobe->fixups = UPROBE_FIX_RIP_CX;
-		/* Change modrm from 00 000 101 to 00 000 001. */
-		*cursor = 0x1;
+		/* Change modrm from 00 000 101 to 10 000 001. */
+		*cursor = 0x81;
 	} else {
 		/* Use %rax (register #0) for the scratch register. */
 		auprobe->fixups = UPROBE_FIX_RIP_AX;
-		/* Change modrm from 00 xxx 101 to 00 xxx 000 */
-		*cursor = (reg << 3);
+		/* Change modrm from 00 xxx 101 to 10 xxx 000 */
+		*cursor = (reg << 3) | 0x80;
 	}
 
-	/* Target address = address of next instruction + (signed) offset */
-	auprobe->rip_rela_target_address = (long)insn->length + insn->displacement.value;
-
-	/* Displacement field is gone; slide immediate field (if any) over. */
-	if (insn->immediate.nbytes) {
-		cursor++;
-		memmove(cursor, cursor + insn->displacement.nbytes, insn->immediate.nbytes);
-	}
+	auprobe->insn_length = insn->length;
 }
 
 /*
@@ -439,11 +435,11 @@ pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
 	if (auprobe->fixups & UPROBE_FIX_RIP_AX) {
 		autask->saved_scratch_register = regs->ax;
 		regs->ax = current->utask->vaddr;
-		regs->ax += auprobe->rip_rela_target_address;
+		regs->ax += auprobe->insn_length;
 	} else if (auprobe->fixups & UPROBE_FIX_RIP_CX) {
 		autask->saved_scratch_register = regs->cx;
 		regs->cx = current->utask->vaddr;
-		regs->cx += auprobe->rip_rela_target_address;
+		regs->cx += auprobe->insn_length;
 	}
 }
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
  2014-04-26 17:29 [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Denys Vlasenko
  2014-04-26 17:29 ` [RFC PATCH] uprobes: simplify rip-relative handling Denys Vlasenko
@ 2014-04-27 11:29 ` Oleg Nesterov
  1 sibling, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2014-04-27 11:29 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-kernel, Jim Keniston, Masami Hiramatsu

On 04/26, Denys Vlasenko wrote:
>
> After adding these, it's clear we have some awkward choices there.
> some valid instructions are prohibited from uprobing while
> several invalid ones are allowed.

Denys, both patches conflict with the pending changes in uprobe.c.

Could you resend after the next PULL request?

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] uprobes: simplify rip-relative handling
  2014-04-26 17:29 ` [RFC PATCH] uprobes: simplify rip-relative handling Denys Vlasenko
@ 2014-04-27 11:30   ` Oleg Nesterov
  2014-04-27 13:13     ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2014-04-27 11:30 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-kernel, Jim Keniston, Masami Hiramatsu

On 04/26, Denys Vlasenko wrote:
>
> @@ -46,7 +46,7 @@ struct arch_uprobe {
>  
>  	union {
>  #ifdef CONFIG_X86_64
> -		unsigned long			rip_rela_target_address;
> +		int				insn_length;
>  #endif

in particular, we already have auprobe->def.ilen this patch can use.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] uprobes: simplify rip-relative handling
  2014-04-27 11:30   ` Oleg Nesterov
@ 2014-04-27 13:13     ` Oleg Nesterov
  0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2014-04-27 13:13 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-kernel, Jim Keniston, Masami Hiramatsu

On 04/27, Oleg Nesterov wrote:
>
> On 04/26, Denys Vlasenko wrote:
> >
> > @@ -46,7 +46,7 @@ struct arch_uprobe {
> >
> >  	union {
> >  #ifdef CONFIG_X86_64
> > -		unsigned long			rip_rela_target_address;
> > +		int				insn_length;
> >  #endif
>
> in particular, we already have auprobe->def.ilen this patch can use.

And I forgot to mention...

I can easily misunderstand the change in handle_riprel_insn(), but it
seems that since you removed memmove(cursor, ...) the changed insn has
the same lenghth as the original insn?

If yes,

	- the patch is wrong, it breaks the "correction" logic

	- but, at the same time, we can simplify this logic and
	  just kill the "long *correction" arg of riprel_post_xol.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-04-27 13:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-26 17:29 [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Denys Vlasenko
2014-04-26 17:29 ` [RFC PATCH] uprobes: simplify rip-relative handling Denys Vlasenko
2014-04-27 11:30   ` Oleg Nesterov
2014-04-27 13:13     ` Oleg Nesterov
2014-04-27 11:29 ` [RFC PATCH] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox