public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Streamline arithmetic instruction emulation
@ 2013-01-04 14:18 Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 1/7] KVM: x86 emulator: framework for streamlining arithmetic opcodes Avi Kivity
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

The current arithmetic instruction emulation is fairly clumsy: after
decode, each instruction gets a switch (size), and for every size
we fetch the operands, prepare flags, emulate the instruction, then store
back the flags and operands.

This patchset simplifies things by moving everything into common code
except the instruction itself.  All the pre- and post- processing is
coded just once.  The per-instrution code looks like:

  add %bl, %al
  ret

  add %bx, %ax
  ret

  add %ebx, %eax
  ret

  add %rbx, %rax
  ret

The savings in size, for the ten instructions converted in this patchset,
are fairly large:

   text	   data	    bss	    dec	    hex	filename
  63724	      0	      0	  63724	   f8ec	arch/x86/kvm/emulate.o.before
  61268	      0	      0	  61268	   ef54	arch/x86/kvm/emulate.o.after

- around 2500 bytes.

v3: fix reversed operand order in 2-operand macro

v2: rebased

Avi Kivity (7):
  KVM: x86 emulator: framework for streamlining arithmetic opcodes
  KVM: x86 emulator: Support for declaring single operand fastops
  KVM: x86 emulator: introduce NoWrite flag
  KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite
  KVM: x86 emulator: convert NOT, NEG to fastop
  KVM: x86 emulator: add macros for defining 2-operand fastop emulation
  KVM: x86 emulator: convert basic ALU ops to fastop

 arch/x86/kvm/emulate.c | 215 +++++++++++++++++++++++++++----------------------
 1 file changed, 120 insertions(+), 95 deletions(-)

-- 
1.8.0.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/7] KVM: x86 emulator: framework for streamlining arithmetic opcodes
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
@ 2013-01-04 14:18 ` Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 2/7] KVM: x86 emulator: Support for declaring single operand fastops Avi Kivity
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

We emulate arithmetic opcodes by executing a "similar" (same operation,
different operands) on the cpu.  This ensures accurate emulation, esp. wrt.
eflags.  However, the prologue and epilogue around the opcode is fairly long,
consisting of a switch (for the operand size) and code to load and save the
operands.  This is repeated for every opcode.

This patch introduces an alternative way to emulate arithmetic opcodes.
Instead of the above, we have four (three on i386) functions consisting
of just the opcode and a ret; one for each operand size.  For example:

   .align 8
   em_notb:
	not %al
	ret

   .align 8
   em_notw:
	not %ax
	ret

   .align 8
   em_notl:
	not %eax
	ret

   .align 8
   em_notq:
	not %rax
	ret

The prologue and epilogue are shared across all opcodes.  Note the functions
use a special calling convention; notably eflags is an input/output parameter
and is not clobbered.  Rather than dispatching the four functions through a
jump table, the functions are declared as a constant size (8) so their address
can be calculated.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
---
 arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 53c5ad6..dd71567 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -149,6 +149,7 @@
 #define Aligned     ((u64)1 << 41)  /* Explicitly aligned (e.g. MOVDQA) */
 #define Unaligned   ((u64)1 << 42)  /* Explicitly unaligned (e.g. MOVDQU) */
 #define Avx         ((u64)1 << 43)  /* Advanced Vector Extensions */
+#define Fastop      ((u64)1 << 44)  /* Use opcode::u.fastop */
 
 #define X2(x...) x, x
 #define X3(x...) X2(x), x
@@ -159,6 +160,27 @@
 #define X8(x...) X4(x), X4(x)
 #define X16(x...) X8(x), X8(x)
 
+#define NR_FASTOP (ilog2(sizeof(ulong)) + 1)
+#define FASTOP_SIZE 8
+
+/*
+ * fastop functions have a special calling convention:
+ *
+ * dst:    [rdx]:rax  (in/out)
+ * src:    rbx        (in/out)
+ * src2:   rcx        (in)
+ * flags:  rflags     (in/out)
+ *
+ * Moreover, they are all exactly FASTOP_SIZE bytes long, so functions for
+ * different operand sizes can be reached by calculation, rather than a jump
+ * table (which would be bigger than the code).
+ *
+ * fastop functions are declared as taking a never-defined fastop parameter,
+ * so they can't be called from C directly.
+ */
+
+struct fastop;
+
 struct opcode {
 	u64 flags : 56;
 	u64 intercept : 8;
@@ -168,6 +190,7 @@ struct opcode {
 		const struct group_dual *gdual;
 		const struct gprefix *gprefix;
 		const struct escape *esc;
+		void (*fastop)(struct fastop *fake);
 	} u;
 	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
 };
@@ -3646,6 +3669,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define GD(_f, _g) { .flags = ((_f) | GroupDual | ModRM), .u.gdual = (_g) }
 #define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) }
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
+#define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
 #define II(_f, _e, _i) \
 	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
 #define IIP(_f, _e, _i, _p) \
@@ -4502,6 +4526,16 @@ static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
 		read_mmx_reg(ctxt, &op->mm_val, op->addr.mm);
 }
 
+static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
+{
+	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
+	fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
+	asm("push %[flags]; popf; call *%[fastop]; pushf; pop %[flags]\n"
+	    : "+a"(ctxt->dst.val), "+b"(ctxt->src.val), [flags]"+D"(flags)
+	: "c"(ctxt->src2.val), [fastop]"S"(fop));
+	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
+	return X86EMUL_CONTINUE;
+}
 
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 {
@@ -4631,6 +4665,13 @@ special_insn:
 	}
 
 	if (ctxt->execute) {
+		if (ctxt->d & Fastop) {
+			void (*fop)(struct fastop *) = (void *)ctxt->execute;
+			rc = fastop(ctxt, fop);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+			goto writeback;
+		}
 		rc = ctxt->execute(ctxt);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
-- 
1.8.0.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/7] KVM: x86 emulator: Support for declaring single operand fastops
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 1/7] KVM: x86 emulator: framework for streamlining arithmetic opcodes Avi Kivity
@ 2013-01-04 14:18 ` Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 3/7] KVM: x86 emulator: introduce NoWrite flag Avi Kivity
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
---
 arch/x86/kvm/emulate.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dd71567..42c53c8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -24,6 +24,7 @@
 #include "kvm_cache_regs.h"
 #include <linux/module.h>
 #include <asm/kvm_emulate.h>
+#include <linux/stringify.h>
 
 #include "x86.h"
 #include "tss.h"
@@ -439,6 +440,30 @@ static void invalidate_registers(struct x86_emulate_ctxt *ctxt)
 		}							\
 	} while (0)
 
+#define FOP_ALIGN ".align " __stringify(FASTOP_SIZE) " \n\t"
+#define FOP_RET   "ret \n\t"
+
+#define FOP_START(op) \
+	extern void em_##op(struct fastop *fake); \
+	asm(".pushsection .text, \"ax\" \n\t" \
+	    ".global em_" #op " \n\t" \
+            FOP_ALIGN \
+	    "em_" #op ": \n\t"
+
+#define FOP_END \
+	    ".popsection")
+
+#define FOP1E(op,  dst) \
+	FOP_ALIGN #op " %" #dst " \n\t" FOP_RET
+
+#define FASTOP1(op) \
+	FOP_START(op) \
+	FOP1E(op##b, al) \
+	FOP1E(op##w, ax) \
+	FOP1E(op##l, eax) \
+	ON64(FOP1E(op##q, rax))	\
+	FOP_END
+
 #define __emulate_1op_rax_rdx(ctxt, _op, _suffix, _ex)			\
 	do {								\
 		unsigned long _tmp;					\
-- 
1.8.0.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/7] KVM: x86 emulator: introduce NoWrite flag
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 1/7] KVM: x86 emulator: framework for streamlining arithmetic opcodes Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 2/7] KVM: x86 emulator: Support for declaring single operand fastops Avi Kivity
@ 2013-01-04 14:18 ` Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 4/7] KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite Avi Kivity
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

Instead of disabling writeback via OP_NONE, just specify NoWrite.

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
---
 arch/x86/kvm/emulate.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 42c53c8..fe113fb 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -151,6 +151,7 @@
 #define Unaligned   ((u64)1 << 42)  /* Explicitly unaligned (e.g. MOVDQU) */
 #define Avx         ((u64)1 << 43)  /* Advanced Vector Extensions */
 #define Fastop      ((u64)1 << 44)  /* Use opcode::u.fastop */
+#define NoWrite     ((u64)1 << 45)  /* No writeback */
 
 #define X2(x...) x, x
 #define X3(x...) X2(x), x
@@ -1633,6 +1634,9 @@ static int writeback(struct x86_emulate_ctxt *ctxt)
 {
 	int rc;
 
+	if (ctxt->d & NoWrite)
+		return X86EMUL_CONTINUE;
+
 	switch (ctxt->dst.type) {
 	case OP_REG:
 		write_register_operand(&ctxt->dst);
-- 
1.8.0.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/7] KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
                   ` (2 preceding siblings ...)
  2013-01-04 14:18 ` [PATCH v3 3/7] KVM: x86 emulator: introduce NoWrite flag Avi Kivity
@ 2013-01-04 14:18 ` Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 5/7] KVM: x86 emulator: convert NOT, NEG to fastop Avi Kivity
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
---
 arch/x86/kvm/emulate.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fe113fb..2af0c44 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3069,16 +3069,12 @@ static int em_xor(struct x86_emulate_ctxt *ctxt)
 static int em_cmp(struct x86_emulate_ctxt *ctxt)
 {
 	emulate_2op_SrcV(ctxt, "cmp");
-	/* Disable writeback. */
-	ctxt->dst.type = OP_NONE;
 	return X86EMUL_CONTINUE;
 }
 
 static int em_test(struct x86_emulate_ctxt *ctxt)
 {
 	emulate_2op_SrcV(ctxt, "test");
-	/* Disable writeback. */
-	ctxt->dst.type = OP_NONE;
 	return X86EMUL_CONTINUE;
 }
 
@@ -3747,7 +3743,7 @@ static const struct opcode group1[] = {
 	I(Lock | PageTable, em_and),
 	I(Lock, em_sub),
 	I(Lock, em_xor),
-	I(0, em_cmp),
+	I(NoWrite, em_cmp),
 };
 
 static const struct opcode group1A[] = {
@@ -3755,8 +3751,8 @@ static const struct opcode group1A[] = {
 };
 
 static const struct opcode group3[] = {
-	I(DstMem | SrcImm, em_test),
-	I(DstMem | SrcImm, em_test),
+	I(DstMem | SrcImm | NoWrite, em_test),
+	I(DstMem | SrcImm | NoWrite, em_test),
 	I(DstMem | SrcNone | Lock, em_not),
 	I(DstMem | SrcNone | Lock, em_neg),
 	I(SrcMem, em_mul_ex),
@@ -3920,7 +3916,7 @@ static const struct opcode opcode_table[256] = {
 	/* 0x30 - 0x37 */
 	I6ALU(Lock, em_xor), N, N,
 	/* 0x38 - 0x3F */
-	I6ALU(0, em_cmp), N, N,
+	I6ALU(NoWrite, em_cmp), N, N,
 	/* 0x40 - 0x4F */
 	X16(D(DstReg)),
 	/* 0x50 - 0x57 */
@@ -3946,7 +3942,7 @@ static const struct opcode opcode_table[256] = {
 	G(DstMem | SrcImm, group1),
 	G(ByteOp | DstMem | SrcImm | No64, group1),
 	G(DstMem | SrcImmByte, group1),
-	I2bv(DstMem | SrcReg | ModRM, em_test),
+	I2bv(DstMem | SrcReg | ModRM | NoWrite, em_test),
 	I2bv(DstMem | SrcReg | ModRM | Lock | PageTable, em_xchg),
 	/* 0x88 - 0x8F */
 	I2bv(DstMem | SrcReg | ModRM | Mov | PageTable, em_mov),
@@ -3966,12 +3962,12 @@ static const struct opcode opcode_table[256] = {
 	I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
 	I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
 	I2bv(SrcSI | DstDI | Mov | String, em_mov),
-	I2bv(SrcSI | DstDI | String, em_cmp),
+	I2bv(SrcSI | DstDI | String | NoWrite, em_cmp),
 	/* 0xA8 - 0xAF */
-	I2bv(DstAcc | SrcImm, em_test),
+	I2bv(DstAcc | SrcImm | NoWrite, em_test),
 	I2bv(SrcAcc | DstDI | Mov | String, em_mov),
 	I2bv(SrcSI | DstAcc | Mov | String, em_mov),
-	I2bv(SrcAcc | DstDI | String, em_cmp),
+	I2bv(SrcAcc | DstDI | String | NoWrite, em_cmp),
 	/* 0xB0 - 0xB7 */
 	X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
 	/* 0xB8 - 0xBF */
-- 
1.8.0.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 5/7] KVM: x86 emulator: convert NOT, NEG to fastop
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
                   ` (3 preceding siblings ...)
  2013-01-04 14:18 ` [PATCH v3 4/7] KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite Avi Kivity
@ 2013-01-04 14:18 ` Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 6/7] KVM: x86 emulator: add macros for defining 2-operand fastop emulation Avi Kivity
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
---
 arch/x86/kvm/emulate.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2af0c44..09dbdc5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2050,17 +2050,8 @@ static int em_grp2(struct x86_emulate_ctxt *ctxt)
 	return X86EMUL_CONTINUE;
 }
 
-static int em_not(struct x86_emulate_ctxt *ctxt)
-{
-	ctxt->dst.val = ~ctxt->dst.val;
-	return X86EMUL_CONTINUE;
-}
-
-static int em_neg(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_1op(ctxt, "neg");
-	return X86EMUL_CONTINUE;
-}
+FASTOP1(not);
+FASTOP1(neg);
 
 static int em_mul_ex(struct x86_emulate_ctxt *ctxt)
 {
@@ -3753,8 +3744,8 @@ static const struct opcode group1A[] = {
 static const struct opcode group3[] = {
 	I(DstMem | SrcImm | NoWrite, em_test),
 	I(DstMem | SrcImm | NoWrite, em_test),
-	I(DstMem | SrcNone | Lock, em_not),
-	I(DstMem | SrcNone | Lock, em_neg),
+	F(DstMem | SrcNone | Lock, em_not),
+	F(DstMem | SrcNone | Lock, em_neg),
 	I(SrcMem, em_mul_ex),
 	I(SrcMem, em_imul_ex),
 	I(SrcMem, em_div_ex),
-- 
1.8.0.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 6/7] KVM: x86 emulator: add macros for defining 2-operand fastop emulation
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
                   ` (4 preceding siblings ...)
  2013-01-04 14:18 ` [PATCH v3 5/7] KVM: x86 emulator: convert NOT, NEG to fastop Avi Kivity
@ 2013-01-04 14:18 ` Avi Kivity
  2013-01-04 14:18 ` [PATCH v3 7/7] KVM: x86 emulator: convert basic ALU ops to fastop Avi Kivity
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
---
 arch/x86/kvm/emulate.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 09dbdc5..3b5d4dd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -465,6 +465,17 @@ static void invalidate_registers(struct x86_emulate_ctxt *ctxt)
 	ON64(FOP1E(op##q, rax))	\
 	FOP_END
 
+#define FOP2E(op,  dst, src)	   \
+	FOP_ALIGN #op " %" #src ", %" #dst " \n\t" FOP_RET
+
+#define FASTOP2(op) \
+	FOP_START(op) \
+	FOP2E(op##b, al, bl) \
+	FOP2E(op##w, ax, bx) \
+	FOP2E(op##l, eax, ebx) \
+	ON64(FOP2E(op##q, rax, rbx)) \
+	FOP_END
+
 #define __emulate_1op_rax_rdx(ctxt, _op, _suffix, _ex)			\
 	do {								\
 		unsigned long _tmp;					\
@@ -3696,6 +3707,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define D2bv(_f)      D((_f) | ByteOp), D(_f)
 #define D2bvIP(_f, _i, _p) DIP((_f) | ByteOp, _i, _p), DIP(_f, _i, _p)
 #define I2bv(_f, _e)  I((_f) | ByteOp, _e), I(_f, _e)
+#define F2bv(_f, _e)  F((_f) | ByteOp, _e), F(_f, _e)
 #define I2bvIP(_f, _e, _i, _p) \
 	IIP((_f) | ByteOp, _e, _i, _p), IIP(_f, _e, _i, _p)
 
-- 
1.8.0.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 7/7] KVM: x86 emulator: convert basic ALU ops to fastop
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
                   ` (5 preceding siblings ...)
  2013-01-04 14:18 ` [PATCH v3 6/7] KVM: x86 emulator: add macros for defining 2-operand fastop emulation Avi Kivity
@ 2013-01-04 14:18 ` Avi Kivity
  2013-01-08 11:38 ` [PATCH v3 0/7] Streamline arithmetic instruction emulation Gleb Natapov
  2013-01-10 17:33 ` Marcelo Tosatti
  8 siblings, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2013-01-04 14:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Gleb Natapov; +Cc: kvm

Opcodes:
	TEST
	CMP
	ADD
	ADC
	SUB
	SBB
	XOR
	OR
	AND

Signed-off-by: Avi Kivity <avi.kivity@gmail.com>
---
 arch/x86/kvm/emulate.c | 112 +++++++++++++++----------------------------------
 1 file changed, 34 insertions(+), 78 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3b5d4dd..619a33d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3026,59 +3026,15 @@ static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)
 	return X86EMUL_CONTINUE;
 }
 
-static int em_add(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "add");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_or(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "or");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_adc(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "adc");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_sbb(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "sbb");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_and(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "and");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_sub(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "sub");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_xor(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "xor");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_cmp(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "cmp");
-	return X86EMUL_CONTINUE;
-}
-
-static int em_test(struct x86_emulate_ctxt *ctxt)
-{
-	emulate_2op_SrcV(ctxt, "test");
-	return X86EMUL_CONTINUE;
-}
+FASTOP2(add);
+FASTOP2(or);
+FASTOP2(adc);
+FASTOP2(sbb);
+FASTOP2(and);
+FASTOP2(sub);
+FASTOP2(xor);
+FASTOP2(cmp);
+FASTOP2(test);
 
 static int em_xchg(struct x86_emulate_ctxt *ctxt)
 {
@@ -3711,9 +3667,9 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define I2bvIP(_f, _e, _i, _p) \
 	IIP((_f) | ByteOp, _e, _i, _p), IIP(_f, _e, _i, _p)
 
-#define I6ALU(_f, _e) I2bv((_f) | DstMem | SrcReg | ModRM, _e),		\
-		I2bv(((_f) | DstReg | SrcMem | ModRM) & ~Lock, _e),	\
-		I2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
+#define F6ALU(_f, _e) F2bv((_f) | DstMem | SrcReg | ModRM, _e),		\
+		F2bv(((_f) | DstReg | SrcMem | ModRM) & ~Lock, _e),	\
+		F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
 
 static const struct opcode group7_rm1[] = {
 	DI(SrcNone | Priv, monitor),
@@ -3739,14 +3695,14 @@ static const struct opcode group7_rm7[] = {
 };
 
 static const struct opcode group1[] = {
-	I(Lock, em_add),
-	I(Lock | PageTable, em_or),
-	I(Lock, em_adc),
-	I(Lock, em_sbb),
-	I(Lock | PageTable, em_and),
-	I(Lock, em_sub),
-	I(Lock, em_xor),
-	I(NoWrite, em_cmp),
+	F(Lock, em_add),
+	F(Lock | PageTable, em_or),
+	F(Lock, em_adc),
+	F(Lock, em_sbb),
+	F(Lock | PageTable, em_and),
+	F(Lock, em_sub),
+	F(Lock, em_xor),
+	F(NoWrite, em_cmp),
 };
 
 static const struct opcode group1A[] = {
@@ -3754,8 +3710,8 @@ static const struct opcode group1A[] = {
 };
 
 static const struct opcode group3[] = {
-	I(DstMem | SrcImm | NoWrite, em_test),
-	I(DstMem | SrcImm | NoWrite, em_test),
+	F(DstMem | SrcImm | NoWrite, em_test),
+	F(DstMem | SrcImm | NoWrite, em_test),
 	F(DstMem | SrcNone | Lock, em_not),
 	F(DstMem | SrcNone | Lock, em_neg),
 	I(SrcMem, em_mul_ex),
@@ -3897,29 +3853,29 @@ static const struct escape escape_dd = { {
 
 static const struct opcode opcode_table[256] = {
 	/* 0x00 - 0x07 */
-	I6ALU(Lock, em_add),
+	F6ALU(Lock, em_add),
 	I(ImplicitOps | Stack | No64 | Src2ES, em_push_sreg),
 	I(ImplicitOps | Stack | No64 | Src2ES, em_pop_sreg),
 	/* 0x08 - 0x0F */
-	I6ALU(Lock | PageTable, em_or),
+	F6ALU(Lock | PageTable, em_or),
 	I(ImplicitOps | Stack | No64 | Src2CS, em_push_sreg),
 	N,
 	/* 0x10 - 0x17 */
-	I6ALU(Lock, em_adc),
+	F6ALU(Lock, em_adc),
 	I(ImplicitOps | Stack | No64 | Src2SS, em_push_sreg),
 	I(ImplicitOps | Stack | No64 | Src2SS, em_pop_sreg),
 	/* 0x18 - 0x1F */
-	I6ALU(Lock, em_sbb),
+	F6ALU(Lock, em_sbb),
 	I(ImplicitOps | Stack | No64 | Src2DS, em_push_sreg),
 	I(ImplicitOps | Stack | No64 | Src2DS, em_pop_sreg),
 	/* 0x20 - 0x27 */
-	I6ALU(Lock | PageTable, em_and), N, N,
+	F6ALU(Lock | PageTable, em_and), N, N,
 	/* 0x28 - 0x2F */
-	I6ALU(Lock, em_sub), N, I(ByteOp | DstAcc | No64, em_das),
+	F6ALU(Lock, em_sub), N, I(ByteOp | DstAcc | No64, em_das),
 	/* 0x30 - 0x37 */
-	I6ALU(Lock, em_xor), N, N,
+	F6ALU(Lock, em_xor), N, N,
 	/* 0x38 - 0x3F */
-	I6ALU(NoWrite, em_cmp), N, N,
+	F6ALU(NoWrite, em_cmp), N, N,
 	/* 0x40 - 0x4F */
 	X16(D(DstReg)),
 	/* 0x50 - 0x57 */
@@ -3945,7 +3901,7 @@ static const struct opcode opcode_table[256] = {
 	G(DstMem | SrcImm, group1),
 	G(ByteOp | DstMem | SrcImm | No64, group1),
 	G(DstMem | SrcImmByte, group1),
-	I2bv(DstMem | SrcReg | ModRM | NoWrite, em_test),
+	F2bv(DstMem | SrcReg | ModRM | NoWrite, em_test),
 	I2bv(DstMem | SrcReg | ModRM | Lock | PageTable, em_xchg),
 	/* 0x88 - 0x8F */
 	I2bv(DstMem | SrcReg | ModRM | Mov | PageTable, em_mov),
@@ -3965,12 +3921,12 @@ static const struct opcode opcode_table[256] = {
 	I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
 	I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
 	I2bv(SrcSI | DstDI | Mov | String, em_mov),
-	I2bv(SrcSI | DstDI | String | NoWrite, em_cmp),
+	F2bv(SrcSI | DstDI | String | NoWrite, em_cmp),
 	/* 0xA8 - 0xAF */
-	I2bv(DstAcc | SrcImm | NoWrite, em_test),
+	F2bv(DstAcc | SrcImm | NoWrite, em_test),
 	I2bv(SrcAcc | DstDI | Mov | String, em_mov),
 	I2bv(SrcSI | DstAcc | Mov | String, em_mov),
-	I2bv(SrcAcc | DstDI | String | NoWrite, em_cmp),
+	F2bv(SrcAcc | DstDI | String | NoWrite, em_cmp),
 	/* 0xB0 - 0xB7 */
 	X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
 	/* 0xB8 - 0xBF */
-- 
1.8.0.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/7] Streamline arithmetic instruction emulation
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
                   ` (6 preceding siblings ...)
  2013-01-04 14:18 ` [PATCH v3 7/7] KVM: x86 emulator: convert basic ALU ops to fastop Avi Kivity
@ 2013-01-08 11:38 ` Gleb Natapov
  2013-01-10 17:33 ` Marcelo Tosatti
  8 siblings, 0 replies; 10+ messages in thread
From: Gleb Natapov @ 2013-01-08 11:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On Fri, Jan 04, 2013 at 04:18:47PM +0200, Avi Kivity wrote:
> The current arithmetic instruction emulation is fairly clumsy: after
> decode, each instruction gets a switch (size), and for every size
> we fetch the operands, prepare flags, emulate the instruction, then store
> back the flags and operands.
> 
> This patchset simplifies things by moving everything into common code
> except the instruction itself.  All the pre- and post- processing is
> coded just once.  The per-instrution code looks like:
> 
>   add %bl, %al
>   ret
> 
>   add %bx, %ax
>   ret
> 
>   add %ebx, %eax
>   ret
> 
>   add %rbx, %rax
>   ret
> 
> The savings in size, for the ten instructions converted in this patchset,
> are fairly large:
> 
>    text	   data	    bss	    dec	    hex	filename
>   63724	      0	      0	  63724	   f8ec	arch/x86/kvm/emulate.o.before
>   61268	      0	      0	  61268	   ef54	arch/x86/kvm/emulate.o.after
> 
> - around 2500 bytes.
> 
> v3: fix reversed operand order in 2-operand macro
> 
> v2: rebased
> 
Acked-by: Gleb Natapov <gleb@redhat.com>

--
			Gleb.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/7] Streamline arithmetic instruction emulation
  2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
                   ` (7 preceding siblings ...)
  2013-01-08 11:38 ` [PATCH v3 0/7] Streamline arithmetic instruction emulation Gleb Natapov
@ 2013-01-10 17:33 ` Marcelo Tosatti
  8 siblings, 0 replies; 10+ messages in thread
From: Marcelo Tosatti @ 2013-01-10 17:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, kvm

On Fri, Jan 04, 2013 at 04:18:47PM +0200, Avi Kivity wrote:
> The current arithmetic instruction emulation is fairly clumsy: after
> decode, each instruction gets a switch (size), and for every size
> we fetch the operands, prepare flags, emulate the instruction, then store
> back the flags and operands.
> 
> This patchset simplifies things by moving everything into common code
> except the instruction itself.  All the pre- and post- processing is
> coded just once.  The per-instrution code looks like:
> 
>   add %bl, %al
>   ret
> 
>   add %bx, %ax
>   ret
> 
>   add %ebx, %eax
>   ret
> 
>   add %rbx, %rax
>   ret
> 
> The savings in size, for the ten instructions converted in this patchset,
> are fairly large:
> 
>    text	   data	    bss	    dec	    hex	filename
>   63724	      0	      0	  63724	   f8ec	arch/x86/kvm/emulate.o.before
>   61268	      0	      0	  61268	   ef54	arch/x86/kvm/emulate.o.after
> 
> - around 2500 bytes.
> 
> v3: fix reversed operand order in 2-operand macro
> 
> v2: rebased
> 
> Avi Kivity (7):
>   KVM: x86 emulator: framework for streamlining arithmetic opcodes
>   KVM: x86 emulator: Support for declaring single operand fastops
>   KVM: x86 emulator: introduce NoWrite flag
>   KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite
>   KVM: x86 emulator: convert NOT, NEG to fastop
>   KVM: x86 emulator: add macros for defining 2-operand fastop emulation
>   KVM: x86 emulator: convert basic ALU ops to fastop
> 
>  arch/x86/kvm/emulate.c | 215 +++++++++++++++++++++++++++----------------------
>  1 file changed, 120 insertions(+), 95 deletions(-)

Applied, thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-01-10 17:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-04 14:18 [PATCH v3 0/7] Streamline arithmetic instruction emulation Avi Kivity
2013-01-04 14:18 ` [PATCH v3 1/7] KVM: x86 emulator: framework for streamlining arithmetic opcodes Avi Kivity
2013-01-04 14:18 ` [PATCH v3 2/7] KVM: x86 emulator: Support for declaring single operand fastops Avi Kivity
2013-01-04 14:18 ` [PATCH v3 3/7] KVM: x86 emulator: introduce NoWrite flag Avi Kivity
2013-01-04 14:18 ` [PATCH v3 4/7] KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite Avi Kivity
2013-01-04 14:18 ` [PATCH v3 5/7] KVM: x86 emulator: convert NOT, NEG to fastop Avi Kivity
2013-01-04 14:18 ` [PATCH v3 6/7] KVM: x86 emulator: add macros for defining 2-operand fastop emulation Avi Kivity
2013-01-04 14:18 ` [PATCH v3 7/7] KVM: x86 emulator: convert basic ALU ops to fastop Avi Kivity
2013-01-08 11:38 ` [PATCH v3 0/7] Streamline arithmetic instruction emulation Gleb Natapov
2013-01-10 17:33 ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox