kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] KVM: speed up invalid guest state emulation
@ 2014-03-27 11:30 Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

This series identifies some low-hanging fruit for speeding up emulation
of invalid guest state.  I am a bit worried about patch 2, while
everything else should be relatively safe.

On the kvm-unit-tests microbenchmarks I get a 1.8-2.5x speedup (from 
740-1100 cycles/instruction to 280-600).

This is not for 3.15, of course.  Even patch 5 is not too useful without
the others; saving 40 clock cycles is a 10% speedup after the previous
patches, but only 4% before.

Please review carefully!

Paolo

Paolo Bonzini (5):
  KVM: vmx: speed up emulation of invalid guest state
  KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  KVM: x86: move around some checks
  KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"
  KVM: x86: speed up emulated moves

 arch/x86/include/asm/kvm_emulate.h |   2 +-
 arch/x86/kvm/emulate.c             | 182 ++++++++++++++++++++-----------------
 arch/x86/kvm/vmx.c                 |   5 +-
 arch/x86/kvm/x86.c                 |  28 ++++--
 4 files changed, 120 insertions(+), 97 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-04-16 22:52   ` Marcelo Tosatti
  2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction.  However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).

In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment).  So there is no need to
call guest_state_valid in the emulation loop.

Emulation performance test results indicate 530-870 cycles
for common instructions, versus 740-1110 before this patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1320e0f8e611..73aa522db47b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3629,7 +3629,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
 	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
 
 out:
-	vmx->emulation_required |= emulation_required(vcpu);
+	vmx->emulation_required = emulation_required(vcpu);
 }
 
 static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -5580,7 +5580,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 	cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
 	intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
 
-	while (!guest_state_valid(vcpu) && count-- != 0) {
+	while (vmx->emulation_required && count-- != 0) {
 		if (intr_window_requested && vmx_interrupt_allowed(vcpu))
 			return handle_interrupt_window(&vmx->vcpu);
 
@@ -5614,7 +5614,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 			schedule();
 	}
 
-	vmx->emulation_required = emulation_required(vcpu);
 out:
 	return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-03-28 12:44   ` Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Despite the provisions to emulate up to 130 consecutive instructions, in
practice KVM will emulate just one before exiting handle_invalid_guest_state,
because x86_emulate_instructionn always sets KVM_REQ_EVENT.

However, we only need to do this if an interrupt could be injected,
which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
away; b) if the interrupt flag has just been set (because other
instructions than STI can set it without enabling an interrupt shadow).

This cuts another 250-300 clock cycles from the cost of emulating an
instruction (530-870 cycles before the patch on kvm-unit-tests,
290-600 afterwards).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd31aada351b..ce9523345f2e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static void process_nmi(struct kvm_vcpu *vcpu);
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 
 struct kvm_x86_ops *kvm_x86_ops;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
 	 * means that the last instruction is an sti. We should not
 	 * leave the flag on in this case. The same goes for mov ss
 	 */
-	if (!(int_shadow & mask))
+	if (unlikely(int_shadow) && !(int_shadow & mask)) {
 		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+	}
 }
 
 static void inject_emulated_exception(struct kvm_vcpu *vcpu)
@@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
 	return dr6;
 }
 
-static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
+static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r)
 {
 	struct kvm_run *kvm_run = vcpu->run;
 
 	/*
-	 * Use the "raw" value to see if TF was passed to the processor.
-	 * Note that the new value of the flags has not been saved yet.
+	 * rflags is the old, "raw" value of the flags.  The new value has
+	 * not been saved yet.
 	 *
 	 * This is correct even for TF set by the guest, because "the
 	 * processor will not generate this exception after the instruction
 	 * that sets the TF flag".
 	 */
-	unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
-
 	if (unlikely(rflags & X86_EFLAGS_TF)) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
 			kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
@@ -5263,13 +5264,15 @@ restart:
 		r = EMULATE_DONE;
 
 	if (writeback) {
+		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
 		kvm_rip_write(vcpu, ctxt->eip);
 		if (r == EMULATE_DONE)
-			kvm_vcpu_check_singlestep(vcpu, &r);
-		kvm_set_rflags(vcpu, ctxt->eflags);
+			kvm_vcpu_check_singlestep(vcpu, rflags, &r);
+		__kvm_set_rflags(vcpu, ctxt->eflags);
+		if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
+			kvm_make_request(KVM_REQ_EVENT, vcpu);
 	} else
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
 
@@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_rflags);
 
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
 	    kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
 		rflags |= X86_EFLAGS_TF;
 	kvm_x86_ops->set_rflags(vcpu, rflags);
+}
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	__kvm_set_rflags(vcpu, rflags);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/5] KVM: x86: move around some checks
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

The only purpose of this patch is to make the next patch simpler
to review.  No semantic change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c9f8f61df46c..9e40fbf94dcd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4347,12 +4347,15 @@ done_prefixes:
 		ctxt->d |= opcode.flags;
 	}
 
+	/* Unrecognised? */
+	if (ctxt->d == 0)
+		return EMULATION_FAILED;
+
 	ctxt->execute = opcode.u.execute;
 	ctxt->check_perm = opcode.check_perm;
 	ctxt->intercept = opcode.intercept;
 
-	/* Unrecognised? */
-	if (ctxt->d == 0 || (ctxt->d & NotImpl))
+	if (ctxt->d & NotImpl)
 		return EMULATION_FAILED;
 
 	if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
@@ -4494,19 +4497,19 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 	ctxt->mem_read.pos = 0;
 
-	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-			(ctxt->d & Undefined)) {
+	/* LOCK prefix is allowed only with some instructions */
+	if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
 
-	/* LOCK prefix is allowed only with some instructions */
-	if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
+	if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
 
-	if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
+	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+			(ctxt->d & Undefined)) {
 		rc = emulate_ud(ctxt);
 		goto done;
 	}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
                   ` (2 preceding siblings ...)
  2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

There are several checks for "peculiar" aspects of instructions in both
x86_decode_insn and x86_emulate_insn.  Group them together, and guard
them with a single "if" that lets the processor quickly skip them all.
To do this effectively, add two more flag bits that say whether the
.intercept and .check_perm fields are valid.

This skims about 10 cycles for each emulated instructions, which is
a 2 to 6% improvement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	To review, use -b.

 arch/x86/kvm/emulate.c | 175 ++++++++++++++++++++++++++-----------------------
 1 file changed, 94 insertions(+), 81 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9e40fbf94dcd..94974055d906 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -161,6 +161,8 @@
 #define Fastop      ((u64)1 << 44)  /* Use opcode::u.fastop */
 #define NoWrite     ((u64)1 << 45)  /* No writeback */
 #define SrcWrite    ((u64)1 << 46)  /* Write back src operand */
+#define Intercept   ((u64)1 << 47)  /* Has intercept field */
+#define CheckPerm   ((u64)1 << 48)  /* Has intercept field */
 
 #define DstXacc     (DstAccLo | SrcAccHi | SrcWrite)
 
@@ -3514,9 +3516,9 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 }
 
 #define D(_y) { .flags = (_y) }
-#define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i }
-#define DIP(_y, _i, _p) { .flags = (_y), .intercept = x86_intercept_##_i, \
-		      .check_perm = (_p) }
+#define DI(_y, _i) { .flags = (_y)|Intercept, .intercept = x86_intercept_##_i }
+#define DIP(_y, _i, _p) { .flags = (_y)|Intercept|CheckPerm, \
+		      .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define N    D(NotImpl)
 #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
 #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
@@ -3525,10 +3527,10 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
 #define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
 #define II(_f, _e, _i) \
-	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
+	{ .flags = (_f)|Intercept, .u.execute = (_e), .intercept = x86_intercept_##_i }
 #define IIP(_f, _e, _i, _p) \
-	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i, \
-	  .check_perm = (_p) }
+	{ .flags = (_f)|Intercept|CheckPerm, .u.execute = (_e), \
+	  .intercept = x86_intercept_##_i, .check_perm = (_p) }
 #define GP(_f, _g) { .flags = ((_f) | Prefix), .u.gprefix = (_g) }
 
 #define D2bv(_f)      D((_f) | ByteOp), D(_f)
@@ -4352,29 +4354,37 @@ done_prefixes:
 		return EMULATION_FAILED;
 
 	ctxt->execute = opcode.u.execute;
-	ctxt->check_perm = opcode.check_perm;
-	ctxt->intercept = opcode.intercept;
 
-	if (ctxt->d & NotImpl)
-		return EMULATION_FAILED;
+	if (unlikely(ctxt->d &
+		     (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
+		/*
+		 * These are copied unconditionally here, and checked unconditionally
+		 * in x86_emulate_insn.
+		 */
+		ctxt->check_perm = opcode.check_perm;
+		ctxt->intercept = opcode.intercept;
 
-	if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
-		return EMULATION_FAILED;
+		if (ctxt->d & NotImpl)
+			return EMULATION_FAILED;
 
-	if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
-		ctxt->op_bytes = 8;
+		if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
+			return EMULATION_FAILED;
 
-	if (ctxt->d & Op3264) {
-		if (mode == X86EMUL_MODE_PROT64)
+		if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
 			ctxt->op_bytes = 8;
-		else
-			ctxt->op_bytes = 4;
-	}
 
-	if (ctxt->d & Sse)
-		ctxt->op_bytes = 16;
-	else if (ctxt->d & Mmx)
-		ctxt->op_bytes = 8;
+		if (ctxt->d & Op3264) {
+			if (mode == X86EMUL_MODE_PROT64)
+				ctxt->op_bytes = 8;
+			else
+				ctxt->op_bytes = 4;
+		}
+
+		if (ctxt->d & Sse)
+			ctxt->op_bytes = 16;
+		else if (ctxt->d & Mmx)
+			ctxt->op_bytes = 8;
+	}
 
 	/* ModRM and SIB bytes. */
 	if (ctxt->d & ModRM) {
@@ -4508,75 +4518,78 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		goto done;
 	}
 
-	if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
-			(ctxt->d & Undefined)) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
-
-	if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
-	    || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
-
-	if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
-		rc = emulate_nm(ctxt);
-		goto done;
-	}
+	if (unlikely(ctxt->d &
+		     (No64|Undefined|Sse|Mmx|Intercept|CheckPerm|Priv|Prot|String))) {
+		if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+				(ctxt->d & Undefined)) {
+			rc = emulate_ud(ctxt);
+			goto done;
+		}
 
-	if (ctxt->d & Mmx) {
-		rc = flush_pending_x87_faults(ctxt);
-		if (rc != X86EMUL_CONTINUE)
+		if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
+		    || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
+			rc = emulate_ud(ctxt);
 			goto done;
-		/*
-		 * Now that we know the fpu is exception safe, we can fetch
-		 * operands from it.
-		 */
-		fetch_possible_mmx_operand(ctxt, &ctxt->src);
-		fetch_possible_mmx_operand(ctxt, &ctxt->src2);
-		if (!(ctxt->d & Mov))
-			fetch_possible_mmx_operand(ctxt, &ctxt->dst);
-	}
+		}
 
-	if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
-		rc = emulator_check_intercept(ctxt, ctxt->intercept,
-					      X86_ICPT_PRE_EXCEPT);
-		if (rc != X86EMUL_CONTINUE)
+		if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
+			rc = emulate_nm(ctxt);
 			goto done;
-	}
+		}
 
-	/* Privileged instruction can be executed only in CPL=0 */
-	if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
-		rc = emulate_gp(ctxt, 0);
-		goto done;
-	}
+		if (ctxt->d & Mmx) {
+			rc = flush_pending_x87_faults(ctxt);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+			/*
+			 * Now that we know the fpu is exception safe, we can fetch
+			 * operands from it.
+			 */
+			fetch_possible_mmx_operand(ctxt, &ctxt->src);
+			fetch_possible_mmx_operand(ctxt, &ctxt->src2);
+			if (!(ctxt->d & Mov))
+				fetch_possible_mmx_operand(ctxt, &ctxt->dst);
+		}
 
-	/* Instruction can only be executed in protected mode */
-	if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
-		rc = emulate_ud(ctxt);
-		goto done;
-	}
+		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+			rc = emulator_check_intercept(ctxt, ctxt->intercept,
+						      X86_ICPT_PRE_EXCEPT);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
 
-	/* Do instruction specific permission checks */
-	if (ctxt->check_perm) {
-		rc = ctxt->check_perm(ctxt);
-		if (rc != X86EMUL_CONTINUE)
+		/* Privileged instruction can be executed only in CPL=0 */
+		if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
+			rc = emulate_gp(ctxt, 0);
 			goto done;
-	}
+		}
 
-	if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
-		rc = emulator_check_intercept(ctxt, ctxt->intercept,
-					      X86_ICPT_POST_EXCEPT);
-		if (rc != X86EMUL_CONTINUE)
+		/* Instruction can only be executed in protected mode */
+		if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
+			rc = emulate_ud(ctxt);
 			goto done;
-	}
+		}
 
-	if (ctxt->rep_prefix && (ctxt->d & String)) {
-		/* All REP prefixes have the same first termination condition */
-		if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
-			ctxt->eip = ctxt->_eip;
-			goto done;
+		/* Do instruction specific permission checks */
+		if (ctxt->check_perm) {
+			rc = ctxt->check_perm(ctxt);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
+
+		if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+			rc = emulator_check_intercept(ctxt, ctxt->intercept,
+						      X86_ICPT_POST_EXCEPT);
+			if (rc != X86EMUL_CONTINUE)
+				goto done;
+		}
+
+		if (ctxt->rep_prefix && (ctxt->d & String)) {
+			/* All REP prefixes have the same first termination condition */
+			if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
+				ctxt->eip = ctxt->_eip;
+				goto done;
+			}
 		}
 	}
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 5/5] KVM: x86: speed up emulated moves
  2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
                   ` (3 preceding siblings ...)
  2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about 50
cycles on kvm-unit-tests for register-to-register moves, and makes them
about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 2 +-
 arch/x86/kvm/emulate.c             | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 24ec1216596e..f7b1e45eb753 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -232,7 +232,7 @@ struct operand {
 	union {
 		unsigned long val;
 		u64 val64;
-		char valptr[sizeof(unsigned long) + 2];
+		char valptr[sizeof(sse128_t)];
 		sse128_t vec_val;
 		u64 mm_val;
 		void *data;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 94974055d906..4a3584d419e5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2955,7 +2955,7 @@ static int em_rdpmc(struct x86_emulate_ctxt *ctxt)
 
 static int em_mov(struct x86_emulate_ctxt *ctxt)
 {
-	memcpy(ctxt->dst.valptr, ctxt->src.valptr, ctxt->op_bytes);
+	memcpy(ctxt->dst.valptr, ctxt->src.valptr, sizeof(ctxt->src.valptr));
 	return X86EMUL_CONTINUE;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
  2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
@ 2014-03-28 12:44   ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-28 12:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm

Il 27/03/2014 12:30, Paolo Bonzini ha scritto:
> Despite the provisions to emulate up to 130 consecutive instructions, in
> practice KVM will emulate just one before exiting handle_invalid_guest_state,
> because x86_emulate_instructionn always sets KVM_REQ_EVENT.
> 
> However, we only need to do this if an interrupt could be injected,
> which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
> away; b) if the interrupt flag has just been set (because other
> instructions than STI can set it without enabling an interrupt shadow).
> 
> This cuts another 250-300 clock cycles from the cost of emulating an
> instruction (530-870 cycles before the patch on kvm-unit-tests,
> 290-600 afterwards).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd31aada351b..ce9523345f2e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>  
>  static void update_cr8_intercept(struct kvm_vcpu *vcpu);
>  static void process_nmi(struct kvm_vcpu *vcpu);
> +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
>  
>  struct kvm_x86_ops *kvm_x86_ops;
>  EXPORT_SYMBOL_GPL(kvm_x86_ops);
> @@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
>  	 * means that the last instruction is an sti. We should not
>  	 * leave the flag on in this case. The same goes for mov ss
>  	 */
> -	if (!(int_shadow & mask))
> +	if (unlikely(int_shadow) && !(int_shadow & mask)) {
>  		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
> +	}

Better:

 	 * means that the last instruction is an sti. We should not
 	 * leave the flag on in this case. The same goes for mov ss
 	 */
-	if (!(int_shadow & mask))
+	mask &= ~int_shadow;
+	if (unlikely(mask != int_shadow))
 		kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+
+	/*
+	 * The interrupt window might have opened if a bit has been cleared.
+	 */
+	if (unlikely(int_shadow & ~mask))
+		kvm_make_request(KVM_REQ_EVENT, vcpu);

Paolo

>  }
>  
>  static void inject_emulated_exception(struct kvm_vcpu *vcpu)
> @@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
>  	return dr6;
>  }
>  
> -static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
> +static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r)
>  {
>  	struct kvm_run *kvm_run = vcpu->run;
>  
>  	/*
> -	 * Use the "raw" value to see if TF was passed to the processor.
> -	 * Note that the new value of the flags has not been saved yet.
> +	 * rflags is the old, "raw" value of the flags.  The new value has
> +	 * not been saved yet.
>  	 *
>  	 * This is correct even for TF set by the guest, because "the
>  	 * processor will not generate this exception after the instruction
>  	 * that sets the TF flag".
>  	 */
> -	unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> -
>  	if (unlikely(rflags & X86_EFLAGS_TF)) {
>  		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>  			kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
> @@ -5263,13 +5264,15 @@ restart:
>  		r = EMULATE_DONE;
>  
>  	if (writeback) {
> +		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
>  		toggle_interruptibility(vcpu, ctxt->interruptibility);
> -		kvm_make_request(KVM_REQ_EVENT, vcpu);
>  		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
>  		kvm_rip_write(vcpu, ctxt->eip);
>  		if (r == EMULATE_DONE)
> -			kvm_vcpu_check_singlestep(vcpu, &r);
> -		kvm_set_rflags(vcpu, ctxt->eflags);
> +			kvm_vcpu_check_singlestep(vcpu, rflags, &r);
> +		__kvm_set_rflags(vcpu, ctxt->eflags);
> +		if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
> +			kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	} else
>  		vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
>  
> @@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_get_rflags);
>  
> -void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  {
>  	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
>  	    kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
>  		rflags |= X86_EFLAGS_TF;
>  	kvm_x86_ops->set_rflags(vcpu, rflags);
> +}
> +
> +void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> +{
> +	__kvm_set_rflags(vcpu, rflags);
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_rflags);
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
@ 2014-04-16 22:52   ` Marcelo Tosatti
  2014-04-18  4:19     ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2014-04-16 22:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Thu, Mar 27, 2014 at 12:30:34PM +0100, Paolo Bonzini wrote:
> About 25% of the time spent in emulation of invalid guest state
> is wasted in checking whether emulation is required for the next
> instruction.  However, this almost never changes except when a
> segment register (or TR or LDTR) changes, or when there is a mode
> transition (i.e. CR0 changes).
> 
> In fact, vmx_set_segment and vmx_set_cr0 already modify
> vmx->emulation_required (except that the former for some reason
> uses |= instead of just an assignment).  So there is no need to
> call guest_state_valid in the emulation loop.
> 
> Emulation performance test results indicate 530-870 cycles
> for common instructions, versus 740-1110 before this patch.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1320e0f8e611..73aa522db47b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3629,7 +3629,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
>  	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
>  
>  out:
> -	vmx->emulation_required |= emulation_required(vcpu);
> +	vmx->emulation_required = emulation_required(vcpu);
>  }
>  
>  static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
> @@ -5580,7 +5580,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  	cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
>  	intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
>  
> -	while (!guest_state_valid(vcpu) && count-- != 0) {
> +	while (vmx->emulation_required && count-- != 0) {
>  		if (intr_window_requested && vmx_interrupt_allowed(vcpu))
>  			return handle_interrupt_window(&vmx->vcpu);
>  
> @@ -5614,7 +5614,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  			schedule();
>  	}
>  
> -	vmx->emulation_required = emulation_required(vcpu);
>  out:
>  	return ret;
>  }
> -- 
> 1.8.3.1

How about handling VM-entry error due to invalid state with

vmx->emulation_required = true;
continue to main vcpu loop;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-04-16 22:52   ` Marcelo Tosatti
@ 2014-04-18  4:19     ` Paolo Bonzini
  2014-04-21  2:13       ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-04-18  4:19 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, kvm

Il 16/04/2014 18:52, Marcelo Tosatti ha scritto:
> How about handling VM-entry error due to invalid state with
>
> vmx->emulation_required = true;
> continue to main vcpu loop;

What would reset it to false though?  None of the places that call 
emulation_required() is a hot path right now, and this patch doesn't add 
any.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-04-18  4:19     ` Paolo Bonzini
@ 2014-04-21  2:13       ` Marcelo Tosatti
  2014-04-22  3:25         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2014-04-21  2:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm

On Fri, Apr 18, 2014 at 12:19:28AM -0400, Paolo Bonzini wrote:
> Il 16/04/2014 18:52, Marcelo Tosatti ha scritto:
> >How about handling VM-entry error due to invalid state with
> >
> >vmx->emulation_required = true;
> >continue to main vcpu loop;
> 
> What would reset it to false though?  None of the places that call
> emulation_required() is a hot path right now, and this patch doesn't
> add any.

The same code which resets it to false inside the
handle_invalid_guest_state loop (so you would stop emulating
at the same point as you do with this patch).

Advantage would be that failure to set vmx->emulation_required to
true would not cause VM-entry failure.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
  2014-04-21  2:13       ` Marcelo Tosatti
@ 2014-04-22  3:25         ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-04-22  3:25 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, kvm

Il 20/04/2014 22:13, Marcelo Tosatti ha scritto:
> The same code which resets it to false inside the
> handle_invalid_guest_state loop (so you would stop emulating
> at the same point as you do with this patch).

So (barring any bugs where we fail to set vmx->emulation_required to 
true) the "vmx->emulation_required = true;" on vmentry error would be 
dead code.

> Advantage would be that failure to set vmx->emulation_required to
> true would not cause VM-entry failure.

A place where we fail to set vmx->emulation_required to true is quite 
likely to also be wrong when setting vmx->emulation_required to false. 
Since this is not something that has ever seen much churn, I think it's 
better to code it in a way that shows bugs easily.  The bugs do not 
affect the host anyway.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-22  3:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
2014-04-16 22:52   ` Marcelo Tosatti
2014-04-18  4:19     ` Paolo Bonzini
2014-04-21  2:13       ` Marcelo Tosatti
2014-04-22  3:25         ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
2014-03-28 12:44   ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).