* [RFC PATCH 0/5] KVM: speed up invalid guest state emulation
@ 2014-03-27 11:30 Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
This series identifies some low-hanging fruit for speeding up emulation
of invalid guest state. I am a bit worried about patch 2, while
everything else should be relatively safe.
On the kvm-unit-tests microbenchmarks I get a 1.8-2.5x speedup (from
740-1100 cycles/instruction to 280-600).
This is not for 3.15, of course. Even patch 5 is not too useful without
the others; saving 40 clock cycles is a 10% speedup after the previous
patches, but only 4% before.
Please review carefully!
Paolo
Paolo Bonzini (5):
KVM: vmx: speed up emulation of invalid guest state
KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
KVM: x86: move around some checks
KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"
KVM: x86: speed up emulated moves
arch/x86/include/asm/kvm_emulate.h | 2 +-
arch/x86/kvm/emulate.c | 182 ++++++++++++++++++++-----------------
arch/x86/kvm/vmx.c | 5 +-
arch/x86/kvm/x86.c | 28 ++++--
4 files changed, 120 insertions(+), 97 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
2014-04-16 22:52 ` Marcelo Tosatti
2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction. However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).
In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment). So there is no need to
call guest_state_valid in the emulation loop.
Emulation performance test results indicate 530-870 cycles
for common instructions, versus 740-1110 before this patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/vmx.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1320e0f8e611..73aa522db47b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3629,7 +3629,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
out:
- vmx->emulation_required |= emulation_required(vcpu);
+ vmx->emulation_required = emulation_required(vcpu);
}
static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -5580,7 +5580,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
- while (!guest_state_valid(vcpu) && count-- != 0) {
+ while (vmx->emulation_required && count-- != 0) {
if (intr_window_requested && vmx_interrupt_allowed(vcpu))
return handle_interrupt_window(&vmx->vcpu);
@@ -5614,7 +5614,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
schedule();
}
- vmx->emulation_required = emulation_required(vcpu);
out:
return ret;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
2014-03-28 12:44 ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
Despite the provisions to emulate up to 130 consecutive instructions, in
practice KVM will emulate just one before exiting handle_invalid_guest_state,
because x86_emulate_instructionn always sets KVM_REQ_EVENT.
However, we only need to do this if an interrupt could be injected,
which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
away; b) if the interrupt flag has just been set (because other
instructions than STI can set it without enabling an interrupt shadow).
This cuts another 250-300 clock cycles from the cost of emulating an
instruction (530-870 cycles before the patch on kvm-unit-tests,
290-600 afterwards).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/x86.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd31aada351b..ce9523345f2e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
struct kvm_x86_ops *kvm_x86_ops;
EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
* means that the last instruction is an sti. We should not
* leave the flag on in this case. The same goes for mov ss
*/
- if (!(int_shadow & mask))
+ if (unlikely(int_shadow) && !(int_shadow & mask)) {
kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ }
}
static void inject_emulated_exception(struct kvm_vcpu *vcpu)
@@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
return dr6;
}
-static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
+static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r)
{
struct kvm_run *kvm_run = vcpu->run;
/*
- * Use the "raw" value to see if TF was passed to the processor.
- * Note that the new value of the flags has not been saved yet.
+ * rflags is the old, "raw" value of the flags. The new value has
+ * not been saved yet.
*
* This is correct even for TF set by the guest, because "the
* processor will not generate this exception after the instruction
* that sets the TF flag".
*/
- unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
-
if (unlikely(rflags & X86_EFLAGS_TF)) {
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
@@ -5263,13 +5264,15 @@ restart:
r = EMULATE_DONE;
if (writeback) {
+ unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
- kvm_make_request(KVM_REQ_EVENT, vcpu);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
kvm_rip_write(vcpu, ctxt->eip);
if (r == EMULATE_DONE)
- kvm_vcpu_check_singlestep(vcpu, &r);
- kvm_set_rflags(vcpu, ctxt->eflags);
+ kvm_vcpu_check_singlestep(vcpu, rflags, &r);
+ __kvm_set_rflags(vcpu, ctxt->eflags);
+ if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
} else
vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
@@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_get_rflags);
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
{
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
rflags |= X86_EFLAGS_TF;
kvm_x86_ops->set_rflags(vcpu, rflags);
+}
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+ __kvm_set_rflags(vcpu, rflags);
kvm_make_request(KVM_REQ_EVENT, vcpu);
}
EXPORT_SYMBOL_GPL(kvm_set_rflags);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 3/5] KVM: x86: move around some checks
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini
4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
The only purpose of this patch is to make the next patch simpler
to review. No semantic change.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/emulate.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c9f8f61df46c..9e40fbf94dcd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4347,12 +4347,15 @@ done_prefixes:
ctxt->d |= opcode.flags;
}
+ /* Unrecognised? */
+ if (ctxt->d == 0)
+ return EMULATION_FAILED;
+
ctxt->execute = opcode.u.execute;
ctxt->check_perm = opcode.check_perm;
ctxt->intercept = opcode.intercept;
- /* Unrecognised? */
- if (ctxt->d == 0 || (ctxt->d & NotImpl))
+ if (ctxt->d & NotImpl)
return EMULATION_FAILED;
if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
@@ -4494,19 +4497,19 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
ctxt->mem_read.pos = 0;
- if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
- (ctxt->d & Undefined)) {
+ /* LOCK prefix is allowed only with some instructions */
+ if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
rc = emulate_ud(ctxt);
goto done;
}
- /* LOCK prefix is allowed only with some instructions */
- if (ctxt->lock_prefix && (!(ctxt->d & Lock) || ctxt->dst.type != OP_MEM)) {
+ if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
rc = emulate_ud(ctxt);
goto done;
}
- if ((ctxt->d & SrcMask) == SrcMemFAddr && ctxt->src.type != OP_MEM) {
+ if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+ (ctxt->d & Undefined)) {
rc = emulate_ud(ctxt);
goto done;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())"
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
` (2 preceding siblings ...)
2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini
4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
There are several checks for "peculiar" aspects of instructions in both
x86_decode_insn and x86_emulate_insn. Group them together, and guard
them with a single "if" that lets the processor quickly skip them all.
To do this effectively, add two more flag bits that say whether the
.intercept and .check_perm fields are valid.
This skims about 10 cycles for each emulated instructions, which is
a 2 to 6% improvement.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
To review, use -b.
arch/x86/kvm/emulate.c | 175 ++++++++++++++++++++++++++-----------------------
1 file changed, 94 insertions(+), 81 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9e40fbf94dcd..94974055d906 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -161,6 +161,8 @@
#define Fastop ((u64)1 << 44) /* Use opcode::u.fastop */
#define NoWrite ((u64)1 << 45) /* No writeback */
#define SrcWrite ((u64)1 << 46) /* Write back src operand */
+#define Intercept ((u64)1 << 47) /* Has intercept field */
+#define CheckPerm ((u64)1 << 48) /* Has intercept field */
#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
@@ -3514,9 +3516,9 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
}
#define D(_y) { .flags = (_y) }
-#define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i }
-#define DIP(_y, _i, _p) { .flags = (_y), .intercept = x86_intercept_##_i, \
- .check_perm = (_p) }
+#define DI(_y, _i) { .flags = (_y)|Intercept, .intercept = x86_intercept_##_i }
+#define DIP(_y, _i, _p) { .flags = (_y)|Intercept|CheckPerm, \
+ .intercept = x86_intercept_##_i, .check_perm = (_p) }
#define N D(NotImpl)
#define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
#define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
@@ -3525,10 +3527,10 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
#define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
#define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
#define II(_f, _e, _i) \
- { .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
+ { .flags = (_f)|Intercept, .u.execute = (_e), .intercept = x86_intercept_##_i }
#define IIP(_f, _e, _i, _p) \
- { .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i, \
- .check_perm = (_p) }
+ { .flags = (_f)|Intercept|CheckPerm, .u.execute = (_e), \
+ .intercept = x86_intercept_##_i, .check_perm = (_p) }
#define GP(_f, _g) { .flags = ((_f) | Prefix), .u.gprefix = (_g) }
#define D2bv(_f) D((_f) | ByteOp), D(_f)
@@ -4352,29 +4354,37 @@ done_prefixes:
return EMULATION_FAILED;
ctxt->execute = opcode.u.execute;
- ctxt->check_perm = opcode.check_perm;
- ctxt->intercept = opcode.intercept;
- if (ctxt->d & NotImpl)
- return EMULATION_FAILED;
+ if (unlikely(ctxt->d &
+ (NotImpl|EmulateOnUD|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm))) {
+ /*
+ * These are copied unconditionally here, and checked unconditionally
+ * in x86_emulate_insn.
+ */
+ ctxt->check_perm = opcode.check_perm;
+ ctxt->intercept = opcode.intercept;
- if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
- return EMULATION_FAILED;
+ if (ctxt->d & NotImpl)
+ return EMULATION_FAILED;
- if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
- ctxt->op_bytes = 8;
+ if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
+ return EMULATION_FAILED;
- if (ctxt->d & Op3264) {
- if (mode == X86EMUL_MODE_PROT64)
+ if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
ctxt->op_bytes = 8;
- else
- ctxt->op_bytes = 4;
- }
- if (ctxt->d & Sse)
- ctxt->op_bytes = 16;
- else if (ctxt->d & Mmx)
- ctxt->op_bytes = 8;
+ if (ctxt->d & Op3264) {
+ if (mode == X86EMUL_MODE_PROT64)
+ ctxt->op_bytes = 8;
+ else
+ ctxt->op_bytes = 4;
+ }
+
+ if (ctxt->d & Sse)
+ ctxt->op_bytes = 16;
+ else if (ctxt->d & Mmx)
+ ctxt->op_bytes = 8;
+ }
/* ModRM and SIB bytes. */
if (ctxt->d & ModRM) {
@@ -4508,75 +4518,78 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}
- if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
- (ctxt->d & Undefined)) {
- rc = emulate_ud(ctxt);
- goto done;
- }
-
- if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
- || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
- rc = emulate_ud(ctxt);
- goto done;
- }
-
- if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
- rc = emulate_nm(ctxt);
- goto done;
- }
+ if (unlikely(ctxt->d &
+ (No64|Undefined|Sse|Mmx|Intercept|CheckPerm|Priv|Prot|String))) {
+ if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
+ (ctxt->d & Undefined)) {
+ rc = emulate_ud(ctxt);
+ goto done;
+ }
- if (ctxt->d & Mmx) {
- rc = flush_pending_x87_faults(ctxt);
- if (rc != X86EMUL_CONTINUE)
+ if (((ctxt->d & (Sse|Mmx)) && ((ops->get_cr(ctxt, 0) & X86_CR0_EM)))
+ || ((ctxt->d & Sse) && !(ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))) {
+ rc = emulate_ud(ctxt);
goto done;
- /*
- * Now that we know the fpu is exception safe, we can fetch
- * operands from it.
- */
- fetch_possible_mmx_operand(ctxt, &ctxt->src);
- fetch_possible_mmx_operand(ctxt, &ctxt->src2);
- if (!(ctxt->d & Mov))
- fetch_possible_mmx_operand(ctxt, &ctxt->dst);
- }
+ }
- if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
- rc = emulator_check_intercept(ctxt, ctxt->intercept,
- X86_ICPT_PRE_EXCEPT);
- if (rc != X86EMUL_CONTINUE)
+ if ((ctxt->d & (Sse|Mmx)) && (ops->get_cr(ctxt, 0) & X86_CR0_TS)) {
+ rc = emulate_nm(ctxt);
goto done;
- }
+ }
- /* Privileged instruction can be executed only in CPL=0 */
- if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
- rc = emulate_gp(ctxt, 0);
- goto done;
- }
+ if (ctxt->d & Mmx) {
+ rc = flush_pending_x87_faults(ctxt);
+ if (rc != X86EMUL_CONTINUE)
+ goto done;
+ /*
+ * Now that we know the fpu is exception safe, we can fetch
+ * operands from it.
+ */
+ fetch_possible_mmx_operand(ctxt, &ctxt->src);
+ fetch_possible_mmx_operand(ctxt, &ctxt->src2);
+ if (!(ctxt->d & Mov))
+ fetch_possible_mmx_operand(ctxt, &ctxt->dst);
+ }
- /* Instruction can only be executed in protected mode */
- if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
- rc = emulate_ud(ctxt);
- goto done;
- }
+ if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+ rc = emulator_check_intercept(ctxt, ctxt->intercept,
+ X86_ICPT_PRE_EXCEPT);
+ if (rc != X86EMUL_CONTINUE)
+ goto done;
+ }
- /* Do instruction specific permission checks */
- if (ctxt->check_perm) {
- rc = ctxt->check_perm(ctxt);
- if (rc != X86EMUL_CONTINUE)
+ /* Privileged instruction can be executed only in CPL=0 */
+ if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
+ rc = emulate_gp(ctxt, 0);
goto done;
- }
+ }
- if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
- rc = emulator_check_intercept(ctxt, ctxt->intercept,
- X86_ICPT_POST_EXCEPT);
- if (rc != X86EMUL_CONTINUE)
+ /* Instruction can only be executed in protected mode */
+ if ((ctxt->d & Prot) && ctxt->mode < X86EMUL_MODE_PROT16) {
+ rc = emulate_ud(ctxt);
goto done;
- }
+ }
- if (ctxt->rep_prefix && (ctxt->d & String)) {
- /* All REP prefixes have the same first termination condition */
- if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
- ctxt->eip = ctxt->_eip;
- goto done;
+ /* Do instruction specific permission checks */
+ if (ctxt->check_perm) {
+ rc = ctxt->check_perm(ctxt);
+ if (rc != X86EMUL_CONTINUE)
+ goto done;
+ }
+
+ if (unlikely(ctxt->guest_mode) && ctxt->intercept) {
+ rc = emulator_check_intercept(ctxt, ctxt->intercept,
+ X86_ICPT_POST_EXCEPT);
+ if (rc != X86EMUL_CONTINUE)
+ goto done;
+ }
+
+ if (ctxt->rep_prefix && (ctxt->d & String)) {
+ /* All REP prefixes have the same first termination condition */
+ if (address_mask(ctxt, reg_read(ctxt, VCPU_REGS_RCX)) == 0) {
+ ctxt->eip = ctxt->_eip;
+ goto done;
+ }
}
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 5/5] KVM: x86: speed up emulated moves
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
` (3 preceding siblings ...)
2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
@ 2014-03-27 11:30 ` Paolo Bonzini
4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-27 11:30 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.
Avoiding a call to memcpy (the compiler optimizes it out) gains about 50
cycles on kvm-unit-tests for register-to-register moves, and makes them
about as fast as arithmetic instructions.
We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/kvm_emulate.h | 2 +-
arch/x86/kvm/emulate.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 24ec1216596e..f7b1e45eb753 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -232,7 +232,7 @@ struct operand {
union {
unsigned long val;
u64 val64;
- char valptr[sizeof(unsigned long) + 2];
+ char valptr[sizeof(sse128_t)];
sse128_t vec_val;
u64 mm_val;
void *data;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 94974055d906..4a3584d419e5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2955,7 +2955,7 @@ static int em_rdpmc(struct x86_emulate_ctxt *ctxt)
static int em_mov(struct x86_emulate_ctxt *ctxt)
{
- memcpy(ctxt->dst.valptr, ctxt->src.valptr, ctxt->op_bytes);
+ memcpy(ctxt->dst.valptr, ctxt->src.valptr, sizeof(ctxt->src.valptr));
return X86EMUL_CONTINUE;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation
2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
@ 2014-03-28 12:44 ` Paolo Bonzini
0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-03-28 12:44 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm
Il 27/03/2014 12:30, Paolo Bonzini ha scritto:
> Despite the provisions to emulate up to 130 consecutive instructions, in
> practice KVM will emulate just one before exiting handle_invalid_guest_state,
> because x86_emulate_instructionn always sets KVM_REQ_EVENT.
>
> However, we only need to do this if an interrupt could be injected,
> which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
> away; b) if the interrupt flag has just been set (because other
> instructions than STI can set it without enabling an interrupt shadow).
>
> This cuts another 250-300 clock cycles from the cost of emulating an
> instruction (530-870 cycles before the patch on kvm-unit-tests,
> 290-600 afterwards).
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/x86.c | 28 ++++++++++++++++++----------
> 1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd31aada351b..ce9523345f2e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -87,6 +87,7 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>
> static void update_cr8_intercept(struct kvm_vcpu *vcpu);
> static void process_nmi(struct kvm_vcpu *vcpu);
> +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
>
> struct kvm_x86_ops *kvm_x86_ops;
> EXPORT_SYMBOL_GPL(kvm_x86_ops);
> @@ -4856,8 +4857,10 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
> * means that the last instruction is an sti. We should not
> * leave the flag on in this case. The same goes for mov ss
> */
> - if (!(int_shadow & mask))
> + if (unlikely(int_shadow) && !(int_shadow & mask)) {
> kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
> + kvm_make_request(KVM_REQ_EVENT, vcpu);
> + }
Better:
* means that the last instruction is an sti. We should not
* leave the flag on in this case. The same goes for mov ss
*/
- if (!(int_shadow & mask))
+ mask &= ~int_shadow;
+ if (unlikely(mask != int_shadow))
kvm_x86_ops->set_interrupt_shadow(vcpu, mask);
+
+ /*
+ * The interrupt window might have opened if a bit has been cleared.
+ */
+ if (unlikely(int_shadow & ~mask))
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
Paolo
> }
>
> static void inject_emulated_exception(struct kvm_vcpu *vcpu)
> @@ -5083,20 +5086,18 @@ static int kvm_vcpu_check_hw_bp(unsigned long addr, u32 type, u32 dr7,
> return dr6;
> }
>
> -static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, int *r)
> +static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflags, int *r)
> {
> struct kvm_run *kvm_run = vcpu->run;
>
> /*
> - * Use the "raw" value to see if TF was passed to the processor.
> - * Note that the new value of the flags has not been saved yet.
> + * rflags is the old, "raw" value of the flags. The new value has
> + * not been saved yet.
> *
> * This is correct even for TF set by the guest, because "the
> * processor will not generate this exception after the instruction
> * that sets the TF flag".
> */
> - unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> -
> if (unlikely(rflags & X86_EFLAGS_TF)) {
> if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> kvm_run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
> @@ -5263,13 +5264,15 @@ restart:
> r = EMULATE_DONE;
>
> if (writeback) {
> + unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> toggle_interruptibility(vcpu, ctxt->interruptibility);
> - kvm_make_request(KVM_REQ_EVENT, vcpu);
> vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
> kvm_rip_write(vcpu, ctxt->eip);
> if (r == EMULATE_DONE)
> - kvm_vcpu_check_singlestep(vcpu, &r);
> - kvm_set_rflags(vcpu, ctxt->eflags);
> + kvm_vcpu_check_singlestep(vcpu, rflags, &r);
> + __kvm_set_rflags(vcpu, ctxt->eflags);
> + if (unlikely((ctxt->eflags & ~rflags) & X86_EFLAGS_IF))
> + kvm_make_request(KVM_REQ_EVENT, vcpu);
> } else
> vcpu->arch.emulate_regs_need_sync_to_vcpu = true;
>
> @@ -7385,12 +7388,17 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
> }
> EXPORT_SYMBOL_GPL(kvm_get_rflags);
>
> -void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> +static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> {
> if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
> kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
> rflags |= X86_EFLAGS_TF;
> kvm_x86_ops->set_rflags(vcpu, rflags);
> +}
> +
> +void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
> +{
> + __kvm_set_rflags(vcpu, rflags);
> kvm_make_request(KVM_REQ_EVENT, vcpu);
> }
> EXPORT_SYMBOL_GPL(kvm_set_rflags);
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
@ 2014-04-16 22:52 ` Marcelo Tosatti
2014-04-18 4:19 ` Paolo Bonzini
0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2014-04-16 22:52 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: linux-kernel, kvm
On Thu, Mar 27, 2014 at 12:30:34PM +0100, Paolo Bonzini wrote:
> About 25% of the time spent in emulation of invalid guest state
> is wasted in checking whether emulation is required for the next
> instruction. However, this almost never changes except when a
> segment register (or TR or LDTR) changes, or when there is a mode
> transition (i.e. CR0 changes).
>
> In fact, vmx_set_segment and vmx_set_cr0 already modify
> vmx->emulation_required (except that the former for some reason
> uses |= instead of just an assignment). So there is no need to
> call guest_state_valid in the emulation loop.
>
> Emulation performance test results indicate 530-870 cycles
> for common instructions, versus 740-1110 before this patch.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1320e0f8e611..73aa522db47b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3629,7 +3629,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
> vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
>
> out:
> - vmx->emulation_required |= emulation_required(vcpu);
> + vmx->emulation_required = emulation_required(vcpu);
> }
>
> static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
> @@ -5580,7 +5580,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
> intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
>
> - while (!guest_state_valid(vcpu) && count-- != 0) {
> + while (vmx->emulation_required && count-- != 0) {
> if (intr_window_requested && vmx_interrupt_allowed(vcpu))
> return handle_interrupt_window(&vmx->vcpu);
>
> @@ -5614,7 +5614,6 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> schedule();
> }
>
> - vmx->emulation_required = emulation_required(vcpu);
> out:
> return ret;
> }
> --
> 1.8.3.1
How about handling VM-entry error due to invalid state with
vmx->emulation_required = true;
continue to main vcpu loop;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
2014-04-16 22:52 ` Marcelo Tosatti
@ 2014-04-18 4:19 ` Paolo Bonzini
2014-04-21 2:13 ` Marcelo Tosatti
0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-04-18 4:19 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-kernel, kvm
Il 16/04/2014 18:52, Marcelo Tosatti ha scritto:
> How about handling VM-entry error due to invalid state with
>
> vmx->emulation_required = true;
> continue to main vcpu loop;
What would reset it to false though? None of the places that call
emulation_required() is a hot path right now, and this patch doesn't add
any.
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
2014-04-18 4:19 ` Paolo Bonzini
@ 2014-04-21 2:13 ` Marcelo Tosatti
2014-04-22 3:25 ` Paolo Bonzini
0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2014-04-21 2:13 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: linux-kernel, kvm
On Fri, Apr 18, 2014 at 12:19:28AM -0400, Paolo Bonzini wrote:
> Il 16/04/2014 18:52, Marcelo Tosatti ha scritto:
> >How about handling VM-entry error due to invalid state with
> >
> >vmx->emulation_required = true;
> >continue to main vcpu loop;
>
> What would reset it to false though? None of the places that call
> emulation_required() is a hot path right now, and this patch doesn't
> add any.
The same code which resets it to false inside the
handle_invalid_guest_state loop (so you would stop emulating
at the same point as you do with this patch).
Advantage would be that failure to set vmx->emulation_required to
true would not cause VM-entry failure.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state
2014-04-21 2:13 ` Marcelo Tosatti
@ 2014-04-22 3:25 ` Paolo Bonzini
0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-04-22 3:25 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-kernel, kvm
Il 20/04/2014 22:13, Marcelo Tosatti ha scritto:
> The same code which resets it to false inside the
> handle_invalid_guest_state loop (so you would stop emulating
> at the same point as you do with this patch).
So (barring any bugs where we fail to set vmx->emulation_required to
true) the "vmx->emulation_required = true;" on vmentry error would be
dead code.
> Advantage would be that failure to set vmx->emulation_required to
> true would not cause VM-entry failure.
A place where we fail to set vmx->emulation_required to true is quite
likely to also be wrong when setting vmx->emulation_required to false.
Since this is not something that has ever seen much churn, I think it's
better to code it in a way that shows bugs easily. The bugs do not
affect the host anyway.
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-04-22 3:26 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-27 11:30 [RFC PATCH 0/5] KVM: speed up invalid guest state emulation Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 1/5] KVM: vmx: speed up emulation of invalid guest state Paolo Bonzini
2014-04-16 22:52 ` Marcelo Tosatti
2014-04-18 4:19 ` Paolo Bonzini
2014-04-21 2:13 ` Marcelo Tosatti
2014-04-22 3:25 ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 2/5] KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Paolo Bonzini
2014-03-28 12:44 ` Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 3/5] KVM: x86: move around some checks Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 4/5] KVM: x86: protect checks on ctxt->d by a common "if (unlikely())" Paolo Bonzini
2014-03-27 11:30 ` [RFC PATCH 5/5] KVM: x86: speed up emulated moves Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).