kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/6] more VMX real mode emulation fixes
@ 2012-12-20 14:57 Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function Gleb Natapov
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 14:57 UTC (permalink / raw)
  To: kvm; +Cc: mtosatti

This series goes on top of my previous one: "Fix
emulate_invalid_guest_state=0 part 2". It does not only fixes bugs,
but also does a nice cleanup of VMX real mode emulation. All real mode
segment register mangling is now contained in fix_rmode_seg() function.

Changelog:
  v1 -> v2:
    - emulate_invalid_guest_state=0 broke again. Fix it.
    - additional patch to handle IO during emulation caused by #GP

Gleb Natapov (6):
  KVM: emulator: drop RPL check from linearize() function
  KVM: emulator: implement fninit, fnstsw, fnstcw
  KVM: VMX: make rmode_segment_valid() more strict.
  KVM: VMX: fix emulation of invalid guest state.
  KVM: VMX: Do not fix segment register during vcpu initialization.
  KVM: VMX: handle IO when emulation is due to #GP in real mode.

 arch/x86/kvm/emulate.c |  133 +++++++++++++++++++++++++++--
 arch/x86/kvm/vmx.c     |  219 +++++++++++++++++++++++++-----------------------
 2 files changed, 241 insertions(+), 111 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
@ 2012-12-20 14:57 ` Gleb Natapov
  2013-02-11 16:58   ` Jan Kiszka
  2012-12-20 14:57 ` [PATCHv2 2/6] KVM: emulator: implement fninit, fnstsw, fnstcw Gleb Natapov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 14:57 UTC (permalink / raw)
  To: kvm; +Cc: mtosatti

According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
"Privilege Level Checking When Accessing Data Segments" RPL checking is
done during loading of a segment selector, not during data access. We
already do checking during segment selector loading, so drop the check
during data access. Checking RPL during data access triggers #GP if
after transition from real mode to protected mode RPL bits in a segment
selector are set.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/emulate.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c7547b3..a3d31e3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -665,7 +665,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
 	ulong la;
 	u32 lim;
 	u16 sel;
-	unsigned cpl, rpl;
+	unsigned cpl;
 
 	la = seg_base(ctxt, addr.seg) + addr.ea;
 	switch (ctxt->mode) {
@@ -699,11 +699,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
 				goto bad;
 		}
 		cpl = ctxt->ops->cpl(ctxt);
-		if (ctxt->mode == X86EMUL_MODE_REAL)
-			rpl = 0;
-		else
-			rpl = sel & 3;
-		cpl = max(cpl, rpl);
 		if (!(desc.type & 8)) {
 			/* data segment */
 			if (cpl > desc.dpl)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv2 2/6] KVM: emulator: implement fninit, fnstsw, fnstcw
  2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function Gleb Natapov
@ 2012-12-20 14:57 ` Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 3/6] KVM: VMX: make rmode_segment_valid() more strict Gleb Natapov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 14:57 UTC (permalink / raw)
  To: kvm; +Cc: mtosatti


Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/emulate.c |  126 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a3d31e3..53c5ad6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -115,6 +115,7 @@
 #define GroupDual   (2<<15)     /* Alternate decoding of mod == 3 */
 #define Prefix      (3<<15)     /* Instruction varies with 66/f2/f3 prefix */
 #define RMExt       (4<<15)     /* Opcode extension in ModRM r/m if mod == 3 */
+#define Escape      (5<<15)     /* Escape to coprocessor instruction */
 #define Sse         (1<<18)     /* SSE Vector instruction */
 /* Generic ModRM decode. */
 #define ModRM       (1<<19)
@@ -166,6 +167,7 @@ struct opcode {
 		const struct opcode *group;
 		const struct group_dual *gdual;
 		const struct gprefix *gprefix;
+		const struct escape *esc;
 	} u;
 	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
 };
@@ -182,6 +184,11 @@ struct gprefix {
 	struct opcode pfx_f3;
 };
 
+struct escape {
+	struct opcode op[8];
+	struct opcode high[64];
+};
+
 /* EFLAGS bit definitions. */
 #define EFLG_ID (1<<21)
 #define EFLG_VIP (1<<20)
@@ -991,6 +998,53 @@ static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
 	ctxt->ops->put_fpu(ctxt);
 }
 
+static int em_fninit(struct x86_emulate_ctxt *ctxt)
+{
+	if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
+		return emulate_nm(ctxt);
+
+	ctxt->ops->get_fpu(ctxt);
+	asm volatile("fninit");
+	ctxt->ops->put_fpu(ctxt);
+	return X86EMUL_CONTINUE;
+}
+
+static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
+{
+	u16 fcw;
+
+	if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
+		return emulate_nm(ctxt);
+
+	ctxt->ops->get_fpu(ctxt);
+	asm volatile("fnstcw %0": "+m"(fcw));
+	ctxt->ops->put_fpu(ctxt);
+
+	/* force 2 byte destination */
+	ctxt->dst.bytes = 2;
+	ctxt->dst.val = fcw;
+
+	return X86EMUL_CONTINUE;
+}
+
+static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
+{
+	u16 fsw;
+
+	if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
+		return emulate_nm(ctxt);
+
+	ctxt->ops->get_fpu(ctxt);
+	asm volatile("fnstsw %0": "+m"(fsw));
+	ctxt->ops->put_fpu(ctxt);
+
+	/* force 2 byte destination */
+	ctxt->dst.bytes = 2;
+	ctxt->dst.val = fsw;
+
+	return X86EMUL_CONTINUE;
+}
+
 static void decode_register_operand(struct x86_emulate_ctxt *ctxt,
 				    struct operand *op)
 {
@@ -3590,6 +3644,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
 #define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
 #define G(_f, _g) { .flags = ((_f) | Group | ModRM), .u.group = (_g) }
 #define GD(_f, _g) { .flags = ((_f) | GroupDual | ModRM), .u.gdual = (_g) }
+#define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) }
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
 #define II(_f, _e, _i) \
 	{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
@@ -3725,6 +3780,69 @@ static const struct gprefix pfx_vmovntpx = {
 	I(0, em_mov), N, N, N,
 };
 
+static const struct escape escape_d9 = { {
+	N, N, N, N, N, N, N, I(DstMem, em_fnstcw),
+}, {
+	/* 0xC0 - 0xC7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xC8 - 0xCF */
+	N, N, N, N, N, N, N, N,
+	/* 0xD0 - 0xC7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xD8 - 0xDF */
+	N, N, N, N, N, N, N, N,
+	/* 0xE0 - 0xE7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xE8 - 0xEF */
+	N, N, N, N, N, N, N, N,
+	/* 0xF0 - 0xF7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xF8 - 0xFF */
+	N, N, N, N, N, N, N, N,
+} };
+
+static const struct escape escape_db = { {
+	N, N, N, N, N, N, N, N,
+}, {
+	/* 0xC0 - 0xC7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xC8 - 0xCF */
+	N, N, N, N, N, N, N, N,
+	/* 0xD0 - 0xC7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xD8 - 0xDF */
+	N, N, N, N, N, N, N, N,
+	/* 0xE0 - 0xE7 */
+	N, N, N, I(ImplicitOps, em_fninit), N, N, N, N,
+	/* 0xE8 - 0xEF */
+	N, N, N, N, N, N, N, N,
+	/* 0xF0 - 0xF7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xF8 - 0xFF */
+	N, N, N, N, N, N, N, N,
+} };
+
+static const struct escape escape_dd = { {
+	N, N, N, N, N, N, N, I(DstMem, em_fnstsw),
+}, {
+	/* 0xC0 - 0xC7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xC8 - 0xCF */
+	N, N, N, N, N, N, N, N,
+	/* 0xD0 - 0xC7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xD8 - 0xDF */
+	N, N, N, N, N, N, N, N,
+	/* 0xE0 - 0xE7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xE8 - 0xEF */
+	N, N, N, N, N, N, N, N,
+	/* 0xF0 - 0xF7 */
+	N, N, N, N, N, N, N, N,
+	/* 0xF8 - 0xFF */
+	N, N, N, N, N, N, N, N,
+} };
+
 static const struct opcode opcode_table[256] = {
 	/* 0x00 - 0x07 */
 	I6ALU(Lock, em_add),
@@ -3821,7 +3939,7 @@ static const struct opcode opcode_table[256] = {
 	D2bv(DstMem | SrcOne | ModRM), D2bv(DstMem | ModRM),
 	N, I(DstAcc | SrcImmByte | No64, em_aad), N, N,
 	/* 0xD8 - 0xDF */
-	N, N, N, N, N, N, N, N,
+	N, E(0, &escape_d9), N, E(0, &escape_db), N, E(0, &escape_dd), N, N,
 	/* 0xE0 - 0xE7 */
 	X3(I(SrcImmByte, em_loop)),
 	I(SrcImmByte, em_jcxz),
@@ -4246,6 +4364,12 @@ done_prefixes:
 			case 0xf3: opcode = opcode.u.gprefix->pfx_f3; break;
 			}
 			break;
+		case Escape:
+			if (ctxt->modrm > 0xbf)
+				opcode = opcode.u.esc->high[ctxt->modrm - 0xc0];
+			else
+				opcode = opcode.u.esc->op[(ctxt->modrm >> 3) & 7];
+			break;
 		default:
 			return EMULATION_FAILED;
 		}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv2 3/6] KVM: VMX: make rmode_segment_valid() more strict.
  2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 2/6] KVM: emulator: implement fninit, fnstsw, fnstcw Gleb Natapov
@ 2012-12-20 14:57 ` Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 4/6] KVM: VMX: fix emulation of invalid guest state Gleb Natapov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 14:57 UTC (permalink / raw)
  To: kvm; +Cc: mtosatti

Currently it allows entering vm86 mode if segment limit is greater than
0xffff and db bit is set. Both of those can cause incorrect execution of
instruction by cpu since in vm86 mode limit will be set to 0xffff and db
will be forced to 0.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 23d5aec..7ebcac2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3341,15 +3341,13 @@ static bool rmode_segment_valid(struct kvm_vcpu *vcpu, int seg)
 
 	vmx_get_segment(vcpu, &var, seg);
 	var.dpl = 0x3;
-	var.g = 0;
-	var.db = 0;
 	if (seg == VCPU_SREG_CS)
 		var.type = 0x3;
 	ar = vmx_segment_access_rights(&var);
 
 	if (var.base != (var.selector << 4))
 		return false;
-	if (var.limit < 0xffff)
+	if (var.limit != 0xffff)
 		return false;
 	if (ar != 0xf3)
 		return false;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv2 4/6] KVM: VMX: fix emulation of invalid guest state.
  2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
                   ` (2 preceding siblings ...)
  2012-12-20 14:57 ` [PATCHv2 3/6] KVM: VMX: make rmode_segment_valid() more strict Gleb Natapov
@ 2012-12-20 14:57 ` Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 5/6] KVM: VMX: Do not fix segment register during vcpu initialization Gleb Natapov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 14:57 UTC (permalink / raw)
  To: kvm; +Cc: mtosatti

Currently when emulation of invalid guest state is enable
(emulate_invalid_guest_state=1) segment registers are still fixed for
entry to vm86 mode some times. Segment register fixing is avoided in
enter_rmode(), but vmx_set_segment() still does it unconditionally.
The patch fixes it.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |  122 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 68 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7ebcac2..9dff310 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -624,6 +624,8 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
 			    struct kvm_segment *var, int seg);
 static void vmx_get_segment(struct kvm_vcpu *vcpu,
 			    struct kvm_segment *var, int seg);
+static bool guest_state_valid(struct kvm_vcpu *vcpu);
+static u32 vmx_segment_access_rights(struct kvm_segment *var);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -2758,18 +2760,23 @@ static __exit void hardware_unsetup(void)
 	free_kvm_area();
 }
 
-static void fix_pmode_dataseg(struct kvm_vcpu *vcpu, int seg, struct kvm_segment *save)
+static void fix_pmode_dataseg(struct kvm_vcpu *vcpu, int seg,
+		struct kvm_segment *save)
 {
-	const struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
-	struct kvm_segment tmp = *save;
-
-	if (!(vmcs_readl(sf->base) == tmp.base && tmp.s)) {
-		tmp.base = vmcs_readl(sf->base);
-		tmp.selector = vmcs_read16(sf->selector);
-		tmp.dpl = tmp.selector & SELECTOR_RPL_MASK;
-		tmp.s = 1;
+	if (!emulate_invalid_guest_state) {
+		/*
+		 * CS and SS RPL should be equal during guest entry according
+		 * to VMX spec, but in reality it is not always so. Since vcpu
+		 * is in the middle of the transition from real mode to
+		 * protected mode it is safe to assume that RPL 0 is a good
+		 * default value.
+		 */
+		if (seg == VCPU_SREG_CS || seg == VCPU_SREG_SS)
+			save->selector &= ~SELECTOR_RPL_MASK;
+		save->dpl = save->selector & SELECTOR_RPL_MASK;
+		save->s = 1;
 	}
-	vmx_set_segment(vcpu, &tmp, seg);
+	vmx_set_segment(vcpu, save, seg);
 }
 
 static void enter_pmode(struct kvm_vcpu *vcpu)
@@ -2777,6 +2784,17 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
 	unsigned long flags;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	/*
+	 * Update real mode segment cache. It may be not up-to-date if sement
+	 * register was written while vcpu was in a guest mode.
+	 */
+	vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_ES], VCPU_SREG_ES);
+	vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_DS], VCPU_SREG_DS);
+	vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_FS], VCPU_SREG_FS);
+	vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_GS], VCPU_SREG_GS);
+	vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_SS], VCPU_SREG_SS);
+	vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_CS], VCPU_SREG_CS);
+
 	vmx->emulation_required = 1;
 	vmx->rmode.vm86_active = 0;
 
@@ -2794,22 +2812,12 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
 
 	update_exception_bitmap(vcpu);
 
-	if (emulate_invalid_guest_state)
-		return;
-
+	fix_pmode_dataseg(vcpu, VCPU_SREG_CS, &vmx->rmode.segs[VCPU_SREG_CS]);
+	fix_pmode_dataseg(vcpu, VCPU_SREG_SS, &vmx->rmode.segs[VCPU_SREG_SS]);
 	fix_pmode_dataseg(vcpu, VCPU_SREG_ES, &vmx->rmode.segs[VCPU_SREG_ES]);
 	fix_pmode_dataseg(vcpu, VCPU_SREG_DS, &vmx->rmode.segs[VCPU_SREG_DS]);
 	fix_pmode_dataseg(vcpu, VCPU_SREG_FS, &vmx->rmode.segs[VCPU_SREG_FS]);
 	fix_pmode_dataseg(vcpu, VCPU_SREG_GS, &vmx->rmode.segs[VCPU_SREG_GS]);
-
-	vmx_segment_cache_clear(vmx);
-
-	vmcs_write16(GUEST_SS_SELECTOR, 0);
-	vmcs_write32(GUEST_SS_AR_BYTES, 0x93);
-
-	vmcs_write16(GUEST_CS_SELECTOR,
-		     vmcs_read16(GUEST_CS_SELECTOR) & ~SELECTOR_RPL_MASK);
-	vmcs_write32(GUEST_CS_AR_BYTES, 0x9b);
 }
 
 static gva_t rmode_tss_base(struct kvm *kvm)
@@ -2831,22 +2839,40 @@ static gva_t rmode_tss_base(struct kvm *kvm)
 static void fix_rmode_seg(int seg, struct kvm_segment *save)
 {
 	const struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
+	struct kvm_segment var = *save;
 
-	vmcs_write16(sf->selector, save->base >> 4);
-	vmcs_write32(sf->base, save->base & 0xffff0);
-	vmcs_write32(sf->limit, 0xffff);
-	vmcs_write32(sf->ar_bytes, 0xf3);
-	if (save->base & 0xf)
-		printk_once(KERN_WARNING "kvm: segment base is not paragraph"
-			    " aligned when entering protected mode (seg=%d)",
-			    seg);
+	var.dpl = 0x3;
+	if (seg == VCPU_SREG_CS)
+		var.type = 0x3;
+
+	if (!emulate_invalid_guest_state) {
+		var.selector = var.base >> 4;
+		var.base = var.base & 0xffff0;
+		var.limit = 0xffff;
+		var.g = 0;
+		var.db = 0;
+		var.present = 1;
+		var.s = 1;
+		var.l = 0;
+		var.unusable = 0;
+		var.type = 0x3;
+		var.avl = 0;
+		if (save->base & 0xf)
+			printk_once(KERN_WARNING "kvm: segment base is not "
+					"paragraph aligned when entering "
+					"protected mode (seg=%d)", seg);
+	}
+
+	vmcs_write16(sf->selector, var.selector);
+	vmcs_write32(sf->base, var.base);
+	vmcs_write32(sf->limit, var.limit);
+	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(&var));
 }
 
 static void enter_rmode(struct kvm_vcpu *vcpu)
 {
 	unsigned long flags;
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	struct kvm_segment var;
 
 	if (enable_unrestricted_guest)
 		return;
@@ -2862,7 +2888,6 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
 	vmx->emulation_required = 1;
 	vmx->rmode.vm86_active = 1;
 
-
 	/*
 	 * Very old userspace does not call KVM_SET_TSS_ADDR before entering
 	 * vcpu. Call it here with phys address pointing 16M below 4G.
@@ -2890,28 +2915,13 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
 	vmcs_writel(GUEST_CR4, vmcs_readl(GUEST_CR4) | X86_CR4_VME);
 	update_exception_bitmap(vcpu);
 
-	if (emulate_invalid_guest_state)
-		goto continue_rmode;
-
-	vmx_get_segment(vcpu, &var, VCPU_SREG_SS);
-	vmx_set_segment(vcpu, &var, VCPU_SREG_SS);
+	fix_rmode_seg(VCPU_SREG_SS, &vmx->rmode.segs[VCPU_SREG_SS]);
+	fix_rmode_seg(VCPU_SREG_CS, &vmx->rmode.segs[VCPU_SREG_CS]);
+	fix_rmode_seg(VCPU_SREG_ES, &vmx->rmode.segs[VCPU_SREG_ES]);
+	fix_rmode_seg(VCPU_SREG_DS, &vmx->rmode.segs[VCPU_SREG_DS]);
+	fix_rmode_seg(VCPU_SREG_GS, &vmx->rmode.segs[VCPU_SREG_GS]);
+	fix_rmode_seg(VCPU_SREG_FS, &vmx->rmode.segs[VCPU_SREG_FS]);
 
-	vmx_get_segment(vcpu, &var, VCPU_SREG_CS);
-	vmx_set_segment(vcpu, &var, VCPU_SREG_CS);
-
-	vmx_get_segment(vcpu, &var, VCPU_SREG_ES);
-	vmx_set_segment(vcpu, &var, VCPU_SREG_ES);
-
-	vmx_get_segment(vcpu, &var, VCPU_SREG_DS);
-	vmx_set_segment(vcpu, &var, VCPU_SREG_DS);
-
-	vmx_get_segment(vcpu, &var, VCPU_SREG_GS);
-	vmx_set_segment(vcpu, &var, VCPU_SREG_GS);
-
-	vmx_get_segment(vcpu, &var, VCPU_SREG_FS);
-	vmx_set_segment(vcpu, &var, VCPU_SREG_FS);
-
-continue_rmode:
 	kvm_mmu_reset_context(vcpu);
 }
 
@@ -3278,7 +3288,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
 			vmcs_write16(sf->selector, var->selector);
 		else if (var->s)
 			fix_rmode_seg(seg, &vmx->rmode.segs[seg]);
-		return;
+		goto out;
 	}
 
 	vmcs_writel(sf->base, var->base);
@@ -3300,6 +3310,10 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
 		var->type |= 0x1; /* Accessed */
 
 	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
+
+out:
+	if (!vmx->emulation_required)
+		vmx->emulation_required = !guest_state_valid(vcpu);
 }
 
 static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv2 5/6] KVM: VMX: Do not fix segment register during vcpu initialization.
  2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
                   ` (3 preceding siblings ...)
  2012-12-20 14:57 ` [PATCHv2 4/6] KVM: VMX: fix emulation of invalid guest state Gleb Natapov
@ 2012-12-20 14:57 ` Gleb Natapov
  2012-12-20 14:57 ` [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode Gleb Natapov
  2013-01-03 12:53 ` [PATCHv2 0/6] more VMX real mode emulation fixes Marcelo Tosatti
  6 siblings, 0 replies; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 14:57 UTC (permalink / raw)
  To: kvm; +Cc: mtosatti

Segment registers will be fixed according to current emulation policy
during switching to real mode for the first time.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |   18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9dff310..a101dd4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3621,12 +3621,9 @@ static void seg_setup(int seg)
 	vmcs_write16(sf->selector, 0);
 	vmcs_writel(sf->base, 0);
 	vmcs_write32(sf->limit, 0xffff);
-	if (enable_unrestricted_guest) {
-		ar = 0x93;
-		if (seg == VCPU_SREG_CS)
-			ar |= 0x08; /* code segment */
-	} else
-		ar = 0xf3;
+	ar = 0x93;
+	if (seg == VCPU_SREG_CS)
+		ar |= 0x08; /* code segment */
 
 	vmcs_write32(sf->ar_bytes, ar);
 }
@@ -3967,14 +3964,9 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 	vmx_segment_cache_clear(vmx);
 
 	seg_setup(VCPU_SREG_CS);
-	/*
-	 * GUEST_CS_BASE should really be 0xffff0000, but VT vm86 mode
-	 * insists on having GUEST_CS_BASE == GUEST_CS_SELECTOR << 4.  Sigh.
-	 */
-	if (kvm_vcpu_is_bsp(&vmx->vcpu)) {
+	if (kvm_vcpu_is_bsp(&vmx->vcpu))
 		vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
-		vmcs_writel(GUEST_CS_BASE, 0x000f0000);
-	} else {
+	else {
 		vmcs_write16(GUEST_CS_SELECTOR, vmx->vcpu.arch.sipi_vector << 8);
 		vmcs_writel(GUEST_CS_BASE, vmx->vcpu.arch.sipi_vector << 12);
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.
  2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
                   ` (4 preceding siblings ...)
  2012-12-20 14:57 ` [PATCHv2 5/6] KVM: VMX: Do not fix segment register during vcpu initialization Gleb Natapov
@ 2012-12-20 14:57 ` Gleb Natapov
  2012-12-20 15:25   ` Alex Williamson
  2013-01-03 12:53 ` [PATCHv2 0/6] more VMX real mode emulation fixes Marcelo Tosatti
  6 siblings, 1 reply; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 14:57 UTC (permalink / raw)
  To: kvm; +Cc: mtosatti

With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
enter the vcpu with smaller segment limit than guest configured.  If the
guest tries to access pass this limit it will get #GP at which point
instruction will be emulated with correct segment limit applied. If
during the emulation IO is detected it is not handled correctly. Vcpu
thread should exit to userspace to serve the IO, but it returns to the
guest instead.  Since emulation is not completed till userspace completes
the IO the faulty instruction is re-executed ad infinitum.

The patch fixes that by exiting to userspace if IO happens during
instruction emulation.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/vmx.c |   75 ++++++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a101dd4..ab5933f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4230,28 +4230,9 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
 	return 0;
 }
 
-static int handle_rmode_exception(struct kvm_vcpu *vcpu,
-				  int vec, u32 err_code)
+static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) 
 {
-	/*
-	 * Instruction with address size override prefix opcode 0x67
-	 * Cause the #SS fault with 0 error code in VM86 mode.
-	 */
-	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0)
-		if (emulate_instruction(vcpu, 0) == EMULATE_DONE)
-			return 1;
-	/*
-	 * Forward all other exceptions that are valid in real mode.
-	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
-	 *        the required debugging infrastructure rework.
-	 */
 	switch (vec) {
-	case DB_VECTOR:
-		if (vcpu->guest_debug &
-		    (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
-			return 0;
-		kvm_queue_exception(vcpu, vec);
-		return 1;
 	case BP_VECTOR:
 		/*
 		 * Update instruction length as we may reinject the exception
@@ -4260,7 +4241,12 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
 		to_vmx(vcpu)->vcpu.arch.event_exit_inst_len =
 			vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
-			return 0;
+			return false;
+		/* fall through */
+	case DB_VECTOR:
+		if (vcpu->guest_debug &
+			(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
+			return false;
 		/* fall through */
 	case DE_VECTOR:
 	case OF_VECTOR:
@@ -4270,10 +4256,37 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
 	case SS_VECTOR:
 	case GP_VECTOR:
 	case MF_VECTOR:
-		kvm_queue_exception(vcpu, vec);
-		return 1;
+		return true;
+	break;
 	}
-	return 0;
+	return false;
+}
+
+static int handle_rmode_exception(struct kvm_vcpu *vcpu,
+				  int vec, u32 err_code)
+{
+	/*
+	 * Instruction with address size override prefix opcode 0x67
+	 * Cause the #SS fault with 0 error code in VM86 mode.
+	 */
+	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0) {
+		if (emulate_instruction(vcpu, 0) == EMULATE_DONE) {
+			if (vcpu->arch.halt_request) {
+				vcpu->arch.halt_request = 0;
+				return kvm_emulate_halt(vcpu);
+			}
+			return 1;
+		}
+		return 0;
+	}
+
+	/*
+	 * Forward all other exceptions that are valid in real mode.
+	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
+	 *        the required debugging infrastructure rework.
+	 */
+	kvm_queue_exception(vcpu, vec);
+	return 1;
 }
 
 /*
@@ -4361,17 +4374,11 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
 	}
 
-	if (vmx->rmode.vm86_active &&
-	    handle_rmode_exception(vcpu, intr_info & INTR_INFO_VECTOR_MASK,
-								error_code)) {
-		if (vcpu->arch.halt_request) {
-			vcpu->arch.halt_request = 0;
-			return kvm_emulate_halt(vcpu);
-		}
-		return 1;
-	}
-
 	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
+
+	if (vmx->rmode.vm86_active && rmode_exception(vcpu, ex_no))
+		return handle_rmode_exception(vcpu, ex_no, error_code);
+
 	switch (ex_no) {
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.
  2012-12-20 14:57 ` [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode Gleb Natapov
@ 2012-12-20 15:25   ` Alex Williamson
  2012-12-20 16:43     ` Gleb Natapov
  2012-12-22 11:06     ` Avi Kivity
  0 siblings, 2 replies; 21+ messages in thread
From: Alex Williamson @ 2012-12-20 15:25 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, mtosatti

On Thu, 2012-12-20 at 16:57 +0200, Gleb Natapov wrote:
> With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
> enter the vcpu with smaller segment limit than guest configured.  If the
> guest tries to access pass this limit it will get #GP at which point
> instruction will be emulated with correct segment limit applied. If
> during the emulation IO is detected it is not handled correctly. Vcpu
> thread should exit to userspace to serve the IO, but it returns to the
> guest instead.  Since emulation is not completed till userspace completes
> the IO the faulty instruction is re-executed ad infinitum.
> 
> The patch fixes that by exiting to userspace if IO happens during
> instruction emulation.

Thanks for finding the right fix Gleb.  This originally came about from
an experiment in lazily mapping assigned device MMIO BARs.  That's
something I think might still have value for conserving memory slots,
but now I have to be aware of this bug.  That makes me wonder if we need
to expose a capability for the fix.  I could also just avoid the lazy
path if a device includes a ROM, but it's still difficult to know how
long such a workaround is required.  Thanks,

Alex

> Reported-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/kvm/vmx.c |   75 ++++++++++++++++++++++++++++------------------------
>  1 file changed, 41 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a101dd4..ab5933f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4230,28 +4230,9 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
>  	return 0;
>  }
>  
> -static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> -				  int vec, u32 err_code)
> +static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) 
>  {
> -	/*
> -	 * Instruction with address size override prefix opcode 0x67
> -	 * Cause the #SS fault with 0 error code in VM86 mode.
> -	 */
> -	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0)
> -		if (emulate_instruction(vcpu, 0) == EMULATE_DONE)
> -			return 1;
> -	/*
> -	 * Forward all other exceptions that are valid in real mode.
> -	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> -	 *        the required debugging infrastructure rework.
> -	 */
>  	switch (vec) {
> -	case DB_VECTOR:
> -		if (vcpu->guest_debug &
> -		    (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> -			return 0;
> -		kvm_queue_exception(vcpu, vec);
> -		return 1;
>  	case BP_VECTOR:
>  		/*
>  		 * Update instruction length as we may reinject the exception
> @@ -4260,7 +4241,12 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
>  		to_vmx(vcpu)->vcpu.arch.event_exit_inst_len =
>  			vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
>  		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
> -			return 0;
> +			return false;
> +		/* fall through */
> +	case DB_VECTOR:
> +		if (vcpu->guest_debug &
> +			(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> +			return false;
>  		/* fall through */
>  	case DE_VECTOR:
>  	case OF_VECTOR:
> @@ -4270,10 +4256,37 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
>  	case SS_VECTOR:
>  	case GP_VECTOR:
>  	case MF_VECTOR:
> -		kvm_queue_exception(vcpu, vec);
> -		return 1;
> +		return true;
> +	break;
>  	}
> -	return 0;
> +	return false;
> +}
> +
> +static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> +				  int vec, u32 err_code)
> +{
> +	/*
> +	 * Instruction with address size override prefix opcode 0x67
> +	 * Cause the #SS fault with 0 error code in VM86 mode.
> +	 */
> +	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0) {
> +		if (emulate_instruction(vcpu, 0) == EMULATE_DONE) {
> +			if (vcpu->arch.halt_request) {
> +				vcpu->arch.halt_request = 0;
> +				return kvm_emulate_halt(vcpu);
> +			}
> +			return 1;
> +		}
> +		return 0;
> +	}
> +
> +	/*
> +	 * Forward all other exceptions that are valid in real mode.
> +	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> +	 *        the required debugging infrastructure rework.
> +	 */
> +	kvm_queue_exception(vcpu, vec);
> +	return 1;
>  }
>  
>  /*
> @@ -4361,17 +4374,11 @@ static int handle_exception(struct kvm_vcpu *vcpu)
>  		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
>  	}
>  
> -	if (vmx->rmode.vm86_active &&
> -	    handle_rmode_exception(vcpu, intr_info & INTR_INFO_VECTOR_MASK,
> -								error_code)) {
> -		if (vcpu->arch.halt_request) {
> -			vcpu->arch.halt_request = 0;
> -			return kvm_emulate_halt(vcpu);
> -		}
> -		return 1;
> -	}
> -
>  	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
> +
> +	if (vmx->rmode.vm86_active && rmode_exception(vcpu, ex_no))
> +		return handle_rmode_exception(vcpu, ex_no, error_code);
> +
>  	switch (ex_no) {
>  	case DB_VECTOR:
>  		dr6 = vmcs_readl(EXIT_QUALIFICATION);




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.
  2012-12-20 15:25   ` Alex Williamson
@ 2012-12-20 16:43     ` Gleb Natapov
  2012-12-20 22:58       ` Alex Williamson
  2012-12-22 11:06     ` Avi Kivity
  1 sibling, 1 reply; 21+ messages in thread
From: Gleb Natapov @ 2012-12-20 16:43 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, mtosatti

On Thu, Dec 20, 2012 at 08:25:41AM -0700, Alex Williamson wrote:
> On Thu, 2012-12-20 at 16:57 +0200, Gleb Natapov wrote:
> > With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
> > enter the vcpu with smaller segment limit than guest configured.  If the
> > guest tries to access pass this limit it will get #GP at which point
> > instruction will be emulated with correct segment limit applied. If
> > during the emulation IO is detected it is not handled correctly. Vcpu
> > thread should exit to userspace to serve the IO, but it returns to the
> > guest instead.  Since emulation is not completed till userspace completes
> > the IO the faulty instruction is re-executed ad infinitum.
> > 
> > The patch fixes that by exiting to userspace if IO happens during
> > instruction emulation.
> 
> Thanks for finding the right fix Gleb.  This originally came about from
> an experiment in lazily mapping assigned device MMIO BARs.  That's
> something I think might still have value for conserving memory slots,
> but now I have to be aware of this bug.  That makes me wonder if we need
> to expose a capability for the fix.  I could also just avoid the lazy
> path if a device includes a ROM, but it's still difficult to know how
> long such a workaround is required.  Thanks,
> 
I do not like the idea to have capability flags for emulator bugs. We
have to many of those.  And It is hard to know at which point exactly
this capability should be set.  Case in point: apply this patch series
and the one it depends upon, but do not apply the last patch that actually
fixes the bug you've found. Run your test. It will work. Also the test
case will work on older kernels on AMD and modern Intel cpus with
unrestricted guest capability.
 
> Alex
> 
> > Reported-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > ---
> >  arch/x86/kvm/vmx.c |   75 ++++++++++++++++++++++++++++------------------------
> >  1 file changed, 41 insertions(+), 34 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index a101dd4..ab5933f 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -4230,28 +4230,9 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
> >  	return 0;
> >  }
> >  
> > -static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > -				  int vec, u32 err_code)
> > +static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) 
> >  {
> > -	/*
> > -	 * Instruction with address size override prefix opcode 0x67
> > -	 * Cause the #SS fault with 0 error code in VM86 mode.
> > -	 */
> > -	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0)
> > -		if (emulate_instruction(vcpu, 0) == EMULATE_DONE)
> > -			return 1;
> > -	/*
> > -	 * Forward all other exceptions that are valid in real mode.
> > -	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> > -	 *        the required debugging infrastructure rework.
> > -	 */
> >  	switch (vec) {
> > -	case DB_VECTOR:
> > -		if (vcpu->guest_debug &
> > -		    (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> > -			return 0;
> > -		kvm_queue_exception(vcpu, vec);
> > -		return 1;
> >  	case BP_VECTOR:
> >  		/*
> >  		 * Update instruction length as we may reinject the exception
> > @@ -4260,7 +4241,12 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> >  		to_vmx(vcpu)->vcpu.arch.event_exit_inst_len =
> >  			vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> >  		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
> > -			return 0;
> > +			return false;
> > +		/* fall through */
> > +	case DB_VECTOR:
> > +		if (vcpu->guest_debug &
> > +			(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> > +			return false;
> >  		/* fall through */
> >  	case DE_VECTOR:
> >  	case OF_VECTOR:
> > @@ -4270,10 +4256,37 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> >  	case SS_VECTOR:
> >  	case GP_VECTOR:
> >  	case MF_VECTOR:
> > -		kvm_queue_exception(vcpu, vec);
> > -		return 1;
> > +		return true;
> > +	break;
> >  	}
> > -	return 0;
> > +	return false;
> > +}
> > +
> > +static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > +				  int vec, u32 err_code)
> > +{
> > +	/*
> > +	 * Instruction with address size override prefix opcode 0x67
> > +	 * Cause the #SS fault with 0 error code in VM86 mode.
> > +	 */
> > +	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0) {
> > +		if (emulate_instruction(vcpu, 0) == EMULATE_DONE) {
> > +			if (vcpu->arch.halt_request) {
> > +				vcpu->arch.halt_request = 0;
> > +				return kvm_emulate_halt(vcpu);
> > +			}
> > +			return 1;
> > +		}
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * Forward all other exceptions that are valid in real mode.
> > +	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> > +	 *        the required debugging infrastructure rework.
> > +	 */
> > +	kvm_queue_exception(vcpu, vec);
> > +	return 1;
> >  }
> >  
> >  /*
> > @@ -4361,17 +4374,11 @@ static int handle_exception(struct kvm_vcpu *vcpu)
> >  		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
> >  	}
> >  
> > -	if (vmx->rmode.vm86_active &&
> > -	    handle_rmode_exception(vcpu, intr_info & INTR_INFO_VECTOR_MASK,
> > -								error_code)) {
> > -		if (vcpu->arch.halt_request) {
> > -			vcpu->arch.halt_request = 0;
> > -			return kvm_emulate_halt(vcpu);
> > -		}
> > -		return 1;
> > -	}
> > -
> >  	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
> > +
> > +	if (vmx->rmode.vm86_active && rmode_exception(vcpu, ex_no))
> > +		return handle_rmode_exception(vcpu, ex_no, error_code);
> > +
> >  	switch (ex_no) {
> >  	case DB_VECTOR:
> >  		dr6 = vmcs_readl(EXIT_QUALIFICATION);
> 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.
  2012-12-20 16:43     ` Gleb Natapov
@ 2012-12-20 22:58       ` Alex Williamson
  2012-12-21  8:01         ` Gleb Natapov
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2012-12-20 22:58 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, mtosatti

On Thu, 2012-12-20 at 18:43 +0200, Gleb Natapov wrote:
> On Thu, Dec 20, 2012 at 08:25:41AM -0700, Alex Williamson wrote:
> > On Thu, 2012-12-20 at 16:57 +0200, Gleb Natapov wrote:
> > > With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
> > > enter the vcpu with smaller segment limit than guest configured.  If the
> > > guest tries to access pass this limit it will get #GP at which point
> > > instruction will be emulated with correct segment limit applied. If
> > > during the emulation IO is detected it is not handled correctly. Vcpu
> > > thread should exit to userspace to serve the IO, but it returns to the
> > > guest instead.  Since emulation is not completed till userspace completes
> > > the IO the faulty instruction is re-executed ad infinitum.
> > > 
> > > The patch fixes that by exiting to userspace if IO happens during
> > > instruction emulation.
> > 
> > Thanks for finding the right fix Gleb.  This originally came about from
> > an experiment in lazily mapping assigned device MMIO BARs.  That's
> > something I think might still have value for conserving memory slots,
> > but now I have to be aware of this bug.  That makes me wonder if we need
> > to expose a capability for the fix.  I could also just avoid the lazy
> > path if a device includes a ROM, but it's still difficult to know how
> > long such a workaround is required.  Thanks,
> > 
> I do not like the idea to have capability flags for emulator bugs. We
> have to many of those.  And It is hard to know at which point exactly
> this capability should be set.  Case in point: apply this patch series
> and the one it depends upon, but do not apply the last patch that actually
> fixes the bug you've found. Run your test. It will work. Also the test
> case will work on older kernels on AMD and modern Intel cpus with
> unrestricted guest capability.

I agree, capabilities are terrible, the only thing worse is having an
un-probe-able bug that could randomly bite us from an unknown option
rom.  The capability doesn't eliminate that entirely though, we could
still have an MMIO BAR that must be emulated because if it's size.  I'll
just have to avoid it as best as I can.  Thanks,

Alex
 
> > > Reported-by: Alex Williamson <alex.williamson@redhat.com>
> > > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > > ---
> > >  arch/x86/kvm/vmx.c |   75 ++++++++++++++++++++++++++++------------------------
> > >  1 file changed, 41 insertions(+), 34 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index a101dd4..ab5933f 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -4230,28 +4230,9 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
> > >  	return 0;
> > >  }
> > >  
> > > -static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > > -				  int vec, u32 err_code)
> > > +static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) 
> > >  {
> > > -	/*
> > > -	 * Instruction with address size override prefix opcode 0x67
> > > -	 * Cause the #SS fault with 0 error code in VM86 mode.
> > > -	 */
> > > -	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0)
> > > -		if (emulate_instruction(vcpu, 0) == EMULATE_DONE)
> > > -			return 1;
> > > -	/*
> > > -	 * Forward all other exceptions that are valid in real mode.
> > > -	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> > > -	 *        the required debugging infrastructure rework.
> > > -	 */
> > >  	switch (vec) {
> > > -	case DB_VECTOR:
> > > -		if (vcpu->guest_debug &
> > > -		    (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> > > -			return 0;
> > > -		kvm_queue_exception(vcpu, vec);
> > > -		return 1;
> > >  	case BP_VECTOR:
> > >  		/*
> > >  		 * Update instruction length as we may reinject the exception
> > > @@ -4260,7 +4241,12 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > >  		to_vmx(vcpu)->vcpu.arch.event_exit_inst_len =
> > >  			vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> > >  		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
> > > -			return 0;
> > > +			return false;
> > > +		/* fall through */
> > > +	case DB_VECTOR:
> > > +		if (vcpu->guest_debug &
> > > +			(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> > > +			return false;
> > >  		/* fall through */
> > >  	case DE_VECTOR:
> > >  	case OF_VECTOR:
> > > @@ -4270,10 +4256,37 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > >  	case SS_VECTOR:
> > >  	case GP_VECTOR:
> > >  	case MF_VECTOR:
> > > -		kvm_queue_exception(vcpu, vec);
> > > -		return 1;
> > > +		return true;
> > > +	break;
> > >  	}
> > > -	return 0;
> > > +	return false;
> > > +}
> > > +
> > > +static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > > +				  int vec, u32 err_code)
> > > +{
> > > +	/*
> > > +	 * Instruction with address size override prefix opcode 0x67
> > > +	 * Cause the #SS fault with 0 error code in VM86 mode.
> > > +	 */
> > > +	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0) {
> > > +		if (emulate_instruction(vcpu, 0) == EMULATE_DONE) {
> > > +			if (vcpu->arch.halt_request) {
> > > +				vcpu->arch.halt_request = 0;
> > > +				return kvm_emulate_halt(vcpu);
> > > +			}
> > > +			return 1;
> > > +		}
> > > +		return 0;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Forward all other exceptions that are valid in real mode.
> > > +	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> > > +	 *        the required debugging infrastructure rework.
> > > +	 */
> > > +	kvm_queue_exception(vcpu, vec);
> > > +	return 1;
> > >  }
> > >  
> > >  /*
> > > @@ -4361,17 +4374,11 @@ static int handle_exception(struct kvm_vcpu *vcpu)
> > >  		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
> > >  	}
> > >  
> > > -	if (vmx->rmode.vm86_active &&
> > > -	    handle_rmode_exception(vcpu, intr_info & INTR_INFO_VECTOR_MASK,
> > > -								error_code)) {
> > > -		if (vcpu->arch.halt_request) {
> > > -			vcpu->arch.halt_request = 0;
> > > -			return kvm_emulate_halt(vcpu);
> > > -		}
> > > -		return 1;
> > > -	}
> > > -
> > >  	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
> > > +
> > > +	if (vmx->rmode.vm86_active && rmode_exception(vcpu, ex_no))
> > > +		return handle_rmode_exception(vcpu, ex_no, error_code);
> > > +
> > >  	switch (ex_no) {
> > >  	case DB_VECTOR:
> > >  		dr6 = vmcs_readl(EXIT_QUALIFICATION);
> > 
> > 
> 
> --
> 			Gleb.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.
  2012-12-20 22:58       ` Alex Williamson
@ 2012-12-21  8:01         ` Gleb Natapov
  0 siblings, 0 replies; 21+ messages in thread
From: Gleb Natapov @ 2012-12-21  8:01 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, mtosatti

On Thu, Dec 20, 2012 at 03:58:49PM -0700, Alex Williamson wrote:
> On Thu, 2012-12-20 at 18:43 +0200, Gleb Natapov wrote:
> > On Thu, Dec 20, 2012 at 08:25:41AM -0700, Alex Williamson wrote:
> > > On Thu, 2012-12-20 at 16:57 +0200, Gleb Natapov wrote:
> > > > With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
> > > > enter the vcpu with smaller segment limit than guest configured.  If the
> > > > guest tries to access pass this limit it will get #GP at which point
> > > > instruction will be emulated with correct segment limit applied. If
> > > > during the emulation IO is detected it is not handled correctly. Vcpu
> > > > thread should exit to userspace to serve the IO, but it returns to the
> > > > guest instead.  Since emulation is not completed till userspace completes
> > > > the IO the faulty instruction is re-executed ad infinitum.
> > > > 
> > > > The patch fixes that by exiting to userspace if IO happens during
> > > > instruction emulation.
> > > 
> > > Thanks for finding the right fix Gleb.  This originally came about from
> > > an experiment in lazily mapping assigned device MMIO BARs.  That's
> > > something I think might still have value for conserving memory slots,
> > > but now I have to be aware of this bug.  That makes me wonder if we need
> > > to expose a capability for the fix.  I could also just avoid the lazy
> > > path if a device includes a ROM, but it's still difficult to know how
> > > long such a workaround is required.  Thanks,
> > > 
> > I do not like the idea to have capability flags for emulator bugs. We
> > have to many of those.  And It is hard to know at which point exactly
> > this capability should be set.  Case in point: apply this patch series
> > and the one it depends upon, but do not apply the last patch that actually
> > fixes the bug you've found. Run your test. It will work. Also the test
> > case will work on older kernels on AMD and modern Intel cpus with
> > unrestricted guest capability.
> 
> I agree, capabilities are terrible, the only thing worse is having an
> un-probe-able bug that could randomly bite us from an unknown option
> rom.  The capability doesn't eliminate that entirely though, we could
> still have an MMIO BAR that must be emulated because if it's size.  I'll
> just have to avoid it as best as I can.  Thanks,
> 
Precisely (to capability doesn't eliminate that entirely part). You will
not hit the bug with the patch series, but without the last patch because
now option roms are fully emulated. The reason is that according to BIOS
spec option roms are called in big real mode and we cannot enter vcpu
in this mode, the fact you hit the #GP bug is the result of another
bug that caused us to enter the vcpu anyway (with incorrect segment
registers). Since option roms are now fully emulated we cannot tell
for sure which one will work and which one will not until we implement
all real mode instruction in the emulator. We are almost, but not yet,
there. Note that this is true no matter your changes. When you assign
device with option rom it may fail due to missing emulation.

> Alex
>  
> > > > Reported-by: Alex Williamson <alex.williamson@redhat.com>
> > > > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > > > ---
> > > >  arch/x86/kvm/vmx.c |   75 ++++++++++++++++++++++++++++------------------------
> > > >  1 file changed, 41 insertions(+), 34 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > index a101dd4..ab5933f 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -4230,28 +4230,9 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > > > -				  int vec, u32 err_code)
> > > > +static bool rmode_exception(struct kvm_vcpu *vcpu, int vec) 
> > > >  {
> > > > -	/*
> > > > -	 * Instruction with address size override prefix opcode 0x67
> > > > -	 * Cause the #SS fault with 0 error code in VM86 mode.
> > > > -	 */
> > > > -	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0)
> > > > -		if (emulate_instruction(vcpu, 0) == EMULATE_DONE)
> > > > -			return 1;
> > > > -	/*
> > > > -	 * Forward all other exceptions that are valid in real mode.
> > > > -	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> > > > -	 *        the required debugging infrastructure rework.
> > > > -	 */
> > > >  	switch (vec) {
> > > > -	case DB_VECTOR:
> > > > -		if (vcpu->guest_debug &
> > > > -		    (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> > > > -			return 0;
> > > > -		kvm_queue_exception(vcpu, vec);
> > > > -		return 1;
> > > >  	case BP_VECTOR:
> > > >  		/*
> > > >  		 * Update instruction length as we may reinject the exception
> > > > @@ -4260,7 +4241,12 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > > >  		to_vmx(vcpu)->vcpu.arch.event_exit_inst_len =
> > > >  			vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
> > > >  		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
> > > > -			return 0;
> > > > +			return false;
> > > > +		/* fall through */
> > > > +	case DB_VECTOR:
> > > > +		if (vcpu->guest_debug &
> > > > +			(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
> > > > +			return false;
> > > >  		/* fall through */
> > > >  	case DE_VECTOR:
> > > >  	case OF_VECTOR:
> > > > @@ -4270,10 +4256,37 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > > >  	case SS_VECTOR:
> > > >  	case GP_VECTOR:
> > > >  	case MF_VECTOR:
> > > > -		kvm_queue_exception(vcpu, vec);
> > > > -		return 1;
> > > > +		return true;
> > > > +	break;
> > > >  	}
> > > > -	return 0;
> > > > +	return false;
> > > > +}
> > > > +
> > > > +static int handle_rmode_exception(struct kvm_vcpu *vcpu,
> > > > +				  int vec, u32 err_code)
> > > > +{
> > > > +	/*
> > > > +	 * Instruction with address size override prefix opcode 0x67
> > > > +	 * Cause the #SS fault with 0 error code in VM86 mode.
> > > > +	 */
> > > > +	if (((vec == GP_VECTOR) || (vec == SS_VECTOR)) && err_code == 0) {
> > > > +		if (emulate_instruction(vcpu, 0) == EMULATE_DONE) {
> > > > +			if (vcpu->arch.halt_request) {
> > > > +				vcpu->arch.halt_request = 0;
> > > > +				return kvm_emulate_halt(vcpu);
> > > > +			}
> > > > +			return 1;
> > > > +		}
> > > > +		return 0;
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * Forward all other exceptions that are valid in real mode.
> > > > +	 * FIXME: Breaks guest debugging in real mode, needs to be fixed with
> > > > +	 *        the required debugging infrastructure rework.
> > > > +	 */
> > > > +	kvm_queue_exception(vcpu, vec);
> > > > +	return 1;
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -4361,17 +4374,11 @@ static int handle_exception(struct kvm_vcpu *vcpu)
> > > >  		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
> > > >  	}
> > > >  
> > > > -	if (vmx->rmode.vm86_active &&
> > > > -	    handle_rmode_exception(vcpu, intr_info & INTR_INFO_VECTOR_MASK,
> > > > -								error_code)) {
> > > > -		if (vcpu->arch.halt_request) {
> > > > -			vcpu->arch.halt_request = 0;
> > > > -			return kvm_emulate_halt(vcpu);
> > > > -		}
> > > > -		return 1;
> > > > -	}
> > > > -
> > > >  	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
> > > > +
> > > > +	if (vmx->rmode.vm86_active && rmode_exception(vcpu, ex_no))
> > > > +		return handle_rmode_exception(vcpu, ex_no, error_code);
> > > > +
> > > >  	switch (ex_no) {
> > > >  	case DB_VECTOR:
> > > >  		dr6 = vmcs_readl(EXIT_QUALIFICATION);
> > > 
> > > 
> > 
> > --
> > 			Gleb.
> 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.
  2012-12-20 15:25   ` Alex Williamson
  2012-12-20 16:43     ` Gleb Natapov
@ 2012-12-22 11:06     ` Avi Kivity
  2012-12-23  9:44       ` Gleb Natapov
  1 sibling, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-12-22 11:06 UTC (permalink / raw)
  To: kvm

Alex Williamson <alex.williamson <at> redhat.com> writes:

> Thanks for finding the right fix Gleb.  This originally came about from
> an experiment in lazily mapping assigned device MMIO BARs.  That's
> something I think might still have value for conserving memory slots,
> but now I have to be aware of this bug.  That makes me wonder if we need
> to expose a capability for the fix.  I could also just avoid the lazy
> path if a device includes a ROM, but it's still difficult to know how
> long such a workaround is required.  Thanks,

Just backport the fix to all stable kernels and you're done.

(posting from gmane so cc: list lost)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode.
  2012-12-22 11:06     ` Avi Kivity
@ 2012-12-23  9:44       ` Gleb Natapov
  0 siblings, 0 replies; 21+ messages in thread
From: Gleb Natapov @ 2012-12-23  9:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, alex.williamson, mtosatti

On Sat, Dec 22, 2012 at 11:06:39AM +0000, Avi Kivity wrote:
> Alex Williamson <alex.williamson <at> redhat.com> writes:
> 
> > Thanks for finding the right fix Gleb.  This originally came about from
> > an experiment in lazily mapping assigned device MMIO BARs.  That's
> > something I think might still have value for conserving memory slots,
> > but now I have to be aware of this bug.  That makes me wonder if we need
> > to expose a capability for the fix.  I could also just avoid the lazy
> > path if a device includes a ROM, but it's still difficult to know how
> > long such a workaround is required.  Thanks,
> 
> Just backport the fix to all stable kernels and you're done.
> 
Sounds like a plan.

> (posting from gmane so cc: list lost)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 0/6] more VMX real mode emulation fixes
  2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
                   ` (5 preceding siblings ...)
  2012-12-20 14:57 ` [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode Gleb Natapov
@ 2013-01-03 12:53 ` Marcelo Tosatti
  6 siblings, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2013-01-03 12:53 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Thu, Dec 20, 2012 at 04:57:41PM +0200, Gleb Natapov wrote:
> This series goes on top of my previous one: "Fix
> emulate_invalid_guest_state=0 part 2". It does not only fixes bugs,
> but also does a nice cleanup of VMX real mode emulation. All real mode
> segment register mangling is now contained in fix_rmode_seg() function.
> 
> Changelog:
>   v1 -> v2:
>     - emulate_invalid_guest_state=0 broke again. Fix it.
>     - additional patch to handle IO during emulation caused by #GP
> 
> Gleb Natapov (6):
>   KVM: emulator: drop RPL check from linearize() function
>   KVM: emulator: implement fninit, fnstsw, fnstcw
>   KVM: VMX: make rmode_segment_valid() more strict.
>   KVM: VMX: fix emulation of invalid guest state.
>   KVM: VMX: Do not fix segment register during vcpu initialization.
>   KVM: VMX: handle IO when emulation is due to #GP in real mode.
> 
>  arch/x86/kvm/emulate.c |  133 +++++++++++++++++++++++++++--
>  arch/x86/kvm/vmx.c     |  219 +++++++++++++++++++++++++-----------------------
>  2 files changed, 241 insertions(+), 111 deletions(-)
> 
> -- 
> 1.7.10.4

Applied, thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2012-12-20 14:57 ` [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function Gleb Natapov
@ 2013-02-11 16:58   ` Jan Kiszka
  2013-02-11 17:25     ` Gleb Natapov
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2013-02-11 16:58 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, mtosatti

On 2012-12-20 15:57, Gleb Natapov wrote:
> According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
> "Privilege Level Checking When Accessing Data Segments" RPL checking is
> done during loading of a segment selector, not during data access. We
> already do checking during segment selector loading, so drop the check
> during data access. Checking RPL during data access triggers #GP if
> after transition from real mode to protected mode RPL bits in a segment
> selector are set.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/kvm/emulate.c |    7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index c7547b3..a3d31e3 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -665,7 +665,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
>  	ulong la;
>  	u32 lim;
>  	u16 sel;
> -	unsigned cpl, rpl;
> +	unsigned cpl;
>  
>  	la = seg_base(ctxt, addr.seg) + addr.ea;
>  	switch (ctxt->mode) {
> @@ -699,11 +699,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
>  				goto bad;
>  		}
>  		cpl = ctxt->ops->cpl(ctxt);
> -		if (ctxt->mode == X86EMUL_MODE_REAL)
> -			rpl = 0;
> -		else
> -			rpl = sel & 3;
> -		cpl = max(cpl, rpl);
>  		if (!(desc.type & 8)) {
>  			/* data segment */
>  			if (cpl > desc.dpl)
> 

I suppose this one is queued for 3.8 and stable already, right? We
happen to hit the case reliably while booting an older SUSE guest on an
AMD host.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2013-02-11 16:58   ` Jan Kiszka
@ 2013-02-11 17:25     ` Gleb Natapov
  2013-02-11 17:31       ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Gleb Natapov @ 2013-02-11 17:25 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm, mtosatti

On Mon, Feb 11, 2013 at 05:58:24PM +0100, Jan Kiszka wrote:
> On 2012-12-20 15:57, Gleb Natapov wrote:
> > According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
> > "Privilege Level Checking When Accessing Data Segments" RPL checking is
> > done during loading of a segment selector, not during data access. We
> > already do checking during segment selector loading, so drop the check
> > during data access. Checking RPL during data access triggers #GP if
> > after transition from real mode to protected mode RPL bits in a segment
> > selector are set.
> > 
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > ---
> >  arch/x86/kvm/emulate.c |    7 +------
> >  1 file changed, 1 insertion(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > index c7547b3..a3d31e3 100644
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -665,7 +665,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
> >  	ulong la;
> >  	u32 lim;
> >  	u16 sel;
> > -	unsigned cpl, rpl;
> > +	unsigned cpl;
> >  
> >  	la = seg_base(ctxt, addr.seg) + addr.ea;
> >  	switch (ctxt->mode) {
> > @@ -699,11 +699,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
> >  				goto bad;
> >  		}
> >  		cpl = ctxt->ops->cpl(ctxt);
> > -		if (ctxt->mode == X86EMUL_MODE_REAL)
> > -			rpl = 0;
> > -		else
> > -			rpl = sel & 3;
> > -		cpl = max(cpl, rpl);
> >  		if (!(desc.type & 8)) {
> >  			/* data segment */
> >  			if (cpl > desc.dpl)
> > 
> 
> I suppose this one is queued for 3.8 and stable already, right? We
> happen to hit the case reliably while booting an older SUSE guest on an
> AMD host.
> 
The patch was in the middle of the pile of vmx real mode fixes. I had
no reports that it can be triggered on its own, so it was not queued
neither to 3.8 nor to stable. Is it a regression?  If yes what version
the bug appears in?

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2013-02-11 17:25     ` Gleb Natapov
@ 2013-02-11 17:31       ` Jan Kiszka
  2013-02-11 17:40         ` Gleb Natapov
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2013-02-11 17:31 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm@vger.kernel.org, mtosatti@redhat.com

On 2013-02-11 18:25, Gleb Natapov wrote:
> On Mon, Feb 11, 2013 at 05:58:24PM +0100, Jan Kiszka wrote:
>> On 2012-12-20 15:57, Gleb Natapov wrote:
>>> According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
>>> "Privilege Level Checking When Accessing Data Segments" RPL checking is
>>> done during loading of a segment selector, not during data access. We
>>> already do checking during segment selector loading, so drop the check
>>> during data access. Checking RPL during data access triggers #GP if
>>> after transition from real mode to protected mode RPL bits in a segment
>>> selector are set.
>>>
>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
>>> ---
>>>  arch/x86/kvm/emulate.c |    7 +------
>>>  1 file changed, 1 insertion(+), 6 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>> index c7547b3..a3d31e3 100644
>>> --- a/arch/x86/kvm/emulate.c
>>> +++ b/arch/x86/kvm/emulate.c
>>> @@ -665,7 +665,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
>>>  	ulong la;
>>>  	u32 lim;
>>>  	u16 sel;
>>> -	unsigned cpl, rpl;
>>> +	unsigned cpl;
>>>  
>>>  	la = seg_base(ctxt, addr.seg) + addr.ea;
>>>  	switch (ctxt->mode) {
>>> @@ -699,11 +699,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
>>>  				goto bad;
>>>  		}
>>>  		cpl = ctxt->ops->cpl(ctxt);
>>> -		if (ctxt->mode == X86EMUL_MODE_REAL)
>>> -			rpl = 0;
>>> -		else
>>> -			rpl = sel & 3;
>>> -		cpl = max(cpl, rpl);
>>>  		if (!(desc.type & 8)) {
>>>  			/* data segment */
>>>  			if (cpl > desc.dpl)
>>>
>>
>> I suppose this one is queued for 3.8 and stable already, right? We
>> happen to hit the case reliably while booting an older SUSE guest on an
>> AMD host.
>>
> The patch was in the middle of the pile of vmx real mode fixes. I had
> no reports that it can be triggered on its own, so it was not queued
> neither to 3.8 nor to stable. Is it a regression?  If yes what version
> the bug appears in?

It is a regression of 618ff15 ("implement segment permission checks"),
thus 3.0. We are running on such a 3.0.x host kernel (SLES11.2), and
this issue only triggers on specific hosts with specific guest
configurations. After no longer seeing it with kvm/next, I bisected the
fix to this commit and instrumented it to ensure the case was actually hit.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2013-02-11 17:31       ` Jan Kiszka
@ 2013-02-11 17:40         ` Gleb Natapov
  2013-02-11 17:50           ` Gleb Natapov
  0 siblings, 1 reply; 21+ messages in thread
From: Gleb Natapov @ 2013-02-11 17:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm@vger.kernel.org, mtosatti@redhat.com

On Mon, Feb 11, 2013 at 06:31:59PM +0100, Jan Kiszka wrote:
> On 2013-02-11 18:25, Gleb Natapov wrote:
> > On Mon, Feb 11, 2013 at 05:58:24PM +0100, Jan Kiszka wrote:
> >> On 2012-12-20 15:57, Gleb Natapov wrote:
> >>> According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
> >>> "Privilege Level Checking When Accessing Data Segments" RPL checking is
> >>> done during loading of a segment selector, not during data access. We
> >>> already do checking during segment selector loading, so drop the check
> >>> during data access. Checking RPL during data access triggers #GP if
> >>> after transition from real mode to protected mode RPL bits in a segment
> >>> selector are set.
> >>>
> >>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> >>> ---
> >>>  arch/x86/kvm/emulate.c |    7 +------
> >>>  1 file changed, 1 insertion(+), 6 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >>> index c7547b3..a3d31e3 100644
> >>> --- a/arch/x86/kvm/emulate.c
> >>> +++ b/arch/x86/kvm/emulate.c
> >>> @@ -665,7 +665,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
> >>>  	ulong la;
> >>>  	u32 lim;
> >>>  	u16 sel;
> >>> -	unsigned cpl, rpl;
> >>> +	unsigned cpl;
> >>>  
> >>>  	la = seg_base(ctxt, addr.seg) + addr.ea;
> >>>  	switch (ctxt->mode) {
> >>> @@ -699,11 +699,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
> >>>  				goto bad;
> >>>  		}
> >>>  		cpl = ctxt->ops->cpl(ctxt);
> >>> -		if (ctxt->mode == X86EMUL_MODE_REAL)
> >>> -			rpl = 0;
> >>> -		else
> >>> -			rpl = sel & 3;
> >>> -		cpl = max(cpl, rpl);
> >>>  		if (!(desc.type & 8)) {
> >>>  			/* data segment */
> >>>  			if (cpl > desc.dpl)
> >>>
> >>
> >> I suppose this one is queued for 3.8 and stable already, right? We
> >> happen to hit the case reliably while booting an older SUSE guest on an
> >> AMD host.
> >>
> > The patch was in the middle of the pile of vmx real mode fixes. I had
> > no reports that it can be triggered on its own, so it was not queued
> > neither to 3.8 nor to stable. Is it a regression?  If yes what version
> > the bug appears in?
> 
> It is a regression of 618ff15 ("implement segment permission checks"),
Naturally :)

> thus 3.0. We are running on such a 3.0.x host kernel (SLES11.2), and
> this issue only triggers on specific hosts with specific guest
> configurations. After no longer seeing it with kvm/next, I bisected the
> fix to this commit and instrumented it to ensure the case was actually hit.
> 
I see. Too later to try and push it to 3.8 now, will queue for stable. Not
sure if 3.0 is still maintained by stable folks though.

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2013-02-11 17:40         ` Gleb Natapov
@ 2013-02-11 17:50           ` Gleb Natapov
  2013-02-11 18:05             ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Gleb Natapov @ 2013-02-11 17:50 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm@vger.kernel.org, mtosatti@redhat.com

On Mon, Feb 11, 2013 at 07:40:38PM +0200, Gleb Natapov wrote:
> On Mon, Feb 11, 2013 at 06:31:59PM +0100, Jan Kiszka wrote:
> > On 2013-02-11 18:25, Gleb Natapov wrote:
> > > On Mon, Feb 11, 2013 at 05:58:24PM +0100, Jan Kiszka wrote:
> > >> On 2012-12-20 15:57, Gleb Natapov wrote:
> > >>> According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
> > >>> "Privilege Level Checking When Accessing Data Segments" RPL checking is
> > >>> done during loading of a segment selector, not during data access. We
> > >>> already do checking during segment selector loading, so drop the check
> > >>> during data access. Checking RPL during data access triggers #GP if
> > >>> after transition from real mode to protected mode RPL bits in a segment
> > >>> selector are set.
> > >>>
> > >>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > >>> ---
> > >>>  arch/x86/kvm/emulate.c |    7 +------
> > >>>  1 file changed, 1 insertion(+), 6 deletions(-)
> > >>>
> > >>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > >>> index c7547b3..a3d31e3 100644
> > >>> --- a/arch/x86/kvm/emulate.c
> > >>> +++ b/arch/x86/kvm/emulate.c
> > >>> @@ -665,7 +665,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
> > >>>  	ulong la;
> > >>>  	u32 lim;
> > >>>  	u16 sel;
> > >>> -	unsigned cpl, rpl;
> > >>> +	unsigned cpl;
> > >>>  
> > >>>  	la = seg_base(ctxt, addr.seg) + addr.ea;
> > >>>  	switch (ctxt->mode) {
> > >>> @@ -699,11 +699,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
> > >>>  				goto bad;
> > >>>  		}
> > >>>  		cpl = ctxt->ops->cpl(ctxt);
> > >>> -		if (ctxt->mode == X86EMUL_MODE_REAL)
> > >>> -			rpl = 0;
> > >>> -		else
> > >>> -			rpl = sel & 3;
> > >>> -		cpl = max(cpl, rpl);
> > >>>  		if (!(desc.type & 8)) {
> > >>>  			/* data segment */
> > >>>  			if (cpl > desc.dpl)
> > >>>
> > >>
> > >> I suppose this one is queued for 3.8 and stable already, right? We
> > >> happen to hit the case reliably while booting an older SUSE guest on an
> > >> AMD host.
> > >>
> > > The patch was in the middle of the pile of vmx real mode fixes. I had
> > > no reports that it can be triggered on its own, so it was not queued
> > > neither to 3.8 nor to stable. Is it a regression?  If yes what version
> > > the bug appears in?
> > 
> > It is a regression of 618ff15 ("implement segment permission checks"),
> Naturally :)
> 
> > thus 3.0. We are running on such a 3.0.x host kernel (SLES11.2), and
> > this issue only triggers on specific hosts with specific guest
> > configurations. After no longer seeing it with kvm/next, I bisected the
> > fix to this commit and instrumented it to ensure the case was actually hit.
> > 
> I see. Too later to try and push it to 3.8 now, will queue for stable. Not
> sure if 3.0 is still maintained by stable folks though.
> 
And just after I wrote that 3.0.63 was announced.

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2013-02-11 17:50           ` Gleb Natapov
@ 2013-02-11 18:05             ` Jan Kiszka
  2013-02-11 18:40               ` Gleb Natapov
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kiszka @ 2013-02-11 18:05 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm@vger.kernel.org, mtosatti@redhat.com

On 2013-02-11 18:50, Gleb Natapov wrote:
> On Mon, Feb 11, 2013 at 07:40:38PM +0200, Gleb Natapov wrote:
>> On Mon, Feb 11, 2013 at 06:31:59PM +0100, Jan Kiszka wrote:
>>> On 2013-02-11 18:25, Gleb Natapov wrote:
>>>> On Mon, Feb 11, 2013 at 05:58:24PM +0100, Jan Kiszka wrote:
>>>>> On 2012-12-20 15:57, Gleb Natapov wrote:
>>>>>> According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
>>>>>> "Privilege Level Checking When Accessing Data Segments" RPL checking is
>>>>>> done during loading of a segment selector, not during data access. We
>>>>>> already do checking during segment selector loading, so drop the check
>>>>>> during data access. Checking RPL during data access triggers #GP if
>>>>>> after transition from real mode to protected mode RPL bits in a segment
>>>>>> selector are set.
>>>>>>
>>>>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
>>>>>> ---
>>>>>>  arch/x86/kvm/emulate.c |    7 +------
>>>>>>  1 file changed, 1 insertion(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>>>>> index c7547b3..a3d31e3 100644
>>>>>> --- a/arch/x86/kvm/emulate.c
>>>>>> +++ b/arch/x86/kvm/emulate.c
>>>>>> @@ -665,7 +665,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
>>>>>>  	ulong la;
>>>>>>  	u32 lim;
>>>>>>  	u16 sel;
>>>>>> -	unsigned cpl, rpl;
>>>>>> +	unsigned cpl;
>>>>>>  
>>>>>>  	la = seg_base(ctxt, addr.seg) + addr.ea;
>>>>>>  	switch (ctxt->mode) {
>>>>>> @@ -699,11 +699,6 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
>>>>>>  				goto bad;
>>>>>>  		}
>>>>>>  		cpl = ctxt->ops->cpl(ctxt);
>>>>>> -		if (ctxt->mode == X86EMUL_MODE_REAL)
>>>>>> -			rpl = 0;
>>>>>> -		else
>>>>>> -			rpl = sel & 3;
>>>>>> -		cpl = max(cpl, rpl);
>>>>>>  		if (!(desc.type & 8)) {
>>>>>>  			/* data segment */
>>>>>>  			if (cpl > desc.dpl)
>>>>>>
>>>>>
>>>>> I suppose this one is queued for 3.8 and stable already, right? We
>>>>> happen to hit the case reliably while booting an older SUSE guest on an
>>>>> AMD host.
>>>>>
>>>> The patch was in the middle of the pile of vmx real mode fixes. I had
>>>> no reports that it can be triggered on its own, so it was not queued
>>>> neither to 3.8 nor to stable. Is it a regression?  If yes what version
>>>> the bug appears in?
>>>
>>> It is a regression of 618ff15 ("implement segment permission checks"),
>> Naturally :)
>>
>>> thus 3.0. We are running on such a 3.0.x host kernel (SLES11.2), and
>>> this issue only triggers on specific hosts with specific guest
>>> configurations. After no longer seeing it with kvm/next, I bisected the
>>> fix to this commit and instrumented it to ensure the case was actually hit.
>>>
>> I see. Too later to try and push it to 3.8 now, will queue for stable. Not
>> sure if 3.0 is still maintained by stable folks though.
>>
> And just after I wrote that 3.0.63 was announced.

Well, if you consider pushing it that far, an adapted version will be
needed. For us, it is not that important as we already have to patch kvm
down there due to missing f1c1da2bde (SVM: Keep intercepting task
switching with NPT enabled).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function
  2013-02-11 18:05             ` Jan Kiszka
@ 2013-02-11 18:40               ` Gleb Natapov
  0 siblings, 0 replies; 21+ messages in thread
From: Gleb Natapov @ 2013-02-11 18:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm@vger.kernel.org, mtosatti@redhat.com

On Mon, Feb 11, 2013 at 07:05:50PM +0100, Jan Kiszka wrote:
> >>> thus 3.0. We are running on such a 3.0.x host kernel (SLES11.2), and
> >>> this issue only triggers on specific hosts with specific guest
> >>> configurations. After no longer seeing it with kvm/next, I bisected the
> >>> fix to this commit and instrumented it to ensure the case was actually hit.
> >>>
> >> I see. Too later to try and push it to 3.8 now, will queue for stable. Not
> >> sure if 3.0 is still maintained by stable folks though.
> >>
> > And just after I wrote that 3.0.63 was announced.
> 
> Well, if you consider pushing it that far, an adapted version will be
> needed. For us, it is not that important as we already have to patch kvm
> down there due to missing f1c1da2bde (SVM: Keep intercepting task
> switching with NPT enabled).
> 
OK, no reason to push it that far then.

--
			Gleb.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-02-11 18:41 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-20 14:57 [PATCHv2 0/6] more VMX real mode emulation fixes Gleb Natapov
2012-12-20 14:57 ` [PATCHv2 1/6] KVM: emulator: drop RPL check from linearize() function Gleb Natapov
2013-02-11 16:58   ` Jan Kiszka
2013-02-11 17:25     ` Gleb Natapov
2013-02-11 17:31       ` Jan Kiszka
2013-02-11 17:40         ` Gleb Natapov
2013-02-11 17:50           ` Gleb Natapov
2013-02-11 18:05             ` Jan Kiszka
2013-02-11 18:40               ` Gleb Natapov
2012-12-20 14:57 ` [PATCHv2 2/6] KVM: emulator: implement fninit, fnstsw, fnstcw Gleb Natapov
2012-12-20 14:57 ` [PATCHv2 3/6] KVM: VMX: make rmode_segment_valid() more strict Gleb Natapov
2012-12-20 14:57 ` [PATCHv2 4/6] KVM: VMX: fix emulation of invalid guest state Gleb Natapov
2012-12-20 14:57 ` [PATCHv2 5/6] KVM: VMX: Do not fix segment register during vcpu initialization Gleb Natapov
2012-12-20 14:57 ` [PATCHv2 6/6] KVM: VMX: handle IO when emulation is due to #GP in real mode Gleb Natapov
2012-12-20 15:25   ` Alex Williamson
2012-12-20 16:43     ` Gleb Natapov
2012-12-20 22:58       ` Alex Williamson
2012-12-21  8:01         ` Gleb Natapov
2012-12-22 11:06     ` Avi Kivity
2012-12-23  9:44       ` Gleb Natapov
2013-01-03 12:53 ` [PATCHv2 0/6] more VMX real mode emulation fixes Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).