From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [RFC] Patch - Big real mode emulation Date: Wed, 21 May 2008 16:59:46 +0300 Message-ID: <48342AD2.40406@qumranet.com> References: <20080521113410.43ec182f@frecb000711.frec.bull.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Anthony Liguori , Mohammed Gamal , "Kamble, Nitin A" , Alexander Graf , Marcelo Tosatti To: Guillaume Thouvenin Return-path: Received: from bzq-179-150-194.static.bezeqint.net ([212.179.150.194]:32968 "EHLO il.qumranet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759526AbYEUN7u (ORCPT ); Wed, 21 May 2008 09:59:50 -0400 In-Reply-To: <20080521113410.43ec182f@frecb000711.frec.bull.fr> Sender: kvm-owner@vger.kernel.org List-ID: Guillaume Thouvenin wrote: > Hello, > > Here is a patch that allows to boot OpenSuse-10.3. The problem with > Opensuse 10.3 is it uses a version of gfxboot that reads SS after > switching from real to protected mode, where SS contains an invalid > value, which VMX does not allow. Good to see progress on this issue. > So this patch > > 1) removes the code that writes sane value in SS in order to detect VM > entry failure due to CS.RPL != SS.RPL > 2) adds an handler to catch the VMentry failure > > The handler calls instruction's emulator and to boot opensuse we need > to emulate the following instructions: > > ljmp $0x18,$0x6e18 > mov $0x20,%ax > mov %eax,%ds > mov %ss,%eax > and $0xffff,%esp > shl $0x4,%eax > add %eax,%esp > mov $0x8,%ax > mov %eax,%ss > -> At this point CS.RPL == SS.RPL > > There is an issue with the patch. When removing the SS patching we see > other problems. So to be able to still boot distribution that was > already able to boot we added a hack that allows to modify SS_SELECTOR > (as it was done previously) when emulation failed. The future solution > will be to emulate instruction that need to be emulated. > Which instructions are still problematic? > index e537005..e4d6f3b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3016,8 +3016,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) > return 0; > } > > -static void get_segment(struct kvm_vcpu *vcpu, > - struct kvm_segment *var, int seg) > +void get_segment(struct kvm_vcpu *vcpu, > + struct kvm_segment *var, int seg) > { > kvm_x86_ops->get_segment(vcpu, var, seg); > } > @@ -3100,8 +3100,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, > return 0; > } > > -static void set_segment(struct kvm_vcpu *vcpu, > - struct kvm_segment *var, int seg) > +void set_segment(struct kvm_vcpu *vcpu, > + struct kvm_segment *var, int seg) > { > kvm_x86_ops->set_segment(vcpu, var, seg); > } > @@ -3259,8 +3259,8 @@ static int load_segment_descriptor_to_kvm_desct(struct kvm_vcpu *vcpu, > return 0; > } > > -static int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, > - int type_bits, int seg) > +int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, > + int type_bits, int seg) > { > struct kvm_segment kvm_seg; > These names need to be prefixed with kvm_, since they're much too generic. > @@ -1342,6 +1346,10 @@ special_insn: > switch (c->b) { > case 0x00 ... 0x05: > add: /* add */ > + if ((c->d & ModRM) && c->modrm_mod == 3) { > + c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; > + c->dst.ptr = decode_register(c->modrm_rm, c->regs, c->d & ByteOp); > + } > This is not specific to the add instruction. We really need to switch decode_modrm() to decode into a struct operand. > emulate_2op_SrcV("add", c->src, c->dst, ctxt->eflags); > break; > case 0x08 ... 0x0d: > @@ -1514,14 +1522,120 @@ special_insn: > break; > case 0x88 ... 0x8b: /* mov */ > goto mov; > + case 0x8c: { /* mov r/m, sreg */ > + struct kvm_segment segreg; > + > + if (c->modrm_mod == 0x3) > + c->src.val = c->modrm_val; > + > + switch ( c->modrm_reg ) { > + case 0: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_ES); > + break; > + case 1: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_CS); > + break; > + case 2: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_SS); > + break; > + case 3: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_DS); > + break; > + case 4: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_FS); > + break; > + case 5: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_GS); > + break; > + default: > A look up table would be nice. Or better, reorder VCPU_SREG_xx to reflect the natural order. One can even imagine a SrcSreg and DstSreg decoding flags. > + printk(KERN_INFO "0x8c: Invalid segreg in modrm byte 0x%02x\n", > + c->modrm); > + goto cannot_emulate; > + } > + c->dst.val = segreg.selector; > + c->dst.bytes = 2; > + c->dst.ptr = (unsigned long *)decode_register(c->modrm_rm, c->regs, > + c->d & ByteOp); > + break; > + } > case 0x8d: /* lea r16/r32, m */ > c->dst.val = c->modrm_ea; > break; > + case 0x8e: { /* mov seg, r/m16 */ > + uint16_t sel; > + > + sel = c->src.val; > + switch ( c->modrm_reg ) { > + case 0: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_ES) < 0) > + goto cannot_emulate; > + break; > + case 1: > + if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0) > + goto cannot_emulate; > + break; > + case 2: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_SS) < 0) > + goto cannot_emulate; > + break; > + case 3: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_DS) < 0) > + goto cannot_emulate; > + break; > + case 4: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_FS) < 0) > + goto cannot_emulate; > + break; > + case 5: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_GS) < 0) > + goto cannot_emulate; > + break; > Ditto. > + default: > + printk(KERN_INFO "Invalid segreg in modrm byte 0x%02x\n", > + c->modrm); > + goto cannot_emulate; > + } > + > + c->dst.type = OP_NONE; /* Disable writeback. */ > + break; > + } > case 0x8f: /* pop (sole member of Grp1a) */ > rc = emulate_grp1a(ctxt, ops); > if (rc != 0) > goto done; > break; > + case 0x90 ... 0x97: /* xchg ax, r16 / xchg eax, r32 / xchg rax, r64 */ > 0x90 is a special case, nop (it's different from xchg %eax, %eax!) > + c->src.ptr = & c->regs[c->b & 0x7]; > + c->dst.ptr = & c->regs[VCPU_REGS_RAX]; > + > + switch (c->op_bytes) { > + case 2: > + c->dst.val = *(u16*) c->dst.ptr; > + c->src.val = *(u16*) c->src.ptr; > + *(u16 *) c->dst.ptr = (u16) c->src.val; > + *(u16 *) c->src.ptr = (u16) c->dst.val; > You only need to write back the source pointer. The destination pointer will be written back by the writeback code. > + break; > + case 4: > + c->dst.val = *(u32*) c->dst.ptr; > + c->src.val = *(u32*) c->src.ptr; > + *(u32 *) c->dst.ptr = (u32) c->src.val; > + *(u32 *) c->src.ptr = (u32) c->dst.val; > Needs to be (unsigned long) for the destination, since x86_64 sign extends. > + break; > + case 8: > + c->dst.val = *(u64*) c->dst.ptr; > + c->src.val = *(u64*) c->src.ptr; > + *(u64 *) c->dst.ptr = (u64) c->src.val; > + *(u64 *) c->src.ptr = (u64) c->dst.val; > On 32-bit registers are not u64 (the code will never be executed, but still). > + break; > + default: > + printk("xchg: op_bytes=%d is not supported.\n", c->op_bytes); > + goto cannot_emulate; > + } > + c->dst.type = OP_NONE; /* Disable writeback. */ > + break; > + case 0xb8: /* mov r, imm */ > 0xb8 .. 0xbf. Use DstReg instead of decoding yourself. > + c->dst.ptr = (unsigned long *)&c->regs[VCPU_REGS_RAX + (c->b & 0x7)]; > + goto mov; > case 0x9c: /* pushf */ > c->src.val = (unsigned long) ctxt->eflags; > emulate_push(ctxt); > @@ -1623,6 +1737,10 @@ special_insn: > DPRINTF("Urk! I don't handle SCAS.\n"); > goto cannot_emulate; > case 0xc0 ... 0xc1: > + if ((c->d & ModRM) && c->modrm_mod == 3) { > + c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes; > + c->dst.ptr = decode_register(c->modrm_rm, c->regs, c->d & ByteOp); > + } > Again, fix decode_modrm instead. > emulate_grp2(ctxt); > break; > case 0xc3: /* ret */ > @@ -1660,6 +1778,36 @@ special_insn: > break; > } > case 0xe9: /* jmp rel */ > + jmp_rel(c, c->src.val); > + c->dst.type = OP_NONE; /* Disable writeback. */ > + break; > + case 0xea: /* jmp far */ { > + uint32_t eip; > + uint16_t sel; > + > + switch (c->op_bytes) { > + case 2: > + eip = insn_fetch(u16, 2, c->eip); > + eip = eip & 0x0000FFFF; /* clear upper 16 bits */ > No need, insn_fetch will not sign extend. > + break; > + case 4: > + eip = insn_fetch(u32, 4, c->eip); > + break; > + default: > + DPRINTF("jmp far: Invalid op_bytes\n"); > + goto cannot_emulate; > + } > + sel = insn_fetch(u16, 2, c->eip); > + if (ctxt->mode == X86EMUL_MODE_REAL) > + eip |= (sel << 4); > No, eip doesn't change. > + else if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0) { > + DPRINTF("jmp far: Failed to load CS descriptor\n"); > + goto cannot_emulate; > + } > + > + c->eip = eip; > + break; > + } > case 0xeb: /* jmp rel short */ > jmp_rel(c, c->src.val); > c->dst.type = OP_NONE; /* Disable writeback. */ > diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h > index a71f3aa..fb85da4 100644 The patch should be split into many parts (enhancing emulator infrastructure, one patch per insn, vmx changes), but in essence it does the right things. -- error compiling committee.c: too many arguments to function