public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@qumranet.com>
To: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Cc: kvm@vger.kernel.org, Anthony Liguori <anthony@codemonkey.ws>,
	Mohammed Gamal <m.gamal005@gmail.com>,
	"Kamble, Nitin A" <nitin.a.kamble@intel.com>,
	Alexander Graf <alex@csgraf.de>,
	Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [RFC] Patch - Big real mode emulation
Date: Wed, 21 May 2008 16:59:46 +0300	[thread overview]
Message-ID: <48342AD2.40406@qumranet.com> (raw)
In-Reply-To: <20080521113410.43ec182f@frecb000711.frec.bull.fr>

Guillaume Thouvenin wrote:
> Hello,
>
>  Here is a patch that allows to boot OpenSuse-10.3. The problem with
> Opensuse 10.3 is it uses a version of gfxboot that reads SS after
> switching from real to protected mode, where SS contains an invalid
> value, which VMX does not allow. 

Good to see progress on this issue.

> So this patch 
>
>  1) removes the code that writes sane value in SS in order to detect VM
> entry failure due to CS.RPL != SS.RPL
>  2) adds an handler to catch the VMentry failure
>
>  The handler calls instruction's emulator and to boot opensuse we need
> to emulate the following instructions:
>
> 	ljmp   $0x18,$0x6e18
> 	mov    $0x20,%ax
> 	mov    %eax,%ds
> 	mov    %ss,%eax
> 	and    $0xffff,%esp
> 	shl    $0x4,%eax
> 	add    %eax,%esp
> 	mov    $0x8,%ax
> 	mov    %eax,%ss	
> 	-> At this point CS.RPL == SS.RPL
>
>  There is an issue with the patch. When removing the SS patching we see
> other problems. So to be able to still boot distribution that was
> already able to boot we added a hack that allows to modify SS_SELECTOR
> (as it was done previously) when emulation failed. The future solution
> will be to emulate instruction that need to be emulated.
>   

Which instructions are still problematic?
> index e537005..e4d6f3b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3016,8 +3016,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>  	return 0;
>  }
>  
> -static void get_segment(struct kvm_vcpu *vcpu,
> -			struct kvm_segment *var, int seg)
> +void get_segment(struct kvm_vcpu *vcpu,
> +		 struct kvm_segment *var, int seg)
>  {
>  	kvm_x86_ops->get_segment(vcpu, var, seg);
>  }
> @@ -3100,8 +3100,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> -static void set_segment(struct kvm_vcpu *vcpu,
> -			struct kvm_segment *var, int seg)
> +void set_segment(struct kvm_vcpu *vcpu,
> +		 struct kvm_segment *var, int seg)
>  {
>  	kvm_x86_ops->set_segment(vcpu, var, seg);
>  }
> @@ -3259,8 +3259,8 @@ static int load_segment_descriptor_to_kvm_desct(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> -static int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
> -				   int type_bits, int seg)
> +int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
> +			    int type_bits, int seg)
>  {
>  	struct kvm_segment kvm_seg;
>   

These names need to be prefixed with kvm_, since they're much too generic.

> @@ -1342,6 +1346,10 @@ special_insn:
>  	switch (c->b) {
>  	case 0x00 ... 0x05:
>  	      add:		/* add */
> +		if ((c->d & ModRM) && c->modrm_mod == 3) {
> +			c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
> +			c->dst.ptr =  decode_register(c->modrm_rm, c->regs, c->d & ByteOp);
> +		}
>   

This is not specific to the add instruction. We really need to switch 
decode_modrm() to decode into a struct operand.

>  		emulate_2op_SrcV("add", c->src, c->dst, ctxt->eflags);
>  		break;
>  	case 0x08 ... 0x0d:
> @@ -1514,14 +1522,120 @@ special_insn:
>  		break;
>  	case 0x88 ... 0x8b:	/* mov */
>  		goto mov;
> +	case 0x8c: { /* mov r/m, sreg */
> +		struct kvm_segment segreg;
> +
> +		if (c->modrm_mod == 0x3)
> +			c->src.val = c->modrm_val;
> +
> +		switch ( c->modrm_reg ) {
> +		case 0:
> +			get_segment(ctxt->vcpu, &segreg, VCPU_SREG_ES);
> +			break;
> +		case 1:
> +			get_segment(ctxt->vcpu, &segreg, VCPU_SREG_CS);
> +			break;
> +		case 2:
> +			get_segment(ctxt->vcpu, &segreg, VCPU_SREG_SS);
> +			break;
> +		case 3:
> +			get_segment(ctxt->vcpu, &segreg, VCPU_SREG_DS);
> +			break;
> +		case 4:
> +			get_segment(ctxt->vcpu, &segreg, VCPU_SREG_FS);
> +			break;
> +		case 5:
> +			get_segment(ctxt->vcpu, &segreg, VCPU_SREG_GS);
> +			break;
> +		default:
>   

A look up table would be nice. Or better, reorder VCPU_SREG_xx to 
reflect the natural order.

One can even imagine a SrcSreg and DstSreg decoding flags.

> +			printk(KERN_INFO "0x8c: Invalid segreg in modrm byte 0x%02x\n",
> +					 c->modrm);
> +			goto cannot_emulate;
> +		}
> +		c->dst.val = segreg.selector;
> +		c->dst.bytes = 2;
> +		c->dst.ptr = (unsigned long *)decode_register(c->modrm_rm, c->regs,
> +							      c->d & ByteOp);
> +		break;
> +	}
>  	case 0x8d: /* lea r16/r32, m */
>  		c->dst.val = c->modrm_ea;
>  		break;
> +	case 0x8e: { /* mov seg, r/m16 */
> +		uint16_t sel;
> +
> +		sel = c->src.val;
> +		switch ( c->modrm_reg ) {
> +		case 0:
> +			if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_ES) < 0)
> +				goto cannot_emulate;
> +			break;
> +		case 1:
> +			if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0)
> +				goto cannot_emulate;
> +			break;
> +		case 2:
> +			if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_SS) < 0)
> +				goto cannot_emulate;
> +			break;
> +		case 3:
> +			if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_DS) < 0)
> +				goto cannot_emulate;
> +			break;
> +		case 4:
> +			if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_FS) < 0)
> +				goto cannot_emulate;
> +			break;
> +		case 5:
> +			if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_GS) < 0)
> +				goto cannot_emulate;
> +			break;
>   

Ditto.

> +		default:
> +			printk(KERN_INFO "Invalid segreg in modrm byte 0x%02x\n",
> +					  c->modrm);
> +			goto cannot_emulate;
> +		}
> +
> +		c->dst.type = OP_NONE;  /* Disable writeback. */
> +		break;
> +	}
>  	case 0x8f:		/* pop (sole member of Grp1a) */
>  		rc = emulate_grp1a(ctxt, ops);
>  		if (rc != 0)
>  			goto done;
>  		break;
> +	case 0x90 ... 0x97: /* xchg ax, r16 / xchg eax, r32 / xchg rax, r64 */
>   

0x90 is a special case, nop (it's different from xchg %eax, %eax!)

> +		c->src.ptr = & c->regs[c->b & 0x7];
> +		c->dst.ptr = & c->regs[VCPU_REGS_RAX];
> +
> +		switch (c->op_bytes) {
> +                case 2:
> +			c->dst.val = *(u16*) c->dst.ptr;
> +			c->src.val = *(u16*) c->src.ptr;
> +			*(u16 *) c->dst.ptr = (u16) c->src.val;
> +			*(u16 *) c->src.ptr = (u16) c->dst.val;
>   

You only need to write back the source pointer. The destination pointer 
will be written back by the writeback code.

> +			break;
> +		case 4:
> +			c->dst.val = *(u32*) c->dst.ptr;
> +			c->src.val = *(u32*) c->src.ptr;
> +			*(u32 *) c->dst.ptr = (u32) c->src.val;
> +			*(u32 *) c->src.ptr = (u32) c->dst.val;
>   

Needs to be (unsigned long) for the destination, since x86_64 sign extends.

> +			break;
> +		case 8:
> +			c->dst.val = *(u64*) c->dst.ptr;
> +			c->src.val = *(u64*) c->src.ptr;
> +			*(u64 *) c->dst.ptr = (u64) c->src.val;
> +			*(u64 *) c->src.ptr = (u64) c->dst.val;
>   

On 32-bit registers are not u64 (the code will never be executed, but 
still).

> +			break;
> +		default:
> +			printk("xchg: op_bytes=%d is not supported.\n", c->op_bytes);
> +			goto cannot_emulate;
> +		}
> +		c->dst.type = OP_NONE;  /* Disable writeback. */
> +		break;
> +	case 0xb8: /* mov r, imm */
>   

0xb8 .. 0xbf. Use DstReg instead of decoding yourself.

> +		c->dst.ptr = (unsigned long *)&c->regs[VCPU_REGS_RAX + (c->b & 0x7)];
> +		goto mov;
>  	case 0x9c: /* pushf */
>  		c->src.val =  (unsigned long) ctxt->eflags;
>  		emulate_push(ctxt);
> @@ -1623,6 +1737,10 @@ special_insn:
>  		DPRINTF("Urk! I don't handle SCAS.\n");
>  		goto cannot_emulate;
>  	case 0xc0 ... 0xc1:
> +		if ((c->d & ModRM) && c->modrm_mod == 3) {
> +			c->dst.bytes = (c->d & ByteOp) ? 1 : c->op_bytes;
> +			c->dst.ptr =  decode_register(c->modrm_rm, c->regs, c->d & ByteOp);
> +		}
>   

Again, fix decode_modrm instead.

>  		emulate_grp2(ctxt);
>  		break;
>  	case 0xc3: /* ret */
> @@ -1660,6 +1778,36 @@ special_insn:
>  		break;
>  	}
>  	case 0xe9: /* jmp rel */
> +		jmp_rel(c, c->src.val);
> +		c->dst.type = OP_NONE; /* Disable writeback. */
> +		break;
> +	case 0xea: /* jmp far */ {
> +		uint32_t eip;
> +		uint16_t sel;
> +
> +		switch (c->op_bytes) {
> +		case 2:
> +			eip = insn_fetch(u16, 2, c->eip);
> +			eip = eip & 0x0000FFFF; /* clear upper 16 bits */
>   

No need, insn_fetch will not sign extend.

> +			break;
> +		case 4:
> +			eip = insn_fetch(u32, 4, c->eip);
> +			break;
> +		default:
> +			DPRINTF("jmp far: Invalid op_bytes\n");
> +			goto cannot_emulate;
> +		}
> +		sel = insn_fetch(u16, 2, c->eip);
> +		if (ctxt->mode == X86EMUL_MODE_REAL)
> +			eip |= (sel << 4);
>   

No, eip doesn't change.

> +		else if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0) {
> +			DPRINTF("jmp far: Failed to load CS descriptor\n");
> +			goto cannot_emulate;
> +		}
> +
> +		c->eip = eip;
> +		break;
> +	}
>  	case 0xeb: /* jmp rel short */
>  		jmp_rel(c, c->src.val);
>  		c->dst.type = OP_NONE; /* Disable writeback. */
> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
> index a71f3aa..fb85da4 100644

The patch should be split into many parts (enhancing emulator 
infrastructure, one patch per insn, vmx changes), but in essence it does 
the right things.


-- 
error compiling committee.c: too many arguments to function


  reply	other threads:[~2008-05-21 13:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-21  9:34 [RFC] Patch - Big real mode emulation Guillaume Thouvenin
2008-05-21 13:59 ` Avi Kivity [this message]
2008-05-21 14:10   ` Avi Kivity
2008-05-22  8:55     ` Guillaume Thouvenin
2008-05-21 17:19   ` Marcelo Tosatti
2008-05-21 15:32 ` Mohammed Gamal
2008-05-21 16:18 ` Marcelo Tosatti
2008-05-22  9:02   ` Guillaume Thouvenin
2008-05-21 23:18 ` Kamble, Nitin A
2008-05-22 22:52   ` Mohammed Gamal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48342AD2.40406@qumranet.com \
    --to=avi@qumranet.com \
    --cc=alex@csgraf.de \
    --cc=anthony@codemonkey.ws \
    --cc=guillaume.thouvenin@ext.bull.net \
    --cc=kvm@vger.kernel.org \
    --cc=m.gamal005@gmail.com \
    --cc=mtosatti@redhat.com \
    --cc=nitin.a.kamble@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox