From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [RFC PATCH] Emulate MOVBE Date: Wed, 10 Apr 2013 11:29:42 +0200 Message-ID: <20130410112942.07dfc167@slackpad> References: <20130409234602.GI5077@pd.tnic> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm@vger.kernel.org, =?ISO-8859-1?Q?J=F6rg_R=F6del?= , "H. Peter Anvin" , x86-ml , kvm@vger-kernel.org To: Borislav Petkov Return-path: Received: from mail.andrep.de ([217.160.17.100]:38723 "EHLO mail.andrep.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750738Ab3DJJju convert rfc822-to-8bit (ORCPT ); Wed, 10 Apr 2013 05:39:50 -0400 In-Reply-To: <20130409234602.GI5077@pd.tnic> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, 10 Apr 2013 01:46:02 +0200 Borislav Petkov wrote: > Hi guys, >=20 > so I was trying to repro tglx's bug in smpboot.c and for some reason, > the most reliable way to trigger it was to boot an 32-bit atom smp > guest in kvm (don't ask :)). >=20 > The problem, however, was that atom wants MOVBE and qemu doesn't > emulate it yet (except Richard's patches which I used in order to be > able to actually even boot a guest). >=20 > However, without hw acceleration, qemu is pretty slow, and waiting fo= r > an smp guest to boot in sw-only emulation is not fun. So, just for > funsies, I decided to give the MOVBE emulation a try. >=20 > Patch is below, 8 core smp atom guest boots fine and in 6-ish seconds > with it. :-) >=20 > I know, I know, it still needs cleaning up and proper rFLAGS handling > but that is for later. For now, I'd very much like to hear whether > this approach even makes sense and if so, what should be improved. >=20 > Thanks, and thanks to Andre and J=F6rg for their help. >=20 > Not-yet-signed-off-by: Borislav Petkov You are missing an (hackish) older patch which enables the MOVBE CPUID bit even if the host does not have it (malformed due to c&p): --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -273,7 +273,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, cpuid_mask(&entry->ecx, 4); /* we support x2apic emulation even if host does not support * it since we emulate x2apic in software */ - entry->ecx |=3D F(X2APIC); + entry->ecx |=3D (F(X2APIC) | F(MOVBE)); break; /* function 2 entries are STATEFUL. That is, repeated cpuid commands * may return different values. This forces us to get_cpu() before >=20 > -- > diff --git a/arch/x86/include/asm/kvm_emulate.h > b/arch/x86/include/asm/kvm_emulate.h index 15f960c06ff7..ae01c765cd77 > 100644 --- a/arch/x86/include/asm/kvm_emulate.h > +++ b/arch/x86/include/asm/kvm_emulate.h > @@ -281,6 +281,7 @@ struct x86_emulate_ctxt { > =20 > /* decode cache */ > u8 twobyte; > + u8 thirdbyte; > u8 b; > u8 intercept; > u8 lock_prefix; > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index a335cc6cde72..0ccff339359d 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -152,6 +152,7 @@ > #define Avx ((u64)1 << 43) /* Advanced Vector Extensions */ > #define Fastop ((u64)1 << 44) /* Use opcode::u.fastop */ > #define NoWrite ((u64)1 << 45) /* No writeback */ > +#define ThreeByte ((u64)1 << 46) /* Three byte opcodes 0F 38 and > 0F 3A */=20 > #define X2(x...) x, x > #define X3(x...) X2(x), x > @@ -3107,6 +3108,34 @@ static int em_mov(struct x86_emulate_ctxt > *ctxt) return X86EMUL_CONTINUE; > } > =20 > +static int em_movbe(struct x86_emulate_ctxt *ctxt) > +{ > + char *valptr =3D ctxt->src.valptr; > + > + switch (ctxt->op_bytes) { > + case 2: > + *(u16 *)valptr =3D swab16(*(u16 *)valptr); > + break; > + case 4: > + *(u32 *)valptr =3D swab32(*(u32 *)valptr); > + > + /* > + * clear upper dword for 32-bit operand size in > 64-bit mode. > + */ > + if (ctxt->mode =3D=3D X86EMUL_MODE_PROT64) > + *((u32 *)valptr + 1) =3D 0x0; > + break; > + case 8: > + *(u64 *)valptr =3D swab64(*(u64 *)valptr); > + break; > + default: > + return X86EMUL_PROPAGATE_FAULT; > + } > + > + memcpy(ctxt->dst.valptr, ctxt->src.valptr, ctxt->op_bytes); > + return X86EMUL_CONTINUE; > +} > + On 04/09/2013 05:03 PM, Borislav Petkov wrote: >=20 > Note to self: this destroys the src operand but it shouldn't. Fix it > tomorrow. >=20 What about this one? *(u16 *)(ctxt->dst.valptr) =3D swab16(*(u16 *)(ctxt->src.valptr)); Respective versions for the other operand sizes, of course, and drop the final memcpy. > static int em_cr_write(struct x86_emulate_ctxt *ctxt) > { > if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val)) > @@ -3974,7 +4003,8 @@ static const struct opcode twobyte_table[256] =3D > { I(ImplicitOps | VendorSpecific, em_sysenter), > I(ImplicitOps | Priv | VendorSpecific, em_sysexit), > N, N, > - N, N, N, N, N, N, N, N, > + I(ModRM | Mov | ThreeByte | VendorSpecific, em_movbe), > + N, N, N, N, N, N, N, In a real world VendorSpecific should be replaced with something more meaningful. Depends on KVMs intention to emulate instructions, actually out of scope for a pure virtualizer. What is the opinion from the KVM folks on this? Shall we start to emulate instructions the host does not provide? In this particular case a relatively simple patch fixes a problem (starting Atom optimized kernels on non-Atom machines). (And if one can believe the AMD Fam16h SWOG [1], PS4^Wfuture AMD processors have MOVBE, so it's not even actually one CPU anymore). > /* 0x40 - 0x4F */ > X16(D(DstReg | SrcMem | ModRM | Mov)), > /* 0x50 - 0x5F */ > @@ -4323,6 +4353,15 @@ done_prefixes: > } > ctxt->d =3D opcode.flags; > =20 > + if (ctxt->d & ThreeByte) { > + ctxt->thirdbyte =3D insn_fetch(u8, ctxt); > + > + if (ctxt->thirdbyte =3D=3D 0xf0) > + ctxt->d |=3D DstReg | SrcMem; > + else > + ctxt->d |=3D DstMem | SrcReg; > + } > + As already agreed on on IRC, we should properly check for the other opcodes in the 0F 38/3A table. Or make it right and implement a real sub-table. Regards, Andre. [1]http://support.amd.com/us/Processor_TechDocs/52128_16h_Software_Opt_= Guide.zip