From: Nitin A Kamble <nitin.a.kamble@intel.com>
To: Amit Shah <amit.shah@redhat.com>
Cc: Alexander Graf <agraf@suse.de>,
"kvm@vger.kernel.org list" <kvm@vger.kernel.org>,
Avi Kivity <avi@redhat.com>, Elsie Wahlig <elsie.wahlig@amd.com>,
Anthony Liguori <anthony@codemonkey.ws>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
Benjamin Serebrin <benjamin.serebrin@amd.com>
Subject: Re: Cross vendor migration ideas
Date: Fri, 14 Nov 2008 15:43:29 -0800 [thread overview]
Message-ID: <1226706209.18741.20.camel@lnitindesktop.sc.intel.com> (raw)
In-Reply-To: <200811141837.07981.amit.shah@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 19221 bytes --]
Amit, Alex,
Please see my comments bellow.
Avi,
Please have a look at the patches, and let me know the parts you think
can be done better.
On Fri, 2008-11-14 at 06:07 -0700, Amit Shah wrote:
> * On Thursday 13 Nov 2008 19:08:14 Alexander Graf wrote:
> > On 13.11.2008, at 05:35, Amit Shah wrote:
> > > * On Wednesday 12 Nov 2008 22:49:16 Alexander Graf wrote:
> > >> On 12.11.2008, at 17:52, Amit Shah wrote:
> > >>> Hi Alex,
> > >>>
> > >>> * On Wednesday 12 Nov 2008 21:09:43 Alexander Graf wrote:
> > >>>> Hi,
> > >>>>
> > >>>> I was thinking a bit about cross vendor migration recently and
> > >>>> since
> > >>>> we're doing open source development, I figured it might be a good
> > >>>> idea
> > >>>> to talk to everyone about this.
> > >>>>
> > >>>> So why are we having a problem?
> > >>>>
> > >>>> In normal operation we don't. If we're running a 32-bit kernel, we
> > >>>> can
> > >>>> use SYSENTER to jump from kernel<->userspace. If we're on a 64-bit
> > >>>> kernel with 64-bit userspace, every CPU supports SYSCALL. At least
> > >>>> Linux is being smart on this and does use exactly these two
> > >>>> capabilities in these two cases.
> > >>>> But if we're running in compat mode (64-bit kernel with 32-bit
> > >>>> userspace), things differ. Intel supports only SYSENTER here, while
> > >>>> AMD only supports SYSCALL. Both can still use int80.
> > >>>>
> > >>>> Operating systems detect usage of SYSCALL or SYSENTER pretty
> > >>>> early on
> > >>>> (Linux does this on vdso). So when we boot up on an Intel machine,
> > >>>> Linux assumes that using SYSENTER in compat mode is fine. Migrating
> > >>>> that machine to an AMD machine breaks this assumption though, since
> > >>>> SYSENTER can't be used in compat mode.
> > >>>> On LInux, this detection is based on the CPU vendor string. If
> > >>>> Linux
> > >>>> finds a "GenuineIntel", SYSENTER is used in compat mode, if it's
> > >>>> "AuthenticAMD", SYSCALL is used and if none of these two is found,
> > >>>> int80 is used.
> > >>>>
> > >>>> I tried modifying the vendor string, removed the "overwrite the
> > >>>> vendor
> > >>>> string with the native string" hack and things look like they work
> > >>>> just fine with Linux.
> > >>>>
> > >>>> Unfortunately right now I don't have a 64-bit Windows installation
> > >>>> around to check if that approach works there too, but if it does
> > >>>> and
> > >>>> no known OS breaks due to the invalid vendor string, we can just
> > >>>> create our own virtual CPU string, no?
> > >>>
> > >>> qemu has an option for that, -cpu qemu64 IIRC. As long as we expose
> > >>> practically correct cpuids and MSRs, this should be fine. I've not
> > >>> tested
> > >>> qemu64 with winxp x64 though. Also, last I knew, winxp x64
> > >>> installation
> > >>> didn't succeed with --no-kvm. qemu by default exposes an AMD CPU
> > >>> type.
> > >>
> > >> I wasn't talking about CPUID features, but the vendor string. Qemu64
> > >> provides the AuthenticAMD string, so we don't run into any issues I'm
> > >> presuming.
> > >
> > > Right -- the thing is, with the default AuthenticAMD string, winp x64
> > > installation fails. That has to be because of some missing cpuids.
> > > That's one
> > > of the drawbacks of exposing a well-known CPU type. I was suggesting
> > > we
> > > should try out the -cpu qemu64 CPU type since it exposes a non-
> > > standard CPU
> > > to see if guests and most userspace programs work fine without any
> > > further
> > > tweaking -- see the 'cons' below for why this might be a problem.
> >
> > I still don't really understand what you're trying to say - qemu64 is
> > the default in KVM right now. You mean winxp64 installation doesn't
> No, the default for KVM is the host CPU type.
Amit, Aliex is correct. the default cpu for kvm is qemu64 not the host.
I have sent the patches to add an options -cpu host. Some of the patches
are gone in, But All the patches are not in yet. Also my patches does
not make the host option as default. I have attached the remaining two
patches.
Alex, can you try these patches with "-cpu host" option and see if you
can get the host vendor string in the guest for AMD box. I have already
tested it on the latest Intel system.
>
> > work as is and we should fix it? This has nothing to do with the
> > migration problems, right?
>
> Solutions shouldn't involve adding known regressions. If our default cpu type
> changes to one that renders some of the OSes we support right now to become
> nonfunctional, such changes won't be accepted. Of course, we can improve the
> qemu64 cpu type to ensure the popular OS types work properly at the least.
>
> > >>> There are pros and cons to expose a custom vendor ID:
> > >>>
> > >>> pros:
> > >>> - We don't need to have all the cpuid features exposed which are
> > >>> expected of a
> > >>> physically available CPU in the market, for example, badly-coded
> > >>> applications
> > >>> might crash if we don't have SSSE3 on a Core2Duo. But badly-coded or
> > >>> not, not
> > >>> exposing what's actually available on every C2D out there is bad.
> > >>>
> > >>> cons:
> > >>> - To expose the "correct" set of feature bits for a known processor,
> > >>> we also
> > >>> need to check the family/model/stepping to support the exact same
> > >>> feature
> > >>> bits that were present in the CPU.
> > >>> - We might not get some optimizations that OSes might have based on
> > >>> CPU type,
> > >>> even if the host CPU qualifies for such optimizations
> > >>> - Standard programs like benchmarking tools, etc., might fail if
> > >>> they depend
> > >>> on the vendor string for their functionality
> > >>>
> > >>> For 32-bit guests, I think exposing a pentium4 or Athlon CPU type
> > >>> should be
> > >>> fine. For 64-bit guests, the newer the better.
> > >>
> > >> Well, we could create different CPU definitions:
> > >>
> > >> - migration safe (do what is safe for migration)
> > >
> > > There are multiple ways of approaching this: peg to a least-known
> > > good CPU
> > > type, all of whose instructions will work on processors from both
> > > the major
> > > vendors. However, you never know how the server pools change and
> > > you'd want
> > > to upgrade the CPU type once you know the CPUs that are installed in
> > > servers.
> > > This has to be dynamic and the management application has to take
> > > care of
> > > exposing a CPU that's of a "safe" type for the particular server
> > > pool. We
> > > have to provide ways to mask off CPUID bits as requested by the
> > > management
> > > application. (Each server sends its cpuid to the management
> > > application,
> > > which calculates the safest bits and then conveys this to each
> > > server before
> > > starting a VM.)
> >
> > IMHO we shouldn't really start to be smart here. There's only so much
> > benefit in using the least common dominator between all CPUs in the
> > datacenter vs. using the least common dominator between all possible
> > CPUs. You'll basically end up enabling some newer SSE instructions.
>
> I'm just saying the management application will do it. So it'll be local to
> the server pool the management app caters to. Not a common denominator for
> all deployments.
>
> > So I don't think we need to go through the hassle of making this
> > dynamic. If you want to migrate your machines - use the migrate
> > preset. That won't give you the 150% speed boost on video encoding,
> > but should not really be any slower on normal workloads. It does make
> > things a lot more transparent to us and the admin of a network though,
> > because you know what you'll end up with "-cpu migration".
>
> We hardly know what uses KVM will be put to. Server virtualisation, desktop
> virtualisation, combination, what not. If we provide with the flexibility to
> the admin to tune as necessary, it's not a bad option at all. All the
> userspace needs is one tool that can calculate the max. features supported by
> the current CPU and send it over to the management app when asked for. The
> management app does the rest. KVM is not involved at all.
>
> > >> - CPU specific (like a Core2Duo, necessary to run Mac OS X)
> > >
> > > This doesn't need any more work -- we already have the ability to
> > > select CPU
> > > types. If the management application has knowledge of the kind of OS
> > > being
> > > installed in a VM (which these days is true), exposing a Core2Duo
> > > for a
> > > Mac-based OS isn't difficult.
> >
> > There is no sysenter emulation for IA-32e on AMD yet, right? That's
> > the only issue I see here and your emulation patch should address that.
>
> As I've mentioned before, I've not yet been able to test my patch because I've
> not found the sysenter/sysexit calls being used at all. It's included at the
> end of this mail for review; hopefully someone finds a use-case and we can
> take it forward.
>
> > >> - host (fastest possible, but no migration)
> > >
> > > This should be the default.
> >
> > I'm not sure. Either host or migration should be the default. This
>
> I'm suggesting that 'host' should be default. Where do we disagree?
Well I believe the corss-architecture migration is not a common case. So
the host should be default in IMHO too. But I will not press for it.
>
> > actually depends on the workload you have on KVM. For servers you'll
> > probably want to have migration be the default. For desktop usage it's
> > host. I can't think of a way we can be smart about that on the KVM
> > level.
>
> For individual runs from the command line, I'd prefer the host to be the
> default. For a wider deployment, the management app will set the defaults as
> necessary (admin-chosen).
>
> > >> I don't think we could find one definition that fits all, so the user
> > >> would have to define what the usage pattern will be.
> > >>
> > >>>> I'd love to hear comments and suggestions on this and hope we'll
> > >>>> end
> > >>>> up in a fruitful discussion on how to improve the current
> > >>>> situation.
> > >>>
> > >>> I have a patch ready for emulating sysenter/sysexit on AMD systems
> > >>> (needs
> > >>> testing). Patching the guest was an option that was discouraged; I
> > >>> had a hack
> > >>> ready but it was quickly shelved (again, untested).
> > >>
> > >> That sounds useful for misbehaving guests or cases I haven't thought
> > >> of yet. Are you sure you're intercepting the SYSENTER MSRs on AMD, so
> > >> you don't end up only getting 32 bits?
> > >
> > > Can you elaborate?
> >
> > When you write to MSR_IA32_SYSENTER_EIP on AMD, that MSR will be
> > directly passed through to the hardware (search for that MSR in
> > svm.c). This is because SVM automatically writes the SYSENTER MSRs to
> > the SYSENTER fields in the VMCB.
>
> My patch just handles the case when a sysenter is attempted on a system which
> doesn't have that instruction. So I just emulate it. Accessing the MSR and
> setting values is done at boot-time by the OS, and any migrations at that
> instant is a corner case and not too critical.
>
> Now, the patch.
>
> From e1b760d8e596811081c282484621b49c674f1c22 Mon Sep 17 00:00:00 2001
> From: Amit Shah <amit.shah@redhat.com>
> Date: Wed, 12 Nov 2008 11:31:05 +0530
> Subject: [PATCH] KVM: SVM: Emulate SYSENTER/SYSEXIT on AMD processors
>
> This patch enables emulation of the sysenter/sysexit instructions in
> AMD long mode. This will enable a guest started on an Intel machine to
> be migrated to an AMD machine.
>
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
> arch/x86/kvm/svm.c | 13 ++++
> arch/x86/kvm/x86_emulate.c | 137
> +++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 149 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index f0ad4d4..4e6e1dc 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1155,6 +1155,19 @@ static int vmmcall_interception(struct vcpu_svm *svm,
> struct kvm_run *kvm_run)
> static int invalid_op_interception(struct vcpu_svm *svm,
> struct kvm_run *kvm_run)
> {
> + /*
> + * If we're running in long mode on x86_64, check if we can
> + * emulate sysenter / sysexit
> + */
> + if (!is_long_mode(&svm->vcpu))
> + goto out;
> +
> + if (emulate_instruction(&svm->vcpu, NULL, 0, 0, 0) == EMULATE_DONE) {
> + /* We could emulate it. */
> + return 1;
> + }
> +
> + out:
> kvm_queue_exception(&svm->vcpu, UD_VECTOR);
> return 1;
> }
> diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
> index 8f60ace..c8afd20 100644
> --- a/arch/x86/kvm/x86_emulate.c
> +++ b/arch/x86/kvm/x86_emulate.c
> @@ -205,7 +205,9 @@ static u16 twobyte_table[256] = {
> ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0,
> /* 0x30 - 0x3F */
> - ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + ImplicitOps, 0, ImplicitOps, 0,
> + ImplicitOps, ImplicitOps, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0,
> /* 0x40 - 0x47 */
> DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
> DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
> @@ -305,8 +307,11 @@ static u16 group2_table[] = {
> };
>
> /* EFLAGS bit definitions. */
> +#define EFLG_VM (1<<17)
> +#define EFLG_RF (1<<16)
> #define EFLG_OF (1<<11)
> #define EFLG_DF (1<<10)
> +#define EFLG_IF (1<<9)
> #define EFLG_SF (1<<7)
> #define EFLG_ZF (1<<6)
> #define EFLG_AF (1<<4)
> @@ -1959,6 +1964,136 @@ twobyte_insn:
> rc = X86EMUL_CONTINUE;
> c->dst.type = OP_NONE;
> break;
> + case 0x34: { /* sysenter */
> + /* Vol 2b */
> + unsigned long cr0 = ctxt->vcpu->arch.cr0;
> + struct kvm_segment cs, ss;
> + u64 data;
> +
> + if (cr0 & X86_CR0_PE) {
> + kvm_inject_gp(ctxt->vcpu, 0);
> + goto cannot_emulate;
> + }
> +
> + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &data);
> + if (!(data & 0xFFFC)) {
> + kvm_inject_gp(ctxt->vcpu, 0);
> + goto cannot_emulate;
> + }
> +
> + ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
> +
> + kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
> + cs.selector = (__u16) data;
> + cs.base = 0;
> + cs.limit = 0xfffff;
> + cs.g = 1;
> + cs.s = 1;
> + cs.type = 0x0b;
> + cs.db = 1;
> + cs.dpl = 0;
> + cs.selector &= ~SELECTOR_RPL_MASK;
> + cs.present = 1;
> + /* The CPL should be set to 0 */
> +
> + if (ctxt->mode == X86EMUL_MODE_PROT64) {
> + cs.l = 1;
> + cs.limit = 0xffffffff;
> + }
> +
> + ss.selector = cs.selector + 8;
> + ss.base = 0;
> + ss.limit = 0xfffff;
> + ss.g = 1;
> + ss.s = 1;
> + ss.type = 0x03;
> + ss.db = 1;
> + ss.dpl = 0;
> + ss.selector &= ~SELECTOR_RPL_MASK;
> + ss.present = 1;
> + if (ctxt->mode == X86EMUL_MODE_PROT64) {
> + ss.limit = 0xffffffff;
> + }
> +
> + kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
> + kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
> +
> + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_EIP, &data);
> + c->eip = data;
> +
> + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, &data);
> + c->regs[VCPU_REGS_RSP] = data;
> +
> + goto writeback;
> + break;
> + }
> + case 0x35: { /* sysexit */
> + /* Vol 2b */
> + u64 data;
> + unsigned long cr0 = ctxt->vcpu->arch.cr0;
> + struct kvm_segment cs, ss;
> +
> + if (cr0 & X86_CR0_PE) {
> + kvm_inject_gp(ctxt->vcpu, 0);
> + goto cannot_emulate;
> + }
> +
> + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &data);
> + if (!(data & 0xFFFC) ||
> + ((ctxt->mode == X86EMUL_MODE_PROT64) && !data)) {
> + kvm_inject_gp(ctxt->vcpu, 0);
> + goto cannot_emulate;
> + }
> +
> + /* Check if CPL is 0. If not, inject_gp */
> +
> + kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
> + cs.selector = (u16)(data +
> + (ctxt->mode == X86EMUL_MODE_PROT64 ? 32 : 16));
> + cs.base = 0;
> + cs.limit = 0xfffff;
> + cs.g = 1;
> + cs.s = 1;
> + cs.type = 0x0b;
> + cs.db = 1;
> + cs.dpl = 3;
> + cs.selector |= SELECTOR_RPL_MASK;
> + cs.present = 1;
> + cs.l = 0; /* For return to compatibility mode */
> + /* The CPL should be set to 3 */
> +
> + if (ctxt->mode == X86EMUL_MODE_PROT64) {
> + cs.l = 1;
> + /* The manual doesn't talk about CS limit */
> + }
> +
> + ss.selector = cs.selector +
> + (ctxt->mode == X86EMUL_MODE_PROT64 ? 16 : 8);
> + ss.base = 0;
> + ss.limit = 0xfffff;
> + ss.g = 1;
> + ss.s = 1;
> + ss.type = 0x03;
> + ss.db = 1;
> + ss.dpl = 3;
> + ss.selector |= SELECTOR_RPL_MASK;
> + ss.present = 1;
> + if (ctxt->mode == X86EMUL_MODE_PROT64) {
> + ss.base = 0;
> + ss.limit = 0xffffffff;
> + }
> +
> + kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS);
> + kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS);
> +
> + c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX];
> + c->regs[VCPU_REGS_RSP] = c->regs[VCPU_REGS_RCX];
> +
> + /* TODO: Check if rip and rsp are canonical. inject_gp() if not */
> +
> + goto writeback;
> + break;
> + }
> case 0x40 ... 0x4f: /* cmov */
> c->dst.val = c->dst.orig_val = c->src.val;
> if (!test_cc(c->b, ctxt->eflags))
> --
> 1.5.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thanks & Regards,
Nitin
Open Source Technology Center, Intel Corporation
-----------------------------------------------------------------
The mind is like a parachute; it works much better when it's open
[-- Attachment #2: kvm_kernel_cpuid_patch3.diff --]
[-- Type: text/x-patch, Size: 2656 bytes --]
commit 70e4e65bc591eb9cf25c1cbc0d16b2cbdb089a6f
Author: Nitin A Kamble <nitin.a.kamble@intel.com>
Date: Wed Nov 5 16:17:46 2008 -0800
Change the ioctl KVM_GET_SUPPORTED_CPUID, such that it will return the
no of entries in the list when requested no of entries (nent) is 0.
Also add another KVM_CHECK_EXTENSION, KVM_CAP_CPUID_SIZER to determine
if the running kernel supports the above changed ABI.
Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09e6c56..e50db11 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -86,7 +86,7 @@
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
#define KVM_MIN_FREE_MMU_PAGES 5
#define KVM_REFILL_PAGES 25
-#define KVM_MAX_CPUID_ENTRIES 40
+#define KVM_MAX_CPUID_ENTRIES 100
#define KVM_NR_FIXED_MTRR_REGION 88
#define KVM_NR_VAR_MTRR 8
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf7461b..52e6207 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -969,6 +969,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_NOP_IO_DELAY:
case KVM_CAP_MP_STATE:
case KVM_CAP_SYNC_MMU:
+ case KVM_CAP_CPUID_SIZER:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -1303,10 +1304,14 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
{
struct kvm_cpuid_entry2 *cpuid_entries;
int limit, nent = 0, r = -E2BIG;
+ int sizer = 0;
u32 func;
- if (cpuid->nent < 1)
- goto out;
+ if (cpuid->nent == 0) {
+ sizer = 1;
+ cpuid->nent = KVM_MAX_CPUID_ENTRIES;
+ }
+
r = -ENOMEM;
cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
if (!cpuid_entries)
@@ -1327,9 +1332,11 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
do_cpuid_ent(&cpuid_entries[nent], func, 0,
&nent, cpuid->nent);
r = -EFAULT;
- if (copy_to_user(entries, cpuid_entries,
+ if (!sizer) {
+ if (copy_to_user(entries, cpuid_entries,
nent * sizeof(struct kvm_cpuid_entry2)))
- goto out_free;
+ goto out_free;
+ }
cpuid->nent = nent;
r = 0;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 44fd7fa..d4cb8b1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -392,6 +392,9 @@ struct kvm_trace_rec {
#endif
#define KVM_CAP_IOMMU 18
#define KVM_CAP_NMI 19
+#define KVM_CAP_CPUID_SIZER 20 /* return of 1 means the KVM_GET_SUPPORTED_CPUID */
+ /* ioctl will return the size of list when input */
+ /* list size (nent) is 0 */
/*
* ioctls for VM fds
[-- Attachment #3: kvm_userspace_patch10.diff --]
[-- Type: text/x-patch, Size: 11297 bytes --]
diff --git a/libkvm/libkvm-x86.c b/libkvm/libkvm-x86.c
index a8cca15..7aafa20 100644
--- a/libkvm/libkvm-x86.c
+++ b/libkvm/libkvm-x86.c
@@ -379,6 +379,34 @@ int kvm_set_msrs(kvm_context_t kvm, int vcpu, struct kvm_msr_entry *msrs,
return r;
}
+/*
+ * Returns available host cpuid entries. User must free.
+ */
+struct kvm_cpuid2 *kvm_get_host_cpuid_entries(kvm_context_t kvm)
+{
+ struct kvm_cpuid2 sizer, *cpuids;
+ int r, e;
+
+ sizer.nent = 0;
+ r = ioctl(kvm->fd, KVM_GET_SUPPORTED_CPUID, &sizer);
+ if (r == -1 && errno != E2BIG)
+ return NULL;
+ cpuids = malloc(sizeof *cpuids + sizer.nent * sizeof *cpuids->entries);
+ if (!cpuids) {
+ errno = ENOMEM;
+ return NULL;
+ }
+ cpuids->nent = sizer.nent;
+ r = ioctl(kvm->fd, KVM_GET_SUPPORTED_CPUID, cpuids);
+ if (r == -1) {
+ e = errno;
+ free(cpuids);
+ errno = e;
+ return NULL;
+ }
+ return cpuids;
+}
+
static void print_seg(FILE *file, const char *name, struct kvm_segment *seg)
{
fprintf(stderr,
@@ -458,9 +486,9 @@ __u64 kvm_get_cr8(kvm_context_t kvm, int vcpu)
}
int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent,
- struct kvm_cpuid_entry *entries)
+ struct kvm_cpuid_entry2 *entries)
{
- struct kvm_cpuid *cpuid;
+ struct kvm_cpuid2 *cpuid;
int r;
cpuid = malloc(sizeof(*cpuid) + nent * sizeof(*entries));
@@ -469,7 +497,7 @@ int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent,
cpuid->nent = nent;
memcpy(cpuid->entries, entries, nent * sizeof(*entries));
- r = ioctl(kvm->vcpu_fd[vcpu], KVM_SET_CPUID, cpuid);
+ r = ioctl(kvm->vcpu_fd[vcpu], KVM_SET_CPUID2, cpuid);
free(cpuid);
return r;
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 423ce31..f84d524 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -27,6 +27,9 @@ typedef struct kvm_context *kvm_context_t;
struct kvm_msr_list *kvm_get_msr_list(kvm_context_t);
int kvm_get_msrs(kvm_context_t, int vcpu, struct kvm_msr_entry *msrs, int n);
int kvm_set_msrs(kvm_context_t, int vcpu, struct kvm_msr_entry *msrs, int n);
+struct kvm_cpuid2 *kvm_get_host_cpuid_entries(kvm_context_t);
+void get_host_cpuid_entry(uint32_t function, uint32_t index,
+ struct kvm_cpuid_entry2 * e);
#endif
/*!
@@ -374,7 +377,7 @@ int kvm_guest_debug(kvm_context_t, int vcpu, struct kvm_debug_guest *dbg);
* \return 0 on success, or -errno on error
*/
int kvm_setup_cpuid(kvm_context_t kvm, int vcpu, int nent,
- struct kvm_cpuid_entry *entries);
+ struct kvm_cpuid_entry2 *entries);
/*!
* \brief Setting the number of shadow pages to be allocated to the vm
diff --git a/qemu/qemu-kvm-x86.c b/qemu/qemu-kvm-x86.c
index bf62e18..b776ebf 100644
--- a/qemu/qemu-kvm-x86.c
+++ b/qemu/qemu-kvm-x86.c
@@ -21,6 +21,7 @@
#define MSR_IA32_TSC 0x10
static struct kvm_msr_list *kvm_msr_list;
+static struct kvm_cpuid2 *kvm_host_cpuid_entries;
extern unsigned int kvm_shadow_memory;
extern kvm_context_t kvm_context;
static int kvm_has_msr_star;
@@ -52,11 +53,17 @@ int kvm_arch_qemu_create_context(void)
kvm_msr_list = kvm_get_msr_list(kvm_context);
if (!kvm_msr_list)
- return -1;
+ return -1;
+
for (i = 0; i < kvm_msr_list->nmsrs; ++i)
- if (kvm_msr_list->indices[i] == MSR_STAR)
- kvm_has_msr_star = 1;
- return 0;
+ if (kvm_msr_list->indices[i] == MSR_STAR)
+ kvm_has_msr_star = 1;
+
+ kvm_host_cpuid_entries = kvm_get_host_cpuid_entries(kvm_context);
+ if (!kvm_host_cpuid_entries)
+ return -1;
+
+ return 0;
}
static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index,
@@ -476,13 +483,61 @@ static void host_cpuid(uint32_t function, uint32_t *eax, uint32_t *ebx,
*edx = vec[3];
}
+void get_host_cpuid_entry(uint32_t function, uint32_t index,
+ struct kvm_cpuid_entry2 * e)
+{
+ int i;
+ struct kvm_cpuid_entry2 *entries;
+
+ memset(e, 0, (sizeof *e));
+ e->function = function;
+ e->index = index;
+
+ if (!kvm_host_cpuid_entries)
+ return;
+
+ entries = kvm_host_cpuid_entries->entries;
+
+ for (i=0; i<kvm_host_cpuid_entries->nent; i++) {
+ struct kvm_cpuid_entry2 *ent = &entries[i];
+ if (ent->function != function)
+ continue;
+ if ((ent->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) &&
+ (ent->index != index))
+ continue;
+ if ((ent->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) &&
+ !(ent->flags & KVM_CPUID_FLAG_STATE_READ_NEXT))
+ continue;
+
+ memcpy(e, ent, sizeof (*e));
+
+ if (ent->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) {
+ int j;
+ ent->flags &= ~KVM_CPUID_FLAG_STATE_READ_NEXT;
+ for (j=i+1; ; j=(j+1)%(kvm_host_cpuid_entries->nent)) {
+ struct kvm_cpuid_entry2 *entj = &entries[j];
+ if (entj->function == ent->function) {
+ entj->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
+ break;
+ }
+ }
+ }
+ break;
+ }
+}
-static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function,
- CPUState *env)
+static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function,
+ uint32_t index, CPUState *env)
{
+ if (env->cpuid_host_cpu) {
+ get_host_cpuid_entry(function, index, e);
+ return;
+ }
+ e->function = function;
+ e->index = index;
env->regs[R_EAX] = function;
+ env->regs[R_ECX] = index;
qemu_kvm_cpuid_on_env(env);
- e->function = function;
e->eax = env->regs[R_EAX];
e->ebx = env->regs[R_EBX];
e->ecx = env->regs[R_ECX];
@@ -521,6 +576,11 @@ static void do_cpuid_ent(struct kvm_cpuid_entry *e, uint32_t function,
if (function == 1)
e->ecx |= (1u << 31);
+ if ((function == 4) || (function == 0xb))
+ e->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+ else
+ e->flags = 0;
+
// 3dnow isn't properly emulated yet
if (function == 0x80000001)
e->edx &= ~0xc0000000;
@@ -559,17 +619,30 @@ static int get_para_features(kvm_context_t kvm_context)
int kvm_arch_qemu_init_env(CPUState *cenv)
{
- struct kvm_cpuid_entry cpuid_ent[100];
+ struct kvm_cpuid_entry2 *cpuid_ent, entry, *e;
+ int cpuid_nent = 0, malloc_size = 0;
+ CPUState copy;
+ uint32_t i, limit;
#ifdef KVM_CPUID_SIGNATURE
- struct kvm_cpuid_entry *pv_ent;
+ struct kvm_cpuid_entry2 *pv_ent;
uint32_t signature[3];
+
+ malloc_size += 2;
#endif
- int cpuid_nent = 0;
- CPUState copy;
- uint32_t i, limit;
copy = *cenv;
+ if (copy.cpuid_host_cpu) {
+ if (!kvm_host_cpuid_entries)
+ return -EINVAL;
+ malloc_size += kvm_host_cpuid_entries->nent;
+ } else
+ malloc_size += 100;
+
+ cpuid_ent = malloc(malloc_size * sizeof (struct kvm_cpuid_entry2));
+ if (!cpuid_ent)
+ return -ENOMEM;
+
#ifdef KVM_CPUID_SIGNATURE
/* Paravirtualization CPUIDs */
memcpy(signature, "KVMKVMKVM", 12);
@@ -587,21 +660,48 @@ int kvm_arch_qemu_init_env(CPUState *cenv)
pv_ent->eax = get_para_features(kvm_context);
#endif
- copy.regs[R_EAX] = 0;
- qemu_kvm_cpuid_on_env(©);
- limit = copy.regs[R_EAX];
+ limit = copy.cpuid_level;
+ for (i=0; ((i<2) && (i<limit)) ; i++) {
+ e = &cpuid_ent[cpuid_nent++];
+ do_cpuid_ent(e, i, 0, ©);
+ }
+
+ if (limit >= 2) { /* get the multiple stateful leaf values */
+ do_cpuid_ent(&entry, 2, 0, ©);
+ cpuid_ent[cpuid_nent++] = entry;
+ for (i = 1; i<(entry.eax & 0xff); i++) {
+ e = &cpuid_ent[cpuid_nent++];
+ do_cpuid_ent(e, 2, 0, ©);
+ }
+ }
- for (i = 0; i <= limit; ++i)
- do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©);
+ for (i = 3; i <= limit; i++) {
+ e = &cpuid_ent[cpuid_nent++];
+ do_cpuid_ent(e, i, 0, ©);
+ }
- copy.regs[R_EAX] = 0x80000000;
- qemu_kvm_cpuid_on_env(©);
- limit = copy.regs[R_EAX];
+ if (limit >= 4) { /* get the per index values */
+ int i = 1;
+ do {
+ e = &cpuid_ent[cpuid_nent++];
+ do_cpuid_ent(e, 4, i++, ©);
+ } while(e->eax & 0x1f); /* until the last index */
+ }
+
+ if (limit >= 0xb) { /* get the per index values */
+ int i = 1;
+ do {
+ e = &cpuid_ent[cpuid_nent++];
+ do_cpuid_ent(e, 0xb, i++, ©);
+ } while(e->ecx & 0xff00); /* until the last index */
+ }
+ limit = copy.cpuid_xlevel;
for (i = 0x80000000; i <= limit; ++i)
- do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, ©);
+ do_cpuid_ent(&cpuid_ent[cpuid_nent++], i, 0, ©);
kvm_setup_cpuid(kvm_context, cenv->cpu_index, cpuid_nent, cpuid_ent);
+ free(cpuid_ent);
return 0;
}
diff --git a/qemu/target-i386/cpu.h b/qemu/target-i386/cpu.h
index 11bc2c1..42d646a 100644
--- a/qemu/target-i386/cpu.h
+++ b/qemu/target-i386/cpu.h
@@ -612,6 +612,7 @@ typedef struct CPUX86State {
uint32_t cpuid_ext2_features;
uint32_t cpuid_ext3_features;
uint32_t cpuid_apic_id;
+ uint32_t cpuid_host_cpu;
#ifdef USE_KQEMU
int kqemu_enabled;
diff --git a/qemu/target-i386/helper.c b/qemu/target-i386/helper.c
index 68efd4d..c23e16e 100644
--- a/qemu/target-i386/helper.c
+++ b/qemu/target-i386/helper.c
@@ -152,6 +152,9 @@ typedef struct x86_def_t {
static x86_def_t x86_defs[] = {
#ifdef TARGET_X86_64
{
+ .name = "host",
+ },
+ {
.name = "qemu64",
.level = 2,
.vendor1 = CPUID_VENDOR_AMD_1,
@@ -405,10 +408,59 @@ void x86_cpu_list (FILE *f, int (*cpu_fprintf)(FILE *f, const char *fmt, ...))
(*cpu_fprintf)(f, "x86 %16s\n", x86_defs[i].name);
}
+int fill_x86_defs_for_host(CPUX86State *env, x86_def_t * def)
+{
+ struct kvm_cpuid_entry2 e;
+
+ get_host_cpuid_entry(0, 0, &e);
+ env->cpuid_level = e.eax;
+ env->cpuid_vendor1 = e.ebx;
+ env->cpuid_vendor2 = e.ecx;
+ env->cpuid_vendor3 = e.edx;
+
+ get_host_cpuid_entry(1, 0, &e);
+ env->cpuid_version = e.eax;
+ env->cpuid_features = e.edx;
+ env->cpuid_ext_features = e.ecx;
+
+ get_host_cpuid_entry(0x80000000, 0, &e);
+ env->cpuid_xlevel = e.eax;
+
+ get_host_cpuid_entry(0x80000001, 0, &e);
+ env->cpuid_ext3_features = e.ecx;
+ env->cpuid_ext2_features = e.edx;
+
+ get_host_cpuid_entry(0x80000002, 0, &e);
+ env->cpuid_model[0] = e.eax;
+ env->cpuid_model[1] = e.ebx;
+ env->cpuid_model[2] = e.ecx;
+ env->cpuid_model[3] = e.edx;
+
+ get_host_cpuid_entry(0x80000003, 0, &e);
+ env->cpuid_model[4] = e.eax;
+ env->cpuid_model[5] = e.ebx;
+ env->cpuid_model[6] = e.ecx;
+ env->cpuid_model[7] = e.edx;
+
+ get_host_cpuid_entry(0x80000004, 0, &e);
+ env->cpuid_model[8] = e.eax;
+ env->cpuid_model[9] = e.ebx;
+ env->cpuid_model[10] = e.ecx;
+ env->cpuid_model[11] = e.edx;
+
+ return 0;
+}
+
static int cpu_x86_register (CPUX86State *env, const char *cpu_model)
{
x86_def_t def1, *def = &def1;
+ if (strcmp(cpu_model, "host") == 0) {
+ env->cpuid_host_cpu = 1;
+ fill_x86_defs_for_host(env, def);
+ return 0;
+ } /* else follow through */
+ env->cpuid_host_cpu = 0;
if (cpu_x86_find_by_name(def, cpu_model) < 0)
return -1;
if (def->vendor1) {
next prev parent reply other threads:[~2008-11-14 23:43 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-12 15:39 Cross vendor migration ideas Alexander Graf
2008-11-12 15:45 ` Anthony Liguori
2008-11-12 15:50 ` Alexander Graf
2008-11-12 15:59 ` Anthony Liguori
2008-11-13 0:02 ` Skywing
2008-11-13 1:48 ` Serebrin, Benjamin (Calendar)
2008-11-15 13:03 ` Andi Kleen
2008-11-15 16:39 ` Alexander Graf
2008-11-15 17:37 ` Andi Kleen
2008-11-16 15:36 ` Avi Kivity
2008-11-17 11:09 ` Andi Kleen
2008-11-17 11:59 ` Avi Kivity
2008-11-15 17:38 ` Glauber Costa
2008-11-16 14:58 ` Avi Kivity
2008-11-13 10:16 ` Avi Kivity
2008-11-12 20:06 ` Andi Kleen
2008-11-12 20:52 ` Alexander Graf
2008-11-13 10:20 ` Avi Kivity
2008-11-12 16:52 ` Amit Shah
2008-11-12 17:19 ` Alexander Graf
2008-11-13 4:35 ` Amit Shah
2008-11-13 13:38 ` Alexander Graf
2008-11-14 13:07 ` Amit Shah
2008-11-14 23:43 ` Nitin A Kamble [this message]
2008-11-17 10:07 ` Amit Shah
-- strict thread matches above, loose matches on Subject: below --
2008-11-16 0:23 Skywing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1226706209.18741.20.camel@lnitindesktop.sc.intel.com \
--to=nitin.a.kamble@intel.com \
--cc=agraf@suse.de \
--cc=amit.shah@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=benjamin.serebrin@amd.com \
--cc=elsie.wahlig@amd.com \
--cc=jun.nakajima@intel.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox