From: "Radim Krčmář" <rkrcmar@redhat.com>
To: "Gabriel L. Somlo" <gsomlo@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Joerg Roedel <joro@8bytes.org>,
kvm@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests
Date: Thu, 16 Mar 2017 16:35:18 +0100 [thread overview]
Message-ID: <20170316153517.GL14081@potion> (raw)
In-Reply-To: <20170316145819.GC4085@HEDWIG.INI.CMU.EDU>
2017-03-16 10:58-0400, Gabriel L. Somlo:
> On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote:
> > > After studying your patch a bit more carefully (sorry, it's crazy
> > > around here right now :) ) I realized you're simply trying to
> > > (selectively) decide when to exit L1 and emulate as NOP vs. when to
> > > just allow L1 to execute MONITOR & MWAIT natively.
> > >
> > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are
> > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT
> > > natively was one of the options Alex Graf and Rene Rebe used back in
> > > the very early days of OS X on QEMU, at the time I got involved with
> > > that project. Here's part of an out of tree patch against 3.4 which did
> > > just that, and worked as far as I remember on *any* MWAIT capable
> > > intel chip I had access to back in 2010:
> > >
> > > ##############################################################################
> > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27
> > > ##############################################################################
> > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c
> > > --- linux-3.4/arch/x86/kvm/cpuid.c 2012-05-20 18:29:13.000000000 -0400
> > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c 2012-10-09 11:42:59.921215750 -0400
> > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid
> > > f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > > F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp |
> > > 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
> > > /* cpuid 1.ecx */
> > > const u32 kvm_supported_word4_x86_features =
> > > - F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> > > + F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ |
> > > 0 /* DS-CPL, VMX, SMX, EST */ |
> > > 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> > > F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
> > > 0 /* Reserved, DCA */ | F(XMM4_1) |
> > > F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c
> > > --- linux-3.4/arch/x86/kvm/svm.c 2012-05-20 18:29:13.000000000 -0400
> > > +++ linux-3.4-mac/arch/x86/kvm/svm.c 2012-10-09 11:44:41.598997481 -0400
> > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s
> > > set_intercept(svm, INTERCEPT_VMSAVE);
> > > set_intercept(svm, INTERCEPT_STGI);
> > > set_intercept(svm, INTERCEPT_CLGI);
> > > set_intercept(svm, INTERCEPT_SKINIT);
> > > set_intercept(svm, INTERCEPT_WBINVD);
> > > - set_intercept(svm, INTERCEPT_MONITOR);
> > > - set_intercept(svm, INTERCEPT_MWAIT);
> > > set_intercept(svm, INTERCEPT_XSETBV);
> > >
> > > control->iopm_base_pa = iopm_base;
> > > control->msrpm_base_pa = __pa(svm->msrpm);
> > > control->int_ctl = V_INTR_MASKING_MASK;
> > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c
> > > --- linux-3.4/arch/x86/kvm/vmx.c 2012-05-20 18:29:13.000000000 -0400
> > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c 2012-10-09 11:42:59.925215977 -0400
> > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls
> > > nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high);
> > > nested_vmx_procbased_ctls_low = 0;
> > > nested_vmx_procbased_ctls_high &=
> > > CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING |
> > > CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING |
> > > - CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING |
> > > + CPU_BASED_CR3_LOAD_EXITING |
> > > CPU_BASED_CR3_STORE_EXITING |
> > > #ifdef CONFIG_X86_64
> > > CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING |
> > > #endif
> > > CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
> > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru
> > > CPU_BASED_CR3_LOAD_EXITING |
> > > CPU_BASED_CR3_STORE_EXITING |
> > > CPU_BASED_USE_IO_BITMAPS |
> > > CPU_BASED_MOV_DR_EXITING |
> > > CPU_BASED_USE_TSC_OFFSETING |
> > > - CPU_BASED_MWAIT_EXITING |
> > > - CPU_BASED_MONITOR_EXITING |
> > > CPU_BASED_INVLPG_EXITING |
> > > CPU_BASED_RDPMC_EXITING;
> > >
> > > opt = CPU_BASED_TPR_SHADOW |
> > > CPU_BASED_USE_MSR_BITMAPS |
> > >
> > > If all you're trying to do is (selectively) revert to this behavior,
> > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly
> > > confused at this point :)
> >
> > Yes. Me too. Want to try that other patch and see what happens?
>
> You mean the old 3.4 patch against current KVM ? I'll try to do that,
> might take me a while :)
Michael's patch already did most of that, you just need to add
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index efde6cc50875..b12f07d4ce17 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
const u32 kvm_cpuid_1_ecx_x86_features =
/* NOTE: MONITOR (and MWAIT) are emulated as NOP,
* but *not* advertised to guests via CPUID ! */
- F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
+ F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ |
0 /* DS-CPL, VMX, SMX, EST */ |
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
Note: this will never be upstream, because mwait isn't what we want by
default. :)
>> > Back in 2010, running MWAIT in L>=1 behaved 100% exactly like a NOP,
>> > didn't power down the physical CPU, just immediately moved on to the
>> > next instruction. As such, there was no power saving and no
>> > opportunity to yield to another L0 thread either, unlike with NOP
>> > emulation at L0.
>> >
>> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now
>> > doing something smarter than just acting as a guest-mode NOP) ?
>> >
>> > Thanks,
>> > --Gabriel
>>
>> Interesting. What it seems to say is this:
>>
>> MWAIT. Behavior of the MWAIT instruction (which always causes an invalid-
>> opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT
>> exiting” VM-execution control:
>> — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit
>> (see Section 22.1.3).
>> — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if
>> any of the following is true: (1) the “interrupt-window exiting” VM-execution
>> control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1.
>> — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window
>> exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT
>> does not cause the processor to enter an implementation-dependent
>> optimized state; instead, control passes to the instruction following the
>> MWAIT instruction.
>>
>>
>> And since interrupt-window exiting is 0 most of the time for KVM,
>> I would expect MWAIT to behave normally.
>
> The intel manual said the same thing back in 2010 as well. However,
> regardless of how any flags were set, interrupt-window exiting or not,
> "normal" L1 MWAIT behavior was that it woke up immediately regardless.
> Remember, never going to sleep is still correct ("normal" ?) behavior
> per the ISA definition of MWAIT :)
I'll write a simple kvm-unit-test to better understand why it is broken
for you ...
> Also, when I tested your patch on the macbook air (where it worked),
> not only was the host reporting 400% CPU for qemu (which is to be
> expected), but the thermal fan/cooling thing also shifted up into high
> gear, which means the physical CPU got hot, which it shouldn't have if
> the guest-mode MWAIT actually did put the host CPU into low power.
I tested MWAIT with basically the same kernel patch and the qemu patch
with Linux guest on Haswell and Nehalem. Running the guest took 100% of
the host CPUs, but it still had the same temperature as when the host
was idle.
That reminds me that you to pass '-cpu host' for QEMU reasons.
next prev parent reply other threads:[~2017-03-16 15:35 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-15 21:22 [PATCH v5 untested] kvm: better MWAIT emulation for guests Michael S. Tsirkin
2017-03-15 23:35 ` Gabriel L. Somlo
2017-03-15 23:41 ` Michael S. Tsirkin
2017-03-16 13:24 ` Gabriel L. Somlo
2017-03-16 14:04 ` Michael S. Tsirkin
2017-03-16 14:58 ` Gabriel L. Somlo
2017-03-16 15:23 ` Michael S. Tsirkin
2017-03-16 15:35 ` Radim Krčmář [this message]
2017-03-16 16:01 ` Radim Krčmář
2017-03-16 16:47 ` Gabriel L. Somlo
2017-03-16 17:22 ` Radim Krčmář
2017-03-16 17:39 ` Gabriel L. Somlo
2017-03-16 17:27 ` Michael S. Tsirkin
2017-03-16 17:41 ` Gabriel L. Somlo
2017-03-16 18:29 ` Michael S. Tsirkin
2017-03-16 19:24 ` Gabriel L. Somlo
2017-03-16 19:27 ` Michael S. Tsirkin
2017-03-16 20:17 ` Gabriel L. Somlo
2017-03-16 21:14 ` Gabriel L. Somlo
2017-03-17 2:03 ` Michael S. Tsirkin
2017-03-17 13:23 ` Gabriel L. Somlo
2017-03-21 3:22 ` Michael S. Tsirkin
2017-03-21 16:58 ` Radim Krčmář
2017-03-21 17:29 ` Nadav Amit
2017-03-21 17:29 ` Nadav Amit
2017-03-21 19:22 ` Radim Krčmář
2017-03-21 22:51 ` Gabriel Somlo
2017-03-22 0:02 ` Nadav Amit
2017-03-22 13:35 ` Michael S. Tsirkin
2017-03-22 14:10 ` Gabriel L. Somlo
2017-03-22 14:15 ` Michael S. Tsirkin
2017-03-16 16:16 ` Gabriel L. Somlo
2017-03-16 16:45 ` Michael S. Tsirkin
2017-03-16 16:52 ` Gabriel L. Somlo
2017-03-16 16:54 ` Gabriel L. Somlo
2017-03-16 17:14 ` Michael S. Tsirkin
2017-03-16 17:38 ` Radim Krčmář
2017-03-16 14:08 ` Radim Krčmář
2017-03-16 15:44 ` Gabriel L. Somlo
2017-03-16 15:54 ` Radim Krčmář
2017-03-16 16:26 ` Gabriel L. Somlo
2017-03-21 16:16 ` Joerg Roedel
2017-03-21 18:45 ` Michael S. Tsirkin
2017-03-27 13:34 ` Alexander Graf
2017-03-28 14:28 ` Radim Krčmář
2017-03-28 20:35 ` Jim Mattson
2017-03-29 12:11 ` Radim Krčmář
2017-04-03 10:04 ` Alexander Graf
2017-04-04 12:39 ` Radim Krčmář
2017-04-04 12:51 ` Alexander Graf
2017-04-04 13:13 ` Radim Krčmář
2017-04-04 13:15 ` Alexander Graf
2017-04-04 13:44 ` Radim Krčmář
2017-04-04 13:44 ` [Qemu-devel] " Radim Krčmář
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170316153517.GL14081@potion \
--to=rkrcmar@redhat.com \
--cc=corbet@lwn.net \
--cc=gsomlo@gmail.com \
--cc=hpa@zytor.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.