linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Radim Krčmář" <rkrcmar@redhat.com>
To: "Gabriel L. Somlo" <gsomlo@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests
Date: Thu, 16 Mar 2017 16:35:18 +0100	[thread overview]
Message-ID: <20170316153517.GL14081@potion> (raw)
In-Reply-To: <20170316145819.GC4085@HEDWIG.INI.CMU.EDU>

2017-03-16 10:58-0400, Gabriel L. Somlo:
> On Thu, Mar 16, 2017 at 04:04:12PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote:
> > > After studying your patch a bit more carefully (sorry, it's crazy
> > > around here right now :) ) I realized you're simply trying to
> > > (selectively) decide when to exit L1 and emulate as NOP vs. when to
> > > just allow L1 to execute MONITOR & MWAIT natively.
> > > 
> > > Is that right ? Because if so, the issues I saw on my MacPro1,1 are
> > > weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT
> > > natively was one of the options Alex Graf and Rene Rebe used back in
> > > the very early days of OS X on QEMU, at the time I got involved with
> > > that project. Here's part of an out of tree patch against 3.4 which did
> > > just that, and worked as far as I remember on *any* MWAIT capable
> > > intel chip I had access to back in 2010:
> > > 
> > > ##############################################################################
> > > # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27
> > > ##############################################################################
> > > diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c
> > > --- linux-3.4/arch/x86/kvm/cpuid.c	2012-05-20 18:29:13.000000000 -0400
> > > +++ linux-3.4-mac/arch/x86/kvm/cpuid.c	2012-10-09 11:42:59.921215750 -0400
> > > @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid
> > >  		f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > >  		F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp |
> > >  		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
> > >  	/* cpuid 1.ecx */
> > >  	const u32 kvm_supported_word4_x86_features =
> > > -		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> > > +		F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ |
> > >  		0 /* DS-CPL, VMX, SMX, EST */ |
> > >  		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> > >  		F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
> > >  		0 /* Reserved, DCA */ | F(XMM4_1) |
> > >  		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> > > diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c
> > > --- linux-3.4/arch/x86/kvm/svm.c	2012-05-20 18:29:13.000000000 -0400
> > > +++ linux-3.4-mac/arch/x86/kvm/svm.c	2012-10-09 11:44:41.598997481 -0400
> > > @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s
> > >  	set_intercept(svm, INTERCEPT_VMSAVE);
> > >  	set_intercept(svm, INTERCEPT_STGI);
> > >  	set_intercept(svm, INTERCEPT_CLGI);
> > >  	set_intercept(svm, INTERCEPT_SKINIT);
> > >  	set_intercept(svm, INTERCEPT_WBINVD);
> > > -	set_intercept(svm, INTERCEPT_MONITOR);
> > > -	set_intercept(svm, INTERCEPT_MWAIT);
> > >  	set_intercept(svm, INTERCEPT_XSETBV);
> > >  
> > >  	control->iopm_base_pa = iopm_base;
> > >  	control->msrpm_base_pa = __pa(svm->msrpm);
> > >  	control->int_ctl = V_INTR_MASKING_MASK;
> > > diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c
> > > --- linux-3.4/arch/x86/kvm/vmx.c	2012-05-20 18:29:13.000000000 -0400
> > > +++ linux-3.4-mac/arch/x86/kvm/vmx.c	2012-10-09 11:42:59.925215977 -0400
> > > @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls
> > >  		nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high);
> > >  	nested_vmx_procbased_ctls_low = 0;
> > >  	nested_vmx_procbased_ctls_high &=
> > >  		CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING |
> > >  		CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING |
> > > -		CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING |
> > > +		CPU_BASED_CR3_LOAD_EXITING |
> > >  		CPU_BASED_CR3_STORE_EXITING |
> > >  #ifdef CONFIG_X86_64
> > >  		CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING |
> > >  #endif
> > >  		CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
> > > @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru
> > >  	      CPU_BASED_CR3_LOAD_EXITING |
> > >  	      CPU_BASED_CR3_STORE_EXITING |
> > >  	      CPU_BASED_USE_IO_BITMAPS |
> > >  	      CPU_BASED_MOV_DR_EXITING |
> > >  	      CPU_BASED_USE_TSC_OFFSETING |
> > > -	      CPU_BASED_MWAIT_EXITING |
> > > -	      CPU_BASED_MONITOR_EXITING |
> > >  	      CPU_BASED_INVLPG_EXITING |
> > >  	      CPU_BASED_RDPMC_EXITING;
> > >  
> > >  	opt = CPU_BASED_TPR_SHADOW |
> > >  	      CPU_BASED_USE_MSR_BITMAPS |
> > > 
> > > If all you're trying to do is (selectively) revert to this behavior,
> > > that "shouldn't" mess it up for the MacPro either, so I'm thoroughly
> > > confused at this point :)
> > 
> > Yes.  Me too. Want to try that other patch and see what happens?
> 
> You mean the old 3.4 patch against current KVM ? I'll try to do that,
> might take me a while :)

Michael's patch already did most of that, you just need to add

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index efde6cc50875..b12f07d4ce17 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -348,7 +348,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 	const u32 kvm_cpuid_1_ecx_x86_features =
 		/* NOTE: MONITOR (and MWAIT) are emulated as NOP,
 		 * but *not* advertised to guests via CPUID ! */
-		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
+		F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ |
 		0 /* DS-CPL, VMX, SMX, EST */ |
 		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
 		F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |

Note: this will never be upstream, because mwait isn't what we want by
default. :)

>> > Back in 2010, running MWAIT in L>=1  behaved 100% exactly like a NOP,
>> > didn't power down the physical CPU, just immediately moved on to the
>> > next instruction. As such, there was no power saving and no
>> > opportunity to yield to another L0 thread either, unlike with NOP
>> > emulation at L0.
>> > 
>> > Did that change on newer Intel chips (i.e., is guest-mode MWAIT now
>> > doing something smarter than just acting as a guest-mode NOP) ?
>> > 
>> > Thanks,
>> > --Gabriel
>> 
>> Interesting.  What it seems to say is this:
>> 
>> MWAIT. Behavior of the MWAIT instruction (which always causes an invalid-
>> opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT
>> exiting” VM-execution control:
>> — If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit
>> (see Section 22.1.3).
>> — If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if
>> any of the following is true: (1) the “interrupt-window exiting” VM-execution
>> control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1.
>> — If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window
>> exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT
>> does not cause the processor to enter an implementation-dependent
>> optimized state; instead, control passes to the instruction following the
>> MWAIT instruction.
>> 
>> 
>> And since interrupt-window exiting is 0 most of the time for KVM,
>> I would expect MWAIT to behave normally.
> 
> The intel manual said the same thing back in 2010 as well. However,
> regardless of how any flags were set, interrupt-window exiting or not,
> "normal" L1 MWAIT behavior was that it woke up immediately regardless.
> Remember, never going to sleep is still correct ("normal" ?) behavior
> per the ISA definition of MWAIT :)

I'll write a simple kvm-unit-test to better understand why it is broken
for you ...

> Also, when I tested your patch on the macbook air (where it worked),
> not only was the host reporting 400% CPU for qemu (which is to be
> expected), but the thermal fan/cooling thing also shifted up into high
> gear, which means the physical CPU got hot, which it shouldn't have if
> the guest-mode MWAIT actually did put the host CPU into low power.

I tested MWAIT with basically the same kernel patch and the qemu patch
with Linux guest on Haswell and Nehalem.  Running the guest took 100% of
the host CPUs, but it still had the same temperature as when the host
was idle.

That reminds me that you to pass '-cpu host' for QEMU reasons.

  parent reply	other threads:[~2017-03-16 15:35 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-15 21:22 [PATCH v5 untested] kvm: better MWAIT emulation for guests Michael S. Tsirkin
2017-03-15 23:35 ` Gabriel L. Somlo
2017-03-15 23:41   ` Michael S. Tsirkin
2017-03-16 13:24     ` Gabriel L. Somlo
2017-03-16 14:04       ` Michael S. Tsirkin
2017-03-16 14:58         ` Gabriel L. Somlo
2017-03-16 15:23           ` Michael S. Tsirkin
2017-03-16 15:35           ` Radim Krčmář [this message]
2017-03-16 16:01             ` Radim Krčmář
2017-03-16 16:47               ` Gabriel L. Somlo
2017-03-16 17:22                 ` Radim Krčmář
2017-03-16 17:39                   ` Gabriel L. Somlo
2017-03-16 17:27                 ` Michael S. Tsirkin
2017-03-16 17:41                   ` Gabriel L. Somlo
2017-03-16 18:29                     ` Michael S. Tsirkin
2017-03-16 19:24                       ` Gabriel L. Somlo
2017-03-16 19:27                         ` Michael S. Tsirkin
2017-03-16 20:17                           ` Gabriel L. Somlo
2017-03-16 21:14                             ` Gabriel L. Somlo
2017-03-17  2:03                               ` Michael S. Tsirkin
2017-03-17 13:23                                 ` Gabriel L. Somlo
2017-03-21  3:22                                   ` Michael S. Tsirkin
2017-03-21 16:58                                     ` Radim Krčmář
2017-03-21 17:29                                       ` Nadav Amit
2017-03-21 19:22                                         ` Radim Krčmář
2017-03-21 22:51                                           ` Gabriel Somlo
2017-03-22  0:02                                             ` Nadav Amit
2017-03-22 13:35                                               ` Michael S. Tsirkin
2017-03-22 14:10                                                 ` Gabriel L. Somlo
2017-03-22 14:15                                                   ` Michael S. Tsirkin
2017-03-16 16:16             ` Gabriel L. Somlo
2017-03-16 16:45               ` Michael S. Tsirkin
2017-03-16 16:52                 ` Gabriel L. Somlo
2017-03-16 16:54                   ` Gabriel L. Somlo
2017-03-16 17:14                     ` Michael S. Tsirkin
2017-03-16 17:38                       ` Radim Krčmář
2017-03-16 14:08       ` Radim Krčmář
2017-03-16 15:44         ` Gabriel L. Somlo
2017-03-16 15:54           ` Radim Krčmář
2017-03-16 16:26             ` Gabriel L. Somlo
2017-03-21 16:16 ` Joerg Roedel
2017-03-21 18:45   ` Michael S. Tsirkin
2017-03-27 13:34 ` Alexander Graf
2017-03-28 14:28   ` Radim Krčmář
2017-03-28 20:35     ` Jim Mattson
2017-03-29 12:11       ` Radim Krčmář
2017-04-03 10:04         ` Alexander Graf
2017-04-04 12:39           ` Radim Krčmář
2017-04-04 12:51             ` Alexander Graf
2017-04-04 13:13               ` Radim Krčmář
2017-04-04 13:15                 ` Alexander Graf
2017-04-04 13:44                   ` Radim Krčmář

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170316153517.GL14081@potion \
    --to=rkrcmar@redhat.com \
    --cc=corbet@lwn.net \
    --cc=gsomlo@gmail.com \
    --cc=hpa@zytor.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).