linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Jim Mattson <jmattson@google.com>
Cc: Alexander Graf <agraf@suse.de>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Gabriel L. Somlo" <gsomlo@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Joerg Roedel <joro@8bytes.org>, kvm list <kvm@vger.kernel.org>,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests
Date: Wed, 29 Mar 2017 14:11:47 +0200	[thread overview]
Message-ID: <20170329121147.GA5129@potion> (raw)
In-Reply-To: <CALMp9eRs+WMt+8+wgWk7+H8Kd5zU0_U+On0O_G4cp_7xEffrGQ@mail.gmail.com>

2017-03-28 13:35-0700, Jim Mattson:
> On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> 2017-03-27 15:34+0200, Alexander Graf:
>>> On 15/03/2017 22:22, Michael S. Tsirkin wrote:
>>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
>>>> unless explicitly provided with kernel command line argument
>>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
>>>> without checking CPUID.
>>>>
>>>> We currently emulate that as a NOP but on VMX we can do better: let
>>>> guest stop the CPU until timer, IPI or memory change.  CPU will be busy
>>>> but that isn't any worse than a NOP emulation.
>>>>
>>>> Note that mwait within guests is not the same as on real hardware
>>>> because halt causes an exit while mwait doesn't.  For this reason it
>>>> might not be a good idea to use the regular MWAIT flag in CPUID to
>>>> signal this capability.  Add a flag in the hypervisor leaf instead.
>>>
>>> So imagine we had proper MWAIT emulation capabilities based on page faults.
>>> In that case, we could do something as fancy as
>>>
>>> Treat MWAIT as pass-through by default
>>>
>>> Have a per-vcpu monitor timer 10 times a second in the background that
>>> checks which instruction we're in
>>>
>>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT,
>>> if $IP was in non-mwait within that time, reset counter.
>>
>> Or we could reuse external interrupts for sampling.  Exits trigerred by
>> them would check for current instruction (probably would be best to
>> limit just to timer tick) and a sufficient ratio (> 0?) of other exits
>> would imply that MWAIT is not used.
>>
>>> Or instead maybe just reuse the adapter hlt logic?
>>
>> Emulated MWAIT is very similar to emulated HLT, so reusing the logic
>> makes sense.  We would just add new wakeup methods.
>>
>>> Either way, with that we should be able to get super low latency IPIs
>>> running while still maintaining some sanity on systems which don't have
>>> dedicated CPUs for workloads.
>>>
>>> And we wouldn't need guest modifications, which is a great plus. So older
>>> guests (and Windows?) could benefit from mwait as well.
>>
>> There is no need guest modifications -- it could be exposed as standard
>> MWAIT feature to the guest, with responsibilities for guest/host-impact
>> on the user.
>>
>> I think that the page-fault based MWAIT would require paravirt if it
>> should be enabled by default, because of performance concerns:
>> Enabling write protection on a page needs a VM exit on all other VCPUs
>> when beginning monitoring (to reload page permissions and prevent missed
>> writes).
>> We'd want to keep trapping writes to the page all the time because
>> toggling is slow, but this could regress performance for an OS that has
>> other data accessed by other VCPUs in that page.
>> No current interface can tell the guest that it should reserve the whole
>> page instead of what CPUID[5] says and that writes to the monitored page
>> are not "cheap", but can trigger a VM exit ...
> 
> CPUID.05H:EBX is supposed to address the false sharing issue. IIRC,
> VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX
> when running Mac OS X guests. Per Intel's SDM volume 3, section
> 8.10.5, "To avoid false wake-ups; use the largest monitor line size to
> pad the data structure used to monitor writes. Software must make sure
> that beyond the data structure, no unrelated data variable exists in
> the triggering area for MWAIT. A pad may be needed to avoid this
> situation." Unfortunately, most operating systems do not follow this
> advice.

Right, EBX provides what we need to expose that the whole page is
monitored, thanks!

>             Unfortunately, most operating systems do not follow this
> advice.

Yeah ... KVM could add yet another heuristic to drop MWAIT emulation and
use hardware if there were many traps while the target was not MWAITING,
it's getting over-complicated, though :/

  reply	other threads:[~2017-03-29 12:11 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-15 21:22 [PATCH v5 untested] kvm: better MWAIT emulation for guests Michael S. Tsirkin
2017-03-15 23:35 ` Gabriel L. Somlo
2017-03-15 23:41   ` Michael S. Tsirkin
2017-03-16 13:24     ` Gabriel L. Somlo
2017-03-16 14:04       ` Michael S. Tsirkin
2017-03-16 14:58         ` Gabriel L. Somlo
2017-03-16 15:23           ` Michael S. Tsirkin
2017-03-16 15:35           ` Radim Krčmář
2017-03-16 16:01             ` Radim Krčmář
2017-03-16 16:47               ` Gabriel L. Somlo
2017-03-16 17:22                 ` Radim Krčmář
2017-03-16 17:39                   ` Gabriel L. Somlo
2017-03-16 17:27                 ` Michael S. Tsirkin
2017-03-16 17:41                   ` Gabriel L. Somlo
2017-03-16 18:29                     ` Michael S. Tsirkin
2017-03-16 19:24                       ` Gabriel L. Somlo
2017-03-16 19:27                         ` Michael S. Tsirkin
2017-03-16 20:17                           ` Gabriel L. Somlo
2017-03-16 21:14                             ` Gabriel L. Somlo
2017-03-17  2:03                               ` Michael S. Tsirkin
2017-03-17 13:23                                 ` Gabriel L. Somlo
2017-03-21  3:22                                   ` Michael S. Tsirkin
2017-03-21 16:58                                     ` Radim Krčmář
2017-03-21 17:29                                       ` Nadav Amit
2017-03-21 19:22                                         ` Radim Krčmář
2017-03-21 22:51                                           ` Gabriel Somlo
2017-03-22  0:02                                             ` Nadav Amit
2017-03-22 13:35                                               ` Michael S. Tsirkin
2017-03-22 14:10                                                 ` Gabriel L. Somlo
2017-03-22 14:15                                                   ` Michael S. Tsirkin
2017-03-16 16:16             ` Gabriel L. Somlo
2017-03-16 16:45               ` Michael S. Tsirkin
2017-03-16 16:52                 ` Gabriel L. Somlo
2017-03-16 16:54                   ` Gabriel L. Somlo
2017-03-16 17:14                     ` Michael S. Tsirkin
2017-03-16 17:38                       ` Radim Krčmář
2017-03-16 14:08       ` Radim Krčmář
2017-03-16 15:44         ` Gabriel L. Somlo
2017-03-16 15:54           ` Radim Krčmář
2017-03-16 16:26             ` Gabriel L. Somlo
2017-03-21 16:16 ` Joerg Roedel
2017-03-21 18:45   ` Michael S. Tsirkin
2017-03-27 13:34 ` Alexander Graf
2017-03-28 14:28   ` Radim Krčmář
2017-03-28 20:35     ` Jim Mattson
2017-03-29 12:11       ` Radim Krčmář [this message]
2017-04-03 10:04         ` Alexander Graf
2017-04-04 12:39           ` Radim Krčmář
2017-04-04 12:51             ` Alexander Graf
2017-04-04 13:13               ` Radim Krčmář
2017-04-04 13:15                 ` Alexander Graf
2017-04-04 13:44                   ` Radim Krčmář

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170329121147.GA5129@potion \
    --to=rkrcmar@redhat.com \
    --cc=agraf@suse.de \
    --cc=corbet@lwn.net \
    --cc=gsomlo@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).