From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: SVM: vmload/vmsave-free VM exits? Date: Mon, 13 Apr 2015 20:41:04 +0300 Message-ID: <552BFFB0.9020008@gmail.com> References: <5520F2C8.7090102@web.de> <55216CE5.9000504@gmail.com> <55236E6F.7090705@web.de> <552B69C7.5040205@siemens.com> <552BFCF8.3080607@gmail.com> <552BFE51.3000908@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Cc: Valentine Sinitsyn , kvm , Jailhouse To: Jan Kiszka , Joel Schopp Return-path: In-Reply-To: <552BFE51.3000908@siemens.com> List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , List-Id: kvm.vger.kernel.org On 04/13/2015 08:35 PM, Jan Kiszka wrote: > On 2015-04-13 19:29, Avi Kivity wrote: >> On 04/13/2015 10:01 AM, Jan Kiszka wrote: >>> On 2015-04-07 07:43, Jan Kiszka wrote: >>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote: >>>>> Hi Jan, >>>>> >>>>> On 05.04.2015 13:31, Jan Kiszka wrote: >>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's >>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use >>>>>> these instructions unconditionally. However, I think both only need >>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no >>>>>> user space exit or no CPU migration is involved (both is always >>>>>> true for >>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also >>>>>> still uses rsp-based per-cpu variables. >>>>>> >>>>>> So the question boils down to what is generally faster: >>>>>> >>>>>> A) vmload >>>>>> vmrun >>>>>> vmsave >>>>>> >>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base) >>>>>> vmrun >>>>>> rdmsrl(MSR_GS_BASE, guest_gs_base) >>>>>> >>>>>> Of course, KVM also has to take into account that heavyweight exits >>>>>> still require vmload/vmsave, thus become more expensive with B) due to >>>>>> the additional MSR accesses. >>>>>> >>>>>> Any thoughts or results of previous experiments? >>>>> That's a good question, I also thought about it when I was finalizing >>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it >>>>> didn't seem to affect the latency in any noticeable way. That's why I >>>>> decided not to push the patch (in fact, I was even unable to find it >>>>> now). >>>>> >>>>> Note however that how AMD chips store host state during VM switches are >>>>> implementation-specific. I did my quick experiments on one CPU only, so >>>>> your mileage may vary. >>>>> >>>>> Regarding your question, I feel B will be faster anyways but again I'm >>>>> afraid that the gain could be within statistical error of the >>>>> experiment. >>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more >>>> towards 600 if they are colder (added some usleep to each loop in the >>>> test). >>>> >>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM should >>>> be adjustable in a similar way. Attached the benchmark, patch will be in >>>> the Jailhouse next branch soon. We need to check more CPU types, though. >>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you >>> happen to remember if it was never completed for a technical reason? >> IIRC, I came to the conclusion that it was impossible. Something about >> TR.size not receiving a reasonable value. Let me see. > To my understanding, TR doesn't play a role until we leave ring 0 again. > Or what could make the CPU look for any of the fields in the 64-bit TSS > before that? Exceptions that utilize the IST. I found a writeup [17] that describes this, but I think it's even more impossible than that writeup implies. [17] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/26712/ > Jan > -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.