From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: SVM: vmload/vmsave-free VM exits?
Date: Mon, 13 Apr 2015 09:01:27 +0200
Message-ID: <552B69C7.5040205@siemens.com>
References: <5520F2C8.7090102@web.de> <55216CE5.9000504@gmail.com> <55236E6F.7090705@web.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Cc: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>,
	kvm <kvm@vger.kernel.org>,
	Jailhouse <jailhouse-dev@googlegroups.com>
To: Joel Schopp <joel.schopp@amd.com>,
	Avi Kivity <avi.kivity@gmail.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from david.siemens.de ([192.35.17.14]:47698 "EHLO david.siemens.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752194AbbDMHBd (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 13 Apr 2015 03:01:33 -0400
In-Reply-To: <55236E6F.7090705@web.de>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 2015-04-07 07:43, Jan Kiszka wrote:
> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>> Hi Jan,
>>
>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>> these instructions unconditionally. However, I think both only need
>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>> user space exit or no CPU migration is involved (both is always true for
>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>> still uses rsp-based per-cpu variables.
>>>
>>> So the question boils down to what is generally faster:
>>>
>>> A) vmload
>>>     vmrun
>>>     vmsave
>>>
>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>     vmrun
>>>     rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>
>>> Of course, KVM also has to take into account that heavyweight exits
>>> still require vmload/vmsave, thus become more expensive with B) due to
>>> the additional MSR accesses.
>>>
>>> Any thoughts or results of previous experiments?
>> That's a good question, I also thought about it when I was finalizing
>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>> didn't seem to affect the latency in any noticeable way. That's why I
>> decided not to push the patch (in fact, I was even unable to find it now).
>>
>> Note however that how AMD chips store host state during VM switches are
>> implementation-specific. I did my quick experiments on one CPU only, so
>> your mileage may vary.
>>
>> Regarding your question, I feel B will be faster anyways but again I'm
>> afraid that the gain could be within statistical error of the experiment.
> 
> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
> towards 600 if they are colder (added some usleep to each loop in the test).
> 
> I've tested via vmmcall from guest userspace under Jailhouse. KVM should
> be adjustable in a similar way. Attached the benchmark, patch will be in
> the Jailhouse next branch soon. We need to check more CPU types, though.

Avi, I found some preparatory patches of yours from 2010 [1]. Do you
happen to remember if it was never completed for a technical reason?

Joel, can you comment on the benefit of variant B) for the various AMD
CPUs? Is it always positive?

Thanks,
Jan

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61455

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux