From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] KVM: MMU: Mark sp mmio cached when creating mmio
 spte
Date: Thu, 14 Mar 2013 13:13:30 +0800
Message-ID: <51415C7A.402@linux.vnet.ibm.com>
References: <20130312174333.7f76148e.yoshikawa_takuya_b1@lab.ntt.co.jp> <20130312174440.5d5199ee.yoshikawa_takuya_b1@lab.ntt.co.jp> <5140094F.5080700@linux.vnet.ibm.com> <20130313162816.c62899dc.yoshikawa_takuya_b1@lab.ntt.co.jp> <51402DDA.607@linux.vnet.ibm.com> <20130313123358.GM11223@redhat.com> <51407441.4020200@linux.vnet.ibm.com> <20130313224056.8c9c87f4d95b332d2273a685@gmail.com> <514087A0.1000704@linux.vnet.ibm.com> <20130314015821.GA13261@amt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>,
	Gleb Natapov <gleb@redhat.com>,
	Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>,
	kvm@vger.kernel.org
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e23smtp05.au.ibm.com ([202.81.31.147]:53650 "EHLO
	e23smtp05.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751933Ab3CNFNq (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 14 Mar 2013 01:13:46 -0400
Received: from /spool/local
	by e23smtp05.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
	for <kvm@vger.kernel.org> from <xiaoguangrong@linux.vnet.ibm.com>;
	Thu, 14 Mar 2013 15:09:21 +1000
Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21])
	by d23dlp02.au.ibm.com (Postfix) with ESMTP id 1525E2BB0051
	for <kvm@vger.kernel.org>; Thu, 14 Mar 2013 16:13:36 +1100 (EST)
Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139])
	by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r2E5DV7u58916924
	for <kvm@vger.kernel.org>; Thu, 14 Mar 2013 16:13:32 +1100
Received: from d23av04.au.ibm.com (loopback [127.0.0.1])
	by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r2E5DYSg016082
	for <kvm@vger.kernel.org>; Thu, 14 Mar 2013 16:13:34 +1100
In-Reply-To: <20130314015821.GA13261@amt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 03/14/2013 09:58 AM, Marcelo Tosatti wrote:
> On Wed, Mar 13, 2013 at 10:05:20PM +0800, Xiao Guangrong wrote:
>> On 03/13/2013 09:40 PM, Takuya Yoshikawa wrote:
>>> On Wed, 13 Mar 2013 20:42:41 +0800
>>> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
>>>
>>>>>>>> How about save all mmio spte into a mmio-rmap?
>>>>>>>
>>>>>>> The problem is that other mmu code would need to care about the pointers
>>>>>>> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
>>>>>>
>>>>>> It is not hard... all the codes have been wrapped by *zap_spte*.
>>>>>>
>>>>> So are you going to send a patch? What do you think about applying this
>>>>> as temporary solution?
>>>>
>>>> Hi Gleb,
>>>>
>>>> Since it only needs small change based on this patch, I think we can directly
>>>> apply the rmap-based way.
>>>>
>>>> Takuya, could you please do this? ;)
>>>
>>> Though I'm fine with my making the patch better, I'm still thinking
>>> about the bad side of it, though.
>>>
>>> In zap_spte, don't we need to search the pointer to be removed from the
>>> global mmio-rmap list?  How long can that list be?
>>
>> It is not bad. On softmmu, the rmap list has already been long more than 300.
>> On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).
>>
>> The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
>> can be speed up after applying my previous patch:
>> KVM: MMU: fast drop all spte on the pte_list
>>
>>>
>>> Implementing it will/may not be difficult but I'm not sure if we would
>>> get pure improvement.  Unless it becomes 99% sure, I think we should
>>> first take a basic approach. 
>>
>> I definitely sure zapping all mmio-sptes is fast than zapping mmio shadow
>> pages. ;)
> 
> With a huge number of shadow pages (think 512GB guest, 262144 pte-level
> shadow pages to map), it might be a problem.

That is one of the reasons why i think zap mmio shadow page is not good. ;)

This patch needs to walk all shadow pages to find all mmio shadow page out
and zap them, it depends on how much memory is used on guest (huge memory
causes huge shadow page as you said). But the time of zapping mmio spte is
constant, no matter of memory used.

> 
>>> What do you think?
>>
>> I am considering if zap all shadow page is faster enough (after my patchset), do
>> we really need to care it?
> 
> Still needed: your patch reduces kvm_mmu_zap_all() time, but as you can
> see with huge memory sized guests 100% improvement over the current
> situation will be a bottleneck (and as you noted the deletion case is
> still unsolved).	

The improvement can be greater if more memory is used. (I only used 2G memory in
guest since my test case is 32bit program which can not use huge memory, and
not lock contention in my testcase.)

Actually, the time complexity of current kvm_mmu_zap_all is the same as zap
mmio shadow page in the mmu-lock (O(n), n is the number of shadow page table).
Both of them walking all shadow page table.  The reset work of kvm_mmu_zap is
constant.

And this is a TODO thing:
(2): free shadow pages by using generation-number
After that, kvm_mmu_zap needn't to walking all shadow pages anymore.

> 
> Suppose another improvement angle is to zap only whats necessary for the
> given operation (say there is the memslot hint available, but unused for
> x86).

Yes, i agree on this point. Zapping all shadow pages smake vcpus fault
on all memory access. This is the shortage.