From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Subject: Re: KVM: MMU: improve n_max_mmu_pages calculation with TDP
Date: Fri, 22 Mar 2013 11:00:28 +0800
Message-ID: <514BC94C.8070802@linux.vnet.ibm.com>
References: <20130320201420.GA17347@amt.cnet> <514A9DA7.10702@linux.vnet.ibm.com> <20130321142919.GA30837@amt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: kvm <kvm@vger.kernel.org>, Ulrich Obergfell <uobergfe@redhat.com>,
	Takuya Yoshikawa <takuya.yoshikawa@gmail.com>,
	Avi Kivity <avi.kivity@gmail.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e28smtp05.in.ibm.com ([122.248.162.5]:46222 "EHLO
	e28smtp05.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753131Ab3CVDAr (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 21 Mar 2013 23:00:47 -0400
Received: from /spool/local
	by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
	for <kvm@vger.kernel.org> from <xiaoguangrong@linux.vnet.ibm.com>;
	Fri, 22 Mar 2013 08:28:01 +0530
Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58])
	by d28dlp01.in.ibm.com (Postfix) with ESMTP id 16693E0057
	for <kvm@vger.kernel.org>; Fri, 22 Mar 2013 08:32:12 +0530 (IST)
Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66])
	by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r2M30ZWt24707218
	for <kvm@vger.kernel.org>; Fri, 22 Mar 2013 08:30:36 +0530
Received: from d28av04.in.ibm.com (loopback [127.0.0.1])
	by d28av04.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r2M30Ysl019465
	for <kvm@vger.kernel.org>; Fri, 22 Mar 2013 14:00:37 +1100
In-Reply-To: <20130321142919.GA30837@amt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 03/21/2013 10:29 PM, Marcelo Tosatti wrote:
> On Thu, Mar 21, 2013 at 01:41:59PM +0800, Xiao Guangrong wrote:
>> On 03/21/2013 04:14 AM, Marcelo Tosatti wrote:
>>>
>>> kvm_mmu_calculate_mmu_pages numbers, 
>>>
>>> maximum number of shadow pages = 2% of mapped guest pages
>>>
>>> Does not make sense for TDP guests where mapping all of guest
>>> memory with 4k pages cannot exceed "mapped guest pages / 512"
>>> (not counting root pages).
>>>
>>> Allow that maximum for TDP, forcing the guest to recycle otherwise.
>>>
>>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>>
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 956ca35..a9694a8d7 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -4293,7 +4293,7 @@ nomem:
>>>  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
>>>  {
>>>  	unsigned int nr_mmu_pages;
>>> -	unsigned int  nr_pages = 0;
>>> +	unsigned int i, nr_pages = 0;
>>>  	struct kvm_memslots *slots;
>>>  	struct kvm_memory_slot *memslot;
>>>
>>> @@ -4302,7 +4302,19 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
>>>  	kvm_for_each_memslot(memslot, slots)
>>>  		nr_pages += memslot->npages;
>>>
>>> -	nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
>>> +	if (tdp_enabled) {
>>> +		/* one root page */
>>> +		nr_mmu_pages = 1;
>>> +		/* nr_pages / (512^i) per level, due to
>>> +		 * guest RAM map being linear */
>>> +		for (i = 1; i < 4; i++) {
>>> +			int nr_pages_round = nr_pages + (1 << (9*i));
>>> +			nr_mmu_pages += nr_pages_round >> (9*i);
>>> +		}
>>
>> Marcelo,
>>
>> Can it work if nested guest is used? Did you see any problem in practice (direct guest
>> uses more memory than your calculation)?
> 
> Direct guest can use more than the calculation by switching between
> different paging modes.

I mean guest runs on hardmmu (tdp is used but no nested guest). Its only
use one page table and seems can not use more memory than your calculation
(except some mmio page tables).

So, you calculation is only used to limit memory used if tdp + nested guest?

> 
> About nested guest: at one point in time the working set cannot exceed 
> the number of physical pages visible by the guest.

But it can cause lots of #PF, it is the nightmare for performance, no?

> 
> Allowing an excessively high number of shadow pages is a security

The security concern means "optimization memory usage"? Or something else?

> concern, also, as unpreemptable long operations are necessary to tear
> down the pages.

You mean limiting the shadow pages to let some patch run faster like
remove-write-access and zap-all-sp etc.? If yes, we can directly optimize
for these paths, this is more effective i think.

> 
>> And mmio also can build some page table that looks like not considered
>> in this patch.
> 
> Right, but its only a few pages. Same argument as above: working set at
> one given time is smaller than total RAM. Do you see any potential
> problem?

Marcelo, I just confused whether the limitation is reasonable, as i said,
the limitation is not effective enough on hardmmu-only guest (no nested).
and it seems too low for nested guests.