From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: [patch 01/13] x86/mm: get_user_pages_fast_atomic
Date: Mon, 08 Sep 2008 17:20:42 +0300
Message-ID: <48C534BA.1030107@qumranet.com>
References: <20080906184822.560099087@localhost.localdomain> <20080906192430.514756352@localhost.localdomain> <48C393EA.2080707@qumranet.com> <20080908061005.GB1014@dmt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from il.qumranet.com ([212.179.150.194]:18436 "EHLO il.qumranet.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750791AbYIHOUo (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 8 Sep 2008 10:20:44 -0400
In-Reply-To: <20080908061005.GB1014@dmt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Marcelo Tosatti wrote:
> On Sun, Sep 07, 2008 at 11:42:18AM +0300, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> From: Nick Piggin <nickpiggin@yahoo.com.au>
>>>
>>> Provide a lockless pagetable walk function without fallback to mmap_sem
>>> on error.
>>>   
>>>       
>> I would like to avoid this if possible.  Not only is this a change to  
>> the core (with backporting headaches),
>>     
>
> Chris mentioned that the backport could use down_read_trylock(mmap_sem), 
> and zap the page on failure. Its a simple solution and it should be rare
> for mmap_sem to be acquired in write mode.
>
>   

Yes. Clever.

>> if we resync in atomic context  
>> this can mean a long time spent with preemption disabled.
>>     
>
> The resync time for a single page is comparable to prefetch_page (note
> that prefetch_page with direct access via gfn_to_page_atomic is about
> 50% faster than the current one) plus the gfn->pfn pagetable walks.
>   

These could be very expensive as the gfn->pfn mapping is essentially 
random and too big to be cached. 512 cache misses is a lot of time - 
perhaps upwards of 50 microseconds.

Snapshotting is a must here IMO -- and it avoids the need for a walk 
completely.

> It could simply resched based on need_resched after each page synced.
> Would that cover your concern?
>
>   

I guess it's better than nothing.

> BTW, it might be interesting to spin_needbreak after resyncing a certain
> number of pages.
>
>   
>> We might get around the need by dropping the lock when we resync, fetch  
>> the gfns without the lock, and after reacquiring it check whether we can  
>> proceed or whether we need to abort and let the guest retry.  We can  
>> probably proceed unless one of two things have happened: an mmu page was  
>> zapped, or out page was oos'ed while we were resyncing it.
>>     
>
> This sounds more complicated. First you have to grab the lock twice for
> each page synced. Secondly, the abort case due to oos'ed while resyncing
> means the page has to be zapped.
>   

It is complicated, yes.

-- 
error compiling committee.c: too many arguments to function