public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
Cc: kvm-devel
	<kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [patch 3/5] KVM: add kvm_follow_page()
Date: Sun, 23 Dec 2007 09:03:23 +0200	[thread overview]
Message-ID: <476E083B.5040600@qumranet.com> (raw)
In-Reply-To: <20071223052513.GA19072@dmt>

Marcelo Tosatti wrote:
> Hi Avi,
>
> Andrew, please see the comments on the need for a atomic get_user_pages() 
> below.
>
> On Sat, Dec 22, 2007 at 09:42:27PM +0200, Avi Kivity wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> In preparation for a mmu spinlock, avoid schedules in mmu_set_spte()
>>> by using follow_page() instead of get_user_pages().
>>>
>>> The fault handling work by get_user_pages() now happens outside the lock.
>>>
>>> Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>
>>> Index: kvm.quilt/arch/x86/kvm/mmu.c
>>> @@ -983,8 +985,11 @@ static int nonpaging_map(struct kvm_vcpu
>>> table = __va(table_addr);
>>>
>>> if (level == 1) {
>>> + struct page *page;
>>> + page = gfn_to_page(vcpu->kvm, gfn);
>>> mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL,
>>> 0, write, 1, &pt_write, gfn);
>>> + kvm_release_page_clean(page);
>>> return pt_write || is_io_pte(table[index]);
>>> }
>>>       
>> Is this hunk necessary? It will swap in the page in case it's swapped
>> out, but not sure we care about performance in that case.
>>     
>
> The issue is that the mmap call done by QEMU won't instantiate
> the pte's of the guest memory area. So it needs to either do that
> get_user_pages()->handle_mm_fault() sequence outside mmu_set_spte() 

But that's only on the very first access to the page.  Once the page has 
been accessed by the guest, it will be mapped in the host page table 
(unless it's subsequently swapped out).

Oh, but the page will never be faulted in this way because the path 
that's supposed to map it is mmu_set_spte().  In that case, your 
original implementation of passing in page directly is better (instead 
of being forced to pass it indirectly via the page tables).  Sorry about 
generating this confusion.

> or
> mmap() has to pass MAP_POPULATE, prefaulting all guest pages (which is
> highly suboptimal).
>   

Actually, MAP_POPULATE is optimal since it neatly batches all of the 
work.  I don't think we should do it since it gives the appearance of 
being slower (esp. with lots of guests memory).

>   
>>> n.c
>>> @@ -453,6 +453,25 @@ static unsigned long gfn_to_hva(struct k
>>> return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE);
>>> }
>>>
>>> +struct page *kvm_follow_page(struct kvm *kvm, gfn_t gfn)
>>> +{
>>> + unsigned long addr;
>>> + struct vm_area_struct *vma;
>>> +
>>> + addr = gfn_to_hva(kvm, gfn);
>>> + /* MMIO access */
>>> + if (kvm_is_error_hva(addr)) {
>>> + get_page(bad_page);
>>> + return bad_page;
>>> + }
>>> +
>>> + vma = find_vma(current->mm, addr);
>>> + if (!vma)
>>> + return NULL;
>>> +
>>> + return follow_page(vma, addr, FOLL_GET|FOLL_TOUCH);
>>> +}
>>>       
>> It may be better to create get_user_page_inatomic() as the last four
>> lines rather than exporting follow_page(). Maybe an mm maintainer can
>> comment.
>>     
>
> Andrew, KVM needs to find the struct page which maps to a certain       
> address in the QEMU process without sleeping since that lookup is       
> performed with a spinlock held.
>
> So it needs to either export follow_page() or use a get_user_pages() variant 
> which doesnt call cond_resched() or handle_mm_fault(), simply failing 
> if the page is not pagetable mapped.
>
> What you think? 
>
>   

[We need to pass FOLL_WRITE to follow_page()]

>>> --- kvm.quilt.orig/arch/x86/kvm/paging_tmpl.h
>>> +++ kvm.quilt/arch/x86/kvm/paging_tmpl.h
>>> @@ -67,6 +67,7 @@ struct guest_walker {
>>> gfn_t table_gfn[PT_MAX_FULL_LEVELS];
>>> pt_element_t ptes[PT_MAX_FULL_LEVELS];
>>> gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
>>> + struct page *page;
>>> unsigned pt_access;
>>> unsigned pte_access;
>>> gfn_t gfn;
>>> @@ -203,14 +204,18 @@ walk:
>>> --walker->level;
>>> }
>>>
>>> + walker->page = gfn_to_page(vcpu->kvm, walker->gfn);
>>> +
>>> if (write_fault && !is_dirty_pte(pte)) {
>>> bool ret;
>>>
>>> mark_page_dirty(vcpu->kvm, table_gfn);
>>> ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
>>> pte|PT_DIRTY_MASK);
>>> - if (ret)
>>> + if (ret) {
>>> + kvm_release_page_clean(walker->page);
>>> goto walk;
>>> + }
>>> pte |= PT_DIRTY_MASK;
>>> mutex_lock(&vcpu->kvm->lock);
>>> kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte));
>>> @@ -323,8 +328,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
>>> r = kvm_read_guest_atomic(vcpu->kvm,
>>> walker->pte_gpa[level - 2],
>>> &curr_pte, sizeof(curr_pte));
>>> - if (r || curr_pte != walker->ptes[level - 2])
>>> - return NULL;
>>> + if (r || curr_pte != walker->ptes[level - 2]) {
>>> + shadow_ent = NULL;
>>> + goto out;
>>> + }
>>> }
>>> shadow_addr = __pa(shadow_page->spt);
>>> shadow_pte = shadow_addr | PT_PRESENT_MASK | PT_ACCESSED_MASK
>>> @@ -336,7 +343,8 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
>>> user_fault, write_fault,
>>> walker->ptes[walker->level-1] & PT_DIRTY_MASK,
>>> ptwrite, walker->gfn);
>>> -
>>> +out:
>>> + kvm_release_page_clean(walker->page);
>>> return shadow_ent;
>>> }
>>>
>>>       
>> walker->page is never used, so I think it can be completely dropped.
>> mmu_set_spte() already does the right thing without its help.
>>     
>
> The page being mapped by the guest needs to be faulted in by someone...
> So while walker->page is unused, it guarantees that the handler will
> properly fault in the pagetables to know which physical address the page
> resides at. ie. it is the place which guarantees that the page gets
> mapped if its already in memory or swapped in and then mapped.

Exactly.  But it is better to be explicit about it and pass the page 
directly like you did before.  I hate to make you go back-and-fourth, 
but I did not understand the issue completely before.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

  reply	other threads:[~2007-12-23  7:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-21  0:18 [patch 0/5] KVM shadow scalability enhancements Marcelo Tosatti
2007-12-21  0:18 ` [patch 1/5] KVM: concurrent guest walkers Marcelo Tosatti
2007-12-21  0:18 ` [patch 2/5] KVM: add kvm_read_guest_atomic() Marcelo Tosatti
2007-12-21  0:18 ` [patch 3/5] KVM: add kvm_follow_page() Marcelo Tosatti
     [not found]   ` <20071221002024.345682789-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-12-22 19:42     ` Avi Kivity
     [not found]       ` <476D68A3.3000200-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23  5:25         ` Marcelo Tosatti
2007-12-23  7:03           ` Avi Kivity [this message]
     [not found]             ` <476E083B.5040600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23  8:41               ` Avi Kivity
     [not found]                 ` <476E1F23.3020609-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23  8:59                   ` Avi Kivity
     [not found]                     ` <476E236A.9000800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23  9:26                       ` Andrew Morton
     [not found]                         ` <20071223012618.e8cf8320.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2007-12-23 10:35                           ` Avi Kivity
     [not found]                             ` <476E39F2.4030202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 10:52                               ` Andrew Morton
     [not found]                                 ` <20071223025245.d839c86d.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2007-12-23 11:01                                   ` Avi Kivity
     [not found]                                     ` <476E4005.90003-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 20:15                                       ` Marcelo Tosatti
2007-12-23 20:41                                         ` Andrew Morton
2007-12-24  6:56                                         ` Avi Kivity
     [not found]                                           ` <476F5801.2030002-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-24 14:26                                             ` Marcelo Tosatti
2007-12-24 14:36                                               ` Avi Kivity
2007-12-23 19:14                   ` Marcelo Tosatti
2007-12-24  6:50                     ` Avi Kivity
     [not found]                       ` <476F56A5.2020607-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-24 14:28                         ` Marcelo Tosatti
2007-12-24 14:34                           ` Avi Kivity
2007-12-21  0:18 ` [patch 4/5] KVM: export follow_page() Marcelo Tosatti
     [not found]   ` <20071221002024.423895760-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-12-22 19:43     ` Avi Kivity
2007-12-21  0:18 ` [patch 5/5] KVM: switch to mmu spinlock Marcelo Tosatti
     [not found] ` <20071221001821.029994250-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-12-22 19:46   ` [patch 0/5] KVM shadow scalability enhancements Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=476E083B.5040600@qumranet.com \
    --to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    --cc=marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox