From: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
To: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Cc: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
kvm-devel
<kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Matt Mackall <mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org>
Subject: Re: [patch 3/5] KVM: add kvm_follow_page()
Date: Sun, 23 Dec 2007 15:15:25 -0500 [thread overview]
Message-ID: <20071223201525.GA14259@dmt> (raw)
In-Reply-To: <476E4005.90003-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
On Sun, Dec 23, 2007 at 01:01:25PM +0200, Avi Kivity wrote:
> Andrew Morton wrote:
> >On Sun, 23 Dec 2007 12:35:30 +0200 Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> >
> >
> >>Andrew Morton wrote:
> >>
> >>>On Sun, 23 Dec 2007 10:59:22 +0200 Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> >>>
> >>>
> >>>
> >>>>Avi Kivity wrote:
> >>>>
> >>>>
> >>>>>Avi Kivity wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Exactly. But it is better to be explicit about it and pass the page
> >>>>>>directly like you did before. I hate to make you go back-and-fourth,
> >>>>>>but I did not understand the issue completely before.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>btw, the call to gfn_to_page() can happen in page_fault() instead of
> >>>>>walk_addr(); that will reduce the amount of error handling, and will
> >>>>>simplify the callers to walk_addr() that don't need the page.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>Note further that all this doesn't obviate the need for follow_page()
> >>>>(or get_user_pages_inatomic()); we still need something in update_pte()
> >>>>for the demand paging case.
> >>>>
> >>>>
> >>>Please review -mm's mm/pagewalk.c for suitability.
> >>>
> >>>If is is unsuitable but repairable then please cc Matt Mackall
> >>><mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org> on the review.
> >>>
> >>>
> >>>
> >>The "no locks are taken" comment is very worrying. We need accurate
> >>results.
> >>
> >
> >take down_read(mm->mmap_sem) before calling it..
> >
> >You have to do that anyway for its results to be meaningful in the caller.
> >Ditto get_user_pages().
> >
> >
>
> Yes, but what about the page table locks?
>
> follow_page() is much more thorough.
It can acquire the pagetablelock in the callback handler. But then,
vm_normal_page() must also be exported.
Are you guys OK with this ?
Modular KVM needs walk_page_range(), and also vm_normal_page() to be
used on pagewalk callback.
Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Index: kvm.quilt/mm/pagewalk.c
===================================================================
--- kvm.quilt.orig/mm/pagewalk.c
+++ kvm.quilt/mm/pagewalk.c
@@ -1,6 +1,7 @@
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/sched.h>
+#include <linux/module.h>
static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
const struct mm_walk *walk, void *private)
@@ -129,3 +130,4 @@ int walk_page_range(const struct mm_stru
return err;
}
+EXPORT_SYMBOL(walk_page_range);
Index: kvm.quilt/mm/memory.c
===================================================================
--- kvm.quilt.orig/mm/memory.c
+++ kvm.quilt/mm/memory.c
@@ -412,6 +412,7 @@ struct page *vm_normal_page(struct vm_ar
*/
return pfn_to_page(pfn);
}
+EXPORT_SYMBOL(vm_normal_page);
/*
* copy one vm_area from one task to the other. Assumes the page tables
[PATCH] KVM: add kvm_follow_page()
In preparation for a mmu spinlock, avoid schedules in mmu_set_spte()
by using walk_page_range() instead of get_user_pages().
The fault handling work by get_user_pages() now happens outside the lock.
Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Index: kvm.quilt/arch/x86/kvm/mmu.c
===================================================================
--- kvm.quilt.orig/arch/x86/kvm/mmu.c
+++ kvm.quilt/arch/x86/kvm/mmu.c
@@ -882,14 +882,18 @@ struct page *gva_to_page(struct kvm_vcpu
return gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
}
+/*
+ * mmu_set_spte must be called with an additional reference on
+ * struct page *page, and it will release that if necessary (so
+ * the callers should not worry about it).
+ */
static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
unsigned pt_access, unsigned pte_access,
int user_fault, int write_fault, int dirty,
- int *ptwrite, gfn_t gfn)
+ int *ptwrite, gfn_t gfn, struct page *page)
{
u64 spte;
int was_rmapped = is_rmap_pte(*shadow_pte);
- struct page *page;
pgprintk("%s: spte %llx access %x write_fault %d"
" user_fault %d gfn %lx\n",
@@ -907,8 +911,6 @@ static void mmu_set_spte(struct kvm_vcpu
if (!(pte_access & ACC_EXEC_MASK))
spte |= PT64_NX_MASK;
- page = gfn_to_page(vcpu->kvm, gfn);
-
spte |= PT_PRESENT_MASK;
if (pte_access & ACC_USER_MASK)
spte |= PT_USER_MASK;
@@ -983,8 +985,10 @@ static int nonpaging_map(struct kvm_vcpu
table = __va(table_addr);
if (level == 1) {
+ struct page *page;
+ page = gfn_to_page(vcpu->kvm, gfn);
mmu_set_spte(vcpu, &table[index], ACC_ALL, ACC_ALL,
- 0, write, 1, &pt_write, gfn);
+ 0, write, 1, &pt_write, gfn, page);
return pt_write || is_io_pte(table[index]);
}
Index: kvm.quilt/include/linux/kvm_host.h
===================================================================
--- kvm.quilt.orig/include/linux/kvm_host.h
+++ kvm.quilt/include/linux/kvm_host.h
@@ -163,6 +163,7 @@ int kvm_arch_set_memory_region(struct kv
int user_alloc);
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *kvm_find_page(struct kvm *kvm, gfn_t gfn);
void kvm_release_page_clean(struct page *page);
void kvm_release_page_dirty(struct page *page);
int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
Index: kvm.quilt/virt/kvm/kvm_main.c
===================================================================
--- kvm.quilt.orig/virt/kvm/kvm_main.c
+++ kvm.quilt/virt/kvm/kvm_main.c
@@ -453,6 +453,54 @@ static unsigned long gfn_to_hva(struct k
return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE);
}
+static int kvm_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+ void *private)
+{
+ struct page **page = private;
+ struct vm_area_struct *vma;
+ pte_t *ptep, pte;
+ spinlock_t *ptl;
+ int err = -EFAULT;
+
+ vma = find_vma(current->mm, addr);
+ if (!vma)
+ return err;
+
+ ptep = pte_offset_map_lock(current->mm, pmd, addr, &ptl);
+ pte = *ptep;
+ if (!pte_present(pte))
+ goto unlock;
+
+ *page = vm_normal_page(vma, addr, pte);
+ if (!*page)
+ goto unlock;
+
+ get_page(*page);
+ err = 0;
+unlock:
+ pte_unmap_unlock(ptep, ptl);
+ return err;
+}
+
+static struct mm_walk kvm_mm_walk = { .pmd_entry = kvm_pte_range };
+
+struct page *kvm_find_page(struct kvm *kvm, gfn_t gfn)
+{
+ unsigned long addr;
+ struct page *page = NULL;
+
+ addr = gfn_to_hva(kvm, gfn);
+ /* MMIO access */
+ if (kvm_is_error_hva(addr)) {
+ get_page(bad_page);
+ return bad_page;
+ }
+
+ walk_page_range(current->mm, addr, addr+PAGE_SIZE, &kvm_mm_walk,
+ &page);
+ return page;
+}
+
/*
* Requires current->mm->mmap_sem to be held
*/
Index: kvm.quilt/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.quilt.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm.quilt/arch/x86/kvm/paging_tmpl.h
@@ -67,6 +67,7 @@ struct guest_walker {
gfn_t table_gfn[PT_MAX_FULL_LEVELS];
pt_element_t ptes[PT_MAX_FULL_LEVELS];
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
+ struct page *page;
unsigned pt_access;
unsigned pte_access;
gfn_t gfn;
@@ -203,14 +204,18 @@ walk:
--walker->level;
}
+ walker->page = gfn_to_page(vcpu->kvm, walker->gfn);
+
if (write_fault && !is_dirty_pte(pte)) {
bool ret;
mark_page_dirty(vcpu->kvm, table_gfn);
ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
pte|PT_DIRTY_MASK);
- if (ret)
+ if (ret) {
+ kvm_release_page_clean(walker->page);
goto walk;
+ }
pte |= PT_DIRTY_MASK;
mutex_lock(&vcpu->kvm->lock);
kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte));
@@ -241,12 +246,13 @@ err:
return 0;
}
-static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
+static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *spage,
u64 *spte, const void *pte, int bytes,
int offset_in_pte)
{
pt_element_t gpte;
unsigned pte_access;
+ struct page *page;
gpte = *(const pt_element_t *)pte;
if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
@@ -256,10 +262,15 @@ static void FNAME(update_pte)(struct kvm
}
if (bytes < sizeof(pt_element_t))
return;
+
+ page = kvm_find_page(vcpu->kvm, gpte_to_gfn(gpte));
+ if (!page)
+ return;
+
pgprintk("%s: gpte %llx spte %p\n", __FUNCTION__, (u64)gpte, spte);
- pte_access = page->role.access & FNAME(gpte_access)(vcpu, gpte);
- mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
- gpte & PT_DIRTY_MASK, NULL, gpte_to_gfn(gpte));
+ pte_access = spage->role.access & FNAME(gpte_access)(vcpu, gpte);
+ mmu_set_spte(vcpu, spte, spage->role.access, pte_access, 0, 0,
+ gpte & PT_DIRTY_MASK, NULL, gpte_to_gfn(gpte), page);
}
/*
@@ -323,8 +334,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
r = kvm_read_guest_atomic(vcpu->kvm,
walker->pte_gpa[level - 2],
&curr_pte, sizeof(curr_pte));
- if (r || curr_pte != walker->ptes[level - 2])
+ if (r || curr_pte != walker->ptes[level - 2]) {
+ kvm_release_page_clean(walker->page);
return NULL;
+ }
}
shadow_addr = __pa(shadow_page->spt);
shadow_pte = shadow_addr | PT_PRESENT_MASK | PT_ACCESSED_MASK
@@ -335,7 +348,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu
mmu_set_spte(vcpu, shadow_ent, access, walker->pte_access & access,
user_fault, write_fault,
walker->ptes[walker->level-1] & PT_DIRTY_MASK,
- ptwrite, walker->gfn);
+ ptwrite, walker->gfn, walker->page);
return shadow_ent;
}
@@ -425,6 +438,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kv
if (r) {
gpa = gfn_to_gpa(walker.gfn);
gpa |= vaddr & ~PAGE_MASK;
+ kvm_release_page_clean(walker.page);
}
return gpa;
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
next prev parent reply other threads:[~2007-12-23 20:15 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-21 0:18 [patch 0/5] KVM shadow scalability enhancements Marcelo Tosatti
2007-12-21 0:18 ` [patch 1/5] KVM: concurrent guest walkers Marcelo Tosatti
2007-12-21 0:18 ` [patch 2/5] KVM: add kvm_read_guest_atomic() Marcelo Tosatti
2007-12-21 0:18 ` [patch 3/5] KVM: add kvm_follow_page() Marcelo Tosatti
[not found] ` <20071221002024.345682789-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-12-22 19:42 ` Avi Kivity
[not found] ` <476D68A3.3000200-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 5:25 ` Marcelo Tosatti
2007-12-23 7:03 ` Avi Kivity
[not found] ` <476E083B.5040600-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 8:41 ` Avi Kivity
[not found] ` <476E1F23.3020609-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 8:59 ` Avi Kivity
[not found] ` <476E236A.9000800-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 9:26 ` Andrew Morton
[not found] ` <20071223012618.e8cf8320.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2007-12-23 10:35 ` Avi Kivity
[not found] ` <476E39F2.4030202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 10:52 ` Andrew Morton
[not found] ` <20071223025245.d839c86d.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2007-12-23 11:01 ` Avi Kivity
[not found] ` <476E4005.90003-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-23 20:15 ` Marcelo Tosatti [this message]
2007-12-23 20:41 ` Andrew Morton
2007-12-24 6:56 ` Avi Kivity
[not found] ` <476F5801.2030002-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-24 14:26 ` Marcelo Tosatti
2007-12-24 14:36 ` Avi Kivity
2007-12-23 19:14 ` Marcelo Tosatti
2007-12-24 6:50 ` Avi Kivity
[not found] ` <476F56A5.2020607-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-24 14:28 ` Marcelo Tosatti
2007-12-24 14:34 ` Avi Kivity
2007-12-21 0:18 ` [patch 4/5] KVM: export follow_page() Marcelo Tosatti
[not found] ` <20071221002024.423895760-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-12-22 19:43 ` Avi Kivity
2007-12-21 0:18 ` [patch 5/5] KVM: switch to mmu spinlock Marcelo Tosatti
[not found] ` <20071221001821.029994250-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-12-22 19:46 ` [patch 0/5] KVM shadow scalability enhancements Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071223201525.GA14259@dmt \
--to=marcelo-bw31mazkks3ytjvyw6ydsg@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.