From: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
To: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Cc: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
kvm-devel
<kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [PATCH] Use cmpxchg for pte updates on walk_addr()
Date: Thu, 6 Dec 2007 21:32:37 -0500 [thread overview]
Message-ID: <20071207023237.GA2841@dmt> (raw)
In-Reply-To: <47581418.8000506-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
On Thu, Dec 06, 2007 at 05:24:08PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
> > In preparation for multi-threaded guest pte walking, use cmpxchg()
> > when updating guest pte's. This guarantees that the assignment of the
> > dirty bit can't be lost if two CPU's are faulting the same address
> > simultaneously.
> >
> > In case of pte update via write emulation, no synchronization is
> > necessary since its the guest responsability.
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >
> >
> > diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
> > index b24bc7c..593073a 100644
> > --- a/drivers/kvm/paging_tmpl.h
> > +++ b/drivers/kvm/paging_tmpl.h
> > @@ -78,6 +78,23 @@ static gfn_t gpte_to_gfn_pde(pt_element_t gpte)
> > return (gpte & PT_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
> > }
> >
> > +static inline void FNAME(kvm_cmpxchg_guest_pte)(struct kvm *kvm,
> > + gfn_t table_gfn, unsigned index, pt_element_t new_bits)
> >
>
> You can drop the kvm_ prefix since this is static.
>
> > +{
> > + pt_element_t pte;
> > + pt_element_t *table;
> > + struct page *page;
> > +
> > + page = gfn_to_page(kvm, table_gfn);
> > + table = kmap_atomic(page, KM_USER0);
> > +
> > + do {
> > + pte = table[index];
> > + } while (cmpxchg(&table[index], pte, pte | new_bits) != pte);
> > +
> > + kunmap_atomic(page, KM_USER0);
> > +}
> >
>
> I don't think cmpxchg() handles 64-bits on i386; you need cmpxchg64().
>
> What happens if the guest changed the pte under our feet (say, one vcpu
> accesses the page, the other vcpu swaps it out, setting the pte to
> zero)? We end up modifying the new pte. So we need to compare against
> the original pte, and abort the walk if we fail to set the bit.
Right, patch at end of the message restarts the process if the pte
changes under the walker. The goto is pretty ugly, but I fail to see any
elegant way of doing that. Ideas?
> > diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
> > index c70ac33..edba8ce 100644
> > --- a/drivers/kvm/x86.c
> > +++ b/drivers/kvm/x86.c
> > @@ -1510,6 +1510,9 @@ static int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
> > {
> > int ret;
> >
> > + /* No need for kvm_cmpxchg_guest_pte here, its the guest
> > + * responsability to synchronize pte updates and page faults.
> > + */
> > ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
> > if (ret < 0)
> > return 0;
>
> Hmm. What if an i386 pae guest carefully uses cmpxchg8b to atomically
> set a pte? kvm_write_guest() doesn't guarantee atomicity, so an
> intended atomic write can be seen splitted by the guest walker doing a
> concurrent walk.
True, an atomic write is needed... a separate patch for that seems more
appropriate.
diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
index b24bc7c..a39f8a6 100644
--- a/drivers/kvm/paging_tmpl.h
+++ b/drivers/kvm/paging_tmpl.h
@@ -34,7 +34,9 @@
#define PT_LEVEL_BITS PT64_LEVEL_BITS
#ifdef CONFIG_X86_64
#define PT_MAX_FULL_LEVELS 4
+ #define CMPXCHG cmpxchg
#else
+ #define CMPXCHG cmpxchg64
#define PT_MAX_FULL_LEVELS 2
#endif
#elif PTTYPE == 32
@@ -48,6 +50,7 @@
#define PT_LEVEL_MASK(level) PT32_LEVEL_MASK(level)
#define PT_LEVEL_BITS PT32_LEVEL_BITS
#define PT_MAX_FULL_LEVELS 2
+ #define CMPXCHG cmpxchg
#else
#error Invalid PTTYPE value
#endif
@@ -78,6 +81,24 @@ static gfn_t gpte_to_gfn_pde(pt_element_t gpte)
return (gpte & PT_DIR_BASE_ADDR_MASK) >> PAGE_SHIFT;
}
+static inline bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
+ gfn_t table_gfn, unsigned index,
+ pt_element_t orig_pte, pt_element_t new_pte)
+{
+ pt_element_t ret;
+ pt_element_t *table;
+ struct page *page;
+
+ page = gfn_to_page(kvm, table_gfn);
+ table = kmap_atomic(page, KM_USER0);
+
+ ret = CMPXCHG(&table[index], orig_pte, new_pte);
+
+ kunmap_atomic(page, KM_USER0);
+
+ return (ret != orig_pte);
+}
+
/*
* Fetch a guest pte for a guest virtual address
*/
@@ -91,6 +112,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
gpa_t pte_gpa;
pgprintk("%s: addr %lx\n", __FUNCTION__, addr);
+walk:
walker->level = vcpu->mmu.root_level;
pte = vcpu->cr3;
#if PTTYPE == 64
@@ -135,8 +157,9 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
if (!(pte & PT_ACCESSED_MASK)) {
mark_page_dirty(vcpu->kvm, table_gfn);
- pte |= PT_ACCESSED_MASK;
- kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte));
+ if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn,
+ index, pte, pte|PT_ACCESSED_MASK))
+ goto walk;
}
if (walker->level == PT_PAGE_TABLE_LEVEL) {
@@ -159,9 +182,13 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
}
if (write_fault && !is_dirty_pte(pte)) {
+ bool ret;
mark_page_dirty(vcpu->kvm, table_gfn);
- pte |= PT_DIRTY_MASK;
- kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte));
+ ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
+ pte|PT_DIRTY_MASK);
+ if (ret)
+ goto walk;
+
kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte));
}
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
next prev parent reply other threads:[~2007-12-07 2:32 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-06 15:04 [PATCH] Use cmpxchg for pte updates on walk_addr() Marcelo Tosatti
2007-12-06 15:24 ` Avi Kivity
[not found] ` <47581418.8000506-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-07 2:32 ` Marcelo Tosatti [this message]
2007-12-07 5:06 ` Avi Kivity
[not found] ` <4758D4D2.8090208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-07 12:56 ` Marcelo Tosatti
2007-12-07 17:54 ` Andrea Arcangeli
2007-12-09 8:38 ` Avi Kivity
2007-12-07 22:47 ` Marcelo Tosatti
2007-12-09 8:47 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071207023237.GA2841@dmt \
--to=marcelo-bw31mazkks3ytjvyw6ydsg@public.gmane.org \
--cc=avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox