From: Zachary Amsden <zach@vmware.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
xen-devel <xen-devel@lists.xensource.com>,
Thomas Gleixner <tglx@linutronix.de>,
Hugh Dickins <hugh@veritas.com>,
kvm-devel <kvm-devel@lists.sourceforge.net>,
Virtualization Mailing List <virtualization@lists.osdl.org>,
Rusty Russell <rusty@rustcorp.com.au>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction
Date: Fri, 23 May 2008 16:25:22 -0700 [thread overview]
Message-ID: <1211585122.7465.70.camel@bodhitayantram.eng.vmware.com> (raw)
In-Reply-To: <483729E7.9010002@goop.org>
On Fri, 2008-05-23 at 21:32 +0100, Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
> > I'm a bit skeptical you can get such a semantic to work without a very
> > heavyweight method in the hypervisor. How do you guarantee no other CPU
> > is fizzling the A/D bits in the page table (it can be done by hardware
> > with direct page tables), unless you use some kind of IPI? Is this why
> > it is still 7x?
> >
>
> No, you just use cmpxchg. It's pretty lightweight really. Xen holds a
> lock internally to stop other cpus from updating the pte in software, so
> the only source of modification is the hardware itself; the cmpxchg loop
> is guaranteed to terminate because the A/D bits can only transition from
> 0->1.
Ah yes, you're not worried about invalidations. You can actually do better using a lock; xor combination, which will allow you to flip any of the protection bits without looping (you are guaranteed on Linux not to have concurrent updates by the guest holding the pagetable lock). It might fail for other guests though, and I'm not sure its any cheaper on modern processors (in fact, it wouldn't surprise me if Intel optimized cmpxchg so it was cheaper).
> >> I believe that other virtualization systems, whether they use direct
> >> paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
> >> make use of this interface to improve the performance.
> >>
> >
> > On VMI, we don't trap the xchg of the pte, thus we don't have any
> > bottleneck here to begin with.
>
> If you're doing code rewriting then I guess you can effectively do the
> same trick at that point. If not, then presumably you take a fault for
> the first pte updated in the mprotect and then sync the shadow up when
> the tlb flush happens; batching that trap and the tlb flush would give
> you some benefit for small mprotects.
We don't fault. We write directly to the primary page tables, and clear
the pte just like native. We just issue all mprotect updates in the
queue, and flush the queue when leaving lazy mmu mode. You can't wait
for the TLB flush, you must flush the updates before releasing the
pagetable lock, or you could get misordered updates in an SMP system.
A/D bits are propagated from shadow to primary by getting page faults on
an access that would set an A/D bit in hardware; if we get a page fault
for what would be an A/D bit update in the window where the primary PTE
has been cleared, we convert it to a guest fault (just as native
hardware would). Linux is already prepared to handle these spurious
faults by revalidating the mapping.
Zach
next prev parent reply other threads:[~2008-05-23 23:25 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-23 14:20 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 1 of 4] mm: add a pte_rmw transaction abstraction Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 2 of 4] paravirt: add hooks for pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 3 of 4] xen: implement pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 4 of 4] xen: add mechanism to extend existing multicalls Jeremy Fitzhardinge
2008-05-23 18:27 ` [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction Zachary Amsden
2008-05-23 20:32 ` Jeremy Fitzhardinge
2008-05-23 23:25 ` Zachary Amsden [this message]
2008-05-31 0:13 ` Jeremy Fitzhardinge
2008-06-02 20:09 ` Zachary Amsden
2008-05-23 18:57 ` Linus Torvalds
2008-05-23 20:42 ` Jeremy Fitzhardinge
2008-05-24 17:25 ` Linus Torvalds
2008-05-24 20:44 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1211585122.7465.70.camel@bodhitayantram.eng.vmware.com \
--to=zach@vmware.com \
--cc=a.p.zijlstra@chello.nl \
--cc=hugh@veritas.com \
--cc=jeremy@goop.org \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=virtualization@lists.osdl.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).