Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Zachary Amsden <zach@vmware.com>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
	xen-devel <xen-devel@lists.xensource.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Hugh Dickins <hugh@veritas.com>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write	abstraction
Date: Fri, 23 May 2008 21:32:39 +0100	[thread overview]
Message-ID: <483729E7.9010002@goop.org> (raw)
In-Reply-To: <1211567273.7465.36.camel@bodhitayantram.eng.vmware.com>

Zachary Amsden wrote:
> I'm a bit skeptical you can get such a semantic to work without a very
> heavyweight method in the hypervisor.  How do you guarantee no other CPU
> is fizzling the A/D bits in the page table (it can be done by hardware
> with direct page tables), unless you use some kind of IPI?  Is this why
> it is still 7x?
>   

No, you just use cmpxchg.  It's pretty lightweight really.  Xen holds a 
lock internally to stop other cpus from updating the pte in software, so 
the only source of modification is the hardware itself; the cmpxchg loop 
is guaranteed to terminate because the A/D bits can only transition from 
0->1.

I haven't really gone into depth as to exactly where the 7x number comes 
from.  I could increase the batch size (currently max of 32 pte 
updates/hypercall), and some of it is plain overhead from the in-kernel 
infrastructure.  A simpler and more hackish approach which basically 
pastes the Xen hypercall directly into the mprotect loop gets the 
overhead down to about 5.5x.

> Still, a 7x gain from asynchronous batching is very nice.  I wonder if
> that means the average mprotect size in your benchmark is 7 pages.
>   

Yeah, it's around 7x.  The batching pays off even for single page 
mprotects, because the trap and emulate of xchg is so expensive.

>> I believe that other virtualization systems, whether they use direct
>> paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
>> make use of this interface to improve the performance.
>>     
>
> On VMI, we don't trap the xchg of the pte, thus we don't have any
> bottleneck here to begin with.

If you're doing code rewriting then I guess you can effectively do the 
same trick at that point.  If not, then presumably you take a fault for 
the first pte updated in the mprotect and then sync the shadow up when 
the tlb flush happens; batching that trap and the tlb flush would give 
you some benefit for small mprotects.

    J

next prev parent reply	other threads:[~2008-05-23 20:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-23 14:20 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 1 of 4] mm: add a pte_rmw transaction abstraction Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 2 of 4] paravirt: add hooks for pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 3 of 4] xen: implement pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 4 of 4] xen: add mechanism to extend existing multicalls Jeremy Fitzhardinge
2008-05-23 18:27 ` [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction Zachary Amsden
2008-05-23 20:32   ` Jeremy Fitzhardinge [this message]
2008-05-23 23:25     ` Zachary Amsden
2008-05-31  0:13       ` Jeremy Fitzhardinge
2008-06-02 20:09         ` Zachary Amsden
2008-05-23 18:57 ` Linus Torvalds
2008-05-23 20:42   ` Jeremy Fitzhardinge
2008-05-24 17:25     ` Linus Torvalds
2008-05-24 20:44       ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=483729E7.9010002@goop.org \
    --to=jeremy@goop.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hugh@veritas.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.osdl.org \
    --cc=xen-devel@lists.xensource.com \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox