Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Zachary Amsden <zach@vmware.com>
Cc: xen-devel <xen-devel@lists.xensource.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Rusty Russell <rusty@rustcorp.com.au>,
	LKML <linux-kernel@vger.kernel.org>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	Hugh Dickins <hugh@veritas.com>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write	abstraction
Date: Fri, 23 May 2008 21:32:39 +0100	[thread overview]
Message-ID: <483729E7.9010002@goop.org> (raw)
In-Reply-To: <1211567273.7465.36.camel@bodhitayantram.eng.vmware.com>

Zachary Amsden wrote:
> I'm a bit skeptical you can get such a semantic to work without a very
> heavyweight method in the hypervisor.  How do you guarantee no other CPU
> is fizzling the A/D bits in the page table (it can be done by hardware
> with direct page tables), unless you use some kind of IPI?  Is this why
> it is still 7x?
>   

No, you just use cmpxchg.  It's pretty lightweight really.  Xen holds a 
lock internally to stop other cpus from updating the pte in software, so 
the only source of modification is the hardware itself; the cmpxchg loop 
is guaranteed to terminate because the A/D bits can only transition from 
0->1.

I haven't really gone into depth as to exactly where the 7x number comes 
from.  I could increase the batch size (currently max of 32 pte 
updates/hypercall), and some of it is plain overhead from the in-kernel 
infrastructure.  A simpler and more hackish approach which basically 
pastes the Xen hypercall directly into the mprotect loop gets the 
overhead down to about 5.5x.

> Still, a 7x gain from asynchronous batching is very nice.  I wonder if
> that means the average mprotect size in your benchmark is 7 pages.
>   

Yeah, it's around 7x.  The batching pays off even for single page 
mprotects, because the trap and emulate of xchg is so expensive.

>> I believe that other virtualization systems, whether they use direct
>> paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
>> make use of this interface to improve the performance.
>>     
>
> On VMI, we don't trap the xchg of the pte, thus we don't have any
> bottleneck here to begin with.

If you're doing code rewriting then I guess you can effectively do the 
same trick at that point.  If not, then presumably you take a fault for 
the first pte updated in the mprotect and then sync the shadow up when 
the tlb flush happens; batching that trap and the tlb flush would give 
you some benefit for small mprotects.

    J

WARNING: multiple messages have this Message-ID (diff)

From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Zachary Amsden <zach@vmware.com>
Cc: Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
	xen-devel <xen-devel@lists.xensource.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Hugh Dickins <hugh@veritas.com>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write	abstraction
Date: Fri, 23 May 2008 21:32:39 +0100	[thread overview]
Message-ID: <483729E7.9010002@goop.org> (raw)
In-Reply-To: <1211567273.7465.36.camel@bodhitayantram.eng.vmware.com>

Zachary Amsden wrote:
> I'm a bit skeptical you can get such a semantic to work without a very
> heavyweight method in the hypervisor.  How do you guarantee no other CPU
> is fizzling the A/D bits in the page table (it can be done by hardware
> with direct page tables), unless you use some kind of IPI?  Is this why
> it is still 7x?
>   

No, you just use cmpxchg.  It's pretty lightweight really.  Xen holds a 
lock internally to stop other cpus from updating the pte in software, so 
the only source of modification is the hardware itself; the cmpxchg loop 
is guaranteed to terminate because the A/D bits can only transition from 
0->1.

I haven't really gone into depth as to exactly where the 7x number comes 
from.  I could increase the batch size (currently max of 32 pte 
updates/hypercall), and some of it is plain overhead from the in-kernel 
infrastructure.  A simpler and more hackish approach which basically 
pastes the Xen hypercall directly into the mprotect loop gets the 
overhead down to about 5.5x.

> Still, a 7x gain from asynchronous batching is very nice.  I wonder if
> that means the average mprotect size in your benchmark is 7 pages.
>   

Yeah, it's around 7x.  The batching pays off even for single page 
mprotects, because the trap and emulate of xchg is so expensive.

>> I believe that other virtualization systems, whether they use direct
>> paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
>> make use of this interface to improve the performance.
>>     
>
> On VMI, we don't trap the xchg of the pte, thus we don't have any
> bottleneck here to begin with.

If you're doing code rewriting then I guess you can effectively do the 
same trick at that point.  If not, then presumably you take a fault for 
the first pte updated in the mprotect and then sync the shadow up when 
the tlb flush happens; batching that trap and the tlb flush would give 
you some benefit for small mprotects.

    J

next prev parent reply	other threads:[~2008-05-23 20:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-23 14:20 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction Jeremy Fitzhardinge
2008-05-23 14:20 ` Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 1 of 4] mm: add a pte_rmw transaction abstraction Jeremy Fitzhardinge
2008-05-23 14:20   ` Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 2 of 4] paravirt: add hooks for pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20   ` Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 3 of 4] xen: implement pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 4 of 4] xen: add mechanism to extend existing multicalls Jeremy Fitzhardinge
2008-05-23 18:27 ` [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction Zachary Amsden
2008-05-23 20:32   ` Jeremy Fitzhardinge [this message]
2008-05-23 20:32     ` Jeremy Fitzhardinge
2008-05-23 23:25     ` Zachary Amsden
2008-05-31  0:13       ` Jeremy Fitzhardinge
2008-06-02 20:09         ` Zachary Amsden
2008-05-23 18:57 ` Linus Torvalds
2008-05-23 18:57   ` Linus Torvalds
2008-05-23 20:42   ` Jeremy Fitzhardinge
2008-05-23 20:42     ` Jeremy Fitzhardinge
2008-05-24 17:25     ` Linus Torvalds
2008-05-24 20:44       ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=483729E7.9010002@goop.org \
    --to=jeremy@goop.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hugh@veritas.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.osdl.org \
    --cc=xen-devel@lists.xensource.com \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.