virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>,
	xen-devel <xen-devel@lists.xensource.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Rusty Russell <rusty@rustcorp.com.au>,
	LKML <linux-kernel@vger.kernel.org>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	Hugh Dickins <hugh@veritas.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction
Date: Fri, 23 May 2008 15:20:48 +0100	[thread overview]
Message-ID: <patchbomb.1211552448@localhost> (raw)

Hi all,

This little series adds a new transaction-like abstraction for doing
RMW updates to a pte, hooks it into paravirt_ops, and then makes use
of it in Xen.

The basic problem is that mprotect is very slow under Xen (up to 50x
slower than native), primarily because of the

	ptent = ptep_get_and_clear(mm, addr, pte);
	ptent = pte_modify(ptent, newprot);
	/* ... */
	set_pte_at(mm, addr, pte, ptent);

sequence in mm/mprotect.c:change_pte_range().

This is bad for Xen for two reasons:

  1: ptep_get_and_clear() ends up being a xchg on the pte.  Since the
     pte page is read-only (as it must be, because Xen needs to
     control all pte updates), this traps into Xen, which then
     emulates the instruction.  Trapping into the instruction emulator
     is inherently fairly expensive.  And,

  2: because ptep_get_and_clear has atomic-fetch-and-update semantics,
     it's impossible to implement in a way which can be batched to amortize
     the cost of faulting into the hypervisor.

This series adds the pte_rmw_start() and pte_rmw_commit() operations,
which change this sequence to:

	ptent = pte_rmw_start(mm, addr, pte);
	ptent = pte_modify(ptent, newprot);
	/* ... */
	pte_rmw_commit(mm, addr, pte, ptent);

Which looks very familiar.  And, indeed, when compiled without
CONFIG_PARAVIRT (or on a non-x86 architecture), it will end up doing
precisely the same thing as before.

However, the effective semantics are a bit different.  pte_rmw_start()
means "I'm reading this pte with the intention of updating it; please
don't lose any hardware pte changes in the meantime".  And
pte_rmw_commit() means "Here's a new value for the pte, but make sure
you don't lose any hardware changes".

The default implementation achieves these semantics by making
pte_rmw_start() set the pte to non-present, which prevents any async
hardware changes to the pte.  The pte_rmw_commit() can then just write
the new value into place without having to worry about preserving any
changes, because it knows there are none.

Xen implements pte_rmw_start() as a simple read of the pte.  This
leaves the pte unchanged in memory, and the hardware may make
asynchronous changes to it.  It implements pte_rmw_commit() using a
hypercall which preserves the state of the Access/Dirty bits to update
the pte.  This allows the whole change_pte_range() loop to be run
without any synchronous unbatched traps into the hypervisor.  With
this change in place, an mprotect microbenchmark goes from being 50x
worse than native to around 7x, which is acceptible.

I believe that other virtualization systems, whether they use direct
paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
make use of this interface to improve the performance.

Unfortunately (or fortunately) there aren't very many other areas of
the kernel which can really take advantage of this.  There's only a
couple of other instances of ptep_get_and_clear() in mm/, and they're
being used in a similar way; but I don't think they're very
performance critical (though zap_pte_range might be interesting).

In general, mprotect is rarely a performance bottleneck.  But some
debugging libraries (such as electric fence) and garbage collectors
can be very heavy users of mprotect, and this change could materially
benefit them.

Thanks,
	J

             reply	other threads:[~2008-05-23 14:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-23 14:20 Jeremy Fitzhardinge [this message]
2008-05-23 14:20 ` [PATCH 1 of 4] mm: add a pte_rmw transaction abstraction Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 2 of 4] paravirt: add hooks for pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 3 of 4] xen: implement pte_rmw_start/commit Jeremy Fitzhardinge
2008-05-23 14:20 ` [PATCH 4 of 4] xen: add mechanism to extend existing multicalls Jeremy Fitzhardinge
2008-05-23 18:27 ` [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction Zachary Amsden
2008-05-23 20:32   ` Jeremy Fitzhardinge
2008-05-23 23:25     ` Zachary Amsden
2008-05-31  0:13       ` Jeremy Fitzhardinge
2008-06-02 20:09         ` Zachary Amsden
2008-05-23 18:57 ` Linus Torvalds
2008-05-23 20:42   ` Jeremy Fitzhardinge
2008-05-24 17:25     ` Linus Torvalds
2008-05-24 20:44       ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=patchbomb.1211552448@localhost \
    --to=jeremy@goop.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hugh@veritas.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.osdl.org \
    --cc=xen-devel@lists.xensource.com \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).