virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, xen-devel <xen-devel@lists.xensource.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Hugh Dickins <hugh@veritas.com>, Zachary Amsden <zach@vmware.com>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
Date: Sat, 31 May 2008 01:04:29 +0100	[thread overview]
Message-ID: <4deead25e96efc8ca783.1212192269@localhost> (raw)
In-Reply-To: <patchbomb.1212192268@localhost>

This patch adds an API for doing read-modify-write updates to a pte's
protection bits which may race against hardware updates to the pte.
After reading the pte, the hardware may asynchonously set the accessed
or dirty bits on a pte, which would be lost when writing back the
modified pte value.

The existing technique to handle this race is to use
ptep_get_and_clear() atomically fetch the old pte value and clear it
in memory.  This has the effect of marking the pte as non-present,
which will prevent the hardware from updating its state.  When the new
value is written back, the pte will be present again, and the hardware
can resume updating the access/dirty flags.

When running in a virtualized environment, pagetable updates are
relatively expensive, since they generally involve some trap into the
hypervisor.  To mitigate the cost of these updates, we tend to batch
them.

However, because of the atomic nature of ptep_get_and_clear(), it is
inherently non-batchable.  This new interface allows batching by
giving the underlying implementation enough information to open a
transaction between the read and write phases:

ptep_modify_prot_start() returns the current pte value, and puts the
  pte entry into a state where either the hardware will not update the
  pte, or if it does, the updates will be preserved on commit.

ptep_modify_prot_commit() writes back the updated pte, makes sure that
  any hardware updates made since ptep_modify_prot_start() are
  preserved.

ptep_modify_prot_start() and _commit() must be exactly paired, and
used while holding the appropriate pte lock.  They do not protect
against other software updates of the pte in any way.

The current implementations of ptep_modify_prot_start and _commit are
functionally unchanged from before: _start() uses ptep_get_and_clear()
fetch the pte and zero the entry, preventing any hardware updates.
_commit() simply writes the new pte value back knowing that the
hardware has not updated the pte in the meantime.

The only current user of this interface is mprotect

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/asm-generic/pgtable.h |   53 +++++++++++++++++++++++++++++++++++++++++
 mm/mprotect.c                 |   10 +++----
 2 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -197,6 +197,59 @@
 }
 #endif /* CONFIG_MMU */
 
+static inline pte_t __ptep_modify_prot_start(struct mm_struct *mm,
+					     unsigned long addr,
+					     pte_t *ptep)
+{
+	/* Get the current pte state, but zero it out to make it
+	   non-present, preventing the hardware from asynchronously
+	   updating it. */
+	return ptep_get_and_clear(mm, addr, ptep);
+}
+
+static inline void __ptep_modify_prot_commit(struct mm_struct *mm,
+					     unsigned long addr,
+					     pte_t *ptep, pte_t pte)
+{
+	/* The pte is non-present, so there's no hardware state to
+	   preserve. */
+	set_pte_at(mm, addr, ptep, pte);
+}
+
+#ifndef __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+/*
+ * Start a pte protection read-modify-write transaction, which
+ * protects against asynchronous hardware modifications to the pte.
+ * The intention is not to prevent the hardware from making pte
+ * updates, but to prevent any updates it may make from being lost.
+ *
+ * This does not protect against other software modifications of the
+ * pte; the appropriate pte lock must be held over the transation.
+ *
+ * Note that this interface is intended to be batchable, meaning that
+ * ptep_modify_prot_commit may not actually update the pte, but merely
+ * queue the update to be done at some later time.  The update must be
+ * actually committed before the pte lock is released, however.
+ */
+static inline pte_t ptep_modify_prot_start(struct mm_struct *mm,
+					   unsigned long addr,
+					   pte_t *ptep)
+{
+	return __ptep_modify_prot_start(mm, addr, ptep);
+}
+
+/*
+ * Commit an update to a pte, leaving any hardware-controlled bits in
+ * the PTE unmodified.
+ */
+static inline void ptep_modify_prot_commit(struct mm_struct *mm,
+					   unsigned long addr,
+					   pte_t *ptep, pte_t pte)
+{
+	__ptep_modify_prot_commit(mm, addr, ptep, pte);
+}
+#endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
+
 /*
  * A facility to provide lazy MMU batching.  This allows PTE updates and
  * page invalidations to be delayed until a call to leave lazy MMU mode
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -47,19 +47,17 @@
 		if (pte_present(oldpte)) {
 			pte_t ptent;
 
-			/* Avoid an SMP race with hardware updated dirty/clean
-			 * bits by wiping the pte and then setting the new pte
-			 * into place.
-			 */
-			ptent = ptep_get_and_clear(mm, addr, pte);
+			ptent = ptep_modify_prot_start(mm, addr, pte);
 			ptent = pte_modify(ptent, newprot);
+
 			/*
 			 * Avoid taking write faults for pages we know to be
 			 * dirty.
 			 */
 			if (dirty_accountable && pte_dirty(ptent))
 				ptent = pte_mkwrite(ptent);
-			set_pte_at(mm, addr, pte, ptent);
+
+			ptep_modify_prot_commit(mm, addr, pte, ptent);
 #ifdef CONFIG_MIGRATION
 		} else if (!pte_file(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);

  reply	other threads:[~2008-05-31  0:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-31  0:04 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
2008-05-31  0:04 ` Jeremy Fitzhardinge [this message]
2008-06-02 11:13   ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Ingo Molnar
2008-06-02 11:57     ` Jeremy Fitzhardinge
2008-06-02 13:02     ` [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_ Jeremy Fitzhardinge
2008-06-02 23:45       ` Rusty Russell
2008-06-02 23:53         ` Jeremy Fitzhardinge
2008-06-13  7:18         ` Ingo Molnar
2008-05-31  0:04 ` [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit Jeremy Fitzhardinge
2008-06-02 11:12   ` Ingo Molnar
2008-06-02 11:57     ` Jeremy Fitzhardinge
2008-05-31  0:04 ` [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2008-06-16 11:29 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
2008-06-16 11:30 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
2008-06-16 17:29   ` Linus Torvalds
2008-06-16 18:13     ` Hugh Dickins
2008-06-16 18:49       ` Ingo Molnar
2008-06-18 23:23   ` Benjamin Herrenschmidt
2008-06-18 23:59     ` Jeremy Fitzhardinge
2008-06-19  0:15       ` Jeremy Fitzhardinge
2008-06-19  0:24         ` Linus Torvalds
2008-06-19  0:37           ` Jeremy Fitzhardinge
2008-06-19  0:49             ` Linus Torvalds
2008-06-19  4:03               ` Linus Torvalds
2008-06-19 11:58                 ` Ingo Molnar
2008-06-19 12:03                   ` Ingo Molnar
2008-06-19 12:20                   ` Akinobu Mita
2008-06-19 16:30                   ` Linus Torvalds
2008-06-19 16:47                     ` Ingo Molnar
2008-06-20 10:10                       ` Ingo Molnar
2008-06-20 19:06                         ` Jeremy Fitzhardinge
2008-06-20 19:15                           ` Linus Torvalds
2008-06-20 19:56                             ` Ingo Molnar
2008-06-20 20:03                               ` Linus Torvalds
2008-06-20 20:16                                 ` Jeremy Fitzhardinge
2008-06-20 20:22                                   ` Jeremy Fitzhardinge
2008-06-21  6:06                                     ` Ingo Molnar
2008-06-20 20:05                               ` Jeremy Fitzhardinge
2008-06-19  0:39           ` Benjamin Herrenschmidt
2008-06-19  5:03             ` Jeremy Fitzhardinge
2008-06-19  7:20               ` Benjamin Herrenschmidt
2008-06-19 17:57                 ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4deead25e96efc8ca783.1212192269@localhost \
    --to=jeremy@goop.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hugh@veritas.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.osdl.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).