* [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
2008-05-31 0:04 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
@ 2008-05-31 0:04 ` Jeremy Fitzhardinge
2008-06-02 11:13 ` Ingo Molnar
2008-05-31 0:04 ` [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit Jeremy Fitzhardinge
2008-05-31 0:04 ` [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit Jeremy Fitzhardinge
2 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-31 0:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, x86, xen-devel, Thomas Gleixner, Hugh Dickins,
Zachary Amsden, kvm-devel, Virtualization Mailing List,
Rusty Russell, Peter Zijlstra, Linus Torvalds
This patch adds an API for doing read-modify-write updates to a pte's
protection bits which may race against hardware updates to the pte.
After reading the pte, the hardware may asynchonously set the accessed
or dirty bits on a pte, which would be lost when writing back the
modified pte value.
The existing technique to handle this race is to use
ptep_get_and_clear() atomically fetch the old pte value and clear it
in memory. This has the effect of marking the pte as non-present,
which will prevent the hardware from updating its state. When the new
value is written back, the pte will be present again, and the hardware
can resume updating the access/dirty flags.
When running in a virtualized environment, pagetable updates are
relatively expensive, since they generally involve some trap into the
hypervisor. To mitigate the cost of these updates, we tend to batch
them.
However, because of the atomic nature of ptep_get_and_clear(), it is
inherently non-batchable. This new interface allows batching by
giving the underlying implementation enough information to open a
transaction between the read and write phases:
ptep_modify_prot_start() returns the current pte value, and puts the
pte entry into a state where either the hardware will not update the
pte, or if it does, the updates will be preserved on commit.
ptep_modify_prot_commit() writes back the updated pte, makes sure that
any hardware updates made since ptep_modify_prot_start() are
preserved.
ptep_modify_prot_start() and _commit() must be exactly paired, and
used while holding the appropriate pte lock. They do not protect
against other software updates of the pte in any way.
The current implementations of ptep_modify_prot_start and _commit are
functionally unchanged from before: _start() uses ptep_get_and_clear()
fetch the pte and zero the entry, preventing any hardware updates.
_commit() simply writes the new pte value back knowing that the
hardware has not updated the pte in the meantime.
The only current user of this interface is mprotect
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
include/asm-generic/pgtable.h | 53 +++++++++++++++++++++++++++++++++++++++++
mm/mprotect.c | 10 +++----
2 files changed, 57 insertions(+), 6 deletions(-)
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -197,6 +197,59 @@
}
#endif /* CONFIG_MMU */
+static inline pte_t __ptep_modify_prot_start(struct mm_struct *mm,
+ unsigned long addr,
+ pte_t *ptep)
+{
+ /* Get the current pte state, but zero it out to make it
+ non-present, preventing the hardware from asynchronously
+ updating it. */
+ return ptep_get_and_clear(mm, addr, ptep);
+}
+
+static inline void __ptep_modify_prot_commit(struct mm_struct *mm,
+ unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ /* The pte is non-present, so there's no hardware state to
+ preserve. */
+ set_pte_at(mm, addr, ptep, pte);
+}
+
+#ifndef __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+/*
+ * Start a pte protection read-modify-write transaction, which
+ * protects against asynchronous hardware modifications to the pte.
+ * The intention is not to prevent the hardware from making pte
+ * updates, but to prevent any updates it may make from being lost.
+ *
+ * This does not protect against other software modifications of the
+ * pte; the appropriate pte lock must be held over the transation.
+ *
+ * Note that this interface is intended to be batchable, meaning that
+ * ptep_modify_prot_commit may not actually update the pte, but merely
+ * queue the update to be done at some later time. The update must be
+ * actually committed before the pte lock is released, however.
+ */
+static inline pte_t ptep_modify_prot_start(struct mm_struct *mm,
+ unsigned long addr,
+ pte_t *ptep)
+{
+ return __ptep_modify_prot_start(mm, addr, ptep);
+}
+
+/*
+ * Commit an update to a pte, leaving any hardware-controlled bits in
+ * the PTE unmodified.
+ */
+static inline void ptep_modify_prot_commit(struct mm_struct *mm,
+ unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ __ptep_modify_prot_commit(mm, addr, ptep, pte);
+}
+#endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
+
/*
* A facility to provide lazy MMU batching. This allows PTE updates and
* page invalidations to be delayed until a call to leave lazy MMU mode
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -47,19 +47,17 @@
if (pte_present(oldpte)) {
pte_t ptent;
- /* Avoid an SMP race with hardware updated dirty/clean
- * bits by wiping the pte and then setting the new pte
- * into place.
- */
- ptent = ptep_get_and_clear(mm, addr, pte);
+ ptent = ptep_modify_prot_start(mm, addr, pte);
ptent = pte_modify(ptent, newprot);
+
/*
* Avoid taking write faults for pages we know to be
* dirty.
*/
if (dirty_accountable && pte_dirty(ptent))
ptent = pte_mkwrite(ptent);
- set_pte_at(mm, addr, pte, ptent);
+
+ ptep_modify_prot_commit(mm, addr, pte, ptent);
#ifdef CONFIG_MIGRATION
} else if (!pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
2008-05-31 0:04 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
@ 2008-06-02 11:13 ` Ingo Molnar
2008-06-02 11:57 ` Jeremy Fitzhardinge
2008-06-02 13:02 ` [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_ Jeremy Fitzhardinge
0 siblings, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 2008-06-02 11:13 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> + /* Get the current pte state, but zero it out to make it
> + non-present, preventing the hardware from asynchronously
> + updating it. */
please use standard and consistent comment style, similar to:
> +/*
> + * Start a pte protection read-modify-write transaction, which
> + * protects against asynchronous hardware modifications to the pte.
> + * The intention is not to prevent the hardware from making pte
> + * updates, but to prevent any updates it may make from being lost.
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
2008-06-02 11:13 ` Ingo Molnar
@ 2008-06-02 11:57 ` Jeremy Fitzhardinge
2008-06-02 13:02 ` [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_ Jeremy Fitzhardinge
1 sibling, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-02 11:57 UTC (permalink / raw)
To: Ingo Molnar
Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>
>> + /* Get the current pte state, but zero it out to make it
>> + non-present, preventing the hardware from asynchronously
>> + updating it. */
>>
>
> please use standard and consistent comment style, similar to:
>
OK.
J
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_
2008-06-02 11:13 ` Ingo Molnar
2008-06-02 11:57 ` Jeremy Fitzhardinge
@ 2008-06-02 13:02 ` Jeremy Fitzhardinge
2008-06-02 23:45 ` Rusty Russell
1 sibling, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-02 13:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
include/asm-generic/pgtable.h | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
===================================================================
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -201,9 +201,11 @@
unsigned long addr,
pte_t *ptep)
{
- /* Get the current pte state, but zero it out to make it
- non-present, preventing the hardware from asynchronously
- updating it. */
+ /*
+ * Get the current pte state, but zero it out to make it
+ * non-present, preventing the hardware from asynchronously
+ * updating it.
+ */
return ptep_get_and_clear(mm, addr, ptep);
}
@@ -211,8 +213,10 @@
unsigned long addr,
pte_t *ptep, pte_t pte)
{
- /* The pte is non-present, so there's no hardware state to
- preserve. */
+ /*
+ * The pte is non-present, so there's no hardware state to
+ * preserve.
+ */
set_pte_at(mm, addr, ptep, pte);
}
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_
2008-06-02 13:02 ` [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_ Jeremy Fitzhardinge
@ 2008-06-02 23:45 ` Rusty Russell
2008-06-02 23:53 ` Jeremy Fitzhardinge
2008-06-13 7:18 ` Ingo Molnar
0 siblings, 2 replies; 12+ messages in thread
From: Rusty Russell @ 2008-06-02 23:45 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Ingo Molnar, LKML, x86, xen-devel, Thomas Gleixner, Hugh Dickins,
Zachary Amsden, kvm-devel, Virtualization Mailing List,
Peter Zijlstra, Linus Torvalds
On Monday 02 June 2008 23:02:34 Jeremy Fitzhardinge wrote:
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
> include/asm-generic/pgtable.h | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> ===================================================================
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -201,9 +201,11 @@
> unsigned long addr,
> pte_t *ptep)
> {
> - /* Get the current pte state, but zero it out to make it
> - non-present, preventing the hardware from asynchronously
> - updating it. */
> + /*
> + * Get the current pte state, but zero it out to make it
> + * non-present, preventing the hardware from asynchronously
> + * updating it.
> + */
Since there is debate over whether winged comments are a feature, I'm not sure
this can be termed a "fix".
> {
> - /* The pte is non-present, so there's no hardware state to
> - preserve. */
> + /*
> + * The pte is non-present, so there's no hardware state to
> + * preserve.
> + */
> set_pte_at(mm, addr, ptep, pte);
> }
This will fit in one line, no?
Rusty.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_
2008-06-02 23:45 ` Rusty Russell
@ 2008-06-02 23:53 ` Jeremy Fitzhardinge
2008-06-13 7:18 ` Ingo Molnar
1 sibling, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-02 23:53 UTC (permalink / raw)
To: Rusty Russell
Cc: Zachary Amsden, xen-devel, Peter Zijlstra, kvm-devel, x86, LKML,
Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
Linus Torvalds, Thomas Gleixner
Rusty Russell wrote:
> On Monday 02 June 2008 23:02:34 Jeremy Fitzhardinge wrote:
>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> ---
>> include/asm-generic/pgtable.h | 14 +++++++++-----
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> ===================================================================
>> --- a/include/asm-generic/pgtable.h
>> +++ b/include/asm-generic/pgtable.h
>> @@ -201,9 +201,11 @@
>> unsigned long addr,
>> pte_t *ptep)
>> {
>> - /* Get the current pte state, but zero it out to make it
>> - non-present, preventing the hardware from asynchronously
>> - updating it. */
>> + /*
>> + * Get the current pte state, but zero it out to make it
>> + * non-present, preventing the hardware from asynchronously
>> + * updating it.
>> + */
>>
>
> Since there is debate over whether winged comments are a feature, I'm not sure
> this can be termed a "fix".
>
I don't feel all that strongly about it in this case. I'm not a huge
fan of winged style for inline comments like this, but it is consistent
with the rest of the file, and the comment is (just) long enough to make
it not look completely stupid.
>> {
>> - /* The pte is non-present, so there's no hardware state to
>> - preserve. */
>> + /*
>> + * The pte is non-present, so there's no hardware state to
>> + * preserve.
>> + */
>> set_pte_at(mm, addr, ptep, pte);
>> }
>>
>
> This will fit in one line, no?
>
Yes, but changing it would mean going to the effort of regenerating the
patch.
J
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] mm: fix comment formatting in asm-generic/pgtable.h:__ptep_modify_prot_
2008-06-02 23:45 ` Rusty Russell
2008-06-02 23:53 ` Jeremy Fitzhardinge
@ 2008-06-13 7:18 ` Ingo Molnar
1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2008-06-13 7:18 UTC (permalink / raw)
To: Rusty Russell
Cc: Zachary Amsden, Jeremy Fitzhardinge, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
* Rusty Russell <rusty@rustcorp.com.au> wrote:
> > - /* Get the current pte state, but zero it out to make it
> > - non-present, preventing the hardware from asynchronously
> > - updating it. */
> > + /*
> > + * Get the current pte state, but zero it out to make it
> > + * non-present, preventing the hardware from asynchronously
> > + * updating it.
> > + */
>
> Since there is debate over whether winged comments are a feature, I'm
> not sure this can be termed a "fix".
Well, if you compare the two variants above Jeremy's solution looks
visually more pleasing, so yes it is an improvement and a fix.
( And it's even very obvious in this case, the vertical line gives a
clear delineation of the information and separates it from the code
sections. )
Also, according to Documentation/CodingStyle:
| The preferred style for long (multi-line) comments is:
|
| /*
| * This is the preferred style for multi-line
| * comments in the Linux kernel source code.
| * Please use it consistently.
| *
| * Description: A column of asterisks on the left side,
| * with beginning and ending almost-blank lines.
| */
arch/x86 and include/asm-x86 follows that rule. (And we'd follow it even
if the issue was typographically debatable [which it isnt] because
consistency is visual rule #0 )
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit
2008-05-31 0:04 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
2008-05-31 0:04 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
@ 2008-05-31 0:04 ` Jeremy Fitzhardinge
2008-06-02 11:12 ` Ingo Molnar
2008-05-31 0:04 ` [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit Jeremy Fitzhardinge
2 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-31 0:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and
ptep_modify_prot_commit. This allows the hypervisor-specific backends to
implement these in some more efficient way.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/kernel/paravirt.c | 3 +++
arch/x86/xen/enlighten.c | 3 +++
include/asm-x86/paravirt.h | 28 ++++++++++++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -383,6 +383,9 @@
.pte_update = paravirt_nop,
.pte_update_defer = paravirt_nop,
+ .ptep_modify_prot_start = __ptep_modify_prot_start,
+ .ptep_modify_prot_commit = __ptep_modify_prot_commit,
+
#ifdef CONFIG_HIGHPTE
.kmap_atomic_pte = kmap_atomic,
#endif
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1103,6 +1103,9 @@
.set_pte_at = xen_set_pte_at,
.set_pmd = xen_set_pmd,
+ .ptep_modify_prot_start = __ptep_modify_prot_start,
+ .ptep_modify_prot_commit = __ptep_modify_prot_commit,
+
.pte_val = xen_pte_val,
.pte_flags = native_pte_val,
.pgd_val = xen_pgd_val,
diff --git a/include/asm-x86/paravirt.h b/include/asm-x86/paravirt.h
--- a/include/asm-x86/paravirt.h
+++ b/include/asm-x86/paravirt.h
@@ -238,6 +238,11 @@
pte_t *ptep);
void (*pte_update_defer)(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
+
+ pte_t (*ptep_modify_prot_start)(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep);
+ void (*ptep_modify_prot_commit)(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
pteval_t (*pte_val)(pte_t);
pteval_t (*pte_flags)(pte_t);
@@ -1040,6 +1045,29 @@
return ret;
}
+#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+static inline pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep)
+{
+ pteval_t ret;
+
+ ret = PVOP_CALL3(pteval_t, pv_mmu_ops.ptep_modify_prot_start,
+ mm, addr, ptep);
+
+ return (pte_t) { .pte = ret };
+}
+
+static inline void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ if (sizeof(pteval_t) > sizeof(long))
+ /* 5 arg words */
+ pv_mmu_ops.ptep_modify_prot_commit(mm, addr, ptep, pte);
+ else
+ PVOP_VCALL4(pv_mmu_ops.ptep_modify_prot_commit,
+ mm, addr, ptep, pte.pte);
+}
+
static inline void set_pte(pte_t *ptep, pte_t pte)
{
if (sizeof(pteval_t) > sizeof(long))
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit
2008-05-31 0:04 ` [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit Jeremy Fitzhardinge
@ 2008-06-02 11:12 ` Ingo Molnar
2008-06-02 11:57 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2008-06-02 11:12 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> +#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
please turn this into CONFIG_ARCH_HAS_PTEP_MODIFY_PROT_TRANSACTION
instead.
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit
2008-06-02 11:12 ` Ingo Molnar
@ 2008-06-02 11:57 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-02 11:57 UTC (permalink / raw)
To: Ingo Molnar
Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>
>> +#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
>>
>
> please turn this into CONFIG_ARCH_HAS_PTEP_MODIFY_PROT_TRANSACTION
> instead.
The __HAVE_ARCH_ form is consistent with all the other conditional
things in asm/pgtable.h, like __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS,
__HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG, etc. I don't think adding
CONFIG_ARCH_HAS_ for this one case is a good idea.
J
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit
2008-05-31 0:04 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
2008-05-31 0:04 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
2008-05-31 0:04 ` [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit Jeremy Fitzhardinge
@ 2008-05-31 0:04 ` Jeremy Fitzhardinge
2 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-31 0:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
Thomas Gleixner, Linus Torvalds
Xen has a pte update function which will update a pte while preserving
its accessed and dirty bits. This means that ptep_modify_prot_start() can be
implemented as a simple read of the pte value. The hardware may
update the pte in the meantime, but ptep_modify_prot_commit() updates it while
preserving any changes that may have happened in the meantime.
The updates in ptep_modify_prot_commit() are batched if we're currently in lazy
mmu mode.
The mmu_update hypercall can take a batch of updates to perform, but
this code doesn't make particular use of that feature, in favour of
using generic multicall batching to get them all into the hypervisor.
The net effect of this is that each mprotect pte update turns from two
expensive trap-and-emulate faults into they hypervisor into a single
hypercall whose cost is amortized in a batched multicall.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/xen/enlighten.c | 13 ++++++++++---
arch/x86/xen/mmu.c | 21 +++++++++++++++++++++
arch/x86/xen/mmu.h | 4 ++++
include/xen/interface/features.h | 3 +++
include/xen/interface/xen.h | 9 +++++++--
5 files changed, 45 insertions(+), 5 deletions(-)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -140,7 +140,9 @@
{
printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
pv_info.name);
- printk(KERN_INFO "Hypervisor signature: %s\n", xen_start_info->magic);
+ printk(KERN_INFO "Hypervisor signature: %s%s\n",
+ xen_start_info->magic,
+ xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
}
static void xen_cpuid(unsigned int *ax, unsigned int *bx,
@@ -1208,6 +1210,8 @@
BUG_ON(memcmp(xen_start_info->magic, "xen-3", 5) != 0);
+ xen_setup_features();
+
/* Install Xen paravirt ops */
pv_info = xen_info;
pv_init_ops = xen_init_ops;
@@ -1217,13 +1221,16 @@
pv_apic_ops = xen_apic_ops;
pv_mmu_ops = xen_mmu_ops;
+ if (xen_feature(XENFEAT_mmu_pt_update_preserve_ad)) {
+ pv_mmu_ops.ptep_modify_prot_start = xen_ptep_modify_prot_start;
+ pv_mmu_ops.ptep_modify_prot_commit = xen_ptep_modify_prot_commit;
+ }
+
machine_ops = xen_machine_ops;
#ifdef CONFIG_SMP
smp_ops = xen_smp_ops;
#endif
-
- xen_setup_features();
/* Get mfn list */
if (!xen_feature(XENFEAT_auto_translated_physmap))
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -298,6 +298,27 @@
out:
if (mm == &init_mm)
preempt_enable();
+}
+
+pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
+{
+ /* Just return the pte as-is. We preserve the bits on commit */
+ return *ptep;
+}
+
+void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ struct multicall_space mcs;
+ struct mmu_update *u;
+
+ mcs = xen_mc_entry(sizeof(*u));
+ u = mcs.args;
+ u->ptr = virt_to_machine(ptep).maddr | MMU_PT_UPDATE_PRESERVE_AD;
+ u->val = pte_val_ma(pte);
+ MULTI_mmu_update(mcs.mc, u, 1, NULL, DOMID_SELF);
+
+ xen_mc_issue(PARAVIRT_LAZY_MMU);
}
pteval_t xen_pte_val(pte_t pte)
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -52,4 +52,8 @@
void xen_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
void xen_pmd_clear(pmd_t *pmdp);
+pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
+void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
+
#endif /* _XEN_MMU_H */
diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h
--- a/include/xen/interface/features.h
+++ b/include/xen/interface/features.h
@@ -38,6 +38,9 @@
*/
#define XENFEAT_pae_pgdir_above_4gb 4
+/* x86: Does this Xen host support the MMU_PT_UPDATE_PRESERVE_AD hypercall? */
+#define XENFEAT_mmu_pt_update_preserve_ad 5
+
#define XENFEAT_NR_SUBMAPS 1
#endif /* __XEN_PUBLIC_FEATURES_H__ */
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -113,9 +113,14 @@
* ptr[:2] -- Machine address within the frame whose mapping to modify.
* The frame must belong to the FD, if one is specified.
* val -- Value to write into the mapping entry.
+ *
+ * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD:
+ * As MMU_NORMAL_PT_UPDATE above, but A/D bits currently in the PTE are ORed
+ * with those in @val.
*/
-#define MMU_NORMAL_PT_UPDATE 0 /* checked '*ptr = val'. ptr is MA. */
-#define MMU_MACHPHYS_UPDATE 1 /* ptr = MA of frame to modify entry for */
+#define MMU_NORMAL_PT_UPDATE 0 /* checked '*ptr = val'. ptr is MA. */
+#define MMU_MACHPHYS_UPDATE 1 /* ptr = MA of frame to modify entry for */
+#define MMU_PT_UPDATE_PRESERVE_AD 2 /* atomically: *ptr = val | (*ptr&(A|D)) */
/*
* MMU EXTENDED OPERATIONS
^ permalink raw reply [flat|nested] 12+ messages in thread