virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-05-31  0:04 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
@ 2008-05-31  0:04 ` Jeremy Fitzhardinge
  2008-06-02 11:13   ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-31  0:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, x86, xen-devel, Thomas Gleixner, Hugh Dickins,
	Zachary Amsden, kvm-devel, Virtualization Mailing List,
	Rusty Russell, Peter Zijlstra, Linus Torvalds

This patch adds an API for doing read-modify-write updates to a pte's
protection bits which may race against hardware updates to the pte.
After reading the pte, the hardware may asynchonously set the accessed
or dirty bits on a pte, which would be lost when writing back the
modified pte value.

The existing technique to handle this race is to use
ptep_get_and_clear() atomically fetch the old pte value and clear it
in memory.  This has the effect of marking the pte as non-present,
which will prevent the hardware from updating its state.  When the new
value is written back, the pte will be present again, and the hardware
can resume updating the access/dirty flags.

When running in a virtualized environment, pagetable updates are
relatively expensive, since they generally involve some trap into the
hypervisor.  To mitigate the cost of these updates, we tend to batch
them.

However, because of the atomic nature of ptep_get_and_clear(), it is
inherently non-batchable.  This new interface allows batching by
giving the underlying implementation enough information to open a
transaction between the read and write phases:

ptep_modify_prot_start() returns the current pte value, and puts the
  pte entry into a state where either the hardware will not update the
  pte, or if it does, the updates will be preserved on commit.

ptep_modify_prot_commit() writes back the updated pte, makes sure that
  any hardware updates made since ptep_modify_prot_start() are
  preserved.

ptep_modify_prot_start() and _commit() must be exactly paired, and
used while holding the appropriate pte lock.  They do not protect
against other software updates of the pte in any way.

The current implementations of ptep_modify_prot_start and _commit are
functionally unchanged from before: _start() uses ptep_get_and_clear()
fetch the pte and zero the entry, preventing any hardware updates.
_commit() simply writes the new pte value back knowing that the
hardware has not updated the pte in the meantime.

The only current user of this interface is mprotect

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/asm-generic/pgtable.h |   53 +++++++++++++++++++++++++++++++++++++++++
 mm/mprotect.c                 |   10 +++----
 2 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -197,6 +197,59 @@
 }
 #endif /* CONFIG_MMU */
 
+static inline pte_t __ptep_modify_prot_start(struct mm_struct *mm,
+					     unsigned long addr,
+					     pte_t *ptep)
+{
+	/* Get the current pte state, but zero it out to make it
+	   non-present, preventing the hardware from asynchronously
+	   updating it. */
+	return ptep_get_and_clear(mm, addr, ptep);
+}
+
+static inline void __ptep_modify_prot_commit(struct mm_struct *mm,
+					     unsigned long addr,
+					     pte_t *ptep, pte_t pte)
+{
+	/* The pte is non-present, so there's no hardware state to
+	   preserve. */
+	set_pte_at(mm, addr, ptep, pte);
+}
+
+#ifndef __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+/*
+ * Start a pte protection read-modify-write transaction, which
+ * protects against asynchronous hardware modifications to the pte.
+ * The intention is not to prevent the hardware from making pte
+ * updates, but to prevent any updates it may make from being lost.
+ *
+ * This does not protect against other software modifications of the
+ * pte; the appropriate pte lock must be held over the transation.
+ *
+ * Note that this interface is intended to be batchable, meaning that
+ * ptep_modify_prot_commit may not actually update the pte, but merely
+ * queue the update to be done at some later time.  The update must be
+ * actually committed before the pte lock is released, however.
+ */
+static inline pte_t ptep_modify_prot_start(struct mm_struct *mm,
+					   unsigned long addr,
+					   pte_t *ptep)
+{
+	return __ptep_modify_prot_start(mm, addr, ptep);
+}
+
+/*
+ * Commit an update to a pte, leaving any hardware-controlled bits in
+ * the PTE unmodified.
+ */
+static inline void ptep_modify_prot_commit(struct mm_struct *mm,
+					   unsigned long addr,
+					   pte_t *ptep, pte_t pte)
+{
+	__ptep_modify_prot_commit(mm, addr, ptep, pte);
+}
+#endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
+
 /*
  * A facility to provide lazy MMU batching.  This allows PTE updates and
  * page invalidations to be delayed until a call to leave lazy MMU mode
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -47,19 +47,17 @@
 		if (pte_present(oldpte)) {
 			pte_t ptent;
 
-			/* Avoid an SMP race with hardware updated dirty/clean
-			 * bits by wiping the pte and then setting the new pte
-			 * into place.
-			 */
-			ptent = ptep_get_and_clear(mm, addr, pte);
+			ptent = ptep_modify_prot_start(mm, addr, pte);
 			ptent = pte_modify(ptent, newprot);
+
 			/*
 			 * Avoid taking write faults for pages we know to be
 			 * dirty.
 			 */
 			if (dirty_accountable && pte_dirty(ptent))
 				ptent = pte_mkwrite(ptent);
-			set_pte_at(mm, addr, pte, ptent);
+
+			ptep_modify_prot_commit(mm, addr, pte, ptent);
 #ifdef CONFIG_MIGRATION
 		} else if (!pte_file(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-05-31  0:04 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
@ 2008-06-02 11:13   ` Ingo Molnar
  2008-06-02 11:57     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2008-06-02 11:13 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner, Linus Torvalds


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> +	/* Get the current pte state, but zero it out to make it
> +	   non-present, preventing the hardware from asynchronously
> +	   updating it. */

please use standard and consistent comment style, similar to:

> +/*
> + * Start a pte protection read-modify-write transaction, which
> + * protects against asynchronous hardware modifications to the pte.
> + * The intention is not to prevent the hardware from making pte
> + * updates, but to prevent any updates it may make from being lost.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-02 11:13   ` Ingo Molnar
@ 2008-06-02 11:57     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-02 11:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner, Linus Torvalds

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> +	/* Get the current pte state, but zero it out to make it
>> +	   non-present, preventing the hardware from asynchronously
>> +	   updating it. */
>>     
>
> please use standard and consistent comment style, similar to:
>   

OK.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2)
@ 2008-06-16 11:29 Jeremy Fitzhardinge
  2008-06-16 11:30 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-16 11:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner, Linus Torvalds

Hi all,

[ Change since last post: change name to ptep_modify_prot_, on the
  grounds that it isn't really a general pte-modification interface. ]

This little series adds a new transaction-like abstraction for doing
RMW updates to a pte, hooks it into paravirt_ops, and then makes use
of it in Xen.

The basic problem is that mprotect is very slow under Xen (up to 50x
slower than native), primarily because of the

	ptent = ptep_get_and_clear(mm, addr, pte);
	ptent = pte_modify(ptent, newprot);
	/* ... */
	set_pte_at(mm, addr, pte, ptent);

sequence in mm/mprotect.c:change_pte_range().

This is bad for Xen for two reasons:

  1: ptep_get_and_clear() ends up being a xchg on the pte.  Since the
     pte page is read-only (as it must be, because Xen needs to
     control all pte updates), this traps into Xen, which then
     emulates the instruction.  Trapping into the instruction emulator
     is inherently expensive.  And,

  2: because ptep_get_and_clear has atomic-fetch-and-update semantics,
     it's impossible to implement in a way which can be batched to
     amortize the cost of trapping into the hypervisor.

This series adds the ptep_modify_prot_start() and
ptep_modify_prot_commit() operations, which change this sequence to:

	ptent = ptep_modify_prot_start(mm, addr, pte);
	ptent = pte_modify(ptent, newprot);
	/* ... */
	ptep_modify_prot_commit(mm, addr, pte, ptent);

Which looks very familiar.  And, indeed, when compiled without
CONFIG_PARAVIRT (or on a non-x86 architecture), it will end up doing
precisely the same thing as before.

However, the effective semantics are a bit different.
ptep_modify_prot_start() means "I'm reading this pte with the
intention of updating it; please don't lose any hardware pte changes
in the meantime".  And ptep_modify_prot_commit() means "Here's a new
value for the pte, but make sure you don't lose any hardware changes".

The default implementation achieves these semantics by making
ptep_modify_prot_start() set the pte to non-present, which prevents
any async hardware changes to the pte.  The ptep_modify_prot_commit()
can then just write the new value into place without having to worry
about preserving any changes, because it knows there are none.

Xen implements ptep_modify_prot_start() as a simple read of the pte.
This leaves the pte unchanged in memory, and the hardware may make
asynchronous changes to it.  It implements ptep_modify_prot_commit()
using a batched hypercall which preserves the state of the
Access/Dirty bits when updating the pte.  This allows the whole
change_pte_range() loop to be run without any synchronous unbatched
traps into the hypervisor.  With this change in place, an mprotect
microbenchmark goes from being 50x worse than native to around 7x,
which is acceptible.

I believe that other virtualization systems, whether they use direct
paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
make use of this interface to improve the performance.

Unfortunately (or fortunately) there aren't very many other areas of
the kernel which can really take advantage of this.  There's only a
couple of other instances of ptep_get_and_clear() in mm/, and they're
being used in a similar way; but I don't think they're very
performance critical (though zap_pte_range might be interesting).

In general, mprotect is rarely a performance bottleneck.  But some
debugging libraries (such as electric fence) and garbage collectors
can be very heavy users of mprotect, and this change could materially
benefit them.

Thanks,
	J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-16 11:29 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
@ 2008-06-16 11:30 ` Jeremy Fitzhardinge
  2008-06-16 17:29   ` Linus Torvalds
  2008-06-18 23:23   ` Benjamin Herrenschmidt
  2008-06-16 11:30 ` [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-16 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner, Linus Torvalds

This patch adds an API for doing read-modify-write updates to a pte's
protection bits which may race against hardware updates to the pte.
After reading the pte, the hardware may asynchonously set the accessed
or dirty bits on a pte, which would be lost when writing back the
modified pte value.

The existing technique to handle this race is to use
ptep_get_and_clear() atomically fetch the old pte value and clear it
in memory.  This has the effect of marking the pte as non-present,
which will prevent the hardware from updating its state.  When the new
value is written back, the pte will be present again, and the hardware
can resume updating the access/dirty flags.

When running in a virtualized environment, pagetable updates are
relatively expensive, since they generally involve some trap into the
hypervisor.  To mitigate the cost of these updates, we tend to batch
them.

However, because of the atomic nature of ptep_get_and_clear(), it is
inherently non-batchable.  This new interface allows batching by
giving the underlying implementation enough information to open a
transaction between the read and write phases:

ptep_modify_prot_start() returns the current pte value, and puts the
  pte entry into a state where either the hardware will not update the
  pte, or if it does, the updates will be preserved on commit.

ptep_modify_prot_commit() writes back the updated pte, makes sure that
  any hardware updates made since ptep_modify_prot_start() are
  preserved.

ptep_modify_prot_start() and _commit() must be exactly paired, and
used while holding the appropriate pte lock.  They do not protect
against other software updates of the pte in any way.

The current implementations of ptep_modify_prot_start and _commit are
functionally unchanged from before: _start() uses ptep_get_and_clear()
fetch the pte and zero the entry, preventing any hardware updates.
_commit() simply writes the new pte value back knowing that the
hardware has not updated the pte in the meantime.

The only current user of this interface is mprotect

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/asm-generic/pgtable.h |   57 +++++++++++++++++++++++++++++++++++++++++
 mm/mprotect.c                 |   10 ++-----
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -197,6 +197,63 @@
 }
 #endif /* CONFIG_MMU */
 
+static inline pte_t __ptep_modify_prot_start(struct mm_struct *mm,
+					     unsigned long addr,
+					     pte_t *ptep)
+{
+	/*
+	 * Get the current pte state, but zero it out to make it
+	 * non-present, preventing the hardware from asynchronously
+	 * updating it.
+	 */
+	return ptep_get_and_clear(mm, addr, ptep);
+}
+
+static inline void __ptep_modify_prot_commit(struct mm_struct *mm,
+					     unsigned long addr,
+					     pte_t *ptep, pte_t pte)
+{
+	/*
+	 * The pte is non-present, so there's no hardware state to
+	 * preserve.
+	 */
+	set_pte_at(mm, addr, ptep, pte);
+}
+
+#ifndef __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+/*
+ * Start a pte protection read-modify-write transaction, which
+ * protects against asynchronous hardware modifications to the pte.
+ * The intention is not to prevent the hardware from making pte
+ * updates, but to prevent any updates it may make from being lost.
+ *
+ * This does not protect against other software modifications of the
+ * pte; the appropriate pte lock must be held over the transation.
+ *
+ * Note that this interface is intended to be batchable, meaning that
+ * ptep_modify_prot_commit may not actually update the pte, but merely
+ * queue the update to be done at some later time.  The update must be
+ * actually committed before the pte lock is released, however.
+ */
+static inline pte_t ptep_modify_prot_start(struct mm_struct *mm,
+					   unsigned long addr,
+					   pte_t *ptep)
+{
+	return __ptep_modify_prot_start(mm, addr, ptep);
+}
+
+/*
+ * Commit an update to a pte, leaving any hardware-controlled bits in
+ * the PTE unmodified.
+ */
+static inline void ptep_modify_prot_commit(struct mm_struct *mm,
+					   unsigned long addr,
+					   pte_t *ptep, pte_t pte)
+{
+	__ptep_modify_prot_commit(mm, addr, ptep, pte);
+}
+#endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
+
 /*
  * A facility to provide lazy MMU batching.  This allows PTE updates and
  * page invalidations to be delayed until a call to leave lazy MMU mode
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -47,19 +47,17 @@
 		if (pte_present(oldpte)) {
 			pte_t ptent;
 
-			/* Avoid an SMP race with hardware updated dirty/clean
-			 * bits by wiping the pte and then setting the new pte
-			 * into place.
-			 */
-			ptent = ptep_get_and_clear(mm, addr, pte);
+			ptent = ptep_modify_prot_start(mm, addr, pte);
 			ptent = pte_modify(ptent, newprot);
+
 			/*
 			 * Avoid taking write faults for pages we know to be
 			 * dirty.
 			 */
 			if (dirty_accountable && pte_dirty(ptent))
 				ptent = pte_mkwrite(ptent);
-			set_pte_at(mm, addr, pte, ptent);
+
+			ptep_modify_prot_commit(mm, addr, pte, ptent);
 #ifdef CONFIG_MIGRATION
 		} else if (!pte_file(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit
  2008-06-16 11:29 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
  2008-06-16 11:30 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
@ 2008-06-16 11:30 ` Jeremy Fitzhardinge
  2008-06-16 11:30 ` [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit Jeremy Fitzhardinge
  2008-06-16 11:30 ` [PATCH 4 of 4] xen: add mechanism to extend existing multicalls Jeremy Fitzhardinge
  3 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-16 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, x86, xen-devel, Thomas Gleixner, Hugh Dickins,
	Zachary Amsden, kvm-devel, Virtualization Mailing List,
	Rusty Russell, Peter Zijlstra, Linus Torvalds

This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and
ptep_modify_prot_commit.  This allows the hypervisor-specific backends to
implement these in some more efficient way.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/kernel/paravirt.c |    3 +++
 arch/x86/xen/enlighten.c   |    3 +++
 include/asm-x86/paravirt.h |   28 ++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -383,6 +383,9 @@
 	.pte_update = paravirt_nop,
 	.pte_update_defer = paravirt_nop,
 
+	.ptep_modify_prot_start = __ptep_modify_prot_start,
+	.ptep_modify_prot_commit = __ptep_modify_prot_commit,
+
 #ifdef CONFIG_HIGHPTE
 	.kmap_atomic_pte = kmap_atomic,
 #endif
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1139,6 +1139,9 @@
 	.set_pte_at = xen_set_pte_at,
 	.set_pmd = xen_set_pmd_hyper,
 
+	.ptep_modify_prot_start = __ptep_modify_prot_start,
+	.ptep_modify_prot_commit = __ptep_modify_prot_commit,
+
 	.pte_val = xen_pte_val,
 	.pte_flags = native_pte_val,
 	.pgd_val = xen_pgd_val,
diff --git a/include/asm-x86/paravirt.h b/include/asm-x86/paravirt.h
--- a/include/asm-x86/paravirt.h
+++ b/include/asm-x86/paravirt.h
@@ -238,6 +238,11 @@
 			   pte_t *ptep);
 	void (*pte_update_defer)(struct mm_struct *mm,
 				 unsigned long addr, pte_t *ptep);
+
+	pte_t (*ptep_modify_prot_start)(struct mm_struct *mm, unsigned long addr,
+					pte_t *ptep);
+	void (*ptep_modify_prot_commit)(struct mm_struct *mm, unsigned long addr,
+					pte_t *ptep, pte_t pte);
 
 	pteval_t (*pte_val)(pte_t);
 	pteval_t (*pte_flags)(pte_t);
@@ -1040,6 +1045,29 @@
 	return ret;
 }
 
+#define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
+static inline pte_t ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr,
+					   pte_t *ptep)
+{
+	pteval_t ret;
+
+	ret = PVOP_CALL3(pteval_t, pv_mmu_ops.ptep_modify_prot_start,
+			 mm, addr, ptep);
+
+	return (pte_t) { .pte = ret };
+}
+
+static inline void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+					   pte_t *ptep, pte_t pte)
+{
+	if (sizeof(pteval_t) > sizeof(long))
+		/* 5 arg words */
+		pv_mmu_ops.ptep_modify_prot_commit(mm, addr, ptep, pte);
+	else
+		PVOP_VCALL4(pv_mmu_ops.ptep_modify_prot_commit,
+			    mm, addr, ptep, pte.pte);
+}
+
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
 	if (sizeof(pteval_t) > sizeof(long))

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit
  2008-06-16 11:29 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
  2008-06-16 11:30 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
  2008-06-16 11:30 ` [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit Jeremy Fitzhardinge
@ 2008-06-16 11:30 ` Jeremy Fitzhardinge
  2008-06-16 11:30 ` [PATCH 4 of 4] xen: add mechanism to extend existing multicalls Jeremy Fitzhardinge
  3 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-16 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner, Linus Torvalds

Xen has a pte update function which will update a pte while preserving
its accessed and dirty bits.  This means that ptep_modify_prot_start() can be
implemented as a simple read of the pte value.  The hardware may
update the pte in the meantime, but ptep_modify_prot_commit() updates it while
preserving any changes that may have happened in the meantime.

The updates in ptep_modify_prot_commit() are batched if we're currently in lazy
mmu mode.

The mmu_update hypercall can take a batch of updates to perform, but
this code doesn't make particular use of that feature, in favour of
using generic multicall batching to get them all into the hypervisor.

The net effect of this is that each mprotect pte update turns from two
expensive trap-and-emulate faults into they hypervisor into a single
hypercall whose cost is amortized in a batched multicall.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/enlighten.c         |   13 ++++++++++---
 arch/x86/xen/mmu.c               |   21 +++++++++++++++++++++
 arch/x86/xen/mmu.h               |    4 ++++
 include/xen/interface/features.h |    3 +++
 include/xen/interface/xen.h      |    9 +++++++--
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -168,7 +168,9 @@
 {
 	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
 	       pv_info.name);
-	printk(KERN_INFO "Hypervisor signature: %s\n", xen_start_info->magic);
+	printk(KERN_INFO "Hypervisor signature: %s%s\n",
+	       xen_start_info->magic,
+	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
 }
 
 static void xen_cpuid(unsigned int *ax, unsigned int *bx,
@@ -1244,6 +1246,8 @@
 
 	BUG_ON(memcmp(xen_start_info->magic, "xen-3", 5) != 0);
 
+	xen_setup_features();
+
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
 	pv_init_ops = xen_init_ops;
@@ -1253,13 +1257,16 @@
 	pv_apic_ops = xen_apic_ops;
 	pv_mmu_ops = xen_mmu_ops;
 
+	if (xen_feature(XENFEAT_mmu_pt_update_preserve_ad)) {
+		pv_mmu_ops.ptep_modify_prot_start = xen_ptep_modify_prot_start;
+		pv_mmu_ops.ptep_modify_prot_commit = xen_ptep_modify_prot_commit;
+	}
+
 	machine_ops = xen_machine_ops;
 
 #ifdef CONFIG_SMP
 	smp_ops = xen_smp_ops;
 #endif
-
-	xen_setup_features();
 
 	/* Get mfn list */
 	if (!xen_feature(XENFEAT_auto_translated_physmap))
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -318,6 +318,27 @@
 out:
 	if (mm == &init_mm)
 		preempt_enable();
+}
+
+pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
+{
+	/* Just return the pte as-is.  We preserve the bits on commit */
+	return *ptep;
+}
+
+void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+				 pte_t *ptep, pte_t pte)
+{
+	struct multicall_space mcs;
+	struct mmu_update *u;
+
+	mcs = xen_mc_entry(sizeof(*u));
+	u = mcs.args;
+	u->ptr = virt_to_machine(ptep).maddr | MMU_PT_UPDATE_PRESERVE_AD;
+	u->val = pte_val_ma(pte);
+	MULTI_mmu_update(mcs.mc, u, 1, NULL, DOMID_SELF);
+
+	xen_mc_issue(PARAVIRT_LAZY_MMU);
 }
 
 /* Assume pteval_t is equivalent to all the other *val_t types. */
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -52,4 +52,8 @@
 void xen_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
 void xen_pmd_clear(pmd_t *pmdp);
 
+pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
+void  xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
+				  pte_t *ptep, pte_t pte);
+
 #endif	/* _XEN_MMU_H */
diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h
--- a/include/xen/interface/features.h
+++ b/include/xen/interface/features.h
@@ -38,6 +38,9 @@
  */
 #define XENFEAT_pae_pgdir_above_4gb        4
 
+/* x86: Does this Xen host support the MMU_PT_UPDATE_PRESERVE_AD hypercall? */
+#define XENFEAT_mmu_pt_update_preserve_ad  5
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -113,9 +113,14 @@
  * ptr[:2]  -- Machine address within the frame whose mapping to modify.
  *             The frame must belong to the FD, if one is specified.
  * val      -- Value to write into the mapping entry.
+ *
+ * ptr[1:0] == MMU_PT_UPDATE_PRESERVE_AD:
+ * As MMU_NORMAL_PT_UPDATE above, but A/D bits currently in the PTE are ORed
+ * with those in @val.
  */
-#define MMU_NORMAL_PT_UPDATE     0 /* checked '*ptr = val'. ptr is MA.       */
-#define MMU_MACHPHYS_UPDATE      1 /* ptr = MA of frame to modify entry for  */
+#define MMU_NORMAL_PT_UPDATE      0 /* checked '*ptr = val'. ptr is MA.       */
+#define MMU_MACHPHYS_UPDATE       1 /* ptr = MA of frame to modify entry for  */
+#define MMU_PT_UPDATE_PRESERVE_AD 2 /* atomically: *ptr = val | (*ptr&(A|D)) */
 
 /*
  * MMU EXTENDED OPERATIONS

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 4 of 4] xen: add mechanism to extend existing multicalls
  2008-06-16 11:29 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2008-06-16 11:30 ` [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit Jeremy Fitzhardinge
@ 2008-06-16 11:30 ` Jeremy Fitzhardinge
  3 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-16 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner, Linus Torvalds

Some Xen hypercalls accept an array of operations to work on.  In
general this is because its more efficient for the hypercall to the
work all at once rather than as separate hypercalls (even batched as a
multicall).

This patch adds a mechanism (xen_mc_extend_args()) to allocate more
argument space to the last-issued multicall, in order to extend its
argument list.

The user of this mechanism is xen/mmu.c, which uses it to extend the
args array of mmu_update.  This is particularly valuable when doing
the update for a large mprotect, which goes via
ptep_modify_prot_commit(), but it also manages to batch updates to
pgd/pmds as well.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c        |   55 ++++++++++++++++++++++++++++-----------------
 arch/x86/xen/multicalls.c |   40 +++++++++++++++++++++++++++-----
 arch/x86/xen/multicalls.h |   12 +++++++++
 3 files changed, 81 insertions(+), 26 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -227,18 +227,35 @@
 	return PagePinned(page);
 }
 
-void xen_set_pmd_hyper(pmd_t *ptr, pmd_t val)
+static void extend_mmu_update(const struct mmu_update *update)
 {
 	struct multicall_space mcs;
 	struct mmu_update *u;
 
+	mcs = xen_mc_extend_args(__HYPERVISOR_mmu_update, sizeof(*u));
+
+	if (mcs.mc != NULL)
+		mcs.mc->args[1]++;
+	else {
+		mcs = __xen_mc_entry(sizeof(*u));
+		MULTI_mmu_update(mcs.mc, mcs.args, 1, NULL, DOMID_SELF);
+	}
+
+	u = mcs.args;
+	*u = *update;
+}
+
+void xen_set_pmd_hyper(pmd_t *ptr, pmd_t val)
+{
+	struct mmu_update u;
+
 	preempt_disable();
 
-	mcs = xen_mc_entry(sizeof(*u));
-	u = mcs.args;
-	u->ptr = virt_to_machine(ptr).maddr;
-	u->val = pmd_val_ma(val);
-	MULTI_mmu_update(mcs.mc, u, 1, NULL, DOMID_SELF);
+	xen_mc_batch();
+
+	u.ptr = virt_to_machine(ptr).maddr;
+	u.val = pmd_val_ma(val);
+	extend_mmu_update(&u);
 
 	xen_mc_issue(PARAVIRT_LAZY_MMU);
 
@@ -329,14 +346,13 @@
 void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
 				 pte_t *ptep, pte_t pte)
 {
-	struct multicall_space mcs;
-	struct mmu_update *u;
+	struct mmu_update u;
 
-	mcs = xen_mc_entry(sizeof(*u));
-	u = mcs.args;
-	u->ptr = virt_to_machine(ptep).maddr | MMU_PT_UPDATE_PRESERVE_AD;
-	u->val = pte_val_ma(pte);
-	MULTI_mmu_update(mcs.mc, u, 1, NULL, DOMID_SELF);
+	xen_mc_batch();
+
+	u.ptr = virt_to_machine(ptep).maddr | MMU_PT_UPDATE_PRESERVE_AD;
+	u.val = pte_val_ma(pte);
+	extend_mmu_update(&u);
 
 	xen_mc_issue(PARAVIRT_LAZY_MMU);
 }
@@ -393,16 +409,15 @@
 
 void xen_set_pud_hyper(pud_t *ptr, pud_t val)
 {
-	struct multicall_space mcs;
-	struct mmu_update *u;
+	struct mmu_update u;
 
 	preempt_disable();
 
-	mcs = xen_mc_entry(sizeof(*u));
-	u = mcs.args;
-	u->ptr = virt_to_machine(ptr).maddr;
-	u->val = pud_val_ma(val);
-	MULTI_mmu_update(mcs.mc, u, 1, NULL, DOMID_SELF);
+	xen_mc_batch();
+
+	u.ptr = virt_to_machine(ptr).maddr;
+	u.val = pud_val_ma(val);
+	extend_mmu_update(&u);
 
 	xen_mc_issue(PARAVIRT_LAZY_MMU);
 
diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c
--- a/arch/x86/xen/multicalls.c
+++ b/arch/x86/xen/multicalls.c
@@ -29,14 +29,14 @@
 #define MC_DEBUG	1
 
 #define MC_BATCH	32
-#define MC_ARGS		(MC_BATCH * 16 / sizeof(u64))
+#define MC_ARGS		(MC_BATCH * 16)
 
 struct mc_buffer {
 	struct multicall_entry entries[MC_BATCH];
 #if MC_DEBUG
 	struct multicall_entry debug[MC_BATCH];
 #endif
-	u64 args[MC_ARGS];
+	unsigned char args[MC_ARGS];
 	struct callback {
 		void (*fn)(void *);
 		void *data;
@@ -108,20 +108,48 @@
 {
 	struct mc_buffer *b = &__get_cpu_var(mc_buffer);
 	struct multicall_space ret;
-	unsigned argspace = (args + sizeof(u64) - 1) / sizeof(u64);
+	unsigned argidx = roundup(b->argidx, sizeof(u64));
 
 	BUG_ON(preemptible());
-	BUG_ON(argspace > MC_ARGS);
+	BUG_ON(b->argidx > MC_ARGS);
 
 	if (b->mcidx == MC_BATCH ||
-	    (b->argidx + argspace) > MC_ARGS)
+	    (argidx + args) > MC_ARGS) {
 		xen_mc_flush();
+		argidx = roundup(b->argidx, sizeof(u64));
+	}
 
 	ret.mc = &b->entries[b->mcidx];
 	b->mcidx++;
+	ret.args = &b->args[argidx];
+	b->argidx = argidx + args;
+
+	BUG_ON(b->argidx > MC_ARGS);
+	return ret;
+}
+
+struct multicall_space xen_mc_extend_args(unsigned long op, size_t size)
+{
+	struct mc_buffer *b = &__get_cpu_var(mc_buffer);
+	struct multicall_space ret = { NULL, NULL };
+
+	BUG_ON(preemptible());
+	BUG_ON(b->argidx > MC_ARGS);
+
+	if (b->mcidx == 0)
+		return ret;
+
+	if (b->entries[b->mcidx - 1].op != op)
+		return ret;
+
+	if ((b->argidx + size) > MC_ARGS)
+		return ret;
+
+	ret.mc = &b->entries[b->mcidx - 1];
 	ret.args = &b->args[b->argidx];
-	b->argidx += argspace;
+	b->argidx += size;
 
+	BUG_ON(b->argidx > MC_ARGS);
 	return ret;
 }
 
diff --git a/arch/x86/xen/multicalls.h b/arch/x86/xen/multicalls.h
--- a/arch/x86/xen/multicalls.h
+++ b/arch/x86/xen/multicalls.h
@@ -45,4 +45,16 @@
 /* Set up a callback to be called when the current batch is flushed */
 void xen_mc_callback(void (*fn)(void *), void *data);
 
+/*
+ * Try to extend the arguments of the previous multicall command.  The
+ * previous command's op must match.  If it does, then it attempts to
+ * extend the argument space allocated to the multicall entry by
+ * arg_size bytes.
+ *
+ * The returned multicall_space will return with mc pointing to the
+ * command on success, or NULL on failure, and args pointing to the
+ * newly allocated space.
+ */
+struct multicall_space xen_mc_extend_args(unsigned long op, size_t arg_size);
+
 #endif /* _XEN_MULTICALLS_H */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-16 11:30 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
@ 2008-06-16 17:29   ` Linus Torvalds
  2008-06-16 18:13     ` Hugh Dickins
  2008-06-18 23:23   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-06-16 17:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel, Peter Zijlstra, kvm-devel, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner



On Mon, 16 Jun 2008, Jeremy Fitzhardinge wrote:
>
> ptep_modify_prot_start() returns the current pte value, and puts the
>   pte entry into a state where either the hardware will not update the
>   pte, or if it does, the updates will be preserved on commit.
> 
> ptep_modify_prot_commit() writes back the updated pte, makes sure that
>   any hardware updates made since ptep_modify_prot_start() are
>   preserved.

Ok, I'm fine with this now that it's renamed to be clearly about just 
protection bits.

So 

	Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

for what it's worth.

		Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-16 17:29   ` Linus Torvalds
@ 2008-06-16 18:13     ` Hugh Dickins
  2008-06-16 18:49       ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Hugh Dickins @ 2008-06-16 18:13 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Ingo Molnar, LKML, x86, xen-devel,
	Thomas Gleixner, Zachary Amsden, kvm-devel,
	Virtualization Mailing List, Rusty Russell, Peter Zijlstra

On Mon, 16 Jun 2008, Linus Torvalds wrote:
> On Mon, 16 Jun 2008, Jeremy Fitzhardinge wrote:
> >
> > ptep_modify_prot_start() returns the current pte value, and puts the
> >   pte entry into a state where either the hardware will not update the
> >   pte, or if it does, the updates will be preserved on commit.
> > 
> > ptep_modify_prot_commit() writes back the updated pte, makes sure that
> >   any hardware updates made since ptep_modify_prot_start() are
> >   preserved.
> 
> Ok, I'm fine with this now that it's renamed to be clearly about just 
> protection bits.
> 
> So 
> 
> 	Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

And seems very reasonable (and exceptionally well described) to me too.

	Acked-by: Hugh Dickins <hugh@veritas.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-16 18:13     ` Hugh Dickins
@ 2008-06-16 18:49       ` Ingo Molnar
  0 siblings, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2008-06-16 18:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Zachary Amsden, Jeremy Fitzhardinge, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Rusty Russell, Virtualization Mailing List,
	Thomas Gleixner, Andrew Morton, Linus Torvalds


* Hugh Dickins <hugh@veritas.com> wrote:

> On Mon, 16 Jun 2008, Linus Torvalds wrote:
> > On Mon, 16 Jun 2008, Jeremy Fitzhardinge wrote:
> > >
> > > ptep_modify_prot_start() returns the current pte value, and puts the
> > >   pte entry into a state where either the hardware will not update the
> > >   pte, or if it does, the updates will be preserved on commit.
> > > 
> > > ptep_modify_prot_commit() writes back the updated pte, makes sure that
> > >   any hardware updates made since ptep_modify_prot_start() are
> > >   preserved.
> > 
> > Ok, I'm fine with this now that it's renamed to be clearly about just 
> > protection bits.
> > 
> > So 
> > 
> > 	Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> And seems very reasonable (and exceptionally well described) to me too.
> 
> 	Acked-by: Hugh Dickins <hugh@veritas.com>

thanks guys, i've added the Acked-by's and added a new -tip topic for 
this.

The dependencies are a bit tricky and the changes contain mm/ and 
include/asm-generic/ bits so lets try a new Git trick here to keep it 
all tidy and disciplined for v2.6.27 merging.

So i've created a new tip/mm/xen topic branch (the 85th -tip topic 
branch ;-), which is COW-ed off the current tip/x86/xen topic branch [on 
which branch these changes have some dependencies], and added these 4 
changes to that. The tip/x86/xen (append-only-) topic will continue to 
advance as usual, and we likely wont have dependencies on the stuff in 
tip/mm/xen. (if there will be such dependencies we can handle that too)

In v2.6.27 we can then offer up the two branches separately for upstream 
merge, and tip/x86/xen will still only have x86 and xen changes, not any 
mm changes. (Obviously tip/mm/xen will be offered after tip/x86/xen has 
gone upstream - so that it will only contain these 4 patches ontop of 
already-upstream changes)

it will all be auto-merged into linux-next so there this internal 
structure is not visible, all the changes are available for wide testing 
of course.

i've added these mm/ changes to auto-core-next (not auto-x86-next), if 
that is fine by Andrew.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-16 11:30 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
  2008-06-16 17:29   ` Linus Torvalds
@ 2008-06-18 23:23   ` Benjamin Herrenschmidt
  2008-06-18 23:59     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-06-18 23:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, LKML, x86, xen-devel, Thomas Gleixner, Hugh Dickins,
	Zachary Amsden, kvm-devel, Virtualization Mailing List,
	Rusty Russell, Peter Zijlstra, Linus Torvalds

On Mon, 2008-06-16 at 04:30 -0700, Jeremy Fitzhardinge wrote:
> 
> The only current user of this interface is mprotect

Do you plan to use it with fork ultimately ?

Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-18 23:23   ` Benjamin Herrenschmidt
@ 2008-06-18 23:59     ` Jeremy Fitzhardinge
  2008-06-19  0:15       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-18 23:59 UTC (permalink / raw)
  To: benh
  Cc: Zachary Amsden, Rusty Russell, xen-devel, Peter Zijlstra,
	kvm-devel, x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Ingo Molnar, Linus Torvalds, Thomas Gleixner

Benjamin Herrenschmidt wrote:
> On Mon, 2008-06-16 at 04:30 -0700, Jeremy Fitzhardinge wrote:
>   
>> The only current user of this interface is mprotect
>>     
>
> Do you plan to use it with fork ultimately ?
>   

Good point, I'd overlooked that.  I guess that means using it in 
ptep_set_wrprotect().

At present the x86 ptep_set_wrprotect() just uses clear_bit on the pte, 
which is a locked cycle.  Is that significantly cheaper than an xchg + 
set?  (Same number of locked operations...)

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-18 23:59     ` Jeremy Fitzhardinge
@ 2008-06-19  0:15       ` Jeremy Fitzhardinge
  2008-06-19  0:24         ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-19  0:15 UTC (permalink / raw)
  To: benh
  Cc: xen-devel, Peter Zijlstra, kvm-devel, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Linus Torvalds, Thomas Gleixner

Jeremy Fitzhardinge wrote:
> Benjamin Herrenschmidt wrote:
>   
>> On Mon, 2008-06-16 at 04:30 -0700, Jeremy Fitzhardinge wrote:
>>   
>>     
>>> The only current user of this interface is mprotect
>>>     
>>>       
>> Do you plan to use it with fork ultimately ?
>>   
>>     
>
> Good point, I'd overlooked that.  I guess that means using it in 
> ptep_set_wrprotect().
Along the lines of:

--- a/include/asm-x86/pgtable.h	Wed Jun 18 13:54:49 2008 -0700
+++ b/include/asm-x86/pgtable.h	Wed Jun 18 17:03:39 2008 -0700
@@ -491,7 +491,11 @@
 static inline void ptep_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pte_t *ptep)
 {
-	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
+	pte_t pte = ptep_modify_prot_start(mm, addr, ptep);
+	
+	pte = pte_wrprotect(pte);
+
+	ptep_modify_prot_commit(mm, addr, ptep, pte);
 	pte_update(mm, addr, ptep);
 }
 

	J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  0:15       ` Jeremy Fitzhardinge
@ 2008-06-19  0:24         ` Linus Torvalds
  2008-06-19  0:37           ` Jeremy Fitzhardinge
  2008-06-19  0:39           ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2008-06-19  0:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner



On Wed, 18 Jun 2008, Jeremy Fitzhardinge wrote:
>
> Along the lines of:

Hell no. There's a reason we have a special set_wrprotect() thing. We can 
do it more efficiently on native hardware by just clearing the bit 
atomically. No need to do the cmpxchg games.

		Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  0:24         ` Linus Torvalds
@ 2008-06-19  0:37           ` Jeremy Fitzhardinge
  2008-06-19  0:49             ` Linus Torvalds
  2008-06-19  0:39           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-19  0:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner

Linus Torvalds wrote:
> On Wed, 18 Jun 2008, Jeremy Fitzhardinge wrote:
>   
>> Along the lines of:
>>     
>
> Hell no. There's a reason we have a special set_wrprotect() thing. We can 
> do it more efficiently on native hardware by just clearing the bit 
> atomically. No need to do the cmpxchg games.
>   

It's not cmpxchg, just xchg. 

In other words, is:

	lock btr $_PAGE_BIT_RW, (%rbx)

much cheaper than

	mov	$0, %rax
	xchg	%rax, (%rbx)
	and	$~_PAGE_RW, %rax
	mov	%rax, (%rbx)

?

It's the same number of locked RMW operations, so aside from being a few 
instructions longer, I think it would be much the same.

I guess the correct answer is "lmbench".

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  0:24         ` Linus Torvalds
  2008-06-19  0:37           ` Jeremy Fitzhardinge
@ 2008-06-19  0:39           ` Benjamin Herrenschmidt
  2008-06-19  5:03             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-06-19  0:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, xen-devel, Peter Zijlstra, kvm-devel, x86,
	LKML, Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner

On Wed, 2008-06-18 at 17:24 -0700, Linus Torvalds wrote:
> 
> On Wed, 18 Jun 2008, Jeremy Fitzhardinge wrote:
> >
> > Along the lines of:
> 
> Hell no. There's a reason we have a special set_wrprotect() thing. We can 
> do it more efficiently on native hardware by just clearing the bit 
> atomically. No need to do the cmpxchg games.

But we can't batch ... 

Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  0:37           ` Jeremy Fitzhardinge
@ 2008-06-19  0:49             ` Linus Torvalds
  2008-06-19  4:03               ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-06-19  0:49 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: benh, xen-devel, Peter Zijlstra, kvm-devel, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner



On Wed, 18 Jun 2008, Jeremy Fitzhardinge wrote:
>
> It's not cmpxchg, just xchg. 
> In other words, is:
> 
> 	lock btr $_PAGE_BIT_RW, (%rbx)

Well, we can actually do it as

	lock andl $~_PAGE_RW,(%rbx)

although we haven't bothered (I've wanted several times to make 
clear_bit() do that, but have never gotten around to it - mainly because 
old gcc versions didn't work with __builtin_constant_p() in inline 
functions - so you have to do the macro from hell)

And yes, the "lock andl" should be noticeably faster than the xchgl.

			Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  0:49             ` Linus Torvalds
@ 2008-06-19  4:03               ` Linus Torvalds
  2008-06-19 11:58                 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-06-19  4:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner



On Wed, 18 Jun 2008, Linus Torvalds wrote:
> 
> And yes, the "lock andl" should be noticeably faster than the xchgl.

I dunno. Here's a untested (!!) patch that turns constant-bit 
set/clear_bit ops into byte mask ops (lock orb/andb).

It's not exactly pretty. The reason for using the byte versions is that a 
locked op is serialized in the memory pipeline anyway, so there are no 
forwarding issues (that could slow down things when we access things with 
different sizes), and the byte ops are a lot smaller than 32-bit and 
particularly 64-bit ops (big constants, and the 64-bit ops need the REX 
prefix byte too).

[ Side note: I wonder if we should turn the "test_bit()" C version into a 
  "char *" version too.. It could actually help with alias analysis, since 
  char pointers can alias anything. So it might be the RightThing(tm) to 
  do for multiple reasons. I dunno. It's a separate issue. ]

It does actually shrink the kernel image a bit (a couple of hundred bytes 
on the text segment for my everything-compiled-in image), and while it's 
totally untested the (admittedly few) code generation points I looked at 
seemed sane. And "lock orb" should be noticeably faster than "lock bts".

If somebody wants to play with it, go wild. I didn't do "change_bit()", 
because nobody sane uses that thing anyway. I guarantee nothing. And if it 
breaks, nobody saw me do anything.  You can't prove this email wasn't sent 
by somebody who is good at forging smtp.

This does require a gcc that is recent enough for "__builtin_constant_p()" 
to work in an inline function, but I suspect our kernel requirements are 
already higher than that. And if you do have an old gcc that is supported, 
the worst that would happen is that the optimization doesn't trigger.

		Linus

---
 include/asm-x86/bitops.h |   27 ++++++++++++++++++++++-----
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index ee4b3ea..c1b7f91 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -23,11 +23,22 @@
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
 /* Technically wrong, but this avoids compilation errors on some gcc
    versions. */
-#define ADDR "=m" (*(volatile long *) addr)
+#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))
 #else
-#define ADDR "+m" (*(volatile long *) addr)
+#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
 #endif
 
+#define ADDR BITOP_ADDR(addr)
+
+/*
+ * We do the locked ops that don't return the old value as
+ * a mask operation on a byte.
+ */
+#define IS_IMMEDIATE(nr) \
+	(__builtin_constant_p(nr))
+#define CONST_MASK_ADDR BITOP_ADDR(addr + (nr>>3))
+#define CONST_MASK (1 << (nr & 7))
+
 /**
  * set_bit - Atomically set a bit in memory
  * @nr: the bit to set
@@ -43,9 +54,12 @@
  * Note that @nr may be almost arbitrarily large; this function is not
  * restricted to acting on a single-word quantity.
  */
-static inline void set_bit(int nr, volatile void *addr)
+static inline void set_bit(unsigned int nr, volatile void *addr)
 {
-	asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory");
+	if (IS_IMMEDIATE(nr))
+		asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR : "i" (CONST_MASK) : "memory");
+	else
+		asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory");
 }
 
 /**
@@ -74,7 +88,10 @@ static inline void __set_bit(int nr, volatile void *addr)
  */
 static inline void clear_bit(int nr, volatile void *addr)
 {
-	asm volatile(LOCK_PREFIX "btr %1,%0" : ADDR : "Ir" (nr));
+	if (IS_IMMEDIATE(nr))
+		asm volatile(LOCK_PREFIX "andb %1,%0" : CONST_MASK_ADDR : "i" (~CONST_MASK));
+	else
+		asm volatile(LOCK_PREFIX "btr %1,%0" : ADDR : "Ir" (nr));
 }
 
 /*

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  0:39           ` Benjamin Herrenschmidt
@ 2008-06-19  5:03             ` Jeremy Fitzhardinge
  2008-06-19  7:20               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-19  5:03 UTC (permalink / raw)
  To: benh
  Cc: Linus Torvalds, xen-devel, Peter Zijlstra, kvm-devel, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner

Benjamin Herrenschmidt wrote:
> On Wed, 2008-06-18 at 17:24 -0700, Linus Torvalds wrote:
>   
>> On Wed, 18 Jun 2008, Jeremy Fitzhardinge wrote:
>>     
>>> Along the lines of:
>>>       
>> Hell no. There's a reason we have a special set_wrprotect() thing. We can 
>> do it more efficiently on native hardware by just clearing the bit 
>> atomically. No need to do the cmpxchg games.
>>     
>
> But we can't batch ... 
>   

Which architecture are you interested in?  If it isn't x86, you can 
probably get anything past Linus ;)

I'll do some measurements to see what effect the batchable 
ptep_set_wrprotect() has on native.  If it's significant, I'll propose 
making it conditional on CONFIG_PARAVIRT.

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  5:03             ` Jeremy Fitzhardinge
@ 2008-06-19  7:20               ` Benjamin Herrenschmidt
  2008-06-19 17:57                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2008-06-19  7:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, xen-devel, Peter Zijlstra, kvm-devel, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner


> Which architecture are you interested in?  If it isn't x86, you can 
> probably get anything past Linus ;)
> 
> I'll do some measurements to see what effect the batchable 
> ptep_set_wrprotect() has on native.  If it's significant, I'll propose 
> making it conditional on CONFIG_PARAVIRT.

Oh, I mostly think about powerpc, I just wondered if I could re-use
your new stuff in that context. Mostly idle thoughts at this stage, I
haven't looked seriously.

I have an old patch set to batch forks (and mprotect etc...) TLB
invalidations (which is what I really want to batch on powerpc, more
than the actual PTE changes) that involves subtle changes to the
batching mechanisms etc... but it has issues and would need to be
reworked before merging.

It's something that's been in the back of my mind or the bottom of my
TODO list for some time, but I never quite found the bandwidth to go
back to it.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  4:03               ` Linus Torvalds
@ 2008-06-19 11:58                 ` Ingo Molnar
  2008-06-19 12:03                   ` Ingo Molnar
                                     ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Ingo Molnar @ 2008-06-19 11:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, xen-devel, Peter Zijlstra, kvm-devel, benh,
	x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, 18 Jun 2008, Linus Torvalds wrote:
> > 
> > And yes, the "lock andl" should be noticeably faster than the xchgl.
> 
> I dunno. Here's a untested (!!) patch that turns constant-bit 
> set/clear_bit ops into byte mask ops (lock orb/andb).
> 
> It's not exactly pretty. The reason for using the byte versions is 
> that a locked op is serialized in the memory pipeline anyway, so there 
> are no forwarding issues (that could slow down things when we access 
> things with different sizes), and the byte ops are a lot smaller than 
> 32-bit and particularly 64-bit ops (big constants, and the 64-bit ops 
> need the REX prefix byte too).
> 
> [ Side note: I wonder if we should turn the "test_bit()" C version into a 
>   "char *" version too.. It could actually help with alias analysis, since 
>   char pointers can alias anything. So it might be the RightThing(tm) to 
>   do for multiple reasons. I dunno. It's a separate issue. ]
> 
> It does actually shrink the kernel image a bit (a couple of hundred bytes 
> on the text segment for my everything-compiled-in image), and while it's 
> totally untested the (admittedly few) code generation points I looked at 
> seemed sane. And "lock orb" should be noticeably faster than "lock bts".
> 
> If somebody wants to play with it, go wild. I didn't do 
> "change_bit()", because nobody sane uses that thing anyway. I 
> guarantee nothing. And if it breaks, nobody saw me do anything.  You 
> can't prove this email wasn't sent by somebody who is good at forging 
> smtp.

i stuck this into tip/x86/bitops branch for kicks - and it blew up very 
quickly on 32-bit ;-)

The crash manifests itself as a flood of spurious irqs:

[    0.997179] irq 96, desc: 788ddd80, depth: 1, count: 0, unhandled: 0
[    1.003414] ->handle_irq():  7814c888, handle_bad_irq+0x0/0x1b0
[    1.003414] ->chip(): 78867174, no_irq_chip+0x0/0x40
[    1.008350] ->action(): 00000000
[    1.008350]   IRQ_DISABLED set
[    1.010339] unexpected IRQ trap at vector 60

unappying that change makes the system boot up fine.

a quick guess would be:

   io_apic_32.c:   if (test_and_set_bit(vector, used_vectors))

   io_apic_32.c:           set_bit(i, used_vectors);

config and crashlog can be found at:

  http://redhat.com/~mingo/misc/config-Thu_Jun_19_13_45_21_CEST_2008.bad
  http://redhat.com/~mingo/misc/crash-Thu_Jun_19_13_45_21_CEST_2008.log

Below is the commit, it needed a small amount of massaging to apply the 
void * -> unsigned long * change in the x86/bitops topic.

	Ingo

-------------->
commit 1a750e0cd7a30c478723ecfa1df685efcdd38a90
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 18 21:03:26 2008 -0700

    x86, bitops: make constant-bit set/clear_bit ops faster
    
    On Wed, 18 Jun 2008, Linus Torvalds wrote:
    >
    > And yes, the "lock andl" should be noticeably faster than the xchgl.
    
    I dunno. Here's a untested (!!) patch that turns constant-bit
    set/clear_bit ops into byte mask ops (lock orb/andb).
    
    It's not exactly pretty. The reason for using the byte versions is that a
    locked op is serialized in the memory pipeline anyway, so there are no
    forwarding issues (that could slow down things when we access things with
    different sizes), and the byte ops are a lot smaller than 32-bit and
    particularly 64-bit ops (big constants, and the 64-bit ops need the REX
    prefix byte too).
    
    [ Side note: I wonder if we should turn the "test_bit()" C version into a
      "char *" version too.. It could actually help with alias analysis, since
      char pointers can alias anything. So it might be the RightThing(tm) to
      do for multiple reasons. I dunno. It's a separate issue. ]
    
    It does actually shrink the kernel image a bit (a couple of hundred bytes
    on the text segment for my everything-compiled-in image), and while it's
    totally untested the (admittedly few) code generation points I looked at
    seemed sane. And "lock orb" should be noticeably faster than "lock bts".
    
    If somebody wants to play with it, go wild. I didn't do "change_bit()",
    because nobody sane uses that thing anyway. I guarantee nothing. And if it
    breaks, nobody saw me do anything.  You can't prove this email wasn't sent
    by somebody who is good at forging smtp.
    
    This does require a gcc that is recent enough for "__builtin_constant_p()"
    to work in an inline function, but I suspect our kernel requirements are
    already higher than that. And if you do have an old gcc that is supported,
    the worst that would happen is that the optimization doesn't trigger.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index 7d2494b..ab7635a 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -23,11 +23,22 @@
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1)
 /* Technically wrong, but this avoids compilation errors on some gcc
    versions. */
-#define ADDR "=m" (*(volatile long *) addr)
+#define BITOP_ADDR(x) "=m" (*(volatile long *) (x))
 #else
-#define ADDR "+m" (*(volatile long *) addr)
+#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
 #endif
 
+#define ADDR BITOP_ADDR(addr)
+
+/*
+ * We do the locked ops that don't return the old value as
+ * a mask operation on a byte.
+ */
+#define IS_IMMEDIATE(nr) \
+	(__builtin_constant_p(nr))
+#define CONST_MASK_ADDR BITOP_ADDR(addr + (nr>>3))
+#define CONST_MASK (1 << (nr & 7))
+
 /**
  * set_bit - Atomically set a bit in memory
  * @nr: the bit to set
@@ -43,11 +54,15 @@
  * Note that @nr may be almost arbitrarily large; this function is not
  * restricted to acting on a single-word quantity.
  */
-static inline void set_bit(int nr, volatile unsigned long *addr)
+static inline void set_bit(unsigned int nr, volatile unsigned long *addr)
 {
-	asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory");
+	if (IS_IMMEDIATE(nr))
+		asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR : "i" (CONST_MASK) : "memory");
+	else
+		asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory");
 }
 
+
 /**
  * __set_bit - Set a bit in memory
  * @nr: the bit to set
@@ -74,7 +89,10 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
  */
 static inline void clear_bit(int nr, volatile unsigned long *addr)
 {
-	asm volatile(LOCK_PREFIX "btr %1,%0" : ADDR : "Ir" (nr));
+	if (IS_IMMEDIATE(nr))
+		asm volatile(LOCK_PREFIX "andb %1,%0" : CONST_MASK_ADDR : "i" (~CONST_MASK));
+	else
+		asm volatile(LOCK_PREFIX "btr %1,%0" : ADDR : "Ir" (nr));
 }
 
 /*

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19 11:58                 ` Ingo Molnar
@ 2008-06-19 12:03                   ` Ingo Molnar
  2008-06-19 12:20                   ` Akinobu Mita
  2008-06-19 16:30                   ` Linus Torvalds
  2 siblings, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2008-06-19 12:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, benh, xen-devel, Peter Zijlstra, kvm-devel,
	x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner


* Ingo Molnar <mingo@elte.hu> wrote:

> config and crashlog can be found at:
> 
>   http://redhat.com/~mingo/misc/config-Thu_Jun_19_13_45_21_CEST_2008.bad
>   http://redhat.com/~mingo/misc/crash-Thu_Jun_19_13_45_21_CEST_2008.log

just in case it helps, and for completeness, a 64-bit box blew up too:

  http://redhat.com/~mingo/misc/config-Thu_Jun_19_13_48_55_CEST_2008.bad

and another 32-bit box:

  http://redhat.com/~mingo/misc/config-Thu_Jun_19_13_48_32_CEST_2008.bad

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19 11:58                 ` Ingo Molnar
  2008-06-19 12:03                   ` Ingo Molnar
@ 2008-06-19 12:20                   ` Akinobu Mita
  2008-06-19 16:30                   ` Linus Torvalds
  2 siblings, 0 replies; 36+ messages in thread
From: Akinobu Mita @ 2008-06-19 12:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Jeremy Fitzhardinge, benh, xen-devel,
	Peter Zijlstra, kvm-devel, x86, LKML, Virtualization Mailing List,
	Hugh Dickins, Thomas Gleixner

> Below is the commit, it needed a small amount of massaging to apply the
> void * -> unsigned long * change in the x86/bitops topic.

So you need to change this line

> +#define CONST_MASK_ADDR BITOP_ADDR(addr + (nr>>3))

to be

#define CONST_MASK_ADDR BITOP_ADDR((void *)addr + (nr>>3))

or something like this. Otherwise it will get wrong address in set_bit

> -static inline void set_bit(int nr, volatile unsigned long *addr)
> +static inline void set_bit(unsigned int nr, volatile unsigned long *addr)
>  {
> -       asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory");
> +       if (IS_IMMEDIATE(nr))
> +               asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR : "i" (CONST_MASK) : "memory");
> +       else
> +               asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory");
>  }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19 11:58                 ` Ingo Molnar
  2008-06-19 12:03                   ` Ingo Molnar
  2008-06-19 12:20                   ` Akinobu Mita
@ 2008-06-19 16:30                   ` Linus Torvalds
  2008-06-19 16:47                     ` Ingo Molnar
  2 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-06-19 16:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Thomas Gleixner



On Thu, 19 Jun 2008, Ingo Molnar wrote:
> 
> Below is the commit, it needed a small amount of massaging to apply the 
> void * -> unsigned long * change in the x86/bitops topic.

Well, that's your bug right there.

The macros very much depended on the pointers being "void *", due to the 
pointer arithmetic (which is a gcc extension that we use extensively - 
"void *" arithmetic works as if it was a byte pointer).

When you changed "addr" to "volatile unsigned long *", now the arithmetic 
ended up multiplying the offset by the size of "long" before adding it.

That said, the _original_ patch wasn't tested either, but quite frankly, 
looking over the patch one more time, I can't really see how it could 
break.

		Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19 16:30                   ` Linus Torvalds
@ 2008-06-19 16:47                     ` Ingo Molnar
  2008-06-20 10:10                       ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2008-06-19 16:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, xen-devel, Peter Zijlstra, kvm-devel, benh,
	x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, 19 Jun 2008, Ingo Molnar wrote:
> > 
> > Below is the commit, it needed a small amount of massaging to apply the 
> > void * -> unsigned long * change in the x86/bitops topic.
> 
> Well, that's your bug right there.
> 
> The macros very much depended on the pointers being "void *", due to 
> the pointer arithmetic (which is a gcc extension that we use 
> extensively - "void *" arithmetic works as if it was a byte pointer).

duh, yeah - of course. Will retry with that fixed :)

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19  7:20               ` Benjamin Herrenschmidt
@ 2008-06-19 17:57                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-19 17:57 UTC (permalink / raw)
  To: benh
  Cc: xen-devel, Peter Zijlstra, kvm-devel, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Linus Torvalds, Thomas Gleixner

Benjamin Herrenschmidt wrote:
>> Which architecture are you interested in?  If it isn't x86, you can 
>> probably get anything past Linus ;)
>>
>> I'll do some measurements to see what effect the batchable 
>> ptep_set_wrprotect() has on native.  If it's significant, I'll propose 
>> making it conditional on CONFIG_PARAVIRT.
>>     
>
> Oh, I mostly think about powerpc, I just wondered if I could re-use
> your new stuff in that context. Mostly idle thoughts at this stage, I
> haven't looked seriously.
>   

There are general-purpose hooks in the common code which architectures 
can implement however they like.  In x86-land we hook them up to pvops, 
but you can handle them however you want.

> I have an old patch set to batch forks (and mprotect etc...) TLB
> invalidations (which is what I really want to batch on powerpc, more
> than the actual PTE changes) that involves subtle changes to the
> batching mechanisms etc...
>   

Do you mean setting up batches of per-page tlb shootdowns rather than 
going a global tlb flush at the end?

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-19 16:47                     ` Ingo Molnar
@ 2008-06-20 10:10                       ` Ingo Molnar
  2008-06-20 19:06                         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2008-06-20 10:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, xen-devel, Peter Zijlstra, kvm-devel, benh,
	x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Thu, 19 Jun 2008, Ingo Molnar wrote:
> > > 
> > > Below is the commit, it needed a small amount of massaging to apply the 
> > > void * -> unsigned long * change in the x86/bitops topic.
> > 
> > Well, that's your bug right there.
> > 
> > The macros very much depended on the pointers being "void *", due to 
> > the pointer arithmetic (which is a gcc extension that we use 
> > extensively - "void *" arithmetic works as if it was a byte 
> > pointer).
> 
> duh, yeah - of course. Will retry with that fixed :)

yep, the patch below got it all going and it passed 5 hours of testing 
already. Thanks,

	Ingo

------------------->
commit 7dbceaf9bb68919651901b101f44edd5391ee489
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jun 20 07:28:24 2008 +0200

    x86, bitops: make constant-bit set/clear_bit ops faster, adapt, clean up
    
    fix integration bug introduced by "x86: bitops take an unsigned long *"
    which turned "(void *) + x" into "(long *) + x".
    
    small cleanups to make it more apparent which value get propagated where.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index ab7635a..6c50548 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -28,16 +28,15 @@
 #define BITOP_ADDR(x) "+m" (*(volatile long *) (x))
 #endif
 
-#define ADDR BITOP_ADDR(addr)
+#define ADDR				BITOP_ADDR(addr)
 
 /*
  * We do the locked ops that don't return the old value as
  * a mask operation on a byte.
  */
-#define IS_IMMEDIATE(nr) \
-	(__builtin_constant_p(nr))
-#define CONST_MASK_ADDR BITOP_ADDR(addr + (nr>>3))
-#define CONST_MASK (1 << (nr & 7))
+#define IS_IMMEDIATE(nr)		(__builtin_constant_p(nr))
+#define CONST_MASK_ADDR(nr, addr)	BITOP_ADDR((void *)(addr) + ((nr)>>3))
+#define CONST_MASK(nr)			(1 << ((nr) & 7))
 
 /**
  * set_bit - Atomically set a bit in memory
@@ -56,13 +55,17 @@
  */
 static inline void set_bit(unsigned int nr, volatile unsigned long *addr)
 {
-	if (IS_IMMEDIATE(nr))
-		asm volatile(LOCK_PREFIX "orb %1,%0" : CONST_MASK_ADDR : "i" (CONST_MASK) : "memory");
-	else
-		asm volatile(LOCK_PREFIX "bts %1,%0" : ADDR : "Ir" (nr) : "memory");
+	if (IS_IMMEDIATE(nr)) {
+		asm volatile(LOCK_PREFIX "orb %1,%0"
+			: CONST_MASK_ADDR(nr, addr)
+			: "i" (CONST_MASK(nr))
+			: "memory");
+	} else {
+		asm volatile(LOCK_PREFIX "bts %1,%0"
+			: BITOP_ADDR(addr) : "Ir" (nr) : "memory");
+	}
 }
 
-
 /**
  * __set_bit - Set a bit in memory
  * @nr: the bit to set
@@ -89,10 +92,15 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
  */
 static inline void clear_bit(int nr, volatile unsigned long *addr)
 {
-	if (IS_IMMEDIATE(nr))
-		asm volatile(LOCK_PREFIX "andb %1,%0" : CONST_MASK_ADDR : "i" (~CONST_MASK));
-	else
-		asm volatile(LOCK_PREFIX "btr %1,%0" : ADDR : "Ir" (nr));
+	if (IS_IMMEDIATE(nr)) {
+		asm volatile(LOCK_PREFIX "andb %1,%0"
+			: CONST_MASK_ADDR(nr, addr)
+			: "i" (~CONST_MASK(nr)));
+	} else {
+		asm volatile(LOCK_PREFIX "btr %1,%0"
+			: BITOP_ADDR(addr)
+			: "Ir" (nr));
+	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 10:10                       ` Ingo Molnar
@ 2008-06-20 19:06                         ` Jeremy Fitzhardinge
  2008-06-20 19:15                           ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-20 19:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Thomas Gleixner,
	Linus Torvalds

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>   
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>     
>>> On Thu, 19 Jun 2008, Ingo Molnar wrote:
>>>       
>>>> Below is the commit, it needed a small amount of massaging to apply the 
>>>> void * -> unsigned long * change in the x86/bitops topic.
>>>>         
>>> Well, that's your bug right there.
>>>
>>> The macros very much depended on the pointers being "void *", due to 
>>> the pointer arithmetic (which is a gcc extension that we use 
>>> extensively - "void *" arithmetic works as if it was a byte 
>>> pointer).
>>>       
>> duh, yeah - of course. Will retry with that fixed :)
>>     
>
> yep, the patch below got it all going and it passed 5 hours of testing 
> already. Thanks,
>   

Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":

  CC      init/main.o
include2/asm/bitops.h: In function `start_kernel':
include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
include2/asm/bitops.h:59: error: impossible constraint in `asm'
include2/asm/bitops.h:59: error: impossible constraint in `asm'
include2/asm/bitops.h:59: error: impossible constraint in `asm'

	J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 19:06                         ` Jeremy Fitzhardinge
@ 2008-06-20 19:15                           ` Linus Torvalds
  2008-06-20 19:56                             ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-06-20 19:15 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner



On Fri, 20 Jun 2008, Jeremy Fitzhardinge wrote:
> 
> Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":

Yeah, I was a bit worried about that. Gcc sometimes does insane things.

We literally just tested that the asm should only _ever_ be generated with 
a constant value, but if some gcc dead-code removal thing doesn't work, it 
will then screw up and try to generate the asm even for a non-constant 
thing.

The fairly trivial fix is probably to just change the "i" to "ir", safe in 
the knowledge that any _sane_ case will never use the "r" possibility. I 
suspect even your insane case will end up then killing the bad choice 
later.

		Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 19:15                           ` Linus Torvalds
@ 2008-06-20 19:56                             ` Ingo Molnar
  2008-06-20 20:03                               ` Linus Torvalds
  2008-06-20 20:05                               ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 36+ messages in thread
From: Ingo Molnar @ 2008-06-20 19:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, xen-devel, Peter Zijlstra, kvm-devel, benh,
	x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 20 Jun 2008, Jeremy Fitzhardinge wrote:
> > 
> > Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":
> 
> Yeah, I was a bit worried about that. Gcc sometimes does insane 
> things.
> 
> We literally just tested that the asm should only _ever_ be generated 
> with a constant value, but if some gcc dead-code removal thing doesn't 
> work, it will then screw up and try to generate the asm even for a 
> non-constant thing.
> 
> The fairly trivial fix is probably to just change the "i" to "ir", 
> safe in the knowledge that any _sane_ case will never use the "r" 
> possibility. I suspect even your insane case will end up then killing 
> the bad choice later.

okay - Jeremy, could you try the fix below? (or tip/master, i just 
pushed this out)

(i dont use gcc 3.x myself to build the kernel, had way too many 
miscompilations in randconfig tests in the past.)

	Ingo

-------------->
commit b68b80b8ab39c42707dc126c41e87d46edc97c27
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jun 20 21:50:20 2008 +0200

    x86, bitops: make constant-bit set/clear_bit ops faster, gcc workaround
    
    Jeremy Fitzhardinge reported this compiler bug:
    
    Suggestion from Linus: add "r" to the input constraint of the
    set_bit()/clear_bit()'s constant 'nr' branch:
    
    Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":
    
     CC      init/main.o
    include2/asm/bitops.h: In function `start_kernel':
    include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
    include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
    include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
    include2/asm/bitops.h:59: error: impossible constraint in `asm'
    include2/asm/bitops.h:59: error: impossible constraint in `asm'
    include2/asm/bitops.h:59: error: impossible constraint in `asm'
    
    Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index 6c50548..4575de4 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -58,7 +58,7 @@ static inline void set_bit(unsigned int nr, volatile unsigned long *addr)
 	if (IS_IMMEDIATE(nr)) {
 		asm volatile(LOCK_PREFIX "orb %1,%0"
 			: CONST_MASK_ADDR(nr, addr)
-			: "i" (CONST_MASK(nr))
+			: "ir" (CONST_MASK(nr))
 			: "memory");
 	} else {
 		asm volatile(LOCK_PREFIX "bts %1,%0"
@@ -95,7 +95,7 @@ static inline void clear_bit(int nr, volatile unsigned long *addr)
 	if (IS_IMMEDIATE(nr)) {
 		asm volatile(LOCK_PREFIX "andb %1,%0"
 			: CONST_MASK_ADDR(nr, addr)
-			: "i" (~CONST_MASK(nr)));
+			: "ir" (~CONST_MASK(nr)));
 	} else {
 		asm volatile(LOCK_PREFIX "btr %1,%0"
 			: BITOP_ADDR(addr)

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 19:56                             ` Ingo Molnar
@ 2008-06-20 20:03                               ` Linus Torvalds
  2008-06-20 20:16                                 ` Jeremy Fitzhardinge
  2008-06-20 20:05                               ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2008-06-20 20:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, benh, xen-devel, Peter Zijlstra, kvm-devel,
	x86, LKML, Virtualization Mailing List, Hugh Dickins,
	Thomas Gleixner



On Fri, 20 Jun 2008, Ingo Molnar wrote:
> 
> okay - Jeremy, could you try the fix below? (or tip/master, i just 
> pushed this out)

Actually, don't try that one.

It needs to be a _byte_ registers, so "ir" was wrong. You need "iq".

		Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 19:56                             ` Ingo Molnar
  2008-06-20 20:03                               ` Linus Torvalds
@ 2008-06-20 20:05                               ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-20 20:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Thomas Gleixner,
	Linus Torvalds

Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>   
>> On Fri, 20 Jun 2008, Jeremy Fitzhardinge wrote:
>>     
>>> Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":
>>>       
>> Yeah, I was a bit worried about that. Gcc sometimes does insane 
>> things.
>>
>> We literally just tested that the asm should only _ever_ be generated 
>> with a constant value, but if some gcc dead-code removal thing doesn't 
>> work, it will then screw up and try to generate the asm even for a 
>> non-constant thing.
>>
>> The fairly trivial fix is probably to just change the "i" to "ir", 
>> safe in the knowledge that any _sane_ case will never use the "r" 
>> possibility. I suspect even your insane case will end up then killing 
>> the bad choice later.
>>     
>
> okay - Jeremy, could you try the fix below? (or tip/master, i just 
> pushed this out)
>
>   

Hm.  On 32-bit I get these, but they're warnings.  On 64-bit they're errors.

{standard input}: Assembler messages:
{standard input}:1609: Warning: using `%dl' instead of `%edx' due to `b' 
suffix

vs

{standard input}: Assembler messages:
{standard input}:20511: Error: Incorrect register `%eax' used with `b' 
suffix


> (i dont use gcc 3.x myself to build the kernel, had way too many 
> miscompilations in randconfig tests in the past.)
>   

I do it mainly to pick up these kinds of problems.


> 	Ingo
>
> -------------->
> commit b68b80b8ab39c42707dc126c41e87d46edc97c27
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Fri Jun 20 21:50:20 2008 +0200
>
>     x86, bitops: make constant-bit set/clear_bit ops faster, gcc workaround
>     
>     Jeremy Fitzhardinge reported this compiler bug:
>     
>     Suggestion from Linus: add "r" to the input constraint of the
>     set_bit()/clear_bit()'s constant 'nr' branch:
>     
>     Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":
>     
>      CC      init/main.o
>     include2/asm/bitops.h: In function `start_kernel':
>     include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
>     include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
>     include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
>     include2/asm/bitops.h:59: error: impossible constraint in `asm'
>     include2/asm/bitops.h:59: error: impossible constraint in `asm'
>     include2/asm/bitops.h:59: error: impossible constraint in `asm'
>     
>     Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>
> diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
> index 6c50548..4575de4 100644
> --- a/include/asm-x86/bitops.h
> +++ b/include/asm-x86/bitops.h
> @@ -58,7 +58,7 @@ static inline void set_bit(unsigned int nr, volatile unsigned long *addr)
>  	if (IS_IMMEDIATE(nr)) {
>  		asm volatile(LOCK_PREFIX "orb %1,%0"
>  			: CONST_MASK_ADDR(nr, addr)
> -			: "i" (CONST_MASK(nr))
> +			: "ir" (CONST_MASK(nr))
>  			: "memory");
>  	} else {
>  		asm volatile(LOCK_PREFIX "bts %1,%0"
> @@ -95,7 +95,7 @@ static inline void clear_bit(int nr, volatile unsigned long *addr)
>  	if (IS_IMMEDIATE(nr)) {
>  		asm volatile(LOCK_PREFIX "andb %1,%0"
>  			: CONST_MASK_ADDR(nr, addr)
> -			: "i" (~CONST_MASK(nr)));
> +			: "ir" (~CONST_MASK(nr)));
>  	} else {
>  		asm volatile(LOCK_PREFIX "btr %1,%0"
>  			: BITOP_ADDR(addr)
>   

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 20:03                               ` Linus Torvalds
@ 2008-06-20 20:16                                 ` Jeremy Fitzhardinge
  2008-06-20 20:22                                   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-20 20:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner

Linus Torvalds wrote:
> On Fri, 20 Jun 2008, Ingo Molnar wrote:
>   
>> okay - Jeremy, could you try the fix below? (or tip/master, i just 
>> pushed this out)
>>     
>
> Actually, don't try that one.
>
> It needs to be a _byte_ registers, so "ir" was wrong. You need "iq".
>   

Doesn't work, unfortunately:
{standard input}:20511: Error: Incorrect register `%eax' used with `b' 
suffix

        lock; orb %eax,1(%rdi)  # tmp64,

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 20:16                                 ` Jeremy Fitzhardinge
@ 2008-06-20 20:22                                   ` Jeremy Fitzhardinge
  2008-06-21  6:06                                     ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Jeremy Fitzhardinge @ 2008-06-20 20:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Ingo Molnar,
	Thomas Gleixner

Jeremy Fitzhardinge wrote:
> Linus Torvalds wrote:
>   
>> On Fri, 20 Jun 2008, Ingo Molnar wrote:
>>   
>>     
>>> okay - Jeremy, could you try the fix below? (or tip/master, i just 
>>> pushed this out)
>>>     
>>>       
>> Actually, don't try that one.
>>
>> It needs to be a _byte_ registers, so "ir" was wrong. You need "iq".
>>   
>>     
>
> Doesn't work, unfortunately:
> {standard input}:20511: Error: Incorrect register `%eax' used with `b' 
> suffix
>
>         lock; orb %eax,1(%rdi)  # tmp64,
>   

This does work:

                asm volatile(LOCK_PREFIX "orb %1,%0"
                        : CONST_MASK_ADDR(nr, addr)
                        : "iq" ((u8)CONST_MASK(nr))
                        : "memory");

(ie, explicitly casting the mask to u8)

    J

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction
  2008-06-20 20:22                                   ` Jeremy Fitzhardinge
@ 2008-06-21  6:06                                     ` Ingo Molnar
  0 siblings, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2008-06-21  6:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel, Peter Zijlstra, kvm-devel, benh, x86, LKML,
	Virtualization Mailing List, Hugh Dickins, Thomas Gleixner,
	Linus Torvalds


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Jeremy Fitzhardinge wrote:
>> Linus Torvalds wrote:
>>   
>>> On Fri, 20 Jun 2008, Ingo Molnar wrote:
>>>       
>>>> okay - Jeremy, could you try the fix below? (or tip/master, i just  
>>>> pushed this out)
>>>>           
>>> Actually, don't try that one.
>>>
>>> It needs to be a _byte_ registers, so "ir" was wrong. You need "iq".
>>>       
>>
>> Doesn't work, unfortunately:
>> {standard input}:20511: Error: Incorrect register `%eax' used with `b'  
>> suffix
>>
>>         lock; orb %eax,1(%rdi)  # tmp64,
>>   
>
> This does work:
>
>                asm volatile(LOCK_PREFIX "orb %1,%0"
>                        : CONST_MASK_ADDR(nr, addr)
>                        : "iq" ((u8)CONST_MASK(nr))
>                        : "memory");
>
> (ie, explicitly casting the mask to u8)

ok, i've pushed out the fix with this.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2008-06-21  6:06 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-16 11:29 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
2008-06-16 11:30 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
2008-06-16 17:29   ` Linus Torvalds
2008-06-16 18:13     ` Hugh Dickins
2008-06-16 18:49       ` Ingo Molnar
2008-06-18 23:23   ` Benjamin Herrenschmidt
2008-06-18 23:59     ` Jeremy Fitzhardinge
2008-06-19  0:15       ` Jeremy Fitzhardinge
2008-06-19  0:24         ` Linus Torvalds
2008-06-19  0:37           ` Jeremy Fitzhardinge
2008-06-19  0:49             ` Linus Torvalds
2008-06-19  4:03               ` Linus Torvalds
2008-06-19 11:58                 ` Ingo Molnar
2008-06-19 12:03                   ` Ingo Molnar
2008-06-19 12:20                   ` Akinobu Mita
2008-06-19 16:30                   ` Linus Torvalds
2008-06-19 16:47                     ` Ingo Molnar
2008-06-20 10:10                       ` Ingo Molnar
2008-06-20 19:06                         ` Jeremy Fitzhardinge
2008-06-20 19:15                           ` Linus Torvalds
2008-06-20 19:56                             ` Ingo Molnar
2008-06-20 20:03                               ` Linus Torvalds
2008-06-20 20:16                                 ` Jeremy Fitzhardinge
2008-06-20 20:22                                   ` Jeremy Fitzhardinge
2008-06-21  6:06                                     ` Ingo Molnar
2008-06-20 20:05                               ` Jeremy Fitzhardinge
2008-06-19  0:39           ` Benjamin Herrenschmidt
2008-06-19  5:03             ` Jeremy Fitzhardinge
2008-06-19  7:20               ` Benjamin Herrenschmidt
2008-06-19 17:57                 ` Jeremy Fitzhardinge
2008-06-16 11:30 ` [PATCH 2 of 4] paravirt: add hooks for ptep_modify_prot_start/commit Jeremy Fitzhardinge
2008-06-16 11:30 ` [PATCH 3 of 4] xen: implement ptep_modify_prot_start/commit Jeremy Fitzhardinge
2008-06-16 11:30 ` [PATCH 4 of 4] xen: add mechanism to extend existing multicalls Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2008-05-31  0:04 [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction (take 2) Jeremy Fitzhardinge
2008-05-31  0:04 ` [PATCH 1 of 4] mm: add a ptep_modify_prot transaction abstraction Jeremy Fitzhardinge
2008-06-02 11:13   ` Ingo Molnar
2008-06-02 11:57     ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).