virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/4] Pte xchg optimization.patch
@ 2007-04-12  5:30 Zachary Amsden
  2007-04-12 18:18 ` Dave Jones
  0 siblings, 1 reply; 3+ messages in thread
From: Zachary Amsden @ 2007-04-12  5:30 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, Jeremy Fitzhardinge, Rusty Russell,
	Chris Wright, Hugh Dickins, David Rientjes, Michel Lespinasse,
	Virtualization Mailing List, Linux Kernel Mailing List,
	Zachary Amsden

In situations where page table updates need only be made locally, and there
is no cross-processor A/D bit races involved, we need not use the heavyweight
xchg instruction to atomically fetch and clear page table entries.  Instead,
we can just read and clear them directly.

This introduces a neat optimization for non-SMP kernels; drop the atomic
xchg operations from page table updates.

Thanks to Michel Lespinasse for noting this potential optimization.

Signed-off-by: Zachary Amsden <zach@vmware.com>

diff -r 47495b2532b3 include/asm-i386/pgtable-2level.h
--- a/include/asm-i386/pgtable-2level.h	Wed Apr 11 18:23:01 2007 -0700
+++ b/include/asm-i386/pgtable-2level.h	Wed Apr 11 18:23:39 2007 -0700
@@ -41,10 +41,24 @@ static inline void native_pte_clear(stru
 	*xp = __pte(0);
 }
 
+/* local pte updates need not use xchg for locking */
+static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
+{
+	pte_t res;
+
+	res = *ptep;
+	native_pte_clear(NULL, 0, ptep);
+	return res;
+}
+
+#ifdef CONFIG_SMP
 static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 {
 	return __pte(xchg(&xp->pte_low, 0));
 }
+#else
+#define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
+#endif
 
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
 #define pte_none(x)		(!(x).pte_low)
diff -r 47495b2532b3 include/asm-i386/pgtable-3level.h
--- a/include/asm-i386/pgtable-3level.h	Wed Apr 11 18:23:01 2007 -0700
+++ b/include/asm-i386/pgtable-3level.h	Wed Apr 11 18:23:05 2007 -0700
@@ -139,6 +139,17 @@ static inline void pud_clear (pud_t * pu
 #define pmd_offset(pud, address) ((pmd_t *) pud_page(*(pud)) + \
 			pmd_index(address))
 
+/* local pte updates need not use xchg for locking */
+static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
+{
+	pte_t res;
+	
+	res = *ptep;
+	native_pte_clear(NULL, 0, ptep);
+	return res;
+}
+
+#ifdef CONFIG_SMP
 static inline pte_t native_ptep_get_and_clear(pte_t *ptep)
 {
 	pte_t res;
@@ -150,6 +161,9 @@ static inline pte_t native_ptep_get_and_
 
 	return res;
 }
+#else
+#define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
+#endif
 
 #define __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t a, pte_t b)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 3/4] Pte xchg optimization.patch
  2007-04-12  5:30 [PATCH 3/4] Pte xchg optimization.patch Zachary Amsden
@ 2007-04-12 18:18 ` Dave Jones
  2007-04-12 18:44   ` Zachary Amsden
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Jones @ 2007-04-12 18:18 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andrew Morton, Andi Kleen, Virtualization Mailing List,
	Chris Wright, David Rientjes, Hugh Dickins,
	Linux Kernel Mailing List

On Wed, Apr 11, 2007 at 10:30:58PM -0700, Zachary Amsden wrote:
 > In situations where page table updates need only be made locally, and there
 > is no cross-processor A/D bit races involved, we need not use the heavyweight
 > xchg instruction to atomically fetch and clear page table entries.  Instead,
 > we can just read and clear them directly.
 > 
 > This introduces a neat optimization for non-SMP kernels; drop the atomic
 > xchg operations from page table updates.

Would it be feasible to do this using the smp_alternatives stuff?
Given many distros are now shipping SMP-only kernels, it'd be nice
to get that win without using having to rebuild their kernels.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 3/4] Pte xchg optimization.patch
  2007-04-12 18:18 ` Dave Jones
@ 2007-04-12 18:44   ` Zachary Amsden
  0 siblings, 0 replies; 3+ messages in thread
From: Zachary Amsden @ 2007-04-12 18:44 UTC (permalink / raw)
  To: Dave Jones, Zachary Amsden, Andrew Morton, Andi Kleen,
	Jeremy Fitzhardinge, Rusty Russell, Chris Wright, Hugh Dickins,
	David Rientjes, Michel Lespinasse, Virtualization Mailing List,
	Linux Kernel Mailing List

Dave Jones wrote:
> On Wed, Apr 11, 2007 at 10:30:58PM -0700, Zachary Amsden wrote:
>  > In situations where page table updates need only be made locally, and there
>  > is no cross-processor A/D bit races involved, we need not use the heavyweight
>  > xchg instruction to atomically fetch and clear page table entries.  Instead,
>  > we can just read and clear them directly.
>  > 
>  > This introduces a neat optimization for non-SMP kernels; drop the atomic
>  > xchg operations from page table updates.
>
> Would it be feasible to do this using the smp_alternatives stuff?
> Given many distros are now shipping SMP-only kernels, it'd be nice
> to get that win without using having to rebuild their kernels.
>   

That's a good point. It is a fairly straightforward substitution for the 
static case:

xchg reg, mem

to

mov mem, reg
mov imm, mem

I'll give it a spin and see how it drives. It is, though, less obvious 
how easy it will be for PAE kernels.

Zach

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-04-12 18:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-12  5:30 [PATCH 3/4] Pte xchg optimization.patch Zachary Amsden
2007-04-12 18:18 ` Dave Jones
2007-04-12 18:44   ` Zachary Amsden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).