[PATCH 0/4] i386 - pte update optimizations

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] i386 - pte update optimizations
@ 2007-04-12  5:30 Zachary Amsden
  2007-04-13  1:25 ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Zachary Amsden @ 2007-04-12  5:30 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen, Jeremy Fitzhardinge, Rusty Russell,
	Chris Wright, Hugh Dickins, David Rientjes, Michel Lespinasse,
	Virtualization Mailing List, Linux Kernel Mailing List,
	Zachary Amsden

Some PTE optimizations for native and paravirt-ops kernels; this
provides a huge win for shadow mode hypervisors and gets rid of
some unnecessary atomic instructions in native kernels, saving
even more on UP by getting rid of implicit LOCK on xchg instruction.

Zach

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-12  5:30 [PATCH 0/4] i386 - pte update optimizations Zachary Amsden
@ 2007-04-13  1:25 ` H. Peter Anvin
  2007-04-13  2:24   ` Zachary Amsden
  0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2007-04-13  1:25 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Andrew Morton, Andi Kleen, Virtualization Mailing List,
	Chris Wright, David Rientjes, Hugh Dickins,
	Linux Kernel Mailing List

Zachary Amsden wrote:
> Some PTE optimizations for native and paravirt-ops kernels; this
> provides a huge win for shadow mode hypervisors and gets rid of
> some unnecessary atomic instructions in native kernels, saving
> even more on UP by getting rid of implicit LOCK on xchg instruction.

You do know that P6 and higher don't do locked bus references as long as 
the value is in the cache, right?

	-hpa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-13  1:25 ` H. Peter Anvin
@ 2007-04-13  2:24   ` Zachary Amsden
  2007-04-13  6:00     ` Eric Dumazet
  2007-04-13  9:31     ` Keir Fraser
  0 siblings, 2 replies; 9+ messages in thread
From: Zachary Amsden @ 2007-04-13  2:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Andi Kleen, Jeremy Fitzhardinge, Rusty Russell,
	Chris Wright, Hugh Dickins, David Rientjes, Michel Lespinasse,
	Virtualization Mailing List, Linux Kernel Mailing List

H. Peter Anvin wrote:
> Zachary Amsden wrote:
>> Some PTE optimizations for native and paravirt-ops kernels; this
>> provides a huge win for shadow mode hypervisors and gets rid of
>> some unnecessary atomic instructions in native kernels, saving
>> even more on UP by getting rid of implicit LOCK on xchg instruction.
>
> You do know that P6 and higher don't do locked bus references as long 
> as the value is in the cache, right?

Yes.  Even then, last time I clocked instructions, xchg was still slower 
than read / write, although I could be misremembering.  And it's not 
totally clear that they will always be in cached state, however, and for 
SMP, we still want to drop the implicit lock in cases where the 
processor might not know they are cached exclusive, but we know there 
are no other racing users.  And there are plenty of old processors out 
there to still make it worthwhile.

Zach

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-13  2:24   ` Zachary Amsden
@ 2007-04-13  6:00     ` Eric Dumazet
  2007-04-13  6:25       ` H. Peter Anvin
  2007-04-13  9:31     ` Keir Fraser
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2007-04-13  6:00 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: H. Peter Anvin, Andrew Morton, Andi Kleen, Jeremy Fitzhardinge,
	Rusty Russell, Chris Wright, Hugh Dickins, David Rientjes,
	Michel Lespinasse, Virtualization Mailing List,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 771 bytes --]

Zachary Amsden a écrit :
> 
> Yes.  Even then, last time I clocked instructions, xchg was still slower 
> than read / write, although I could be misremembering.  And it's not 
> totally clear that they will always be in cached state, however, and for 
> SMP, we still want to drop the implicit lock in cases where the 
> processor might not know they are cached exclusive, but we know there 
> are no other racing users.  And there are plenty of old processors out 
> there to still make it worthwhile.
> 

Is there one processor that benefit from this patch then ?

I couldnt get a win on my test machines, maybe they are not old enough ;)

umask() doesnt need xchg() atomic semantic. If several threads are using 
umask() concurrently results are not guaranted anyway.

[-- Attachment #2: umask.patch --]
[-- Type: text/plain, Size: 441 bytes --]

--- linux-2.6.21-rc6/kernel/sys.c
+++ linux-2.6.21-rc6-ed/kernel/sys.c
@@ -2138,8 +2138,10 @@ asmlinkage long sys_getrusage(int who, s
 
 asmlinkage long sys_umask(int mask)
 {
-	mask = xchg(&current->fs->umask, mask & S_IRWXUGO);
-	return mask;
+	struct fs_struct *fs = current->fs;
+	int old = fs->umask;
+	fs->umask = mask & S_IRWXUGO;
+	return old;
 }
     
 asmlinkage long sys_prctl(int option, unsigned long arg2, unsigned long arg3,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-13  6:00     ` Eric Dumazet
@ 2007-04-13  6:25       ` H. Peter Anvin
  0 siblings, 0 replies; 9+ messages in thread
From: H. Peter Anvin @ 2007-04-13  6:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Zachary Amsden, Andrew Morton, Andi Kleen, Jeremy Fitzhardinge,
	Rusty Russell, Chris Wright, Hugh Dickins, David Rientjes,
	Michel Lespinasse, Virtualization Mailing List,
	Linux Kernel Mailing List

Eric Dumazet wrote:
> 
> Is there one processor that benefit from this patch then ?
> 

At least P5 systems should benefit.

	-hpa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-13  2:24   ` Zachary Amsden
  2007-04-13  6:00     ` Eric Dumazet
@ 2007-04-13  9:31     ` Keir Fraser
  2007-04-13 12:27       ` Andi Kleen
  1 sibling, 1 reply; 9+ messages in thread
From: Keir Fraser @ 2007-04-13  9:31 UTC (permalink / raw)
  To: Zachary Amsden, H. Peter Anvin
  Cc: Andrew Morton, Andi Kleen, Virtualization Mailing List,
	Chris Wright, David Rientjes, Hugh Dickins,
	Linux Kernel Mailing List

On 13/4/07 03:24, "Zachary Amsden" <zach@vmware.com> wrote:

>> You do know that P6 and higher don't do locked bus references as long
>> as the value is in the cache, right?
> 
> Yes.  Even then, last time I clocked instructions, xchg was still slower
> than read / write, although I could be misremembering.  And it's not
> totally clear that they will always be in cached state, however, and for
> SMP, we still want to drop the implicit lock in cases where the
> processor might not know they are cached exclusive, but we know there
> are no other racing users.  And there are plenty of old processors out
> there to still make it worthwhile.

LOCKed instruction suck really badly on the netburst microarchitecture (like
factor of 10x, or not far off). I think it's probably because of their side
effect of serialising memory accesses, causing horrible pipeline stalls.

 -- Keir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-13 12:27       ` Andi Kleen
@ 2007-04-13 11:31         ` Keir Fraser
  2007-04-13 15:34           ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Keir Fraser @ 2007-04-13 11:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Zachary Amsden, H. Peter Anvin, Andrew Morton,
	Virtualization Mailing List, Chris Wright, David Rientjes,
	Hugh Dickins, Linux Kernel Mailing List

On 13/4/07 13:27, "Andi Kleen" <andi@firstfloor.org> wrote:

>> LOCKed instruction suck really badly on the netburst microarchitecture (like
>> factor of 10x, or not far off). I think it's probably because of their side
>> effect of serialising memory accesses, causing horrible pipeline stalls.
> 
> Unfortunately they tend to be HyperThreaded usually (except for early ones
> and Celerons) and need the LOCK anyways.

Fair point, although quite a few people disable HT.

 -- Keir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-13  9:31     ` Keir Fraser
@ 2007-04-13 12:27       ` Andi Kleen
  2007-04-13 11:31         ` Keir Fraser
  0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2007-04-13 12:27 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Zachary Amsden, H. Peter Anvin, Andrew Morton,
	Virtualization Mailing List, Chris Wright, David Rientjes,
	Hugh Dickins, Linux Kernel Mailing List

Keir Fraser <keir@xensource.com> writes:

> On 13/4/07 03:24, "Zachary Amsden" <zach@vmware.com> wrote:
> 
> >> You do know that P6 and higher don't do locked bus references as long
> >> as the value is in the cache, right?
> > 
> > Yes.  Even then, last time I clocked instructions, xchg was still slower
> > than read / write, although I could be misremembering.  And it's not
> > totally clear that they will always be in cached state, however, and for
> > SMP, we still want to drop the implicit lock in cases where the
> > processor might not know they are cached exclusive, but we know there
> > are no other racing users.  And there are plenty of old processors out
> > there to still make it worthwhile.
> 
> LOCKed instruction suck really badly on the netburst microarchitecture (like
> factor of 10x, or not far off). I think it's probably because of their side
> effect of serialising memory accesses, causing horrible pipeline stalls.

Unfortunately they tend to be HyperThreaded usually (except for early ones 
and Celerons) and need the LOCK anyways.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] i386 - pte update optimizations
  2007-04-13 11:31         ` Keir Fraser
@ 2007-04-13 15:34           ` H. Peter Anvin
  0 siblings, 0 replies; 9+ messages in thread
From: H. Peter Anvin @ 2007-04-13 15:34 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Andrew Morton, Andi Kleen, Chris Wright, David Rientjes,
	Virtualization Mailing List, Hugh Dickins,
	Linux Kernel Mailing List

Keir Fraser wrote:
> On 13/4/07 13:27, "Andi Kleen" <andi@firstfloor.org> wrote:
> 
>>> LOCKed instruction suck really badly on the netburst microarchitecture (like
>>> factor of 10x, or not far off). I think it's probably because of their side
>>> effect of serialising memory accesses, causing horrible pipeline stalls.
>> Unfortunately they tend to be HyperThreaded usually (except for early ones
>> and Celerons) and need the LOCK anyways.
> 
> Fair point, although quite a few people disable HT.

Note we're talking a UP-only hack.

	-hpa

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-04-13 15:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-12  5:30 [PATCH 0/4] i386 - pte update optimizations Zachary Amsden
2007-04-13  1:25 ` H. Peter Anvin
2007-04-13  2:24   ` Zachary Amsden
2007-04-13  6:00     ` Eric Dumazet
2007-04-13  6:25       ` H. Peter Anvin
2007-04-13  9:31     ` Keir Fraser
2007-04-13 12:27       ` Andi Kleen
2007-04-13 11:31         ` Keir Fraser
2007-04-13 15:34           ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).