* [PATCH 0/4] i386 - pte update optimizations @ 2007-04-12 5:30 Zachary Amsden 2007-04-13 1:25 ` H. Peter Anvin 0 siblings, 1 reply; 9+ messages in thread From: Zachary Amsden @ 2007-04-12 5:30 UTC (permalink / raw) To: Andrew Morton, Andi Kleen, Jeremy Fitzhardinge, Rusty Russell, Chris Wright, Hugh Dickins, David Rientjes, Michel Lespinasse, Virtualization Mailing List, Linux Kernel Mailing List, Zachary Amsden Some PTE optimizations for native and paravirt-ops kernels; this provides a huge win for shadow mode hypervisors and gets rid of some unnecessary atomic instructions in native kernels, saving even more on UP by getting rid of implicit LOCK on xchg instruction. Zach ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-12 5:30 [PATCH 0/4] i386 - pte update optimizations Zachary Amsden @ 2007-04-13 1:25 ` H. Peter Anvin 2007-04-13 2:24 ` Zachary Amsden 0 siblings, 1 reply; 9+ messages in thread From: H. Peter Anvin @ 2007-04-13 1:25 UTC (permalink / raw) To: Zachary Amsden Cc: Andrew Morton, Andi Kleen, Virtualization Mailing List, Chris Wright, David Rientjes, Hugh Dickins, Linux Kernel Mailing List Zachary Amsden wrote: > Some PTE optimizations for native and paravirt-ops kernels; this > provides a huge win for shadow mode hypervisors and gets rid of > some unnecessary atomic instructions in native kernels, saving > even more on UP by getting rid of implicit LOCK on xchg instruction. You do know that P6 and higher don't do locked bus references as long as the value is in the cache, right? -hpa ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-13 1:25 ` H. Peter Anvin @ 2007-04-13 2:24 ` Zachary Amsden 2007-04-13 6:00 ` Eric Dumazet 2007-04-13 9:31 ` Keir Fraser 0 siblings, 2 replies; 9+ messages in thread From: Zachary Amsden @ 2007-04-13 2:24 UTC (permalink / raw) To: H. Peter Anvin Cc: Andrew Morton, Andi Kleen, Jeremy Fitzhardinge, Rusty Russell, Chris Wright, Hugh Dickins, David Rientjes, Michel Lespinasse, Virtualization Mailing List, Linux Kernel Mailing List H. Peter Anvin wrote: > Zachary Amsden wrote: >> Some PTE optimizations for native and paravirt-ops kernels; this >> provides a huge win for shadow mode hypervisors and gets rid of >> some unnecessary atomic instructions in native kernels, saving >> even more on UP by getting rid of implicit LOCK on xchg instruction. > > You do know that P6 and higher don't do locked bus references as long > as the value is in the cache, right? Yes. Even then, last time I clocked instructions, xchg was still slower than read / write, although I could be misremembering. And it's not totally clear that they will always be in cached state, however, and for SMP, we still want to drop the implicit lock in cases where the processor might not know they are cached exclusive, but we know there are no other racing users. And there are plenty of old processors out there to still make it worthwhile. Zach ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-13 2:24 ` Zachary Amsden @ 2007-04-13 6:00 ` Eric Dumazet 2007-04-13 6:25 ` H. Peter Anvin 2007-04-13 9:31 ` Keir Fraser 1 sibling, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2007-04-13 6:00 UTC (permalink / raw) To: Zachary Amsden Cc: H. Peter Anvin, Andrew Morton, Andi Kleen, Jeremy Fitzhardinge, Rusty Russell, Chris Wright, Hugh Dickins, David Rientjes, Michel Lespinasse, Virtualization Mailing List, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 771 bytes --] Zachary Amsden a écrit : > > Yes. Even then, last time I clocked instructions, xchg was still slower > than read / write, although I could be misremembering. And it's not > totally clear that they will always be in cached state, however, and for > SMP, we still want to drop the implicit lock in cases where the > processor might not know they are cached exclusive, but we know there > are no other racing users. And there are plenty of old processors out > there to still make it worthwhile. > Is there one processor that benefit from this patch then ? I couldnt get a win on my test machines, maybe they are not old enough ;) umask() doesnt need xchg() atomic semantic. If several threads are using umask() concurrently results are not guaranted anyway. [-- Attachment #2: umask.patch --] [-- Type: text/plain, Size: 441 bytes --] --- linux-2.6.21-rc6/kernel/sys.c +++ linux-2.6.21-rc6-ed/kernel/sys.c @@ -2138,8 +2138,10 @@ asmlinkage long sys_getrusage(int who, s asmlinkage long sys_umask(int mask) { - mask = xchg(¤t->fs->umask, mask & S_IRWXUGO); - return mask; + struct fs_struct *fs = current->fs; + int old = fs->umask; + fs->umask = mask & S_IRWXUGO; + return old; } asmlinkage long sys_prctl(int option, unsigned long arg2, unsigned long arg3, ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-13 6:00 ` Eric Dumazet @ 2007-04-13 6:25 ` H. Peter Anvin 0 siblings, 0 replies; 9+ messages in thread From: H. Peter Anvin @ 2007-04-13 6:25 UTC (permalink / raw) To: Eric Dumazet Cc: Zachary Amsden, Andrew Morton, Andi Kleen, Jeremy Fitzhardinge, Rusty Russell, Chris Wright, Hugh Dickins, David Rientjes, Michel Lespinasse, Virtualization Mailing List, Linux Kernel Mailing List Eric Dumazet wrote: > > Is there one processor that benefit from this patch then ? > At least P5 systems should benefit. -hpa ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-13 2:24 ` Zachary Amsden 2007-04-13 6:00 ` Eric Dumazet @ 2007-04-13 9:31 ` Keir Fraser 2007-04-13 12:27 ` Andi Kleen 1 sibling, 1 reply; 9+ messages in thread From: Keir Fraser @ 2007-04-13 9:31 UTC (permalink / raw) To: Zachary Amsden, H. Peter Anvin Cc: Andrew Morton, Andi Kleen, Virtualization Mailing List, Chris Wright, David Rientjes, Hugh Dickins, Linux Kernel Mailing List On 13/4/07 03:24, "Zachary Amsden" <zach@vmware.com> wrote: >> You do know that P6 and higher don't do locked bus references as long >> as the value is in the cache, right? > > Yes. Even then, last time I clocked instructions, xchg was still slower > than read / write, although I could be misremembering. And it's not > totally clear that they will always be in cached state, however, and for > SMP, we still want to drop the implicit lock in cases where the > processor might not know they are cached exclusive, but we know there > are no other racing users. And there are plenty of old processors out > there to still make it worthwhile. LOCKed instruction suck really badly on the netburst microarchitecture (like factor of 10x, or not far off). I think it's probably because of their side effect of serialising memory accesses, causing horrible pipeline stalls. -- Keir ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-13 9:31 ` Keir Fraser @ 2007-04-13 12:27 ` Andi Kleen 2007-04-13 11:31 ` Keir Fraser 0 siblings, 1 reply; 9+ messages in thread From: Andi Kleen @ 2007-04-13 12:27 UTC (permalink / raw) To: Keir Fraser Cc: Zachary Amsden, H. Peter Anvin, Andrew Morton, Virtualization Mailing List, Chris Wright, David Rientjes, Hugh Dickins, Linux Kernel Mailing List Keir Fraser <keir@xensource.com> writes: > On 13/4/07 03:24, "Zachary Amsden" <zach@vmware.com> wrote: > > >> You do know that P6 and higher don't do locked bus references as long > >> as the value is in the cache, right? > > > > Yes. Even then, last time I clocked instructions, xchg was still slower > > than read / write, although I could be misremembering. And it's not > > totally clear that they will always be in cached state, however, and for > > SMP, we still want to drop the implicit lock in cases where the > > processor might not know they are cached exclusive, but we know there > > are no other racing users. And there are plenty of old processors out > > there to still make it worthwhile. > > LOCKed instruction suck really badly on the netburst microarchitecture (like > factor of 10x, or not far off). I think it's probably because of their side > effect of serialising memory accesses, causing horrible pipeline stalls. Unfortunately they tend to be HyperThreaded usually (except for early ones and Celerons) and need the LOCK anyways. -Andi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-13 12:27 ` Andi Kleen @ 2007-04-13 11:31 ` Keir Fraser 2007-04-13 15:34 ` H. Peter Anvin 0 siblings, 1 reply; 9+ messages in thread From: Keir Fraser @ 2007-04-13 11:31 UTC (permalink / raw) To: Andi Kleen Cc: Zachary Amsden, H. Peter Anvin, Andrew Morton, Virtualization Mailing List, Chris Wright, David Rientjes, Hugh Dickins, Linux Kernel Mailing List On 13/4/07 13:27, "Andi Kleen" <andi@firstfloor.org> wrote: >> LOCKed instruction suck really badly on the netburst microarchitecture (like >> factor of 10x, or not far off). I think it's probably because of their side >> effect of serialising memory accesses, causing horrible pipeline stalls. > > Unfortunately they tend to be HyperThreaded usually (except for early ones > and Celerons) and need the LOCK anyways. Fair point, although quite a few people disable HT. -- Keir ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/4] i386 - pte update optimizations 2007-04-13 11:31 ` Keir Fraser @ 2007-04-13 15:34 ` H. Peter Anvin 0 siblings, 0 replies; 9+ messages in thread From: H. Peter Anvin @ 2007-04-13 15:34 UTC (permalink / raw) To: Keir Fraser Cc: Andrew Morton, Andi Kleen, Chris Wright, David Rientjes, Virtualization Mailing List, Hugh Dickins, Linux Kernel Mailing List Keir Fraser wrote: > On 13/4/07 13:27, "Andi Kleen" <andi@firstfloor.org> wrote: > >>> LOCKed instruction suck really badly on the netburst microarchitecture (like >>> factor of 10x, or not far off). I think it's probably because of their side >>> effect of serialising memory accesses, causing horrible pipeline stalls. >> Unfortunately they tend to be HyperThreaded usually (except for early ones >> and Celerons) and need the LOCK anyways. > > Fair point, although quite a few people disable HT. Note we're talking a UP-only hack. -hpa ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-04-13 15:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-12 5:30 [PATCH 0/4] i386 - pte update optimizations Zachary Amsden 2007-04-13 1:25 ` H. Peter Anvin 2007-04-13 2:24 ` Zachary Amsden 2007-04-13 6:00 ` Eric Dumazet 2007-04-13 6:25 ` H. Peter Anvin 2007-04-13 9:31 ` Keir Fraser 2007-04-13 12:27 ` Andi Kleen 2007-04-13 11:31 ` Keir Fraser 2007-04-13 15:34 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).