From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <andi@firstfloor.org>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
mingo@redhat.com
Subject: Re: [PATCH] SLUB use cmpxchg_local
Date: Tue, 21 Aug 2007 21:12:25 -0400 [thread overview]
Message-ID: <20070822011225.GA4124@Krystal> (raw)
In-Reply-To: <Pine.LNX.4.64.0708211757360.3118@schroedinger.engr.sgi.com>
* Christoph Lameter (clameter@sgi.com) wrote:
> Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz
> (hyperthreading enabled). Test run with your module show only minor
> performance improvements and lots of regressions. So we must have
> cmpxchg_local to see any improvements? Some kind of a recent optimization
> of cmpxchg performance that we do not see on older cpus?
>
I did not expect the cmpxchg with LOCK prefix to be faster than irq
save/restore. You will need to run these tests using cmpxchg_local to
see an improvement.
Mathieu
>
> Code of kmem_cache_alloc (to show you that there are no debug options on):
>
> Dump of assembler code for function kmem_cache_alloc:
> 0x4015cfa9 <kmem_cache_alloc+0>: push %ebp
> 0x4015cfaa <kmem_cache_alloc+1>: mov %esp,%ebp
> 0x4015cfac <kmem_cache_alloc+3>: push %edi
> 0x4015cfad <kmem_cache_alloc+4>: push %esi
> 0x4015cfae <kmem_cache_alloc+5>: push %ebx
> 0x4015cfaf <kmem_cache_alloc+6>: sub $0x10,%esp
> 0x4015cfb2 <kmem_cache_alloc+9>: mov %eax,%esi
> 0x4015cfb4 <kmem_cache_alloc+11>: mov %edx,0xffffffe8(%ebp)
> 0x4015cfb7 <kmem_cache_alloc+14>: mov 0x4(%ebp),%eax
> 0x4015cfba <kmem_cache_alloc+17>: mov %eax,0xfffffff0(%ebp)
> 0x4015cfbd <kmem_cache_alloc+20>: mov %fs:0x404af008,%eax
> 0x4015cfc3 <kmem_cache_alloc+26>: mov 0x90(%esi,%eax,4),%edi
> 0x4015cfca <kmem_cache_alloc+33>: mov (%edi),%ecx
> 0x4015cfcc <kmem_cache_alloc+35>: test %ecx,%ecx
> 0x4015cfce <kmem_cache_alloc+37>: je 0x4015d00a <kmem_cache_alloc+97>
> 0x4015cfd0 <kmem_cache_alloc+39>: mov 0xc(%edi),%eax
> 0x4015cfd3 <kmem_cache_alloc+42>: mov (%ecx,%eax,4),%eax
> 0x4015cfd6 <kmem_cache_alloc+45>: mov %eax,%edx
> 0x4015cfd8 <kmem_cache_alloc+47>: mov %ecx,%eax
> 0x4015cfda <kmem_cache_alloc+49>: lock cmpxchg %edx,(%edi)
> 0x4015cfde <kmem_cache_alloc+53>: mov %eax,%ebx
> 0x4015cfe0 <kmem_cache_alloc+55>: cmp %ecx,%eax
> 0x4015cfe2 <kmem_cache_alloc+57>: jne 0x4015cfbd <kmem_cache_alloc+20>
> 0x4015cfe4 <kmem_cache_alloc+59>: cmpw $0x0,0xffffffe8(%ebp)
> 0x4015cfe9 <kmem_cache_alloc+64>: jns 0x4015d006 <kmem_cache_alloc+93>
> 0x4015cfeb <kmem_cache_alloc+66>: mov 0x10(%edi),%edx
> 0x4015cfee <kmem_cache_alloc+69>: xor %eax,%eax
> 0x4015cff0 <kmem_cache_alloc+71>: mov %edx,%ecx
> 0x4015cff2 <kmem_cache_alloc+73>: shr $0x2,%ecx
> 0x4015cff5 <kmem_cache_alloc+76>: mov %ebx,%edi
>
> Base
>
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 332 cycles kfree -> 422 cycles
> 10000 times kmalloc(16) -> 218 cycles kfree -> 360 cycles
> 10000 times kmalloc(32) -> 214 cycles kfree -> 368 cycles
> 10000 times kmalloc(64) -> 244 cycles kfree -> 390 cycles
> 10000 times kmalloc(128) -> 320 cycles kfree -> 417 cycles
> 10000 times kmalloc(256) -> 438 cycles kfree -> 550 cycles
> 10000 times kmalloc(512) -> 527 cycles kfree -> 626 cycles
> 10000 times kmalloc(1024) -> 678 cycles kfree -> 775 cycles
> 10000 times kmalloc(2048) -> 748 cycles kfree -> 822 cycles
> 10000 times kmalloc(4096) -> 641 cycles kfree -> 650 cycles
> 10000 times kmalloc(8192) -> 741 cycles kfree -> 817 cycles
> 10000 times kmalloc(16384) -> 872 cycles kfree -> 927 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 332 cycles
> 10000 times kmalloc(16)/kfree -> 327 cycles
> 10000 times kmalloc(32)/kfree -> 323 cycles
> 10000 times kmalloc(64)/kfree -> 320 cycles
> 10000 times kmalloc(128)/kfree -> 320 cycles
> 10000 times kmalloc(256)/kfree -> 333 cycles
> 10000 times kmalloc(512)/kfree -> 332 cycles
> 10000 times kmalloc(1024)/kfree -> 330 cycles
> 10000 times kmalloc(2048)/kfree -> 334 cycles
> 10000 times kmalloc(4096)/kfree -> 674 cycles
> 10000 times kmalloc(8192)/kfree -> 1155 cycles
> 10000 times kmalloc(16384)/kfree -> 1226 cycles
>
> Slub cmpxchg.
>
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 296 cycles kfree -> 515 cycles
> 10000 times kmalloc(16) -> 193 cycles kfree -> 412 cycles
> 10000 times kmalloc(32) -> 188 cycles kfree -> 422 cycles
> 10000 times kmalloc(64) -> 222 cycles kfree -> 441 cycles
> 10000 times kmalloc(128) -> 292 cycles kfree -> 476 cycles
> 10000 times kmalloc(256) -> 414 cycles kfree -> 589 cycles
> 10000 times kmalloc(512) -> 513 cycles kfree -> 673 cycles
> 10000 times kmalloc(1024) -> 694 cycles kfree -> 825 cycles
> 10000 times kmalloc(2048) -> 739 cycles kfree -> 878 cycles
> 10000 times kmalloc(4096) -> 636 cycles kfree -> 653 cycles
> 10000 times kmalloc(8192) -> 715 cycles kfree -> 799 cycles
> 10000 times kmalloc(16384) -> 855 cycles kfree -> 927 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 354 cycles
> 10000 times kmalloc(16)/kfree -> 336 cycles
> 10000 times kmalloc(32)/kfree -> 335 cycles
> 10000 times kmalloc(64)/kfree -> 337 cycles
> 10000 times kmalloc(128)/kfree -> 337 cycles
> 10000 times kmalloc(256)/kfree -> 355 cycles
> 10000 times kmalloc(512)/kfree -> 354 cycles
> 10000 times kmalloc(1024)/kfree -> 337 cycles
> 10000 times kmalloc(2048)/kfree -> 339 cycles
> 10000 times kmalloc(4096)/kfree -> 674 cycles
> 10000 times kmalloc(8192)/kfree -> 1128 cycles
> 10000 times kmalloc(16384)/kfree -> 1240 cycles
>
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2007-08-22 1:12 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-20 20:15 [patch 00/23] cmpxchg_local and cmpxchg64_local implementation Mathieu Desnoyers
2007-08-20 20:15 ` [patch 01/23] Fall back on interrupt disable in cmpxchg8b on 80386 and 80486 Mathieu Desnoyers
2007-08-20 20:32 ` Christoph Lameter
2007-08-20 20:41 ` Mathieu Desnoyers
2007-08-20 20:46 ` Christoph Lameter
2007-08-20 21:29 ` Mathieu Desnoyers
2007-08-20 21:49 ` Christoph Lameter
2007-08-20 21:54 ` Mathieu Desnoyers
2007-08-20 22:07 ` Christoph Lameter
2007-08-20 22:29 ` Mathieu Desnoyers
2007-08-21 17:38 ` [PATCH] SLUB Use cmpxchg() everywhere Mathieu Desnoyers
2007-08-21 17:38 ` [PATCH] SLUB use cmpxchg_local Mathieu Desnoyers
2007-08-21 17:44 ` Mathieu Desnoyers
2007-08-21 21:10 ` Christoph Lameter
2007-08-21 23:21 ` Mathieu Desnoyers
2007-08-21 23:35 ` Christoph Lameter
2007-08-21 23:38 ` Christoph Lameter
2007-08-21 20:41 ` Mathieu Desnoyers
2007-08-21 21:36 ` Christoph Lameter
2007-08-21 21:08 ` Christoph Lameter
2007-08-21 23:12 ` Mathieu Desnoyers
2007-08-21 23:17 ` Christoph Lameter
2007-08-21 23:39 ` Mathieu Desnoyers
2007-08-21 23:41 ` Christoph Lameter
2007-08-21 23:47 ` Mathieu Desnoyers
2007-08-21 23:51 ` Christoph Lameter
2007-08-22 0:03 ` Mathieu Desnoyers
2007-08-22 0:11 ` Christoph Lameter
2007-08-22 0:26 ` Mathieu Desnoyers
2007-08-22 0:34 ` Christoph Lameter
2007-08-22 1:18 ` Mathieu Desnoyers
2007-08-22 15:00 ` [PATCH] define have_arch_cmpxchg() Mathieu Desnoyers
2007-08-22 18:50 ` Christoph Lameter
2007-08-22 15:02 ` [PATCH] SLUB: use have_arch_cmpxchg() Mathieu Desnoyers
2007-08-22 16:24 ` Pekka Enberg
2007-08-27 14:56 ` Mathieu Desnoyers
2007-08-27 19:43 ` Christoph Lameter
2007-08-27 20:25 ` Mathieu Desnoyers
2007-08-22 1:28 ` [PATCH] SLUB use cmpxchg_local Andi Kleen
2007-08-22 0:38 ` Mathieu Desnoyers
2007-08-22 1:06 ` Christoph Lameter
2007-08-22 1:12 ` Mathieu Desnoyers [this message]
2007-08-22 9:39 ` Andi Kleen
2007-08-22 13:45 ` Mathieu Desnoyers
2007-08-22 13:46 ` Andi Kleen
2007-08-22 18:54 ` Christoph Lameter
2007-08-22 19:25 ` Christoph Lameter
2007-08-22 20:09 ` Mathieu Desnoyers
2007-08-22 20:19 ` Christoph Lameter
2007-08-22 20:29 ` Mathieu Desnoyers
2007-08-22 20:33 ` Christoph Lameter
2007-08-22 20:38 ` Christoph Lameter
2007-08-21 23:14 ` Christoph Lameter
2007-08-21 23:23 ` Mathieu Desnoyers
2007-08-21 23:50 ` Mathieu Desnoyers
2007-08-27 6:52 ` Peter Zijlstra
2007-08-27 19:39 ` Christoph Lameter
2007-08-27 20:22 ` Mathieu Desnoyers
2007-08-27 20:26 ` Christoph Lameter
2007-08-27 20:39 ` Mathieu Desnoyers
2007-08-27 21:04 ` Christoph Lameter
2007-08-27 21:10 ` Mathieu Desnoyers
2007-08-27 21:23 ` Christoph Lameter
2007-08-27 21:38 ` Mathieu Desnoyers
2007-08-27 22:12 ` Christoph Lameter
2007-08-27 22:27 ` Mathieu Desnoyers
2007-08-27 22:29 ` Christoph Lameter
2007-08-28 1:26 ` Christoph Lameter
2007-08-28 1:26 ` Christoph Lameter
2007-08-28 12:07 ` Mathieu Desnoyers
2007-08-28 12:07 ` Mathieu Desnoyers
2007-08-28 19:42 ` Christoph Lameter
2007-08-28 19:42 ` Christoph Lameter
2007-09-04 20:02 ` Mathieu Desnoyers
2007-09-04 20:02 ` Mathieu Desnoyers
2007-09-04 20:03 ` [PATCH] local_t protection (critical section) Mathieu Desnoyers
2007-09-04 20:03 ` Mathieu Desnoyers
2007-09-04 20:04 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
2007-09-04 20:04 ` Mathieu Desnoyers
2007-09-04 20:45 ` Christoph Lameter
2007-09-04 20:45 ` Christoph Lameter
2007-09-05 13:03 ` Mathieu Desnoyers
2007-09-05 13:03 ` Mathieu Desnoyers
2007-09-05 13:04 ` [PATCH] local_t protection (critical section) Mathieu Desnoyers
2007-09-05 13:04 ` Mathieu Desnoyers
2007-09-12 22:33 ` Christoph Lameter
2007-09-12 22:33 ` Christoph Lameter
2007-09-12 23:00 ` Mathieu Desnoyers
2007-09-12 23:00 ` Mathieu Desnoyers
2007-09-05 13:06 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
2007-09-05 13:06 ` Mathieu Desnoyers
2007-09-12 22:28 ` Christoph Lameter
2007-09-12 22:28 ` Christoph Lameter
2007-08-27 22:15 ` [PATCH] SLUB use cmpxchg_local Christoph Lameter
2007-08-28 7:12 ` Peter Zijlstra
2007-08-28 19:36 ` Christoph Lameter
2007-08-28 19:46 ` Peter Zijlstra
2007-08-20 20:15 ` [patch 02/23] Add cmpxchg_local to asm-generic for per cpu atomic operations Mathieu Desnoyers
2007-08-20 20:15 ` [patch 03/23] Add cmpxchg_local to arm Mathieu Desnoyers
2007-08-20 20:15 ` [patch 04/23] Add cmpxchg_local to avr32 Mathieu Desnoyers
2007-08-20 20:15 ` [patch 05/23] Add cmpxchg_local to blackfin, replace __cmpxchg by generic cmpxchg Mathieu Desnoyers
2007-08-20 20:15 ` [patch 06/23] Add cmpxchg_local to cris Mathieu Desnoyers
2007-08-20 20:15 ` [patch 07/23] Add cmpxchg_local to frv Mathieu Desnoyers
2007-08-20 20:15 ` [patch 08/23] Add cmpxchg_local to h8300 Mathieu Desnoyers
2007-08-20 20:15 ` [patch 09/23] Add cmpxchg_local, cmpxchg64 and cmpxchg64_local to ia64 Mathieu Desnoyers
2007-08-20 20:15 ` [patch 10/23] New cmpxchg_local (optimized for UP case) for m32r Mathieu Desnoyers
2007-08-21 9:36 ` Hirokazu Takata
2007-08-20 20:15 ` [patch 11/23] Fix m32r __xchg Mathieu Desnoyers
2007-08-21 9:39 ` Hirokazu Takata
2007-08-20 20:15 ` [patch 12/23] local_t m32r use architecture specific cmpxchg_local Mathieu Desnoyers
2007-08-21 9:34 ` Hirokazu Takata
2007-08-21 14:01 ` Mathieu Desnoyers
2007-08-20 20:15 ` [patch 13/23] Add cmpxchg_local to m86k Mathieu Desnoyers
2007-08-20 20:15 ` [patch 14/23] Add cmpxchg_local to m68knommu Mathieu Desnoyers
2007-08-20 20:15 ` [patch 15/23] Add cmpxchg_local to parisc Mathieu Desnoyers
2007-08-20 20:15 ` [parisc-linux] " Mathieu Desnoyers
2007-08-20 20:15 ` [patch 16/23] Add cmpxchg_local to ppc Mathieu Desnoyers
2007-08-20 20:15 ` [patch 17/23] Add cmpxchg_local to s390 Mathieu Desnoyers
2007-08-20 20:15 ` [patch 18/23] Add cmpxchg_local to sh, use generic cmpxchg() instead of cmpxchg_u32 Mathieu Desnoyers
2007-08-20 20:15 ` [patch 19/23] Add cmpxchg_local to sh64 Mathieu Desnoyers
2007-08-20 20:15 ` [patch 20/23] Add cmpxchg_local to sparc, move __cmpxchg to system.h Mathieu Desnoyers
2007-08-20 20:15 ` Mathieu Desnoyers
2007-08-20 20:15 ` [patch 21/23] Add cmpxchg_local to sparc64 Mathieu Desnoyers
2007-08-20 20:15 ` Mathieu Desnoyers
2007-08-20 23:34 ` Julian Calaby
2007-08-20 23:34 ` Julian Calaby
2007-08-20 23:36 ` Christoph Lameter
2007-08-20 23:36 ` Christoph Lameter
2007-08-20 23:42 ` Julian Calaby
2007-08-20 23:42 ` Julian Calaby
2007-08-20 23:43 ` [patch 21/23] Add cmpxchg_local to sparc64 (update) Mathieu Desnoyers
2007-08-20 23:43 ` Mathieu Desnoyers
2007-08-20 20:15 ` [patch 22/23] Add cmpxchg_local to v850 Mathieu Desnoyers
2007-08-20 20:15 ` [patch 23/23] Add cmpxchg_local to xtensa Mathieu Desnoyers
2007-08-20 20:29 ` [patch 00/23] cmpxchg_local and cmpxchg64_local implementation Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070822011225.GA4124@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.