From: Eric Dumazet <dada1@cosmosbay.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <christoph@lameter.com>,
Alok N Kataria <alokk@calsoftinc.com>,
Shobhit Dayal <shobhit@calsoftinc.com>,
Shai Fultheim <shai@scalex86.org>, Matt Mackall <mpm@selenic.com>,
Andrew Morton <akpm@osdl.org>, john stultz <johnstul@us.ibm.com>,
Gunter Ohrner <G.Ohrner@post.rwth-aachen.de>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RT 00/02] SLOB optimizations
Date: Wed, 21 Dec 2005 09:02:15 +0100 [thread overview]
Message-ID: <43A90C07.4000003@cosmosbay.com> (raw)
In-Reply-To: <20051221074346.GA2398@elte.hu>
Ingo Molnar a écrit :
> * Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>
>>>while it could possibly be cleaned up a bit, it's one of the
>>>best-optimized subsystems Linux has. Most of the "unnecessary
>>>complexity" in SLAB is related to a performance or a debugging feature.
>>>Many times i have looked at the SLAB code in a disassembler, right next
>>>to profile output from some hot workload, and have concluded: 'I couldnt
>>>do this any better even with hand-coded assembly'.
>>
>>Well, I miss a version of kmem_cache_alloc()/kmem_cache_free() that
>>wont play with IRQ masking.
>
>
> sure, but adding this sure wont reduce complexity ;)
>
>
>>The local_irq_save()/local_irq_restore() pair is quite expensive and
>>could be avoided for several caches that are exclusively used in
>>process context.
>
>
> in any case, on sane platforms (i386, x86_64) an irq-disable is
> well-optimized in hardware, and is just as fast as a preempt_disable().
>
I'm afraid its not the case on current hardware.
The irq enable/disable pair count for more than 50% the cpu time spent in
kmem_cache_alloc()/kmem_cache_free()/kfree()
oprofile results on a dual Opteron 246 :
You can see the high profile numbers right after cli and popf(sti)
instructions, popf being VERY expensive.
CPU: Hammer, speed 1993.39 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 50000
29993 1.9317 kfree
18654 1.2014 kmem_cache_alloc
12962 0.8348 kmem_cache_free
ffffffff8015c370 <kfree>: /* kfree total: 30334 1.9335 */
770 0.0491 :ffffffff8015c370: push %rbp
2477 0.1579 :ffffffff8015c371: mov %rdi,%rbp
:ffffffff8015c374: push %rbx
:ffffffff8015c375: sub $0x8,%rsp
1792 0.1142 :ffffffff8015c379: test %rdi,%rdi
:ffffffff8015c37c: je ffffffff8015c452 <kfree+0xe2>
122 0.0078 :ffffffff8015c382: pushfq
1001 0.0638 :ffffffff8015c383: popq (%rsp)
1456 0.0928 :ffffffff8015c386: cli
2489 0.1586 :ffffffff8015c387: mov $0xffffffff7fffffff,%rax <<
...
72 0.0046 :ffffffff8015c44e: pushq (%rsp)
1080 0.0688 :ffffffff8015c451: popfq
13934 0.8882 :ffffffff8015c452: add $0x8,%rsp << HERE >>
290 0.0185 :ffffffff8015c456: pop %rbx
:ffffffff8015c457: pop %rbp
124 0.0079 :ffffffff8015c458: retq
ffffffff8015c460 <kmem_cache_free>: /* kmem_cache_free total: 13084 0.8340 */
388 0.0247 :ffffffff8015c460: sub $0x18,%rsp
365 0.0233 :ffffffff8015c464: mov %rbp,0x10(%rsp)
:ffffffff8015c469: mov %rbx,0x8(%rsp)
121 0.0077 :ffffffff8015c46e: mov %rsi,%rbp
262 0.0167 :ffffffff8015c471: pushfq
549 0.0350 :ffffffff8015c472: popq (%rsp)
351 0.0224 :ffffffff8015c475: cli
2478 0.1579 :ffffffff8015c476: mov %gs:0x34,%eax
592 0.0377 :ffffffff8015c47e: cltq
:ffffffff8015c480: mov (%rdi,%rax,8),%rbx
7 4.5e-04 :ffffffff8015c484: mov (%rbx),%eax
200 0.0127 :ffffffff8015c486: cmp 0x4(%rbx),%eax
:ffffffff8015c489: jae ffffffff8015c48f
<kmem_cache_free+0x2f>
:ffffffff8015c48b: mov %eax,%eax
766 0.0488 :ffffffff8015c48d: jmp ffffffff8015c4a0
<kmem_cache_free+0x40>
:ffffffff8015c48f: mov %rbx,%rsi
71 0.0045 :ffffffff8015c492: callq ffffffff8015c810
<cache_flusharray>
:ffffffff8015c497: mov (%rbx),%eax
1 6.4e-05 :ffffffff8015c499: data16
:ffffffff8015c49a: data16
:ffffffff8015c49b: data16
:ffffffff8015c49c: nop
:ffffffff8015c49d: data16
:ffffffff8015c49e: data16
:ffffffff8015c49f: nop
:ffffffff8015c4a0: mov %rbp,0x10(%rbx,%rax,8)
20 0.0013 :ffffffff8015c4a5: incl (%rbx)
176 0.0112 :ffffffff8015c4a7: pushq (%rsp)
7 4.5e-04 :ffffffff8015c4aa: popfq
6187 0.3944 :ffffffff8015c4ab: mov 0x8(%rsp),%rbx << HERE>>
543 0.0346 :ffffffff8015c4b0: mov 0x10(%rsp),%rbp
:ffffffff8015c4b5: add $0x18,%rsp
:ffffffff8015c4b9: retq
ffffffff8015bd70 <kmem_cache_alloc>: /* kmem_cache_alloc total: 18803 1.1985 */
549 0.0350 :ffffffff8015bd70: sub $0x8,%rsp
700 0.0446 :ffffffff8015bd74: pushfq
1427 0.0910 :ffffffff8015bd75: popq (%rsp)
226 0.0144 :ffffffff8015bd78: cli
2399 0.1529 :ffffffff8015bd79: mov %gs:0x34,%eax <<HERE>>
416 0.0265 :ffffffff8015bd81: cltq
:ffffffff8015bd83: mov (%rdi,%rax,8),%rdx
21 0.0013 :ffffffff8015bd87: mov (%rdx),%eax
172 0.0110 :ffffffff8015bd89: test %eax,%eax
:ffffffff8015bd8b: je ffffffff8015bda1
<kmem_cache_alloc+0x31>
8 5.1e-04 :ffffffff8015bd8d: dec %eax
1338 0.0853 :ffffffff8015bd8f: movl $0x1,0xc(%rdx)
9 5.7e-04 :ffffffff8015bd96: mov %eax,(%rdx)
9 5.7e-04 :ffffffff8015bd98: mov %eax,%eax
1146 0.0730 :ffffffff8015bd9a: mov 0x10(%rdx,%rax,8),%rax
4 2.5e-04 :ffffffff8015bd9f: jmp ffffffff8015bda6
<kmem_cache_alloc+0x36>
:ffffffff8015bda1: callq ffffffff8015c160
<cache_alloc_refill>
154 0.0098 :ffffffff8015bda6: pushq (%rsp)
241 0.0154 :ffffffff8015bda9: popfq
9222 0.5878 :ffffffff8015bdaa: prefetchw (%rax) <<HERE>>
758 0.0483 :ffffffff8015bdad: add $0x8,%rsp
4 2.5e-04 :ffffffff8015bdb1: retq
Eric
next prev parent reply other threads:[~2005-12-21 8:02 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-16 11:30 2.6.15-rc5-rt2 slowness Gunter Ohrner
2005-12-16 11:42 ` Gunter Ohrner
2005-12-16 12:04 ` Gunter Ohrner
2005-12-16 12:34 ` Steven Rostedt
2005-12-16 12:32 ` Steven Rostedt
2005-12-16 22:58 ` john stultz
2005-12-17 0:22 ` Gunter Ohrner
2005-12-17 3:51 ` Steven Rostedt
2005-12-17 3:33 ` Steven Rostedt
2005-12-17 22:57 ` Steven Rostedt
2005-12-18 16:05 ` K.R. Foley
2005-12-20 13:32 ` Ingo Molnar
2005-12-20 13:38 ` Steven Rostedt
2005-12-20 13:57 ` Ingo Molnar
2005-12-20 14:04 ` Steven Rostedt
2005-12-20 14:33 ` Steven Rostedt
2005-12-20 15:07 ` Ingo Molnar
2005-12-20 15:16 ` Steven Rostedt
2005-12-20 15:44 ` [PATCH RT 00/02] SLOB optimizations Steven Rostedt
2005-12-20 15:56 ` Steven Rostedt
2005-12-20 15:58 ` Ingo Molnar
2005-12-20 16:13 ` Ingo Molnar
2005-12-20 16:29 ` Steven Rostedt
2005-12-20 16:39 ` Steven Rostedt
2005-12-20 18:19 ` Matt Mackall
2005-12-20 19:15 ` Steven Rostedt
2005-12-20 19:43 ` Matt Mackall
2005-12-20 20:06 ` Steven Rostedt
2005-12-20 20:15 ` Pekka Enberg
2005-12-20 21:42 ` Steven Rostedt
2005-12-20 21:52 ` Christoph Lameter
2005-12-20 22:11 ` Steven Rostedt
2005-12-21 6:36 ` Ingo Molnar
2005-12-21 12:50 ` Steven Rostedt
2005-12-21 6:56 ` Ingo Molnar
2005-12-21 7:16 ` Pekka J Enberg
2005-12-21 7:50 ` Ingo Molnar
2005-12-21 13:13 ` Steven Rostedt
2005-12-21 15:34 ` [PATCH] SLAB - have index_of bug at compile time Steven Rostedt
2005-12-21 7:20 ` [PATCH RT 00/02] SLOB optimizations Eric Dumazet
2005-12-21 7:43 ` Ingo Molnar
2005-12-21 8:02 ` Eric Dumazet [this message]
2005-12-22 18:02 ` Zwane Mwaikambo
2005-12-22 21:11 ` Ingo Molnar
2005-12-22 21:39 ` Eric Dumazet
2005-12-22 21:44 ` George Anzinger
2005-12-22 22:00 ` Eric Dumazet
2005-12-22 22:08 ` Eric Dumazet
2005-12-23 19:22 ` Zwane Mwaikambo
2005-12-21 13:02 ` Steven Rostedt
2005-12-21 2:30 ` Nick Piggin
2005-12-21 2:41 ` Steven Rostedt
2005-12-20 15:44 ` [PATCH RT 01/02] SLOB - remove bigblock list Steven Rostedt
2005-12-20 15:44 ` [PATCH RT 02/02] SLOB - break SLOB up by caches Steven Rostedt
2005-12-20 14:07 ` 2.6.15-rc5-rt2 slowness Steven Rostedt
2005-12-20 15:26 ` K.R. Foley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43A90C07.4000003@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=G.Ohrner@post.rwth-aachen.de \
--cc=akpm@osdl.org \
--cc=alokk@calsoftinc.com \
--cc=christoph@lameter.com \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mpm@selenic.com \
--cc=penberg@cs.helsinki.fi \
--cc=rostedt@goodmis.org \
--cc=shai@scalex86.org \
--cc=shobhit@calsoftinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.