All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Christoph Lameter <christoph@lameter.com>,
	Alok N Kataria <alokk@calsoftinc.com>,
	Shobhit Dayal <shobhit@calsoftinc.com>,
	Shai Fultheim <shai@scalex86.org>, Matt Mackall <mpm@selenic.com>,
	Andrew Morton <akpm@osdl.org>, john stultz <johnstul@us.ibm.com>,
	Gunter Ohrner <G.Ohrner@post.rwth-aachen.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RT 00/02] SLOB optimizations
Date: Wed, 21 Dec 2005 09:02:15 +0100	[thread overview]
Message-ID: <43A90C07.4000003@cosmosbay.com> (raw)
In-Reply-To: <20051221074346.GA2398@elte.hu>

Ingo Molnar a écrit :
> * Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
> 
>>>while it could possibly be cleaned up a bit, it's one of the 
>>>best-optimized subsystems Linux has. Most of the "unnecessary 
>>>complexity" in SLAB is related to a performance or a debugging feature.  
>>>Many times i have looked at the SLAB code in a disassembler, right next 
>>>to profile output from some hot workload, and have concluded: 'I couldnt 
>>>do this any better even with hand-coded assembly'.
>>
>>Well, I miss a version of kmem_cache_alloc()/kmem_cache_free() that 
>>wont play with IRQ masking.
> 
> 
> sure, but adding this sure wont reduce complexity ;)
> 
> 
>>The local_irq_save()/local_irq_restore() pair is quite expensive and 
>>could be avoided for several caches that are exclusively used in 
>>process context.
> 
> 
> in any case, on sane platforms (i386, x86_64) an irq-disable is 
> well-optimized in hardware, and is just as fast as a preempt_disable().
> 

I'm afraid its not the case on current hardware.

The irq enable/disable pair count for more than 50% the cpu time spent in 
kmem_cache_alloc()/kmem_cache_free()/kfree()

oprofile results on a dual Opteron 246 :

You can see the high profile numbers right after cli and popf(sti) 
instructions, popf being VERY expensive.

CPU: Hammer, speed 1993.39 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
mask of 0x00 (No unit mask) count 50000

29993     1.9317  kfree
18654     1.2014  kmem_cache_alloc
12962     0.8348  kmem_cache_free

ffffffff8015c370 <kfree>: /* kfree total:  30334  1.9335 */
    770  0.0491 :ffffffff8015c370:       push   %rbp
   2477  0.1579 :ffffffff8015c371:       mov    %rdi,%rbp
                :ffffffff8015c374:       push   %rbx
                :ffffffff8015c375:       sub    $0x8,%rsp
   1792  0.1142 :ffffffff8015c379:       test   %rdi,%rdi
                :ffffffff8015c37c:       je     ffffffff8015c452 <kfree+0xe2>
    122  0.0078 :ffffffff8015c382:       pushfq
   1001  0.0638 :ffffffff8015c383:       popq   (%rsp)
   1456  0.0928 :ffffffff8015c386:       cli
   2489  0.1586 :ffffffff8015c387:       mov    $0xffffffff7fffffff,%rax    <<

...
     72  0.0046 :ffffffff8015c44e:       pushq  (%rsp)
   1080  0.0688 :ffffffff8015c451:       popfq
  13934  0.8882 :ffffffff8015c452:       add    $0x8,%rsp      << HERE >>
    290  0.0185 :ffffffff8015c456:       pop    %rbx
                :ffffffff8015c457:       pop    %rbp
    124  0.0079 :ffffffff8015c458:       retq


ffffffff8015c460 <kmem_cache_free>: /* kmem_cache_free total:  13084  0.8340 */
    388  0.0247 :ffffffff8015c460:       sub    $0x18,%rsp
    365  0.0233 :ffffffff8015c464:       mov    %rbp,0x10(%rsp)
                :ffffffff8015c469:       mov    %rbx,0x8(%rsp)
    121  0.0077 :ffffffff8015c46e:       mov    %rsi,%rbp
    262  0.0167 :ffffffff8015c471:       pushfq
    549  0.0350 :ffffffff8015c472:       popq   (%rsp)
    351  0.0224 :ffffffff8015c475:       cli
   2478  0.1579 :ffffffff8015c476:       mov    %gs:0x34,%eax
    592  0.0377 :ffffffff8015c47e:       cltq
                :ffffffff8015c480:       mov    (%rdi,%rax,8),%rbx
      7 4.5e-04 :ffffffff8015c484:       mov    (%rbx),%eax
    200  0.0127 :ffffffff8015c486:       cmp    0x4(%rbx),%eax
                :ffffffff8015c489:       jae    ffffffff8015c48f 
<kmem_cache_free+0x2f>
                :ffffffff8015c48b:       mov    %eax,%eax
    766  0.0488 :ffffffff8015c48d:       jmp    ffffffff8015c4a0 
<kmem_cache_free+0x40>
                :ffffffff8015c48f:       mov    %rbx,%rsi
     71  0.0045 :ffffffff8015c492:       callq  ffffffff8015c810 
<cache_flusharray>
                :ffffffff8015c497:       mov    (%rbx),%eax
      1 6.4e-05 :ffffffff8015c499:       data16
                :ffffffff8015c49a:       data16
                :ffffffff8015c49b:       data16
                :ffffffff8015c49c:       nop
                :ffffffff8015c49d:       data16
                :ffffffff8015c49e:       data16
                :ffffffff8015c49f:       nop
                :ffffffff8015c4a0:       mov    %rbp,0x10(%rbx,%rax,8)
     20  0.0013 :ffffffff8015c4a5:       incl   (%rbx)
    176  0.0112 :ffffffff8015c4a7:       pushq  (%rsp)
      7 4.5e-04 :ffffffff8015c4aa:       popfq
   6187  0.3944 :ffffffff8015c4ab:       mov    0x8(%rsp),%rbx << HERE>>
    543  0.0346 :ffffffff8015c4b0:       mov    0x10(%rsp),%rbp
                :ffffffff8015c4b5:       add    $0x18,%rsp
                :ffffffff8015c4b9:       retq


ffffffff8015bd70 <kmem_cache_alloc>: /* kmem_cache_alloc total:  18803  1.1985 */
    549  0.0350 :ffffffff8015bd70:       sub    $0x8,%rsp
    700  0.0446 :ffffffff8015bd74:       pushfq
   1427  0.0910 :ffffffff8015bd75:       popq   (%rsp)
    226  0.0144 :ffffffff8015bd78:       cli
   2399  0.1529 :ffffffff8015bd79:       mov    %gs:0x34,%eax  <<HERE>>
    416  0.0265 :ffffffff8015bd81:       cltq
                :ffffffff8015bd83:       mov    (%rdi,%rax,8),%rdx
     21  0.0013 :ffffffff8015bd87:       mov    (%rdx),%eax
    172  0.0110 :ffffffff8015bd89:       test   %eax,%eax
                :ffffffff8015bd8b:       je     ffffffff8015bda1 
<kmem_cache_alloc+0x31>
      8 5.1e-04 :ffffffff8015bd8d:       dec    %eax
   1338  0.0853 :ffffffff8015bd8f:       movl   $0x1,0xc(%rdx)
      9 5.7e-04 :ffffffff8015bd96:       mov    %eax,(%rdx)
      9 5.7e-04 :ffffffff8015bd98:       mov    %eax,%eax
   1146  0.0730 :ffffffff8015bd9a:       mov    0x10(%rdx,%rax,8),%rax
      4 2.5e-04 :ffffffff8015bd9f:       jmp    ffffffff8015bda6 
<kmem_cache_alloc+0x36>
                :ffffffff8015bda1:       callq  ffffffff8015c160 
<cache_alloc_refill>
    154  0.0098 :ffffffff8015bda6:       pushq  (%rsp)
    241  0.0154 :ffffffff8015bda9:       popfq
   9222  0.5878 :ffffffff8015bdaa:       prefetchw (%rax) <<HERE>>
    758  0.0483 :ffffffff8015bdad:       add    $0x8,%rsp
      4 2.5e-04 :ffffffff8015bdb1:       retq

Eric

  reply	other threads:[~2005-12-21  8:02 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-16 11:30 2.6.15-rc5-rt2 slowness Gunter Ohrner
2005-12-16 11:42 ` Gunter Ohrner
2005-12-16 12:04   ` Gunter Ohrner
2005-12-16 12:34   ` Steven Rostedt
2005-12-16 12:32 ` Steven Rostedt
2005-12-16 22:58   ` john stultz
2005-12-17  0:22     ` Gunter Ohrner
2005-12-17  3:51     ` Steven Rostedt
2005-12-17  3:33 ` Steven Rostedt
2005-12-17 22:57   ` Steven Rostedt
2005-12-18 16:05     ` K.R. Foley
2005-12-20 13:32     ` Ingo Molnar
2005-12-20 13:38       ` Steven Rostedt
2005-12-20 13:57         ` Ingo Molnar
2005-12-20 14:04           ` Steven Rostedt
2005-12-20 14:33             ` Steven Rostedt
2005-12-20 15:07               ` Ingo Molnar
2005-12-20 15:16                 ` Steven Rostedt
2005-12-20 15:44             ` [PATCH RT 00/02] SLOB optimizations Steven Rostedt
2005-12-20 15:56               ` Steven Rostedt
2005-12-20 15:58                 ` Ingo Molnar
2005-12-20 16:13               ` Ingo Molnar
2005-12-20 16:29                 ` Steven Rostedt
2005-12-20 16:39                   ` Steven Rostedt
2005-12-20 18:19               ` Matt Mackall
2005-12-20 19:15                 ` Steven Rostedt
2005-12-20 19:43                   ` Matt Mackall
2005-12-20 20:06                     ` Steven Rostedt
2005-12-20 20:15                   ` Pekka Enberg
2005-12-20 21:42                     ` Steven Rostedt
2005-12-20 21:52                       ` Christoph Lameter
2005-12-20 22:11                         ` Steven Rostedt
2005-12-21  6:36                           ` Ingo Molnar
2005-12-21 12:50                             ` Steven Rostedt
2005-12-21  6:56                       ` Ingo Molnar
2005-12-21  7:16                         ` Pekka J Enberg
2005-12-21  7:50                           ` Ingo Molnar
2005-12-21 13:13                           ` Steven Rostedt
2005-12-21 15:34                             ` [PATCH] SLAB - have index_of bug at compile time Steven Rostedt
2005-12-21  7:20                         ` [PATCH RT 00/02] SLOB optimizations Eric Dumazet
2005-12-21  7:43                           ` Ingo Molnar
2005-12-21  8:02                             ` Eric Dumazet [this message]
2005-12-22 18:02                               ` Zwane Mwaikambo
2005-12-22 21:11                               ` Ingo Molnar
2005-12-22 21:39                                 ` Eric Dumazet
2005-12-22 21:44                                 ` George Anzinger
2005-12-22 22:00                                   ` Eric Dumazet
2005-12-22 22:08                                 ` Eric Dumazet
2005-12-23 19:22                                   ` Zwane Mwaikambo
2005-12-21 13:02                         ` Steven Rostedt
2005-12-21  2:30                   ` Nick Piggin
2005-12-21  2:41                     ` Steven Rostedt
2005-12-20 15:44             ` [PATCH RT 01/02] SLOB - remove bigblock list Steven Rostedt
2005-12-20 15:44             ` [PATCH RT 02/02] SLOB - break SLOB up by caches Steven Rostedt
2005-12-20 14:07           ` 2.6.15-rc5-rt2 slowness Steven Rostedt
2005-12-20 15:26           ` K.R. Foley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43A90C07.4000003@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=G.Ohrner@post.rwth-aachen.de \
    --cc=akpm@osdl.org \
    --cc=alokk@calsoftinc.com \
    --cc=christoph@lameter.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpm@selenic.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=rostedt@goodmis.org \
    --cc=shai@scalex86.org \
    --cc=shobhit@calsoftinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.