From: Eric Dumazet <dada1@cosmosbay.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [rfc][patch] SLQB slab allocator
Date: Fri, 12 Dec 2008 08:07:23 +0100 [thread overview]
Message-ID: <49420DAB.7090604@cosmosbay.com> (raw)
In-Reply-To: <20081212055051.GE15804@wotan.suse.de>
Nick Piggin a écrit :
> On Fri, Dec 12, 2008 at 06:38:26AM +0100, Eric Dumazet wrote:
>> Nick Piggin a écrit :
>>> I'm going to continue working on this as I get time, and I plan to soon ask
>>> to have it merged. It would be great if people could comment or test it.
>>>
>> It seems really good, but will need some hours to review :)
>>
>> Minor nit : You spelled Qeued instead of Queued in init/Kconfig
>>
>> +config SLQB
>> + bool "SLQB (Qeued allocator)"
>
> OK, thanks.
>
>
>> One of the problem I see with SLAB & SLUB is the irq masking stuff.
>> Some (many ???) kmem_cache are only used in process context, I see no point of
>> disabling irqs for them.
>
> That's a very good point actually, and something I want to look at...
>
> I'm thinking it will make most sense to provide a
> kmem_cache_alloc/free_irqsafe for callers who either don't do any
> interrupt context allocations, or already have irqs off (another
> slab flag will just add another branch in the fastpaths).
>
> And then also a kmalloc/kfree_irqsoff for code which already has
> irqs off.
>
> That's something which benefit all slab allocators roughly equally,
> so at the moment I'm concentrating on the core code. But it's a very
> good idea.
>
>
>> I tested your patch on my 8 ways HP BL460c G1, on top
>> on my last patch serie. (linux-2.6, not net-next-2.6)
>>
>> # time ./socketallocbench
>>
>> real 0m1.300s
>> user 0m0.078s
>> sys 0m1.207s
>> # time ./socketallocbench -n 8
>>
>> real 0m1.686s
>> user 0m0.614s
>> sys 0m12.737s
>>
>> So no bad effect (same than SLUB).
>
> Cool, thanks.
>
>
>> For the record, SLAB is really really bad for this workload
>>
>> PU: Core 2, speed 3000.1 MHz (estimated)
>> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100
>> 000
>> samples cum. samples % cum. % symbol name
>> 136537 136537 10.8300 10.8300 kmem_cache_alloc
>> 129380 265917 10.2623 21.0924 tcp_close
>> 79696 345613 6.3214 27.4138 tcp_v4_init_sock
>> 73873 419486 5.8596 33.2733 tcp_v4_destroy_sock
>> 63436 482922 5.0317 38.3050 sysenter_past_esp
>> 62140 545062 4.9289 43.2339 inet_csk_destroy_sock
>> 56565 601627 4.4867 47.7206 kmem_cache_free
>> 40430 642057 3.2069 50.9275 __percpu_counter_add
>> 35742 677799 2.8350 53.7626 init_timer
>> 35611 713410 2.8246 56.5872 copy_from_user
>> 21616 735026 1.7146 58.3018 d_alloc
>> 20821 755847 1.6515 59.9533 alloc_inode
>> 19645 775492 1.5582 61.5115 alloc_fd
>> 18935 794427 1.5019 63.0134 __fput
>> 18922 813349 1.5009 64.5143 inet_create
>> 18919 832268 1.5006 66.0149 sys_close
>> 16074 848342 1.2750 67.2899 release_sock
>> 15337 863679 1.2165 68.5064 lock_sock_nested
>> 15172 878851 1.2034 69.7099 sock_init_data
>> 14196 893047 1.1260 70.8359 fd_install
>> 13677 906724 1.0849 71.9207 drop_file_write_access
>> 13195 919919 1.0466 72.9673 dput
>> 12768 932687 1.0127 73.9801 inotify_d_instantiate
>> 11404 944091 0.9046 74.8846 init_waitqueue_head
>> 11228 955319 0.8906 75.7752 sysenter_do_call
>> 11213 966532 0.8894 76.6647 local_bh_enable_ip
>> 10948 977480 0.8684 77.5330 __sock_create
>> 10912 988392 0.8655 78.3986 local_bh_enable
>> 10665 999057 0.8459 79.2445 __new_inode
>> 10579 1009636 0.8391 80.0836 inet_release
>> 9665 1019301 0.7666 80.8503 iput_single
>> 9545 1028846 0.7571 81.6074 fput
>> 7950 1036796 0.6306 82.2379 sock_release
>> 7236 1044032 0.5740 82.8119 local_bh_disable
>>
>>
>> We can see most of the time is taken by the memset() to clear object,
>> then irq masking stuff...
>
> Yep, it's difficult to make the local alloc/free fastpath much more
> optimal as-is.
>
> Is SLAB still bad at the test with the slab-rcu patch in place?
> SLAB has a pretty optimal fastpath as well, although if its queues
> start overflowing, it can run into contention quite easily.
Yes, I forgot I applied Christoph patch (SLAB_DESTROY_BY_RCU for struct file)
in the meantime, silly me, this was with the v2 of my serie, with only 5 patches.
With SLAB, results are quite good !
# time ./socketallocbench
real 0m1.201s
user 0m0.071s
sys 0m1.122s
# time ./socketallocbench -n8
real 0m1.616s
user 0m0.578s
sys 0m12.220s
>
>> c0281e10 <kmem_cache_alloc>: /* kmem_cache_alloc total: 140659 10.8277 */
>
> I guess you're compiling with -Os? I find gcc can pack the fastpath
> much better with -O2, and actually decrease the effective icache
> footprint size even if the total text size increases...
No, I dont use -Os, unless something got wrong
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
# CONFIG_OPTIMIZE_INLINING is not set
next prev parent reply other threads:[~2008-12-12 7:07 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-12 0:25 [rfc][patch] SLQB slab allocator Nick Piggin
2008-12-12 0:31 ` [rfc][patch] mm: kfree_size Nick Piggin
2008-12-13 2:36 ` Christoph Lameter
2008-12-12 5:38 ` [rfc][patch] SLQB slab allocator Eric Dumazet
2008-12-12 5:50 ` Nick Piggin
2008-12-12 7:07 ` Eric Dumazet [this message]
2008-12-12 7:23 ` Nick Piggin
2008-12-12 8:05 ` Eric Dumazet
2008-12-12 9:43 ` Nick Piggin
2008-12-13 2:34 ` Christoph Lameter
2008-12-13 9:03 ` Pekka Enberg
2008-12-15 1:51 ` Christoph Lameter
2008-12-15 1:51 ` Christoph Lameter
2008-12-14 23:04 ` Nick Piggin
2008-12-14 23:04 ` Nick Piggin
2008-12-15 14:02 ` Christoph Lameter
2008-12-15 14:02 ` Christoph Lameter
2008-12-15 14:16 ` Nick Piggin
2008-12-15 14:16 ` Nick Piggin
2008-12-15 15:03 ` Christoph Lameter
2008-12-15 15:03 ` Christoph Lameter
2008-12-15 23:42 ` MinChan Kim
2008-12-15 23:42 ` MinChan Kim
2008-12-17 6:42 ` Nick Piggin
2008-12-17 6:42 ` Nick Piggin
2008-12-17 7:01 ` MinChan Kim
2008-12-17 7:01 ` MinChan Kim
2008-12-17 7:09 ` Nick Piggin
2008-12-17 7:09 ` Nick Piggin
2008-12-19 7:48 ` Zhang, Yanmin
2008-12-19 7:48 ` Zhang, Yanmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49420DAB.7090604@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.