From: Eric Dumazet <dada1@cosmosbay.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [rfc][patch] SLQB slab allocator
Date: Fri, 12 Dec 2008 06:38:26 +0100 [thread overview]
Message-ID: <4941F8D2.4060807@cosmosbay.com> (raw)
In-Reply-To: <20081212002518.GH8294@wotan.suse.de>
Nick Piggin a écrit :
> (Re)introducing SLQB allocator. Q for queued, but in reality, SLAB and
> SLUB also have queues of things as well, so "Q" is just a meaningless
> differentiator :)
>
> I've kept working on SLQB slab allocator because I don't agree with the
> design choices in SLUB, and I'm worried about the push to make it the
> one true allocator.
>
> My primary goal in SLQB is performance, secondarily are order-0 page
> allocations, and memory consumption.
>
> I have worked with the Linux guys at Intel to ensure that SLQB is comparable
> to SLAB in their OLTP performance benchmark. Recently that goal has been
> reached -- so SLQB performs comparably well to SLAB on that test (it's
> within the noise).
>
> I've also been comparing SLQB with SLAB and SLUB in other benchmarks, and
> trying to ensure it is as good or better. I don't know if that's always
> the case, but nothing obvious has gone wrong (it's sometimes hard to find
> meaningful benchmarks that exercise slab in interesting ways).
>
> Now it isn't exactly complete -- debugging, tracking, stats, etc. code is
> not always in the best shape, however I have been focusing on performance
> of the core allocator. No matter how good the rest is if the core code is
> poor... But it boots, works, is pretty stable.
>
> SLQB, like SLUB and unlike SLAB, doesn't have greater than linear memory
> consumption growth with the number of CPUs or nodes.
>
> SLQB tries to be very page-size agnostic. And it tries very hard to use
> order-0 pages. This is good for both page allocator fragmentation, and
> slab fragmentation. I don't like that SLUB performs significantly worse
> with order-0 pages in some workloads.
>
> SLQB goes to some lengths to optimise remote-freeing cases (allocate on
> one CPU, free on another). It seems to work well, but there are a *lot*
> of possible ways this can be implemented especially when NUMA comes into
> play, so I'd like to know of workloads where remote freeing happens a
> lot, and perhaps look at alternative ways to do it.
>
> SLQB initialistaion code attempts to be as simple and un-clever as possible.
> There are no multiple phases where different things come up. There is no
> weird self bootstrapping stuff. It just statically allocates the structures
> required to create the slabs that allocate other slab structures.
>
> I'm going to continue working on this as I get time, and I plan to soon ask
> to have it merged. It would be great if people could comment or test it.
>
It seems really good, but will need some hours to review :)
Minor nit : You spelled Qeued instead of Queued in init/Kconfig
+config SLQB
+ bool "SLQB (Qeued allocator)"
One of the problem I see with SLAB & SLUB is the irq masking stuff.
Some (many ???) kmem_cache are only used in process context, I see no point of
disabling irqs for them.
I tested your patch on my 8 ways HP BL460c G1, on top
on my last patch serie. (linux-2.6, not net-next-2.6)
# time ./socketallocbench
real 0m1.300s
user 0m0.078s
sys 0m1.207s
# time ./socketallocbench -n 8
real 0m1.686s
user 0m0.614s
sys 0m12.737s
So no bad effect (same than SLUB).
For the record, SLAB is really really bad for this workload
PU: Core 2, speed 3000.1 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100
000
samples cum. samples % cum. % symbol name
136537 136537 10.8300 10.8300 kmem_cache_alloc
129380 265917 10.2623 21.0924 tcp_close
79696 345613 6.3214 27.4138 tcp_v4_init_sock
73873 419486 5.8596 33.2733 tcp_v4_destroy_sock
63436 482922 5.0317 38.3050 sysenter_past_esp
62140 545062 4.9289 43.2339 inet_csk_destroy_sock
56565 601627 4.4867 47.7206 kmem_cache_free
40430 642057 3.2069 50.9275 __percpu_counter_add
35742 677799 2.8350 53.7626 init_timer
35611 713410 2.8246 56.5872 copy_from_user
21616 735026 1.7146 58.3018 d_alloc
20821 755847 1.6515 59.9533 alloc_inode
19645 775492 1.5582 61.5115 alloc_fd
18935 794427 1.5019 63.0134 __fput
18922 813349 1.5009 64.5143 inet_create
18919 832268 1.5006 66.0149 sys_close
16074 848342 1.2750 67.2899 release_sock
15337 863679 1.2165 68.5064 lock_sock_nested
15172 878851 1.2034 69.7099 sock_init_data
14196 893047 1.1260 70.8359 fd_install
13677 906724 1.0849 71.9207 drop_file_write_access
13195 919919 1.0466 72.9673 dput
12768 932687 1.0127 73.9801 inotify_d_instantiate
11404 944091 0.9046 74.8846 init_waitqueue_head
11228 955319 0.8906 75.7752 sysenter_do_call
11213 966532 0.8894 76.6647 local_bh_enable_ip
10948 977480 0.8684 77.5330 __sock_create
10912 988392 0.8655 78.3986 local_bh_enable
10665 999057 0.8459 79.2445 __new_inode
10579 1009636 0.8391 80.0836 inet_release
9665 1019301 0.7666 80.8503 iput_single
9545 1028846 0.7571 81.6074 fput
7950 1036796 0.6306 82.2379 sock_release
7236 1044032 0.5740 82.8119 local_bh_disable
We can see most of the time is taken by the memset() to clear object,
then irq masking stuff...
c0281e10 <kmem_cache_alloc>: /* kmem_cache_alloc total: 140659 10.8277 */
2414 0.1858 :c0281e10: push %ebp
7 5.4e-04 :c0281e11: mov %esp,%ebp
:c0281e13: push %edi
1454 0.1119 :c0281e14: push %esi
310 0.0239 :c0281e15: mov %eax,%esi
:c0281e17: push %ebx
368 0.0283 :c0281e18: sub $0x10,%esp
949 0.0731 :c0281e1b: mov %edx,-0x18(%ebp)
383 0.0295 :c0281e1e: mov 0x4(%ebp),%eax
1189 0.0915 :c0281e21: mov %eax,-0x14(%ebp)
1240 0.0955 :c0281e24: jmp c0281e6e <kmem_cache_alloc+0x5e>
:c0281e26: lea 0x0(%esi),%esi
:c0281e29: lea 0x0(%edi,%eiz,1),%edi
1188 0.0915 :c0281e30: mov 0x10(%esi),%eax
:c0281e33: mov (%edx,%eax,1),%eax
1483 0.1142 :c0281e36: decl (%ebx)
898 0.0691 :c0281e38: mov %eax,0x4(%ebx)
586 0.0451 :c0281e3b: mov %edx,-0x1c(%ebp)
1 7.7e-05 :c0281e3e: pushl -0x10(%ebp)
1226 0.0944 :c0281e41: popf
26385 2.0311 :c0281e42: testl $0x210d00,(%esi)
1188 0.0915 :c0281e48: je c0281ef8 <kmem_cache_alloc+0xe8>
:c0281e4e: mov -0x1c(%ebp),%eax
:c0281e51: test %eax,%eax
:c0281e53: je c0281ef8 <kmem_cache_alloc+0xe8>
:c0281e59: mov -0x14(%ebp),%ecx
:c0281e5c: mov -0x1c(%ebp),%edx
:c0281e5f: mov %esi,%eax
:c0281e61: call c0280d60 <alloc_debug_processing>
:c0281e66: test %eax,%eax
:c0281e68: jne c0281ef8 <kmem_cache_alloc+0xe8>
1205 0.0928 :c0281e6e: pushf
4888 0.3763 :c0281e6f: popl -0x10(%ebp)
319 0.0246 :c0281e72: cli
5975 0.4599 :c0281e73: nop
:c0281e74: lea 0x0(%esi,%eiz,1),%esi
:c0281e78: mov %fs:0xc068d004,%eax
1166 0.0898 :c0281e7e: mov 0x38(%esi,%eax,4),%ebx
26 0.0020 :c0281e82: mov 0x4(%ebx),%edx
662 0.0510 :c0281e85: test %edx,%edx
:c0281e87: jne c0281e30 <kmem_cache_alloc+0x20>
:c0281e89: mov 0xc(%ebx),%eax
:c0281e8c: test %eax,%eax
:c0281e8e: jne c0281ec8 <kmem_cache_alloc+0xb8>
:c0281e90: mov %ebx,%edx
:c0281e92: mov %esi,%eax
:c0281e94: call c0280010 <__cache_list_get_page>
:c0281e99: mov %eax,%edx
:c0281e9b: test %eax,%eax
:c0281e9d: jne c0281f31 <kmem_cache_alloc+0x121>
:c0281ea3: mov $0xffffffff,%ecx
1 7.7e-05 :c0281ea8: mov -0x18(%ebp),%edx
:c0281eab: mov %esi,%eax
:c0281ead: call c02815a0 <__slab_alloc_page>
:c0281eb2: test %eax,%eax
:c0281eb4: jne c0281e78 <kmem_cache_alloc+0x68>
:c0281eb6: movl $0x0,-0x1c(%ebp)
:c0281ebd: jmp c0281e3e <kmem_cache_alloc+0x2e>
:c0281ec2: lea 0x0(%esi),%esi
:c0281ec8: mov %esi,%eax
:c0281eca: mov %ebx,%edx
:c0281ecc: call c0280240 <claim_remote_free_list>
:c0281ed1: mov 0x4(%esi),%eax
:c0281ed4: shl $0x2,%eax
:c0281ed7: cmp %eax,(%ebx)
:c0281ed9: ja c0281f48 <kmem_cache_alloc+0x138>
:c0281edb: mov 0x4(%ebx),%edx
:c0281ede: mov %edx,-0x1c(%ebp)
:c0281ee1: test %edx,%edx
:c0281ee3: je c0281e90 <kmem_cache_alloc+0x80>
:c0281ee5: mov 0x10(%esi),%eax
:c0281ee8: mov (%edx,%eax,1),%eax
:c0281eeb: decl (%ebx)
:c0281eed: mov %eax,0x4(%ebx)
:c0281ef0: jmp c0281e3e <kmem_cache_alloc+0x2e>
:c0281ef5: lea 0x0(%esi),%esi
1261 0.0971 :c0281ef8: cmpw $0x0,-0x18(%ebp)
957 0.0737 :c0281efd: jns c0281f26 <kmem_cache_alloc+0x116>
627 0.0483 :c0281eff: mov -0x1c(%ebp),%eax
:c0281f02: test %eax,%eax
:c0281f04: je c0281f26 <kmem_cache_alloc+0x116>
82 0.0063 :c0281f06: mov 0xc(%esi),%esi
2 1.5e-04 :c0281f09: mov -0x1c(%ebp),%ebx
527 0.0406 :c0281f0c: mov %esi,%ecx
:c0281f0e: mov %ebx,%edi
86 0.0066 :c0281f10: shr $0x2,%ecx
602 0.0463 :c0281f13: xor %eax,%eax
1 7.7e-05 :c0281f15: mov %esi,%edx
74845 5.7614 :c0281f17: rep stos %eax,%es:(%edi)
1170 0.0901 :c0281f19: test $0x2,%dl
2 1.5e-04 :c0281f1c: je c0281f20 <kmem_cache_alloc+0x110>
:c0281f1e: stos %ax,%es:(%edi)
600 0.0462 :c0281f20: test $0x1,%dl
:c0281f23: je c0281f26 <kmem_cache_alloc+0x116>
:c0281f25: stos %al,%es:(%edi)
1171 0.0901 :c0281f26: mov -0x1c(%ebp),%eax
199 0.0153 :c0281f29: add $0x10,%esp
:c0281f2c: pop %ebx
2 1.5e-04 :c0281f2d: pop %esi
1215 0.0935 :c0281f2e: pop %edi
548 0.0422 :c0281f2f: leave
1251 0.0963 :c0281f30: ret
:c0281f31: mov 0x10(%edx),%ecx
:c0281f34: mov %ecx,-0x1c(%ebp)
:c0281f37: mov 0x10(%esi),%eax
:c0281f3a: mov (%ecx,%eax,1),%eax
:c0281f3d: mov %eax,0x10(%edx)
:c0281f40: jmp c0281e3e <kmem_cache_alloc+0x2e>
:c0281f45: lea 0x0(%esi),%esi
:c0281f48: mov %ebx,%edx
:c0281f4a: mov %esi,%eax
:c0281f4c: call c02811d0 <flush_free_list>
:c0281f51: jmp c0281edb <kmem_cache_alloc+0xcb>
:c0281f53: lea 0x0(%esi),%esi
next prev parent reply other threads:[~2008-12-12 5:38 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-12 0:25 [rfc][patch] SLQB slab allocator Nick Piggin
2008-12-12 0:31 ` [rfc][patch] mm: kfree_size Nick Piggin
2008-12-13 2:36 ` Christoph Lameter
2008-12-12 5:38 ` Eric Dumazet [this message]
2008-12-12 5:50 ` [rfc][patch] SLQB slab allocator Nick Piggin
2008-12-12 7:07 ` Eric Dumazet
2008-12-12 7:23 ` Nick Piggin
2008-12-12 8:05 ` Eric Dumazet
2008-12-12 9:43 ` Nick Piggin
2008-12-13 2:34 ` Christoph Lameter
2008-12-13 9:03 ` Pekka Enberg
2008-12-15 1:51 ` Christoph Lameter
2008-12-15 1:51 ` Christoph Lameter
2008-12-14 23:04 ` Nick Piggin
2008-12-14 23:04 ` Nick Piggin
2008-12-15 14:02 ` Christoph Lameter
2008-12-15 14:02 ` Christoph Lameter
2008-12-15 14:16 ` Nick Piggin
2008-12-15 14:16 ` Nick Piggin
2008-12-15 15:03 ` Christoph Lameter
2008-12-15 15:03 ` Christoph Lameter
2008-12-15 23:42 ` MinChan Kim
2008-12-15 23:42 ` MinChan Kim
2008-12-17 6:42 ` Nick Piggin
2008-12-17 6:42 ` Nick Piggin
2008-12-17 7:01 ` MinChan Kim
2008-12-17 7:01 ` MinChan Kim
2008-12-17 7:09 ` Nick Piggin
2008-12-17 7:09 ` Nick Piggin
2008-12-19 7:48 ` Zhang, Yanmin
2008-12-19 7:48 ` Zhang, Yanmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4941F8D2.4060807@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.