public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3
@ 2011-08-01 16:28 Christoph Lameter
  2011-08-01 16:28 ` [slub p3 1/7] slub: free slabs without holding locks (V2) Christoph Lameter
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Christoph Lameter @ 2011-08-01 16:28 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, Andi Kleen, tj, Metathronius Galabant,
	Matt Mackall, Eric Dumazet, Adrian Drzewiecki, linux-kernel

V2->V3 : Work on the todo list. Still some work to be done to reduce
         code impact and make this all cleaner. (Pekka: patch 1-3
         are cleanup patches of general usefulness. You got #1 already
         2+3 could be picked up w/o any issue).

The following patchset introduces per cpu partial lists which allow
a performance increase of around ~15% if there is contention for the
nodelock (can be tested using hackbench).

These lists help to avoid per node lock overhead. Allocator latency
could be further reduced by making these operations work without
disabling interrupts (like the fastpath and the free slowpath) as well as
implementing better ways of handling ther cpu array with partial pages.

I am still not satisfied with the cleanliness of the code after these
changes. Some review with suggestions as to how to restructure the
code given these changes in operations would be appreciated.

It is interesting to note that BSD has gone to a scheme with partial
pages only per cpu (source: Adrian). Transfer of cpu ownerships is
done using IPIs. Probably too much overhead for our taste. The use
of a few per cpu partial pages looks to be beneficial though.

Note that there is no performance gain when there is no contention.

Performance:

				Before		After
./hackbench 100 process 200000
				Time: 2299.072	1742.454
./hackbench 100 process 20000
				Time: 224.654	182.393
./hackbench 100 process 20000
				Time: 227.126	182.780
./hackbench 100 process 20000
				Time: 219.608	182.899
./hackbench 10 process 20000
				Time: 21.769	18.756
./hackbench 10 process 20000
				Time: 21.657	18.938
./hackbench 10 process 20000
				Time: 23.193	19.537
./hackbench 1 process 20000
				Time: 2.337	2.263
./hackbench 1 process 20000
				Time: 2.223	2.271
./hackbench 1 process 20000
				Time: 2.269	2.301



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-08-02 17:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-01 16:28 [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Christoph Lameter
2011-08-01 16:28 ` [slub p3 1/7] slub: free slabs without holding locks (V2) Christoph Lameter
2011-08-01 16:28 ` [slub p3 2/7] slub: Remove useless statements in __slab_alloc Christoph Lameter
2011-08-01 16:28 ` [slub p3 3/7] slub: Prepare inuse field in new_slab() Christoph Lameter
2011-08-01 16:28 ` [slub p3 4/7] slub: pass kmem_cache_cpu pointer to get_partial() Christoph Lameter
2011-08-01 16:28 ` [slub p3 5/7] slub: return object pointer from get_partial() / new_slab() Christoph Lameter
2011-08-01 16:28 ` [slub p3 6/7] slub: per cpu cache for partial pages Christoph Lameter
2011-08-02 17:24   ` Christoph Lameter
2011-08-01 16:28 ` [slub p3 7/7] slub: update slabinfo tools to report per cpu partial list statistics Christoph Lameter
2011-08-02  4:15 ` [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 David Rientjes
2011-08-02 14:10   ` Christoph Lameter
2011-08-02 16:37     ` David Rientjes
2011-08-02 16:47       ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox