linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* New slab allocator SLUB
@ 2007-05-08 19:10 Christoph Lameter
  2007-05-09  1:04 ` Paul Mundt
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-05-08 19:10 UTC (permalink / raw)
  To: linux-arch; +Cc: akpm

The new slab allocator SLUB was merged and it seems that we are heading 
towards replacing SLAB completely by SLUB. This means that we would like 
to be sure that SLUB runs reliably on all platforms. SLUB is first
available upstream with 2.6.21-git9.

One issue is that SLUB requires the use of the entire page struct for the 
management of its objects. If arch code uses the page struct too the 
disaster strikes. As a result SLUB has been disabled for several platforms 
by setting ARCH_USES_SLAB_PAGE_STRUCT in the Kconfig. These are

i386:
 Uses slab for pgd handling and modifies the page structs of those
 Fix has been in Andrew's tree for awhile now.

PowerPC:
 Uses slab allocator for pte allocation / freeing. The page structs of 
 ptes also are used for splitting the page table lock in large cpu 
 configurations (well more than 4) causing issues. There is a patch
 by Hugh Dickins to address the issues but it seems that the arch 
 maintainers have now decided on a different course of action.

FRV
 Like i386. Also uses slab for pgd handling and modifies page structs.
 Fix sent to David Howell's. Hopefully we get can this working soon.

I would appreciate if you could test SLUB on your platform and make sure 
that everything works the right way. There is a slabinfo tool that allows 
monitoring of SLUB slabs in Documentation/vm/slabinfo.c. If there are 
problems then please boot specifying "slub_debug" which should give you a 
detailed analysis of the issues encountered.

There are a lot of kernel config files around that have CONFIG_SLAB=y. 
This means that the kernel will be build with SLAB and not SLUB. In order 
to build a kernel with SLUB you will need to have CONFIG_SLUB=y in there.

Differences in the treatment of power of two slabs:

SLUB has a higher packing density since the control fields are placed in 
the page struct. There is no need for a control structure in the slab 
itself or have control structures in a separate slab (OFF_SLAB). This is 
only possible since SLUB does not have to maintain a map of all objects 
like SLAB. Instead we use a linked list.

In order to manage objects with linked lists we need to have a pointer to 
the next free object for each object. This is no problem for slab 
configurations where the object state is irrelevant after kfree or before 
kmalloc. However, if the object cannot be touched at all 
(SLAB_DESTROY_BY_RCU or the use of constructors) then SLUB must place the 
free list pointer after the object and therefore increase the object size. 
This is particularly bad if the object is also aligned to the same 
power-of-two because it means that the object size has just doubled.

For page sized allocations quicklists are an alternative. Moving to 
quicklists also allows to continue the use of page struct fields. The 
solution for most of the three platforms above was to switch to quicklists 
instead.

If you have smaller than page sized allocations that are of the power of 
two and which are to be aligned to the same power of two then it may be 
advisable to make sure that the slab does not have a constructor. 
Otherwise there may be some memory wasted.

I would expect that the experimental status of SLUB will be removed 
soon. SLUB then become the default slab allocator. It may not be the 
default when we release 2.6.22 but it is scheduled to be the default for 
2.6.23.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-08 19:10 New slab allocator SLUB Christoph Lameter
@ 2007-05-09  1:04 ` Paul Mundt
  2007-05-09 13:36 ` Andi Kleen
  2007-05-10 22:13 ` David Miller
  2 siblings, 0 replies; 14+ messages in thread
From: Paul Mundt @ 2007-05-09  1:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-arch, akpm

On Tue, May 08, 2007 at 12:10:32PM -0700, Christoph Lameter wrote:
> I would appreciate if you could test SLUB on your platform and make sure 
> that everything works the right way. There is a slabinfo tool that allows 
> monitoring of SLUB slabs in Documentation/vm/slabinfo.c. If there are 
> problems then please boot specifying "slub_debug" which should give you a 
> detailed analysis of the issues encountered.
> 
It seems to hold up on SH as expected at least. I've been running
it for awhile with varying workloads and nothing out of the ordinary has
popped up yet.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-08 19:10 New slab allocator SLUB Christoph Lameter
  2007-05-09  1:04 ` Paul Mundt
@ 2007-05-09 13:36 ` Andi Kleen
  2007-05-09 15:52   ` Christoph Lameter
  2007-05-10 22:13 ` David Miller
  2 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2007-05-09 13:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-arch, akpm


>
> i386:
>  Uses slab for pgd handling and modifies the page structs of those
>  Fix has been in Andrew's tree for awhile now.

Should be already upstream.


> I would expect that the experimental status of SLUB will be removed
> soon. SLUB then become the default slab allocator. It may not be the
> default when we release 2.6.22 but it is scheduled to be the default for
> 2.6.23.

Assuming you fix the performance regressions first?

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-09 13:36 ` Andi Kleen
@ 2007-05-09 15:52   ` Christoph Lameter
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-05-09 15:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-arch, akpm

On Wed, 9 May 2007, Andi Kleen wrote:

> Assuming you fix the performance regressions first?

There are no unfixed performance regressions. The netperf issue with 
Cloverton has a fix in mm.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-08 19:10 New slab allocator SLUB Christoph Lameter
  2007-05-09  1:04 ` Paul Mundt
  2007-05-09 13:36 ` Andi Kleen
@ 2007-05-10 22:13 ` David Miller
  2007-05-10 22:21   ` Christoph Lameter
  2 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2007-05-10 22:13 UTC (permalink / raw)
  To: clameter; +Cc: linux-arch, akpm

From: Christoph Lameter <clameter@sgi.com>
Date: Tue, 8 May 2007 12:10:32 -0700 (PDT)

> The new slab allocator SLUB was merged and it seems that we are heading 
> towards replacing SLAB completely by SLUB. This means that we would like 
> to be sure that SLUB runs reliably on all platforms. SLUB is first
> available upstream with 2.6.21-git9.

I found a new difference in SLUB and it prevents sparc64 from
booting currently :-)

What SLAB allows you to do is define LARGE_ALLOCS but not necessarily
set MAX_ORDER large enough for the largest kmalloc SLAB.  SLAB would
ignore the kmalloc cache creation failures for these largest ones that
are over MAX_ORDER.

SLUB instead panic()'s which isn't so nice that early in the boot.

There are a few platforms that will trigger this problem, in
fact pretty much every one that specifies LARGE_ALLOCS currently
based upon a casual scan of platform Kconfig files.

To be honest I don't think I even need LARGE_ALLOCS on sparc64 so I
think I'll just see if I can delete that, but I would suggest one of
two courses of action:

1) Make SLUB ignore kmalloc cache creation failures at least for
   the higher order ones

or

2) Detect the (PAGE_SIZE << MAX_ORDER) < LARGEST_KMALLOC_SIZE
   at compile time so that nobody gets such an early panic.

Take care.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 22:13 ` David Miller
@ 2007-05-10 22:21   ` Christoph Lameter
  2007-05-10 22:30     ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-05-10 22:21 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch, akpm

On Thu, 10 May 2007, David Miller wrote:

> What SLAB allows you to do is define LARGE_ALLOCS but not necessarily
> set MAX_ORDER large enough for the largest kmalloc SLAB.  SLAB would
> ignore the kmalloc cache creation failures for these largest ones that
> are over MAX_ORDER.

Hmmm... How about limiting KMALLOC_SHIFT_HIGH to max order?

---
 include/linux/slub_def.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: slub/include/linux/slub_def.h
===================================================================
--- slub.orig/include/linux/slub_def.h	2007-05-10 15:19:39.000000000 -0700
+++ slub/include/linux/slub_def.h	2007-05-10 15:20:39.000000000 -0700
@@ -59,7 +59,7 @@ struct kmem_cache {
 #define KMALLOC_SHIFT_LOW 3
 
 #ifdef CONFIG_LARGE_ALLOCS
-#define KMALLOC_SHIFT_HIGH 25
+#define KMALLOC_SHIFT_HIGH (min(25, MAX_ORDER + PAGE_SHIFT))
 #else
 #if !defined(CONFIG_MMU) || NR_CPUS > 512 || MAX_NUMNODES > 256
 #define KMALLOC_SHIFT_HIGH 20

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 22:21   ` Christoph Lameter
@ 2007-05-10 22:30     ` David Miller
  2007-05-10 22:33       ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2007-05-10 22:30 UTC (permalink / raw)
  To: clameter; +Cc: linux-arch, akpm

From: Christoph Lameter <clameter@sgi.com>
Date: Thu, 10 May 2007 15:21:52 -0700 (PDT)

> On Thu, 10 May 2007, David Miller wrote:
> 
> > What SLAB allows you to do is define LARGE_ALLOCS but not necessarily
> > set MAX_ORDER large enough for the largest kmalloc SLAB.  SLAB would
> > ignore the kmalloc cache creation failures for these largest ones that
> > are over MAX_ORDER.
> 
> Hmmm... How about limiting KMALLOC_SHIFT_HIGH to max order?

That should definitely do the trick too:

Signed-off-by: David S. Miller <davem@davemloft.net>

I just confirmed that I don't actually need LARGE_ALLOCS on sparc64.
I think I needed them for some reason back when I used kmalloc() to
allocate the per-address-space TLB miss hash tables.

I think the issue was that for Niagara and later really huge TLB
hash table sizes are allowed, and I wanted to experiment with those
and the sizes were large enough to require LARGE_ALLOCS.  But now
I use SLAB for this and I cap the size at the pre-Niagara limit
of 1MB because larger sizes showed no performance gains.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 22:30     ` David Miller
@ 2007-05-10 22:33       ` Christoph Lameter
  2007-05-10 22:35         ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-05-10 22:33 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch, akpm

On Thu, 10 May 2007, David Miller wrote:

> From: Christoph Lameter <clameter@sgi.com>
> Date: Thu, 10 May 2007 15:21:52 -0700 (PDT)
> 
> > On Thu, 10 May 2007, David Miller wrote:
> > 
> > > What SLAB allows you to do is define LARGE_ALLOCS but not necessarily
> > > set MAX_ORDER large enough for the largest kmalloc SLAB.  SLAB would
> > > ignore the kmalloc cache creation failures for these largest ones that
> > > are over MAX_ORDER.
> > 
> > Hmmm... How about limiting KMALLOC_SHIFT_HIGH to max order?
> 
> That should definitely do the trick too:

Could you verify that it indeed does the trick?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 22:33       ` Christoph Lameter
@ 2007-05-10 22:35         ` David Miller
  2007-05-10 22:38           ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2007-05-10 22:35 UTC (permalink / raw)
  To: clameter; +Cc: linux-arch, akpm

From: Christoph Lameter <clameter@sgi.com>
Date: Thu, 10 May 2007 15:33:30 -0700 (PDT)

> On Thu, 10 May 2007, David Miller wrote:
> 
> > From: Christoph Lameter <clameter@sgi.com>
> > Date: Thu, 10 May 2007 15:21:52 -0700 (PDT)
> > 
> > > On Thu, 10 May 2007, David Miller wrote:
> > > 
> > > > What SLAB allows you to do is define LARGE_ALLOCS but not necessarily
> > > > set MAX_ORDER large enough for the largest kmalloc SLAB.  SLAB would
> > > > ignore the kmalloc cache creation failures for these largest ones that
> > > > are over MAX_ORDER.
> > > 
> > > Hmmm... How about limiting KMALLOC_SHIFT_HIGH to max order?
> > 
> > That should definitely do the trick too:
> 
> Could you verify that it indeed does the trick?

Sure... give me a few minutes.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 22:35         ` David Miller
@ 2007-05-10 22:38           ` David Miller
  2007-05-10 22:44             ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2007-05-10 22:38 UTC (permalink / raw)
  To: clameter; +Cc: linux-arch, akpm

From: David Miller <davem@davemloft.net>
Date: Thu, 10 May 2007 15:35:31 -0700 (PDT)

> From: Christoph Lameter <clameter@sgi.com>
> Date: Thu, 10 May 2007 15:33:30 -0700 (PDT)
> 
> > On Thu, 10 May 2007, David Miller wrote:
> > 
> > > From: Christoph Lameter <clameter@sgi.com>
> > > Date: Thu, 10 May 2007 15:21:52 -0700 (PDT)
> > > 
> > > > On Thu, 10 May 2007, David Miller wrote:
> > > > 
> > > > > What SLAB allows you to do is define LARGE_ALLOCS but not necessarily
> > > > > set MAX_ORDER large enough for the largest kmalloc SLAB.  SLAB would
> > > > > ignore the kmalloc cache creation failures for these largest ones that
> > > > > are over MAX_ORDER.
> > > > 
> > > > Hmmm... How about limiting KMALLOC_SHIFT_HIGH to max order?
> > > 
> > > That should definitely do the trick too:
> > 
> > Could you verify that it indeed does the trick?
> 
> Sure... give me a few minutes.

Ugh, it won't build, you can't use min() because this is
evaluated at compile time to compute array sizes etc.

include/linux/slub_def.h:76: error: braced-group within expression allowed only inside a function

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 22:38           ` David Miller
@ 2007-05-10 22:44             ` Christoph Lameter
  2007-05-10 23:01               ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-05-10 22:44 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch, akpm

On Thu, 10 May 2007, David Miller wrote:

> Ugh, it won't build, you can't use min() because this is
> evaluated at compile time to compute array sizes etc.
> 
> include/linux/slub_def.h:76: error: braced-group within expression allowed only inside a function

Rats. Then we have to do it by hand. This compiles here...


---
 include/linux/slub_def.h |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: slub/include/linux/slub_def.h
===================================================================
--- slub.orig/include/linux/slub_def.h	2007-05-10 15:19:39.000000000 -0700
+++ slub/include/linux/slub_def.h	2007-05-10 15:42:57.000000000 -0700
@@ -59,7 +59,8 @@ struct kmem_cache {
 #define KMALLOC_SHIFT_LOW 3
 
 #ifdef CONFIG_LARGE_ALLOCS
-#define KMALLOC_SHIFT_HIGH 25
+#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT) < 25 ? \
+				MAX_ORDER + PAGE_SHIFT : 25)
 #else
 #if !defined(CONFIG_MMU) || NR_CPUS > 512 || MAX_NUMNODES > 256
 #define KMALLOC_SHIFT_HIGH 20
@@ -86,6 +87,9 @@ static inline int kmalloc_index(int size
 	 */
 	WARN_ON_ONCE(size == 0);
 
+	if (size >= (1UL << KMALLOC_SHIFT_HIGH))
+		return -1;
+
 	if (size > 64 && size <= 96)
 		return 1;
 	if (size > 128 && size <= 192)
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 22:44             ` Christoph Lameter
@ 2007-05-10 23:01               ` David Miller
  2007-05-10 23:07                 ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2007-05-10 23:01 UTC (permalink / raw)
  To: clameter; +Cc: linux-arch, akpm

From: Christoph Lameter <clameter@sgi.com>
Date: Thu, 10 May 2007 15:44:12 -0700 (PDT)

> On Thu, 10 May 2007, David Miller wrote:
> 
> > Ugh, it won't build, you can't use min() because this is
> > evaluated at compile time to compute array sizes etc.
> > 
> > include/linux/slub_def.h:76: error: braced-group within expression allowed only inside a function
> 
> Rats. Then we have to do it by hand. This compiles here...

Unfortunately, still no dice with LARGE_ALLOCS=y/SLUB=y on sparc64.
It still tries to create the 16MB kmalloc cache even though MAX_ORDER
is 11. :-)

I think this is an off-by-one error, the kmalloc cache builder
iterates to >= KMALLOC_SHIFT_HIGH but your min() on MAX_ORDER would
only work if it iterated to > KMALLOC_SHIFT_HIGH.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 23:01               ` David Miller
@ 2007-05-10 23:07                 ` Christoph Lameter
  2007-05-11  0:05                   ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-05-10 23:07 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch, akpm

On Thu, 10 May 2007, David Miller wrote:

> Unfortunately, still no dice with LARGE_ALLOCS=y/SLUB=y on sparc64.
> It still tries to create the 16MB kmalloc cache even though MAX_ORDER
> is 11. :-)
> 
> I think this is an off-by-one error, the kmalloc cache builder
> iterates to >= KMALLOC_SHIFT_HIGH but your min() on MAX_ORDER would
> only work if it iterated to > KMALLOC_SHIFT_HIGH.

Then subtract one?

---
 include/linux/slub_def.h |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: slub/include/linux/slub_def.h
===================================================================
--- slub.orig/include/linux/slub_def.h	2007-05-10 15:19:39.000000000 -0700
+++ slub/include/linux/slub_def.h	2007-05-10 16:06:15.000000000 -0700
@@ -59,7 +59,8 @@ struct kmem_cache {
 #define KMALLOC_SHIFT_LOW 3
 
 #ifdef CONFIG_LARGE_ALLOCS
-#define KMALLOC_SHIFT_HIGH 25
+#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT) <= 25 ? \
+				MAX_ORDER + PAGE_SHIFT - 1 : 25)
 #else
 #if !defined(CONFIG_MMU) || NR_CPUS > 512 || MAX_NUMNODES > 256
 #define KMALLOC_SHIFT_HIGH 20
@@ -86,6 +87,9 @@ static inline int kmalloc_index(int size
 	 */
 	WARN_ON_ONCE(size == 0);
 
+	if (size >= (1UL << KMALLOC_SHIFT_HIGH))
+		return -1;
+
 	if (size > 64 && size <= 96)
 		return 1;
 	if (size > 128 && size <= 192)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: New slab allocator SLUB
  2007-05-10 23:07                 ` Christoph Lameter
@ 2007-05-11  0:05                   ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2007-05-11  0:05 UTC (permalink / raw)
  To: clameter; +Cc: linux-arch, akpm

From: Christoph Lameter <clameter@sgi.com>
Date: Thu, 10 May 2007 16:07:16 -0700 (PDT)

> On Thu, 10 May 2007, David Miller wrote:
> 
> > Unfortunately, still no dice with LARGE_ALLOCS=y/SLUB=y on sparc64.
> > It still tries to create the 16MB kmalloc cache even though MAX_ORDER
> > is 11. :-)
> > 
> > I think this is an off-by-one error, the kmalloc cache builder
> > iterates to >= KMALLOC_SHIFT_HIGH but your min() on MAX_ORDER would
> > only work if it iterated to > KMALLOC_SHIFT_HIGH.
> 
> Then subtract one?

Yep, that works.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-05-11  0:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-08 19:10 New slab allocator SLUB Christoph Lameter
2007-05-09  1:04 ` Paul Mundt
2007-05-09 13:36 ` Andi Kleen
2007-05-09 15:52   ` Christoph Lameter
2007-05-10 22:13 ` David Miller
2007-05-10 22:21   ` Christoph Lameter
2007-05-10 22:30     ` David Miller
2007-05-10 22:33       ` Christoph Lameter
2007-05-10 22:35         ` David Miller
2007-05-10 22:38           ` David Miller
2007-05-10 22:44             ` Christoph Lameter
2007-05-10 23:01               ` David Miller
2007-05-10 23:07                 ` Christoph Lameter
2007-05-11  0:05                   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).