Re: [patch 7/8] slub: Make the order configurable for each slab cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [patch 7/8] slub: Make the order configurable for each slab cache
       [not found] ` <20080229044820.044485187@sgi.com>
@ 2008-02-29  8:13   ` Pekka Enberg
  2008-02-29 19:37     ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2008-02-29  8:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Christoph Lameter wrote:
> Makes /sys/kernel/slab/<slabname>/order writable. The allocation
> order of a slab cache can then be changed dynamically during runtime.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>

> @@ -3715,11 +3720,23 @@ static ssize_t objs_per_slab_show(struct
>  }
>  SLAB_ATTR_RO(objs_per_slab);
>  
> +static ssize_t order_store(struct kmem_cache *s,
> +				const char *buf, size_t length)
> +{
> +	int order = simple_strtoul(buf, NULL, 10);
> +
> +	if (order > slub_max_order || order < slub_min_order)
> +		return -EINVAL;
> +
> +	calculate_sizes(s, order);
> +	return length;
> +}

I think we either want to check that the order is big enough to hold one 
object for the given cache or add a comment explaining why it can never 
happen (page allocator pass-through).

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 7/8] slub: Make the order configurable for each slab cache
  2008-02-29  8:13   ` [patch 7/8] slub: Make the order configurable for each slab cache Pekka Enberg
@ 2008-02-29 19:37     ` Christoph Lameter
  2008-03-01  9:47       ` Pekka Enberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-02-29 19:37 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Matt Mackall, linux-mm

On Fri, 29 Feb 2008, Pekka Enberg wrote:

> I think we either want to check that the order is big enough to hold one
> object for the given cache or add a comment explaining why it can never happen
> (page allocator pass-through).

Calculate_sizes() will violate max_order if the object does not fit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 7/8] slub: Make the order configurable for each slab cache
  2008-02-29 19:37     ` Christoph Lameter
@ 2008-03-01  9:47       ` Pekka Enberg
  2008-03-03 17:49         ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2008-03-01  9:47 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Hi Christoph,

On Fri, 29 Feb 2008, Pekka Enberg wrote:
>  > I think we either want to check that the order is big enough to hold one
>  > object for the given cache or add a comment explaining why it can never happen
>  > (page allocator pass-through).

On Fri, Feb 29, 2008 at 9:37 PM, Christoph Lameter <clameter@sgi.com> wrote:
>  Calculate_sizes() will violate max_order if the object does not fit.

I am not sure I understand what you mean here. For example, for a
cache that requires minimum order of 1 to fit any objects (which
doesn't happen now because of page allocator pass-through), the
order_store() function can call calculate_sizes() with forced_order
set to zero after which the cache becomes useless. That deserves a
code comment, I think.

                         Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 7/8] slub: Make the order configurable for each slab cache
  2008-03-01  9:47       ` Pekka Enberg
@ 2008-03-03 17:49         ` Christoph Lameter
  2008-03-03 22:56           ` Pekka Enberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-03-03 17:49 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Matt Mackall, linux-mm

On Sat, 1 Mar 2008, Pekka Enberg wrote:

> I am not sure I understand what you mean here. For example, for a
> cache that requires minimum order of 1 to fit any objects (which
> doesn't happen now because of page allocator pass-through), the
> order_store() function can call calculate_sizes() with forced_order

It does happen because the page allocator pass through is only possible 
for kmalloc allocations.

> set to zero after which the cache becomes useless. That deserves a
> code comment, I think.

If the object does not fit into a page then calculate_sizes will violate 
max_order (if necessary) in order to make sure that an allocation is 
possible.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 7/8] slub: Make the order configurable for each slab cache
  2008-03-03 17:49         ` Christoph Lameter
@ 2008-03-03 22:56           ` Pekka Enberg
  2008-03-03 23:36             ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2008-03-03 22:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Christoph Lameter wrote:
> On Sat, 1 Mar 2008, Pekka Enberg wrote:
> 
>> I am not sure I understand what you mean here. For example, for a
>> cache that requires minimum order of 1 to fit any objects (which
>> doesn't happen now because of page allocator pass-through), the
>> order_store() function can call calculate_sizes() with forced_order
> 
> It does happen because the page allocator pass through is only possible 
> for kmalloc allocations.
> 
>> set to zero after which the cache becomes useless. That deserves a
>> code comment, I think.
> 
> If the object does not fit into a page then calculate_sizes will violate 
> max_order (if necessary) in order to make sure that an allocation is 
> possible.

Hmm, I seem to be missing something here. For page size of 4KB, object 
size of 8KB, and min_order of zero, when I write zero order to 
/sys/kernel/slab/<cache>/order the kernel won't crash because...?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 7/8] slub: Make the order configurable for each slab cache
  2008-03-03 22:56           ` Pekka Enberg
@ 2008-03-03 23:36             ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2008-03-03 23:36 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Matt Mackall, linux-mm

On Tue, 4 Mar 2008, Pekka Enberg wrote:

> Hmm, I seem to be missing something here. For page size of 4KB, object size of
> 8KB, and min_order of zero, when I write zero order to
> /sys/kernel/slab/<cache>/order the kernel won't crash because...?

The slab allocator will override and force the use of order 1 allocations.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20080229044820.298792748@sgi.com>]

* Re: [patch 8/8] slub: Simplify any_slab_object checks
       [not found] ` <20080229044820.298792748@sgi.com>
@ 2008-02-29  8:13   ` Pekka Enberg
  0 siblings, 0 replies; 26+ messages in thread
From: Pekka Enberg @ 2008-02-29  8:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Christoph Lameter wrote:
> Since we now have total_objects counter per node use that to
> check for the presence of any objects.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20080229044819.800974712@sgi.com>]

* Re: [patch 6/8] slub: Adjust order boundaries and minimum objects per slab.
       [not found] ` <20080229044819.800974712@sgi.com>
@ 2008-02-29  8:19   ` Pekka Enberg
  2008-02-29 19:41     ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2008-02-29  8:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Christoph Lameter wrote:
> Since there is now no worry anymore about higher order allocs (hopefully)
> increase the minimum of objects per slab to 60 so that slub can reach a
> similar fastpath/slowpath ratio as slab. Set the max order to default to
> 4 (64k) and require slub to use a higher order if a certain object density
> cannot be reached.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>

I can see why you want to change the defaults for big iron but why not 
keep the existing PAGE_SHIFT check which leaves embedded and regular 
desktop unchanged?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 6/8] slub: Adjust order boundaries and minimum objects per slab.
  2008-02-29  8:19   ` [patch 6/8] slub: Adjust order boundaries and minimum objects per slab Pekka Enberg
@ 2008-02-29 19:41     ` Christoph Lameter
  2008-03-01  9:58       ` Pekka J Enberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-02-29 19:41 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Matt Mackall, linux-mm

On Fri, 29 Feb 2008, Pekka Enberg wrote:

> I can see why you want to change the defaults for big iron but why not keep
> the existing PAGE_SHIFT check which leaves embedded and regular desktop
> unchanged?

The defaults for slab are also 60 objects per slab. The PAGE_SHIFT says 
nothing about the big iron. Our new big irons have a page shift of 12 and 
are x86_64.

We could drop the limit if CONFIG_EMBEDDED is set but then this may waste 
space. A higher order allows slub to reach a higher object density (in 
particular for objects 500-2000 bytes size).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 6/8] slub: Adjust order boundaries and minimum objects per slab.
  2008-02-29 19:41     ` Christoph Lameter
@ 2008-03-01  9:58       ` Pekka J Enberg
  2008-03-03 17:52         ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka J Enberg @ 2008-03-01  9:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Hi Christoph,

On Fri, 29 Feb 2008, Pekka Enberg wrote:
> > I can see why you want to change the defaults for big iron but why not keep
> > the existing PAGE_SHIFT check which leaves embedded and regular desktop
> > unchanged?

On Fri, 29 Feb 2008, Christoph Lameter wrote:
> The defaults for slab are also 60 objects per slab. The PAGE_SHIFT says 
> nothing about the big iron. Our new big irons have a page shift of 12 and 
> are x86_64.

Where is that objects per slab limit? I only see calculate_slab_order() 
trying out bunch of page orders until we hit "acceptable" internal 
fragmentation. Also keep in mind how badly SLAB compares to SLUB and SLOB 
in terms of memory efficiency.

Maybe we can use total amount of memory as some sort of heuristic to 
determine the defaults? That way boxes with lots of memory get to use 
larger orders for better performance whereas smaller boxes are more 
memory efficient.

On Fri, 29 Feb 2008, Christoph Lameter wrote:
> We could drop the limit if CONFIG_EMBEDDED is set but then this may waste 
> space. A higher order allows slub to reach a higher object density (in 
> particular for objects 500-2000 bytes size).

I am more worried about memory allocated for objects that are not used 
rather than memory wasted due to bad fitting.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 6/8] slub: Adjust order boundaries and minimum objects per slab.
  2008-03-01  9:58       ` Pekka J Enberg
@ 2008-03-03 17:52         ` Christoph Lameter
  2008-03-03 21:34           ` Matt Mackall
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-03-03 17:52 UTC (permalink / raw)
  To: Pekka J Enberg; +Cc: Mel Gorman, Matt Mackall, linux-mm

On Sat, 1 Mar 2008, Pekka J Enberg wrote:

> On Fri, 29 Feb 2008, Christoph Lameter wrote:
> > The defaults for slab are also 60 objects per slab. The PAGE_SHIFT says 
> > nothing about the big iron. Our new big irons have a page shift of 12 and 
> > are x86_64.
> 
> Where is that objects per slab limit? I only see calculate_slab_order() 
> trying out bunch of page orders until we hit "acceptable" internal 
> fragmentation. Also keep in mind how badly SLAB compares to SLUB and SLOB 
> in terms of memory efficiency.

slub_min_objects sets that limit.
 
> On Fri, 29 Feb 2008, Christoph Lameter wrote:
> > We could drop the limit if CONFIG_EMBEDDED is set but then this may waste 
> > space. A higher order allows slub to reach a higher object density (in 
> > particular for objects 500-2000 bytes size).
> 
> I am more worried about memory allocated for objects that are not used 
> rather than memory wasted due to bad fitting.

Is there any way to quantify this? This is likely only an effect that 
mostly matters for rarely used slabs (the merging reduces that effect). 
F.e. fitting more inodes or dentries into a single slab increases object 
density.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 6/8] slub: Adjust order boundaries and minimum objects per slab.
  2008-03-03 17:52         ` Christoph Lameter
@ 2008-03-03 21:34           ` Matt Mackall
  2008-03-03 22:36             ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Matt Mackall @ 2008-03-03 21:34 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka J Enberg, Mel Gorman, linux-mm

On Mon, Mar 03, 2008 at 09:52:55AM -0800, Christoph Lameter wrote:
> On Sat, 1 Mar 2008, Pekka J Enberg wrote:
> 
> > On Fri, 29 Feb 2008, Christoph Lameter wrote:
> > > The defaults for slab are also 60 objects per slab. The PAGE_SHIFT says 
> > > nothing about the big iron. Our new big irons have a page shift of 12 and 
> > > are x86_64.
> > 
> > Where is that objects per slab limit? I only see calculate_slab_order() 
> > trying out bunch of page orders until we hit "acceptable" internal 
> > fragmentation. Also keep in mind how badly SLAB compares to SLUB and SLOB 
> > in terms of memory efficiency.
> 
> slub_min_objects sets that limit.
>  
> > On Fri, 29 Feb 2008, Christoph Lameter wrote:
> > > We could drop the limit if CONFIG_EMBEDDED is set but then this may waste 
> > > space. A higher order allows slub to reach a higher object density (in 
> > > particular for objects 500-2000 bytes size).
> > 
> > I am more worried about memory allocated for objects that are not used 
> > rather than memory wasted due to bad fitting.
> 
> Is there any way to quantify this? This is likely only an effect that 
> mostly matters for rarely used slabs (the merging reduces that effect). 
> F.e. fitting more inodes or dentries into a single slab increases object 
> density.

On the other hand, a single object can now pin 64k in memory rather
than 4k. So when we collapse some cache under memory pressure, we're
not likely to free as much.

I know you've put a lot of effort into dealing with the dcache and
icache instances of this, but this could very well offset most of that.

Also, we might consider only allocating an order-1 slab if we've
filled an order-0, and so on. When we hit pressure, we kick our
order counter back to 0.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 6/8] slub: Adjust order boundaries and minimum objects per slab.
  2008-03-03 21:34           ` Matt Mackall
@ 2008-03-03 22:36             ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2008-03-03 22:36 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Pekka J Enberg, Mel Gorman, linux-mm

On Mon, 3 Mar 2008, Matt Mackall wrote:

> On the other hand, a single object can now pin 64k in memory rather
> than 4k. So when we collapse some cache under memory pressure, we're
> not likely to free as much.

Right.

> I know you've put a lot of effort into dealing with the dcache and
> icache instances of this, but this could very well offset most of that.

I developed and tested the icache and dcache stuff with order 3 allocs 
(when mm still had the initial higher order page use without fallbacks).

> Also, we might consider only allocating an order-1 slab if we've
> filled an order-0, and so on. When we hit pressure, we kick our
> order counter back to 0.

Hmmmm... Interesting idea. Is doable now since the size of the individual 
slab is no longer fixed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <20080229044818.999367120@sgi.com>]

* Re: [patch 3/8] slub: Update statistics handling for variable order slabs
       [not found] ` <20080229044818.999367120@sgi.com>
@ 2008-02-29  8:59   ` Pekka Enberg
  2008-02-29 19:43     ` Christoph Lameter
  2008-03-01 10:29   ` Pekka Enberg
  1 sibling, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2008-02-29  8:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Christoph Lameter wrote:
> Change the statistics to consider that slabs of the same slabcache
> can have different number of objects in them since they may be of
> different order.
> 
> Provide a new sysfs field
> 
> 	total_objects
> 
> which shows the total objects that the allocated slabs of a slabcache
> could hold.
> 
> Update the description of the objects field in the kmem_cache structure.
> Its role is now to be the limit of the maximum number of objects per slab
> if a slab is allocated with the largest possible order.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 3/8] slub: Update statistics handling for variable order slabs
  2008-02-29  8:59   ` [patch 3/8] slub: Update statistics handling for variable order slabs Pekka Enberg
@ 2008-02-29 19:43     ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2008-02-29 19:43 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Matt Mackall, linux-mm

On Fri, 29 Feb 2008, Pekka Enberg wrote:

> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

Hmmm... I get some weird numbers when I use slabinfo but cannot spot the 
issue. Could you look a bit closer at this? In particular at the slabinfo 
emulation?
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 3/8] slub: Update statistics handling for variable order slabs
       [not found] ` <20080229044818.999367120@sgi.com>
  2008-02-29  8:59   ` [patch 3/8] slub: Update statistics handling for variable order slabs Pekka Enberg
@ 2008-03-01 10:29   ` Pekka Enberg
  1 sibling, 0 replies; 26+ messages in thread
From: Pekka Enberg @ 2008-03-01 10:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Matt Mackall, linux-mm

Hi Christoph,

On Fri, Feb 29, 2008 at 9:43 PM, Christoph Lameter <clameter@sgi.com> wrote:
>  Hmmm... I get some weird numbers when I use slabinfo but cannot spot the
>  issue. Could you look a bit closer at this? In particular at the slabinfo
>  emulation?

What kind of weird numbers? Unfortunately the patch still looks
correct to me so it might be an integer overflow issue...

On Fri, Feb 29, 2008 at 6:48 AM, Christoph Lameter <clameter@sgi.com> wrote:
>  @@ -4331,7 +4367,9 @@ static int s_show(struct seq_file *m, vo
>         unsigned long nr_partials = 0;

nr_partials is no longer read so you can remove it.

>         unsigned long nr_slabs = 0;
>         unsigned long nr_inuse = 0;

No need to initialize nr_inuse to zero here.

>  -       unsigned long nr_objs;
>  +       unsigned long nr_objs = 0;
>  +       unsigned long nr_partial_inuse = 0;
>  +       unsigned long nr_partial_total = 0;
>         struct kmem_cache *s;
>         int node;
>
>  @@ -4345,14 +4383,15 @@ static int s_show(struct seq_file *m, vo
>
>                 nr_partials += n->nr_partial;
>                 nr_slabs += atomic_long_read(&n->nr_slabs);
>  -               nr_inuse += count_partial(n);
>  +               nr_objs += atomic_long_read(&n->total_objects);

So does ->total_objects contain the total amount of objects (not
necessarily in use) including the partial list or not? AFAICT it
_does_ include slabs in the partial list too so nr_objs is correct
here.

>  +               nr_partial_inuse += count_partial_inuse(n);
>  +               nr_partial_total += count_partial_total(s, n);
>         }
>
>  -       nr_objs = nr_slabs * s->objects;
>  -       nr_inuse += (nr_slabs - nr_partials) * s->objects;
>  +       nr_inuse = nr_objs - (nr_partial_total - nr_partial_inuse);

So if nr_objs contains the total number of objects in all slabs
including those that are in the partial list, this looks correct also.

                             Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
       [not found] <20080229044803.482012397@sgi.com>
                   ` (3 preceding siblings ...)
       [not found] ` <20080229044818.999367120@sgi.com>
@ 2008-03-04 12:20 ` Mel Gorman
  2008-03-04 18:53   ` Christoph Lameter
  2008-03-04 19:01   ` Matt Mackall
  4 siblings, 2 replies; 26+ messages in thread
From: Mel Gorman @ 2008-03-04 12:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On (28/02/08 20:48), Christoph Lameter didst pronounce:
> This is the patchset that was posted two weeks ago modified according
> to the feedback that Pekka gave. I would like to put these patches
> into mm.
> 

I haven't reviewed the patches properly but I put them through a quick test
against 2.6.25-rc3 to see what the performnace was like and the superpage
allocation success rates were like. Performance wise, it looked like

				Loss	to	Gain
Kernbench Elapsed time		 -0.64%		0.32%
Kernbench Total time		 -0.61%		0.48%
Hackbench sockets-12 clients	 -2.95%		5.13%
Hackbench pipes-12 clients	-16.95%		9.27%
TBench 4 clients		 -1.98%		8.2%
DBench 4 clients (ext2)		 -5.9%		7.99%

So, running with the high orders is not a clear-cut win to my eyes. What
did you test to show that it was a general win justifying a high-order by
default? From looking through, tbench seems to be the only obvious one to
gain but the rest, it is not clear at all. I'll try give sysbench a spin
later to see if it is clear-cut.

However, in *all* cases, superpage allocations were less successful and in
some cases it was severely regressed (one machine went from 81% success rate
to 36%). Sufficient statistics are not gathered to see why this happened
in retrospect but my suspicion would be that high-order RECLAIMABLE and
UNMOVABLE slub allocations routinely fall back to the less fragmented
MOVABLE pageblocks with these patches - something that is normally a very
rare event. This change in assumption hurts fragmentation avoidance and
chances are the long-term behaviour of these patches is not great.

If this guess is correct, using a high-order size by default is a bad plan
and it should only be set when it is known that the target workload benefits
and superpage allocations are not a concern. Alternative, set high-order by
default only for a limited number of caches that are RECLAIMABLE (or better
yet ones we know can be directly reclaimed with the slub-defrag patches).

As it is, this is painful from a fragmentation perspective and the
performance win is not clear-cut.

> This patchset makes slub capable of handling arbitrary sizes of pages.
> This means that a slab cache that currently uses order 1 because of
> packing density issues can fallback to order 0 allocations if memory
> becomes fragmented. All allocations for objects <= PAGE_SIZE can fall
> back like that. So a single slab may contain various sizes of pages
> that may contain more or less objects.
> 
> On the other hand it also enables slub to use larger page orders by
> default since it is now no problem to fall back to an order 0 alloc.
> The default max order is set to 4 which means that 64K compound pages
> can beused in some situations for large objects that do not fit into smaller
> pages. This in turn increases the number of times slub can use its
> fastpath before a fallback to the page allocator has to occur.
> 
> The patchset realizes the initial intend of providing a feature
> comparable with the per cpu queue size in slab. The order for
> each slab cache can be configured from user space while the system
> is running. Increasing the default allocation order can be used to
> tune slub like slab.
> 
> The allocated sizes can then also be effectively controlled via boot
> parameters (slub_min_order and slub_max_order).
> 
> The patchset is also available via git
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm.git slab-mm
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-04 12:20 ` [patch 0/8] slub: Fallback to order 0 and variable order slab support Mel Gorman
@ 2008-03-04 18:53   ` Christoph Lameter
  2008-03-05 18:28     ` Mel Gorman
  2008-03-04 19:01   ` Matt Mackall
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-03-04 18:53 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On Tue, 4 Mar 2008, Mel Gorman wrote:

> 				Loss	to	Gain
> Kernbench Elapsed time		 -0.64%		0.32%
> Kernbench Total time		 -0.61%		0.48%
> Hackbench sockets-12 clients	 -2.95%		5.13%
> Hackbench pipes-12 clients	-16.95%		9.27%
> TBench 4 clients		 -1.98%		8.2%
> DBench 4 clients (ext2)		 -5.9%		7.99%
> 
> So, running with the high orders is not a clear-cut win to my eyes. What
> did you test to show that it was a general win justifying a high-order by
> default? From looking through, tbench seems to be the only obvious one to
> gain but the rest, it is not clear at all. I'll try give sysbench a spin
> later to see if it is clear-cut.

Hmmm... Interesting. The tests that I did awhile ago were with max order 
3. The patch as is now has max order 4. Maybe we need to reduce the order?

Looks like this was mostly a gain except for hackbench. Which is to be 
expected since the benchmark shelves out objects from the same slab round 
robin to different cpus. The higher the number of objects in the slab the 
higher the chance of contention on the slab lock.

> However, in *all* cases, superpage allocations were less successful and in
> some cases it was severely regressed (one machine went from 81% success rate
> to 36%). Sufficient statistics are not gathered to see why this happened
> in retrospect but my suspicion would be that high-order RECLAIMABLE and
> UNMOVABLE slub allocations routinely fall back to the less fragmented
> MOVABLE pageblocks with these patches - something that is normally a very
> rare event. This change in assumption hurts fragmentation avoidance and
> chances are the long-term behaviour of these patches is not great.

Superpage allocations means huge page allocations? Enable slub statistics 
and you will be able to see the number of fallbacks in 
/sys/kernel/slab/xx/order_fallback to confirm your suspicions.

How would the allocator be able to get MOVABLE allocations? Is fallback 
permitted for order 0 allocs to MOVABLE?

> If this guess is correct, using a high-order size by default is a bad plan
> and it should only be set when it is known that the target workload benefits
> and superpage allocations are not a concern. Alternative, set high-order by
> default only for a limited number of caches that are RECLAIMABLE (or better
> yet ones we know can be directly reclaimed with the slub-defrag patches).
> 
> As it is, this is painful from a fragmentation perspective and the
> performance win is not clear-cut.

Could we reduce the max order to 3 and see what happens then?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-04 18:53   ` Christoph Lameter
@ 2008-03-05 18:28     ` Mel Gorman
  2008-03-05 18:52       ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2008-03-05 18:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On (04/03/08 10:53), Christoph Lameter didst pronounce:
> On Tue, 4 Mar 2008, Mel Gorman wrote:
> 
> > 				Loss	to	Gain
> > Kernbench Elapsed time		 -0.64%		0.32%
> > Kernbench Total time		 -0.61%		0.48%
> > Hackbench sockets-12 clients	 -2.95%		5.13%
> > Hackbench pipes-12 clients	-16.95%		9.27%
> > TBench 4 clients		 -1.98%		8.2%
> > DBench 4 clients (ext2)		 -5.9%		7.99%
> > 
> > So, running with the high orders is not a clear-cut win to my eyes. What
> > did you test to show that it was a general win justifying a high-order by
> > default? From looking through, tbench seems to be the only obvious one to
> > gain but the rest, it is not clear at all. I'll try give sysbench a spin
> > later to see if it is clear-cut.
> 
> Hmmm... Interesting. The tests that I did awhile ago were with max order 
> 3. The patch as is now has max order 4. Maybe we need to reduce the order?
> 
> Looks like this was mostly a gain except for hackbench. Which is to be 
> expected since the benchmark shelves out objects from the same slab round 
> robin to different cpus. The higher the number of objects in the slab the 
> higher the chance of contention on the slab lock.
> 

Ok, I'm offically a tool. I had named patchsets wrong and tested slub-defrag
instead of slub-highorder. I didn't notice until I opened the diff file to
set the max_order. slub-highorder is being tested at the moment but it'll
be hours before it completes.

FWIW, the comments in the mail apply to slub-defrag instead. There is definite
performance alterations with the patches but that is hardly a surprise.
sysbench in some cases suffered but it wasn't clear why. For small
pages, it might regress and huge pages, not at all. So there may be a
alloc/free batch patterns that performance particularly badly.

What is a major surprise is that it hurt huge page allocations so severely
in some cases. That doesn't make a lot of sense.

> > However, in *all* cases, superpage allocations were less successful and in
> > some cases it was severely regressed (one machine went from 81% success rate
> > to 36%). Sufficient statistics are not gathered to see why this happened
> > in retrospect but my suspicion would be that high-order RECLAIMABLE and
> > UNMOVABLE slub allocations routinely fall back to the less fragmented
> > MOVABLE pageblocks with these patches - something that is normally a very
> > rare event. This change in assumption hurts fragmentation avoidance and
> > chances are the long-term behaviour of these patches is not great.
> 
> Superpage allocations means huge page allocations?

yes

> Enable slub statistics 
> and you will be able to see the number of fallbacks in 
> /sys/kernel/slab/xx/order_fallback to confirm your suspicions.
> 
> How would the allocator be able to get MOVABLE allocations? Is fallback 
> permitted for order 0 allocs to MOVABLE?
> 

Yes as the alternative may be failing allocations. It's avoided where
possible.

> > If this guess is correct, using a high-order size by default is a bad plan
> > and it should only be set when it is known that the target workload benefits
> > and superpage allocations are not a concern. Alternative, set high-order by
> > default only for a limited number of caches that are RECLAIMABLE (or better
> > yet ones we know can be directly reclaimed with the slub-defrag patches).
> > 
> > As it is, this is painful from a fragmentation perspective and the
> > performance win is not clear-cut.
> 
> Could we reduce the max order to 3 and see what happens then?
> 

When the order-4 figures come through I'll post them. If they are
unexpected, I'll run with order 3. Unconditionally, I'll check order-1
as suggested by Matt.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-05 18:28     ` Mel Gorman
@ 2008-03-05 18:52       ` Christoph Lameter
  2008-03-06 22:04         ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-03-05 18:52 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On Wed, 5 Mar 2008, Mel Gorman wrote:

> Ok, I'm offically a tool. I had named patchsets wrong and tested slub-defrag
> instead of slub-highorder. I didn't notice until I opened the diff file to
> set the max_order. slub-highorder is being tested at the moment but it'll
> be hours before it completes.

Tool? Never heard it before. Is that an Irish term? Do not worry. That 
happens all the time in the computer industry. These days, I get 
suspicious when people claim something is perfect (100% yes!).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-05 18:52       ` Christoph Lameter
@ 2008-03-06 22:04         ` Mel Gorman
  2008-03-06 22:18           ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2008-03-06 22:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On (05/03/08 10:52), Christoph Lameter didst pronounce:
> On Wed, 5 Mar 2008, Mel Gorman wrote:
> 
> > Ok, I'm offically a tool. I had named patchsets wrong and tested slub-defrag
> > instead of slub-highorder. I didn't notice until I opened the diff file to
> > set the max_order. slub-highorder is being tested at the moment but it'll
> > be hours before it completes.
> 
> Tool? Never heard it before. Is that an Irish term?

Possibly. It's not very complimentary either way :)

> Do not worry. That 
> happens all the time in the computer industry. These days, I get 
> suspicious when people claim something is perfect (100% yes!).
> 

Lets try this again. The range of performance losses/gains with order 4
was

Kernbench Elapsed    time      -0.16      to 6.23
Kernbench Total  CPU           -0.14      to 0.14
Hackbench pipes-1              -5.57      to 8.03
Hackbench pipes-4              -13.16     to 6.00
Hackbench pipes-8              -14.02     to 6.13
Hackbench pipes-16             -27.02     to 1.45
Hackbench sockets-1            0.00       to 12.90
Hackbench sockets-4            3.02       to 13.68
Hackbench sockets-8            2.57       to 13.40
Hackbench sockets-16           2.55       to 14.47
TBench    clients-1            -3.96      to 3.10
TBench    clients-2            -2.46      to 4.52
TBench    clients-4            -3.65      to 5.03
TBench    clients-8            -4.73      to 5.14
DBench    clients-1-ext2       -1.82      to 8.12
DBench    clients-2-ext2       -8.10      to 6.40
DBench    clients-4-ext2       -7.96      to 3.97
DBench    clients-8-ext2       0.89       to 7.54

At first glance, hackbench-pipes seemed to be hit hardest but in reality
it was one machine that showed wildly different values (bl6-13 from
TKO). When this machine is omitted, it looks like

Hackbench pipes-1              -3.75      to 8.03
Hackbench pipes-4              -5.26      to 6.00
Hackbench pipes-8              -4.56      to 6.13
Hackbench pipes-16             -5.28      to 1.45
Hackbench sockets-1            0.20       to 12.90
Hackbench sockets-4            3.02       to 11.92
Hackbench sockets-8            2.57       to 12.86
Hackbench sockets-16           2.55       to 14.47

Still a fairly wide variance but not as negative. DBench was kindof the
same. There was a different machine that appeared to particularly
suffer. With both machines omitted we get for DBench

DBench    clients-1-ext2       -1.82      to 3.84
DBench    clients-2-ext2       0.04       to 4.07
DBench    clients-4-ext2       -0.17      to 3.97
DBench    clients-8-ext2       0.89       to 4.21

(perversely, the machine with particularly bad hackbench results had
some of the best dbench results, go figure)

For huge page allocation success rates, the high order never helper the
situation but it was nowhere near as severe as it was for the slub-defrag
patches (ironically enough). Only one machine showed significantly worse
results. The rest were comparable for this set of tests at least but I would
still be wary of the long-lived behaviour of high-order slab allocations
slowly fragmenting memory due to pageblock fallbacks. Will think of how to
prove that in some way but just re-running the tests multiple times
without reboot may be enough.

It's more or less the same observation. Going with a higher order by
default wins heavily on some machines but occasionally loses badly as
well. Based on this, it's difficult to know which is more likely. I'll
start the sysbench tests to see what happens there.

Setting the order to 3 had vaguely similar results. The two outlier
machines had even worse negatives than order-4. With those machines
omitted the results were

Kernbench Elapsed    time      -0.28      to 0.16
Kernbench Total  CPU           -0.31      to 0.07
Hackbench pipes-1              -6.55      to 5.95
Hackbench pipes-4              -4.90      to 3.30
Hackbench pipes-8              -3.63      to 0.96
Hackbench pipes-16             -4.47      to 2.06
Hackbench sockets-1            0.80       to 10.04
Hackbench sockets-4            3.20       to 10.80
Hackbench sockets-8            1.61       to 13.69
Hackbench sockets-16           3.88       to 15.45
TBench    clients-1            -3.97      to 1.44
TBench    clients-2            -3.00      to 1.61
TBench    clients-4            -1.83      to 3.26
TBench    clients-8            -0.35      to 9.80
DBench    clients-1-ext2       -0.47      to 2.99
DBench    clients-2-ext2       -1.45      to 2.25
DBench    clients-4-ext2       -0.48      to 5.09

With the two machines included, it's

Kernbench Elapsed    time      -0.28      to 7.77
Kernbench Total  CPU           -0.31      to 0.10
Hackbench pipes-1              -6.55      to 5.95
Hackbench pipes-4              -8.00      to 3.30
Hackbench pipes-8              -25.87     to 0.96
Hackbench pipes-16             -24.74     to 2.06
Hackbench sockets-1            0.80       to 10.04
Hackbench sockets-4            3.20       to 10.80
Hackbench sockets-8            1.61       to 14.17
Hackbench sockets-16           2.42       to 15.45
TBench    clients-1            -3.97      to 1.44
TBench    clients-2            -3.00      to 1.63
TBench    clients-4            -1.83      to 3.26
TBench    clients-8            -3.15      to 9.80
DBench    clients-1-ext2       -0.47      to 3.22
DBench    clients-2-ext2       -11.41     to 10.53
DBench    clients-4-ext2       -26.95     to 5.09
DBench    clients-8-ext2       -5.75      to 5.50

Same story, hackbench-pipes and dbench suffer badly on some machines.
It's a similar story for order-1. With machine omitted it's

Kernbench Elapsed    time      -0.14      to 0.24
Kernbench Total  CPU           -0.13      to 0.11
Hackbench pipes-1              -11.90     to 5.39
Hackbench pipes-4              -7.01      to 2.06
Hackbench pipes-8              -5.49      to 1.66
Hackbench pipes-16             -6.08      to 2.72
Hackbench sockets-1            0.28       to 6.99
Hackbench sockets-4            0.63       to 5.50
Hackbench sockets-8            -10.95     to 7.70
Hackbench sockets-16           0.64       to 12.16
TBench    clients-1            -3.94      to 1.05
TBench    clients-2            -11.96     to 3.25
TBench    clients-4            -12.48     to -1.12
TBench    clients-8            -11.82     to -8.56
DBench    clients-1-ext2       -12.20     to 2.27
DBench    clients-2-ext2       -4.23      to 0.57
DBench    clients-4-ext2       -2.31      to 3.96
DBench    clients-8-ext2       -3.65      to 6.09

Included, it's

Kernbench Elapsed    time      -0.14      to 7.53
Kernbench Total  CPU           -0.14      to 0.53
Hackbench pipes-1              -18.85     to 5.39
Hackbench pipes-4              -18.93     to 2.06
Hackbench pipes-8              -14.24     to 1.66
Hackbench pipes-16             -12.82     to 2.72
Hackbench sockets-1            -4.89      to 6.99
Hackbench sockets-4            0.63       to 5.51
Hackbench sockets-8            -10.95     to 8.75
Hackbench sockets-16           -0.43      to 12.16
TBench    clients-1            -4.72      to 1.05
TBench    clients-2            -12.81     to 3.25
TBench    clients-4            -19.15     to -1.12
TBench    clients-8            -21.81     to -8.56
DBench    clients-1-ext2       -12.20     to 2.27
DBench    clients-2-ext2       -4.23      to 6.93
DBench    clients-4-ext2       -7.13      to 3.96
DBench    clients-8-ext2       -3.65      to 6.09

Based on this set of tests, it's clear that raising the order can be a big
win but setting it as default is less clear-cut.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-06 22:04         ` Mel Gorman
@ 2008-03-06 22:18           ` Christoph Lameter
  2008-03-07 12:17             ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-03-06 22:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On Thu, 6 Mar 2008, Mel Gorman wrote:

> For huge page allocation success rates, the high order never helper the
> situation but it was nowhere near as severe as it was for the slub-defrag
> patches (ironically enough). Only one machine showed significantly worse

Well the slub-defrag tree is not really in shape for testing at this 
point and I was working on it the last week. So not sure what tree was 
picked up and thus not sure what to deduce from it. It may be too 
aggressive in defragmentation attempts.

> results. The rest were comparable for this set of tests at least but I would
> still be wary of the long-lived behaviour of high-order slab allocations
> slowly fragmenting memory due to pageblock fallbacks. Will think of how to
> prove that in some way but just re-running the tests multiple times
> without reboot may be enough.

Well maybe we could tune the page allocator a bit? There is the order 0 
issue. We could also make all slab allocations use the same slab order in 
order to reduce fragmentation problems.
 
> Setting the order to 3 had vaguely similar results. The two outlier
> machines had even worse negatives than order-4. With those machines
> omitted the results were

Wonder what made them go worse.

> Same story, hackbench-pipes and dbench suffer badly on some machines.
> It's a similar story for order-1. With machine omitted it's
> 
> Kernbench Elapsed    time      -0.14      to 0.24
> Kernbench Total  CPU           -0.13      to 0.11
> Hackbench pipes-1              -11.90     to 5.39
> Hackbench pipes-4              -7.01      to 2.06
> Hackbench pipes-8              -5.49      to 1.66
> Hackbench pipes-16             -6.08      to 2.72
> Hackbench sockets-1            0.28       to 6.99
> Hackbench sockets-4            0.63       to 5.50
> Hackbench sockets-8            -10.95     to 7.70
> Hackbench sockets-16           0.64       to 12.16
> TBench    clients-1            -3.94      to 1.05
> TBench    clients-2            -11.96     to 3.25
> TBench    clients-4            -12.48     to -1.12
> TBench    clients-8            -11.82     to -8.56
> DBench    clients-1-ext2       -12.20     to 2.27
> DBench    clients-2-ext2       -4.23      to 0.57
> DBench    clients-4-ext2       -2.31      to 3.96
> DBench    clients-8-ext2       -3.65      to 6.09

Well in that case there is something going on very strange performance
wise. The results should be equal to upstream since the same orders 
are used. The only change in the hotpaths is another lookup which cannot 
really account for the variances we see here. An 12% improvement because 
logic was added to the hotpath? There should be a significant regression 
tbench (2%-4%) because the 4k slab cache must cause trouble.


> Based on this set of tests, it's clear that raising the order can be a big
> win but setting it as default is less clear-cut.

There is something wrong here and we need to figure out what it is. The 
order-1 test should fairly accurately reproduce upstream performance 
characteristics.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-06 22:18           ` Christoph Lameter
@ 2008-03-07 12:17             ` Mel Gorman
  2008-03-07 19:50               ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2008-03-07 12:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On (06/03/08 14:18), Christoph Lameter didst pronounce:
> On Thu, 6 Mar 2008, Mel Gorman wrote:
> 
> > For huge page allocation success rates, the high order never helper the
> > situation but it was nowhere near as severe as it was for the slub-defrag
> > patches (ironically enough). Only one machine showed significantly worse
> 
> Well the slub-defrag tree is not really in shape for testing at this 
> point and I was working on it the last week. So not sure what tree was 
> picked up and thus not sure what to deduce from it. It may be too 
> aggressive in defragmentation attempts.
> 

That sounds fair, I didn't make any attempt to figure out what was going
on. But minimally, what I tested didn't blow up so that in itself is a
plus. We'll pick it up again later.

> > results. The rest were comparable for this set of tests at least but I would
> > still be wary of the long-lived behaviour of high-order slab allocations
> > slowly fragmenting memory due to pageblock fallbacks. Will think of how to
> > prove that in some way but just re-running the tests multiple times
> > without reboot may be enough.
> 
> Well maybe we could tune the page allocator a bit? There is the order 0 
> issue. We could also make all slab allocations use the same slab order in 
> order to reduce fragmentation problems.
>  

I don't think it would reduce them unless everyone was always using the
same order. Once slub is using a higher order than everywhere else, it
is possible it will use an alternative pageblock type just for the high
order.

The only tuning of the page allocator I can think of is to teach
rmqueue_bulk() to use the fewer high-order allocations to batch refill
the pcp queues. It's not very straight-forward though as when I tried
this a bit over a year ago, it cause fragmentation problems of its own.
I'll see about trying again.

> > Setting the order to 3 had vaguely similar results. The two outlier
> > machines had even worse negatives than order-4. With those machines
> > omitted the results were
> 
> Wonder what made them go worse.
> 

No idea.

> > Same story, hackbench-pipes and dbench suffer badly on some machines.
> > It's a similar story for order-1. With machine omitted it's
> > 
> > Kernbench Elapsed    time      -0.14      to 0.24
> > Kernbench Total  CPU           -0.13      to 0.11
> > Hackbench pipes-1              -11.90     to 5.39
> > Hackbench pipes-4              -7.01      to 2.06
> > Hackbench pipes-8              -5.49      to 1.66
> > Hackbench pipes-16             -6.08      to 2.72
> > Hackbench sockets-1            0.28       to 6.99
> > Hackbench sockets-4            0.63       to 5.50
> > Hackbench sockets-8            -10.95     to 7.70
> > Hackbench sockets-16           0.64       to 12.16
> > TBench    clients-1            -3.94      to 1.05
> > TBench    clients-2            -11.96     to 3.25
> > TBench    clients-4            -12.48     to -1.12
> > TBench    clients-8            -11.82     to -8.56
> > DBench    clients-1-ext2       -12.20     to 2.27
> > DBench    clients-2-ext2       -4.23      to 0.57
> > DBench    clients-4-ext2       -2.31      to 3.96
> > DBench    clients-8-ext2       -3.65      to 6.09
> 
> Well in that case there is something going on very strange performance
> wise. The results should be equal to upstream since the same orders 
> are used.

Really, order-1 is used by default by SLUB upstream? I missed that and
it doesn't appear to be the case on 2.6.25-rc2-mm1 at least according to
slabinfo. If it was the difference between order-0 and order-1, it may be
explained by the pcp allocator being bypassed.

> The only change in the hotpaths is another lookup which cannot 
> really account for the variances we see here. An 12% improvement because 
> logic was added to the hotpath?

Presuming you are referring to hackbench sockets-16, it could be because
the same objects were being reused again and the cache-hotness offset
the additional logic? Dunno, it's all handwaving. Unfortunately I don't
have what is needed in place to gather profiles automatically. It's on
the ever larger todo list :(

> There should be a significant regression 
> tbench (2%-4%) because the 4k slab cache must cause trouble.
> 
> > Based on this set of tests, it's clear that raising the order can be a big
> > win but setting it as default is less clear-cut.
> 
> There is something wrong here and we need to figure out what it is. The 
> order-1 test should fairly accurately reproduce upstream performance 
> characteristics.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-07 12:17             ` Mel Gorman
@ 2008-03-07 19:50               ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2008-03-07 19:50 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Matt Mackall, linux-mm

On Fri, 7 Mar 2008, Mel Gorman wrote:

> I don't think it would reduce them unless everyone was always using the
> same order. Once slub is using a higher order than everywhere else, it
> is possible it will use an alternative pageblock type just for the high
> order.

Hmmm... Maybe just order 0 and huge page order?

> The only tuning of the page allocator I can think of is to teach
> rmqueue_bulk() to use the fewer high-order allocations to batch refill
> the pcp queues. It's not very straight-forward though as when I tried
> this a bit over a year ago, it cause fragmentation problems of its own.
> I'll see about trying again.

The simplest solution would be to remove the pcps and put something else 
around the slow paths that does not check the limits etc.

> > Well in that case there is something going on very strange performance
> > wise. The results should be equal to upstream since the same orders 
> > are used.
> 
> Really, order-1 is used by default by SLUB upstream? I missed that and
> it doesn't appear to be the case on 2.6.25-rc2-mm1 at least according to
> slabinfo. If it was the difference between order-0 and order-1, it may be
> explained by the pcp allocator being bypassed.

Order 1 is the maximum that slub can use. We are not talking defaults 
here but what orders slub is allowed to use. The overwhelming majority of 
slab caches use order 0.

Even if you specify slub_max_order=4 there will still be lots of slab 
caches that use order 0 alloc. The higher orders are only used if the 
small order cannot fit more than slub_min_objects into one slab.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-04 12:20 ` [patch 0/8] slub: Fallback to order 0 and variable order slab support Mel Gorman
  2008-03-04 18:53   ` Christoph Lameter
@ 2008-03-04 19:01   ` Matt Mackall
  2008-03-05  0:04     ` Christoph Lameter
  1 sibling, 1 reply; 26+ messages in thread
From: Matt Mackall @ 2008-03-04 19:01 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Christoph Lameter, Pekka Enberg, linux-mm

On Tue, Mar 04, 2008 at 12:20:08PM +0000, Mel Gorman wrote:
> On (28/02/08 20:48), Christoph Lameter didst pronounce:
> > This is the patchset that was posted two weeks ago modified according
> > to the feedback that Pekka gave. I would like to put these patches
> > into mm.
> > 
> 
> I haven't reviewed the patches properly but I put them through a quick test
> against 2.6.25-rc3 to see what the performnace was like and the superpage
> allocation success rates were like. Performance wise, it looked like
> 
> 				Loss	to	Gain
> Kernbench Elapsed time		 -0.64%		0.32%
> Kernbench Total time		 -0.61%		0.48%
> Hackbench sockets-12 clients	 -2.95%		5.13%
> Hackbench pipes-12 clients	-16.95%		9.27%
> TBench 4 clients		 -1.98%		8.2%
> DBench 4 clients (ext2)		 -5.9%		7.99%
> 
> So, running with the high orders is not a clear-cut win to my eyes. What
> did you test to show that it was a general win justifying a high-order by
> default? From looking through, tbench seems to be the only obvious one to
> gain but the rest, it is not clear at all. I'll try give sysbench a spin
> later to see if it is clear-cut.
> 
> However, in *all* cases, superpage allocations were less successful and in
> some cases it was severely regressed (one machine went from 81% success rate
> to 36%). Sufficient statistics are not gathered to see why this happened
> in retrospect but my suspicion would be that high-order RECLAIMABLE and
> UNMOVABLE slub allocations routinely fall back to the less fragmented
> MOVABLE pageblocks with these patches - something that is normally a very
> rare event. This change in assumption hurts fragmentation avoidance and
> chances are the long-term behaviour of these patches is not great.
> 
> If this guess is correct, using a high-order size by default is a bad plan
> and it should only be set when it is known that the target workload benefits
> and superpage allocations are not a concern. Alternative, set high-order by
> default only for a limited number of caches that are RECLAIMABLE (or better
> yet ones we know can be directly reclaimed with the slub-defrag patches).
> 
> As it is, this is painful from a fragmentation perspective and the
> performance win is not clear-cut.

Thanks for looking at this, Mel. Could you try testing.. umm...
slub_max_order=1? That's never going to get us more than one more
object per slab, but if we can go from 1 per page to 1.5 per page, it
might be worth it. Task structs are roughly in that size domain.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 0/8] slub: Fallback to order 0 and variable order slab support
  2008-03-04 19:01   ` Matt Mackall
@ 2008-03-05  0:04     ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2008-03-05  0:04 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Mel Gorman, Pekka Enberg, linux-mm

On Tue, 4 Mar 2008, Matt Mackall wrote:

> Thanks for looking at this, Mel. Could you try testing.. umm...
> slub_max_order=1? That's never going to get us more than one more
> object per slab, but if we can go from 1 per page to 1.5 per page, it
> might be worth it. Task structs are roughly in that size domain.

Note that you would also have decrease the number of objects per slab.

good combinations:

slub_max_order=3 slub_min_objects=8

(Was the config used for mm with the earlier version of higher order alloc w/o fallback)


slub_max_order=1 slub_min_objects=4

(upstream config w/o fallback)


The default in mm is right now

slub_max_order=4 slub_min_objects=60

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-03-07 19:50 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080229044803.482012397@sgi.com>
     [not found] ` <20080229044820.044485187@sgi.com>
2008-02-29  8:13   ` [patch 7/8] slub: Make the order configurable for each slab cache Pekka Enberg
2008-02-29 19:37     ` Christoph Lameter
2008-03-01  9:47       ` Pekka Enberg
2008-03-03 17:49         ` Christoph Lameter
2008-03-03 22:56           ` Pekka Enberg
2008-03-03 23:36             ` Christoph Lameter
     [not found] ` <20080229044820.298792748@sgi.com>
2008-02-29  8:13   ` [patch 8/8] slub: Simplify any_slab_object checks Pekka Enberg
     [not found] ` <20080229044819.800974712@sgi.com>
2008-02-29  8:19   ` [patch 6/8] slub: Adjust order boundaries and minimum objects per slab Pekka Enberg
2008-02-29 19:41     ` Christoph Lameter
2008-03-01  9:58       ` Pekka J Enberg
2008-03-03 17:52         ` Christoph Lameter
2008-03-03 21:34           ` Matt Mackall
2008-03-03 22:36             ` Christoph Lameter
     [not found] ` <20080229044818.999367120@sgi.com>
2008-02-29  8:59   ` [patch 3/8] slub: Update statistics handling for variable order slabs Pekka Enberg
2008-02-29 19:43     ` Christoph Lameter
2008-03-01 10:29   ` Pekka Enberg
2008-03-04 12:20 ` [patch 0/8] slub: Fallback to order 0 and variable order slab support Mel Gorman
2008-03-04 18:53   ` Christoph Lameter
2008-03-05 18:28     ` Mel Gorman
2008-03-05 18:52       ` Christoph Lameter
2008-03-06 22:04         ` Mel Gorman
2008-03-06 22:18           ` Christoph Lameter
2008-03-07 12:17             ` Mel Gorman
2008-03-07 19:50               ` Christoph Lameter
2008-03-04 19:01   ` Matt Mackall
2008-03-05  0:04     ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).