Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
       [not found] ` <20080214040313.616551392@sgi.com>
@ 2008-02-14  7:04   ` Pekka Enberg
  2008-02-14  8:56   ` Pekka Enberg
  2008-02-14 14:06   ` Mel Gorman
  2 siblings, 0 replies; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14  7:04 UTC (permalink / raw)
  To: clameter; +Cc: linux-mm@kvack.org, npiggin

Hi Christoph,

On 2/14/2008, "Christoph Lameter" <clameter@sgi.com> wrote:
> We can use that handoff to avoid failing if a higher order kmalloc slab
> allocation cannot be satisfied by the page allocator. If we reach the
> out of memory path then simply try a kmalloc_large(). kfree() can
> already handle the case of an object that was allocated via the page
> allocator and so this will work just fine (apart from object
> accounting...).

Sorry, I didn't follow the discussion close enough. Why are we doing
this? Is it fixing some real bug I am not aware of?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
       [not found] ` <20080214040314.118141086@sgi.com>
@ 2008-02-14  7:07   ` Pekka Enberg
  2008-02-14 19:04     ` Christoph Lameter
  2008-02-14  8:57   ` Pekka Enberg
  2008-02-14 14:14   ` Mel Gorman
  2 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14  7:07 UTC (permalink / raw)
  To: clameter; +Cc: linux-mm@kvack.org

Hi Christoph,

On 2/14/2008, "Christoph Lameter" <clameter@sgi.com> wrote:
> This is the same trick as done by the hugetlb support in the kernel.
> If we allocate a huge page use __GFP_MOVABLE because an allocation
> of a HUGE_PAGE size is the large allocation unit that cannot cause
> fragmentation.
> 
> This will make a system that was booted with
> 
> 	slub_min_order = 9
> 
> not have any reclaimable slab allocations anymore. All slab allocations
> will be of type MOVABLE (although they are not movable like huge pages
> are also not movable). This means that we only have MOVABLE and 
> UNMOVABLE sections of memory which reduces the types of sections 
> and therefore the danger of fragmenting memory.

Why does slub_min_order=9 matter? I suppose this is fixing some other
real bug?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 5/5] slub: Large allocs for other slab sizes that do not fit in order 0
       [not found] ` <20080214040314.388752493@sgi.com>
@ 2008-02-14  7:14   ` Pekka Enberg
  2008-02-14 19:06     ` Christoph Lameter
  2008-02-14  8:55   ` Pekka Enberg
  1 sibling, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14  7:14 UTC (permalink / raw)
  To: clameter; +Cc: linux-mm@kvack.org

Hi,

On 2/14/2008, "Christoph Lameter" <clameter@sgi.com> wrote:
> Expand the scheme used for kmalloc-2048 and kmalloc-4096 to all slab
> caches. That means that kmem_cache_free() must now be able to 
> handle a fallback object that was allocated from the page allocator. This is
> touching the fastpath costing us 1/2 % of performance (pretty small
> so within variance). Kind of hacky though.

Looks good but are there any numbers that indicate this is an overall win?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 1/5] slub: Determine gfpflags once and not every time a slab is allocated
       [not found] ` <20080214040313.318658830@sgi.com>
@ 2008-02-14  7:23   ` Pekka Enberg
  2008-02-14 13:55   ` Mel Gorman
  1 sibling, 0 replies; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14  7:23 UTC (permalink / raw)
  To: clameter; +Cc: linux-mm@kvack.org

On 2/14/2008, "Christoph Lameter" <clameter@sgi.com> wrote:
> Currently we determine the gfp flags to pass to the page allocator
> each time a slab is being allocated.
> 
> Determine the bits to be set at the time the slab is created. Store
> in a new allocflags field and add the flags in allocate_slab().

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 5/5] slub: Large allocs for other slab sizes that do not fit in order 0
       [not found] ` <20080214040314.388752493@sgi.com>
  2008-02-14  7:14   ` [patch 5/5] slub: Large allocs for other slab sizes that do not fit in order 0 Pekka Enberg
@ 2008-02-14  8:55   ` Pekka Enberg
  1 sibling, 0 replies; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14  8:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

[Sorry for the duplicate. My email client started trimming cc's...]

On Thu, Feb 14, 2008 at 6:02 AM, Christoph Lameter <clameter@sgi.com> wrote:
> Expand the scheme used for kmalloc-2048 and kmalloc-4096 to all slab
>  caches. That means that kmem_cache_free() must now be able to handle
>  a fallback object that was allocated from the page allocator. This is
>  touching the fastpath costing us 1/2 % of performance (pretty small
>  so within variance). Kind of hacky though.

Looks good but are there any numbers that indicate this is an overall win?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
       [not found] ` <20080214040313.616551392@sgi.com>
  2008-02-14  7:04   ` [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs Pekka Enberg
@ 2008-02-14  8:56   ` Pekka Enberg
  2008-02-14 19:07     ` Christoph Lameter
  2008-02-14 14:06   ` Mel Gorman
  2 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14  8:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

[Sorry for the duplicate. My email client started trimming cc's...]

On Thu, Feb 14, 2008 at 6:02 AM, Christoph Lameter <clameter@sgi.com> wrote:
> Slub already has two ways of allocating an object. One is via its own
>  logic and the other is via the call to kmalloc_large to hand of object
>  allocation to the page allocator. kmalloc_large is typically used
>  for objects >= PAGE_SIZE.
>
>  We can use that handoff to avoid failing if a higher order kmalloc slab
>  allocation cannot be satisfied by the page allocator. If we reach the
>  out of memory path then simply try a kmalloc_large(). kfree() can
>  already handle the case of an object that was allocated via the page
>  allocator and so this will work just fine (apart from object
>  accounting...).

Sorry, I didn't follow the discussion close enough. Why are we doing
this? Is it fixing some real bug I am not aware of?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
       [not found] ` <20080214040314.118141086@sgi.com>
  2008-02-14  7:07   ` [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE Pekka Enberg
@ 2008-02-14  8:57   ` Pekka Enberg
  2008-02-14 19:07     ` Christoph Lameter
  2008-02-14 14:14   ` Mel Gorman
  2 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14  8:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

[Sorry for the duplicate. My email client started trimming cc's...]

On Thu, Feb 14, 2008 at 6:02 AM, Christoph Lameter <clameter@sgi.com> wrote:
> This is the same trick as done by the hugetlb support in the kernel.
>  If we allocate a huge page use __GFP_MOVABLE because an allocation
>  of a HUGE_PAGE size is the large allocation unit that cannot cause
>  fragmentation.
>
>  This will make a system that was booted with
>
>         slub_min_order = 9
>
>  not have any reclaimable slab allocations anymore. All slab allocations
>  will be of type MOVABLE (although they are not movable like huge pages
>  are also not movable). This means that we only have MOVABLE and UNMOVABLE
>  sections of memory which reduces the types of sections and therefore the
>  danger of fragmenting memory.

Why does slub_min_order=9 matter? I suppose this is fixing some other
real bug?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 1/5] slub: Determine gfpflags once and not every time a slab is allocated
       [not found] ` <20080214040313.318658830@sgi.com>
  2008-02-14  7:23   ` [patch 1/5] slub: Determine gfpflags once and not every time a slab is allocated Pekka Enberg
@ 2008-02-14 13:55   ` Mel Gorman
  1 sibling, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2008-02-14 13:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On (13/02/08 20:02), Christoph Lameter didst pronounce:
> Currently we determine the gfp flags to pass to the page allocator
> each time a slab is being allocated.
> 
> Determine the bits to be set at the time the slab is created. Store
> in a new allocflags field and add the flags in allocate_slab().
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> ---
>  include/linux/slub_def.h |    1 +
>  mm/slub.c                |   19 +++++++++++--------
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> Index: linux-2.6/include/linux/slub_def.h
> ===================================================================
> --- linux-2.6.orig/include/linux/slub_def.h	2008-02-13 17:13:49.378744786 -0800
> +++ linux-2.6/include/linux/slub_def.h	2008-02-13 18:50:42.235907853 -0800
> @@ -71,6 +71,7 @@ struct kmem_cache {
>  
>  	/* Allocation and freeing of slabs */
>  	int objects;		/* Number of objects in slab */
> +	gfp_t allocflags;	/* gfp flags to use on each alloc */
>  	int refcount;		/* Refcount for slab cache destroy */
>  	void (*ctor)(struct kmem_cache *, void *);
>  	int inuse;		/* Offset to metadata */
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2008-02-13 17:13:49.386744784 -0800
> +++ linux-2.6/mm/slub.c	2008-02-13 18:53:49.612240235 -0800
> @@ -1078,14 +1078,7 @@ static struct page *allocate_slab(struct
>  	struct page *page;
>  	int pages = 1 << s->order;
>  
> -	if (s->order)
> -		flags |= __GFP_COMP;
> -
> -	if (s->flags & SLAB_CACHE_DMA)
> -		flags |= SLUB_DMA;
> -
> -	if (s->flags & SLAB_RECLAIM_ACCOUNT)
> -		flags |= __GFP_RECLAIMABLE;
> +	flags |= s->allocflags;
>  
>  	if (node == -1)
>  		page = alloc_pages(flags, s->order);
> @@ -2333,6 +2326,16 @@ static int calculate_sizes(struct kmem_c
>  	if (s->order < 0)
>  		return 0;
>  
> +	s->allocflags = 0;
> +	if (s->order)
> +		s->allocflags |= __GFP_COMP;
> +
> +	if (s->flags & SLAB_CACHE_DMA)
> +		s->allocflags |= SLUB_DMA;
> +
> +	if (s->flags & SLAB_RECLAIM_ACCOUNT)
> +		s->allocflags |= __GFP_RECLAIMABLE;
> +
>  	/*
>  	 * Determine the number of objects per slab
>  	 */
> 

Seems straight-forward.

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
       [not found] ` <20080214040313.616551392@sgi.com>
  2008-02-14  7:04   ` [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs Pekka Enberg
  2008-02-14  8:56   ` Pekka Enberg
@ 2008-02-14 14:06   ` Mel Gorman
  2008-02-14 19:10     ` Christoph Lameter
  2 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2008-02-14 14:06 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On (13/02/08 20:02), Christoph Lameter didst pronounce:
> Slub already has two ways of allocating an object. One is via its own
> logic and the other is via the call to kmalloc_large to hand of object
> allocation to the page allocator. kmalloc_large is typically used
> for objects >= PAGE_SIZE.
> 
> We can use that handoff to avoid failing if a higher order kmalloc slab
> allocation cannot be satisfied by the page allocator.  If we reach the
> out of memory path then simply try a kmalloc_large(). kfree() can
> already handle the case of an object that was allocated via the page
> allocator and so this will work just fine (apart from object
> accounting...).
> 

This patch is depending on another patchset I haven't read so take any
comments with a grain of salt. But, if a kmalloc slab allocation fails and
it ultimately uses the page allocator, I do not see how calling the page
allocator directly makes a difference.

> For any kmalloc slab that already requires higher order allocs (which
> makes it impossible to use the page allocator fastpath!)
> we just use PAGE_ALLOC_COSTLY_ORDER to get the largest number of
> objects in one go from the page allocator slowpath.
> 
> On a 4k platform this patch will lead to the following use of higher
> order pages for the following kmalloc slabs:
> 
> 8 ... 1024	order 0
> 2048 .. 4096	order 3 (4k slab only after the next patch)
> 
> We may waste some space if fallback occurs on a 2k slab but we
> are always able to fallback to an order 0 alloc. I hope that
> satisfies Nick's concerns?
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> ---
>  mm/slub.c |   43 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 38 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2008-02-13 18:54:58.360385977 -0800
> +++ linux-2.6/mm/slub.c	2008-02-13 19:28:59.906913253 -0800
> @@ -211,6 +211,8 @@ static inline void ClearSlabDebug(struct
>  /* Internal SLUB flags */
>  #define __OBJECT_POISON		0x80000000 /* Poison object */
>  #define __SYSFS_ADD_DEFERRED	0x40000000 /* Not yet visible via sysfs */
> +#define __KMALLOC_CACHE		0x20000000 /* objects freed using kfree */
> +#define __PAGE_ALLOC_FALLBACK	0x10000000 /* Allow fallback to page alloc */
>  
>  /* Not all arches define cache_line_size */
>  #ifndef cache_line_size
> @@ -1539,7 +1541,6 @@ load_freelist:
>  unlock_out:
>  	slab_unlock(c->page);
>  	stat(c, ALLOC_SLOWPATH);
> -out:
>  #ifdef SLUB_FASTPATH
>  	local_irq_restore(flags);
>  #endif
> @@ -1574,8 +1575,24 @@ new_slab:
>  		c->page = new;
>  		goto load_freelist;
>  	}
> -	object = NULL;
> -	goto out;
> +#ifdef SLUB_FASTPATH
> +	local_irq_restore(flags);
> +#endif
> +	/*
> +	 * No memory available.
> +	 *
> +	 * If the slab uses higher order allocs but the object is
> +	 * smaller than a page size then we can fallback in emergencies
> +	 * to the page allocator via kmalloc_large. The page allocator may
> +	 * have failed to obtain a higher order page and we can try to
> +	 * allocate a single page if the object fits into a single page.
> +	 * That is only possible if certain conditions are met that are being
> +	 * checked when a slab is created.
> +	 */
> +	if (!(gfpflags & __GFP_THISNODE) && (s->flags & __PAGE_ALLOC_FALLBACK))
> +		return kmalloc_large(s->objsize, gfpflags);
> +
> +	return NULL;
>  debug:
>  	object = c->page->freelist;
>  	if (!alloc_debug_processing(s, c->page, object, addr))
> @@ -2322,7 +2339,20 @@ static int calculate_sizes(struct kmem_c
>  	size = ALIGN(size, align);
>  	s->size = size;
>  
> -	s->order = calculate_order(size);
> +	if ((flags & __KMALLOC_CACHE) &&
> +			PAGE_SIZE / size < slub_min_objects) {
> +		/*
> +		 * Kmalloc cache that would not have enough objects in
> +		 * an order 0 page. Kmalloc slabs can fallback to
> +		 * page allocator order 0 allocs so take a reasonably large
> +		 * order that will allows us a good number of objects.
> +		 */
> +		s->order = max(slub_max_order, PAGE_ALLOC_COSTLY_ORDER);
> +		s->flags |= __PAGE_ALLOC_FALLBACK;
> +		s->allocflags |= __GFP_NOWARN;

Here, it would make more sense to call buffered_rmqueue() for the number
of pages you want. That function does not know how to properly batch
allocations yet and work is needed to make it batch properly without
impacting anti-fragmentation. However, fixing it there means that the
PCP-refill would benefit as well as SLUB.

> +	} else
> +		s->order = calculate_order(size);
> +
>  	if (s->order < 0)
>  		return 0;
>  
> @@ -2539,7 +2569,7 @@ static struct kmem_cache *create_kmalloc
>  
>  	down_write(&slub_lock);
>  	if (!kmem_cache_open(s, gfp_flags, name, size, ARCH_KMALLOC_MINALIGN,
> -			flags, NULL))
> +			flags | __KMALLOC_CACHE, NULL))
>  		goto panic;
>  
>  	list_add(&s->list, &slab_caches);
> @@ -3058,6 +3088,9 @@ static int slab_unmergeable(struct kmem_
>  	if (slub_nomerge || (s->flags & SLUB_NEVER_MERGE))
>  		return 1;
>  
> +	if ((s->flags & __PAGE_ALLOC_FALLBACK)
> +		return 1;
> +
>  	if (s->ctor)
>  		return 1;
>  
> 
> -- 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
       [not found] ` <20080214040314.118141086@sgi.com>
  2008-02-14  7:07   ` [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE Pekka Enberg
  2008-02-14  8:57   ` Pekka Enberg
@ 2008-02-14 14:14   ` Mel Gorman
  2008-02-14 19:18     ` Christoph Lameter
  2 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2008-02-14 14:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On (13/02/08 20:02), Christoph Lameter didst pronounce:
> This is the same trick as done by the hugetlb support in the kernel.
> If we allocate a huge page use __GFP_MOVABLE because an allocation
> of a HUGE_PAGE size is the large allocation unit that cannot cause
> fragmentation.
> 
> This will make a system that was booted with
> 
> 	slub_min_order = 9
> 
> not have any reclaimable slab allocations anymore. All slab allocations
> will be of type MOVABLE (although they are not movable like huge pages
> are also not movable). This means that we only have MOVABLE and UNMOVABLE
> sections of memory which reduces the types of sections and therefore the
> danger of fragmenting memory.

hmmm.

The only reason to have an allocation like this set as MOVABLE is so it can
make use of the partition created by movablecore= which has a few specific
purposes. One of them is that on a shared system, a partition can be created
that is of the same size as the largest hugepage pool required for any job. As
jobs run, they can grow or shrink the pool as desired.  When the jobs complete,
the hugepages are no longer in use and the partition becomes essentially free.

SLAB pages do not have the same property. Even with all processes exited,
there will be slab allocations lying around, probably in this partition
preventing the hugepage pool being resized (or memory hot-remove for that
matter which can work on a section-boundary on POWER).

If the administrator has created a partition for memory hot-remove or
for having a known quantity when resizing the hugepage pool, it is
unlikely they want SLAB pages to be allocated from the same place
putting a spanner in the works. Without the partition and
slub_min_order==hugepage_size, this patch does nothing so;

NACK.

> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> ---
>  mm/slub.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2008-02-13 18:57:16.036710088 -0800
> +++ linux-2.6/mm/slub.c	2008-02-13 18:59:08.561004851 -0800
> @@ -2363,6 +2363,10 @@ static int calculate_sizes(struct kmem_c
>  	if (s->flags & SLAB_CACHE_DMA)
>  		s->allocflags |= SLUB_DMA;
>  
> +	if (s->order && s->order == get_order(HPAGE_SIZE))
> +		/* Huge pages are always allocated as movable */
> +		s->allocflags |= __GFP_MOVABLE;
> +	else
>  	if (s->flags & SLAB_RECLAIM_ACCOUNT)
>  		s->allocflags |= __GFP_RECLAIMABLE;
>  
> 
> -- 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
  2008-02-14  7:07   ` [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE Pekka Enberg
@ 2008-02-14 19:04     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:04 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-mm@kvack.org

On Thu, 14 Feb 2008, Pekka Enberg wrote:

> > This will make a system that was booted with
> > 
> > 	slub_min_order = 9
> > 
> > not have any reclaimable slab allocations anymore. All slab allocations
> > will be of type MOVABLE (although they are not movable like huge pages
> > are also not movable). This means that we only have MOVABLE and 
> > UNMOVABLE sections of memory which reduces the types of sections 
> > and therefore the danger of fragmenting memory.
> 
> Why does slub_min_order=9 matter? I suppose this is fixing some other
> real bug?

Because some people run slub with huge page allocations. It makes a lot of 
sense on systems that have more than 4 - 8G of RAM per cpu. The 2M pages 
for 100 slab caches (usually we have only 70) take 200M which is just a 
small fraction of the memory for one processor.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 5/5] slub: Large allocs for other slab sizes that do not fit in order 0
  2008-02-14  7:14   ` [patch 5/5] slub: Large allocs for other slab sizes that do not fit in order 0 Pekka Enberg
@ 2008-02-14 19:06     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:06 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-mm@kvack.org

On Thu, 14 Feb 2008, Pekka Enberg wrote:

> On 2/14/2008, "Christoph Lameter" <clameter@sgi.com> wrote:
> > Expand the scheme used for kmalloc-2048 and kmalloc-4096 to all slab
> > caches. That means that kmem_cache_free() must now be able to 
> > handle a fallback object that was allocated from the page allocator. This is
> > touching the fastpath costing us 1/2 % of performance (pretty small
> > so within variance). Kind of hacky though.
> 
> Looks good but are there any numbers that indicate this is an overall win?

I ran tbench tests that shows the performance to be on par as before. Nick 
was concerned about not being able to fallback to order 0 allocs and this 
patch does allow that for most slabs that currently use order 1 allocs. 
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14  8:56   ` Pekka Enberg
@ 2008-02-14 19:07     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:07 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Pekka Enberg wrote:

> >  We can use that handoff to avoid failing if a higher order kmalloc slab
> >  allocation cannot be satisfied by the page allocator. If we reach the
> >  out of memory path then simply try a kmalloc_large(). kfree() can
> >  already handle the case of an object that was allocated via the page
> >  allocator and so this will work just fine (apart from object
> >  accounting...).
> 
> Sorry, I didn't follow the discussion close enough. Why are we doing
> this? Is it fixing some real bug I am not aware of?

It addresses Nick's concern about higher order allocations. It allows 
fallback to an order 0 alloc should memory become so fragmented that order 
3 allocs can no longer be satisfied.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
  2008-02-14  8:57   ` Pekka Enberg
@ 2008-02-14 19:07     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:07 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Pekka Enberg wrote:

> Why does slub_min_order=9 matter? I suppose this is fixing some other
> real bug?

No its just making the behavior of slub running with huge pages better.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 14:06   ` Mel Gorman
@ 2008-02-14 19:10     ` Christoph Lameter
  2008-02-14 19:23       ` Pekka Enberg
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:10 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Mel Gorman wrote:

> comments with a grain of salt. But, if a kmalloc slab allocation fails and
> it ultimately uses the page allocator, I do not see how calling the page
> allocator directly makes a difference.

The kmalloc slab allocation will use order 3. The allocation for an 
individual object via the page allocator only uses order 0. The order 0 
alloc will succeed even if memory is extremely fragmented. Its a safety 
valve that Nick probably finds important.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
  2008-02-14 14:14   ` Mel Gorman
@ 2008-02-14 19:18     ` Christoph Lameter
  2008-02-14 20:08       ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Mel Gorman wrote:

> The only reason to have an allocation like this set as MOVABLE is so it can
> make use of the partition created by movablecore= which has a few specific
> purposes. One of them is that on a shared system, a partition can be created
> that is of the same size as the largest hugepage pool required for any job. As
> jobs run, they can grow or shrink the pool as desired.  When the jobs complete,
> the hugepages are no longer in use and the partition becomes essentially free.

Doesnt it mean that the allocations can occur in MAX_ORDER blocks 
marked MOVABLE? I thought movablecore= is no longer necessary after the 
rest of the antifrag stuff was merged?

> SLAB pages do not have the same property. Even with all processes exited,
> there will be slab allocations lying around, probably in this partition
> preventing the hugepage pool being resized (or memory hot-remove for that
> matter which can work on a section-boundary on POWER).

echo 2 >/proc/sys/vm/drop_cache will usually allow a significant shrinkage
of the slab caches. In many ways it is the same.

> If the administrator has created a partition for memory hot-remove or
> for having a known quantity when resizing the hugepage pool, it is
> unlikely they want SLAB pages to be allocated from the same place
> putting a spanner in the works. Without the partition and
> slub_min_order==hugepage_size, this patch does nothing so;
> 
> NACK.

This is a feature enabled by a special command line boot option. So its 
something that the admin did *intentionally*. Slab allocation will *not* 
take away from the huge page pool but will take pages from the page 
allocator.

A system with huge amounts of memory has a large amount of huge 
pages. It is typically at this point to have 4G per cpu in a system and we 
may go higher. 4G means up to 2048 huge pages per cpu! Huge page 
allocation will be quite common and its good to reduce page allocator 
overhead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 19:10     ` Christoph Lameter
@ 2008-02-14 19:23       ` Pekka Enberg
  2008-02-14 19:32         ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14 19:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

Christoph Lameter wrote:
> The kmalloc slab allocation will use order 3. The allocation for an 
> individual object via the page allocator only uses order 0. The order 0 
> alloc will succeed even if memory is extremely fragmented. Its a safety 
> valve that Nick probably finds important.

Hmm, shouldn't we then fix just fix calculate_order() to not try so hard 
to find better fitting higher orders?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 19:23       ` Pekka Enberg
@ 2008-02-14 19:32         ` Christoph Lameter
  2008-02-14 19:47           ` Pekka Enberg
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:32 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Pekka Enberg wrote:

> Christoph Lameter wrote:
> > The kmalloc slab allocation will use order 3. The allocation for an
> > individual object via the page allocator only uses order 0. The order 0
> > alloc will succeed even if memory is extremely fragmented. Its a safety
> > valve that Nick probably finds important.
> 
> Hmm, shouldn't we then fix just fix calculate_order() to not try so hard to
> find better fitting higher orders?

That would mean reducing the number of objects that can be allocated from 
the fastpath before we have to go to the page allocator again. Increasing 
the number of fastpath uses vs slowpath increases the overall performance 
of a slab.

If we would use order 0 slab allocs for 4k slabs then every call to 
slab_alloc would lead to a corresponding call to the page allocator. The 
regression would not be fixed. We just add slab_alloc overhead to an 
already bad page allocator call.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 19:32         ` Christoph Lameter
@ 2008-02-14 19:47           ` Pekka Enberg
  2008-02-14 19:57             ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14 19:47 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

Christoph Lameter wrote:
> That would mean reducing the number of objects that can be allocated from 
> the fastpath before we have to go to the page allocator again. Increasing 
> the number of fastpath uses vs slowpath increases the overall performance 
> of a slab.
> 
> If we would use order 0 slab allocs for 4k slabs then every call to 
> slab_alloc would lead to a corresponding call to the page allocator. The 
> regression would not be fixed. We just add slab_alloc overhead to an 
> already bad page allocator call.

Aah, I see. I wonder if we can fix up allocate_slab() to try with a 
smaller order as long as the size allows that? The only problem I can 
see is s->objects but I think we can just move that to be a per-slab 
variable. So sort of variable-order slabs kind of a thing.

What do you think?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 19:47           ` Pekka Enberg
@ 2008-02-14 19:57             ` Christoph Lameter
  2008-02-14 20:02               ` Pekka Enberg
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 19:57 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Pekka Enberg wrote:

> Aah, I see. I wonder if we can fix up allocate_slab() to try with a smaller
> order as long as the size allows that? The only problem I can see is
> s->objects but I think we can just move that to be a per-slab variable. So
> sort of variable-order slabs kind of a thing.

Urgh. This is going to require a count of the maximum number of objects 
per individual slab page. Adds more overhead to the fast path and means 
that not all the slabs may have the same order. Which may in turn result 
in a mix of order 3 2 1 pages. Not good for fragmentation. I think the 
do order 3 always and then order 0 if we are in a bad fragmentation 
state the best compromise. In particular because these bad fragmented 
memory scenarios seems to be very difficult to produce and occur only in 
specialized situations (f.e. minimal ram with lots of page pinned by I/O, 
stuff like that).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 19:57             ` Christoph Lameter
@ 2008-02-14 20:02               ` Pekka Enberg
  2008-02-14 20:08                 ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14 20:02 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

Christoph Lameter wrote:
>> Aah, I see. I wonder if we can fix up allocate_slab() to try with a smaller
>> order as long as the size allows that? The only problem I can see is
>> s->objects but I think we can just move that to be a per-slab variable. So
>> sort of variable-order slabs kind of a thing.
> 
> Urgh. This is going to require a count of the maximum number of objects 
> per individual slab page. Adds more overhead to the fast path and means 
> that not all the slabs may have the same order. Which may in turn result 
> in a mix of order 3 2 1 pages. Not good for fragmentation. I think the 
> do order 3 always and then order 0 if we are in a bad fragmentation 
> state the best compromise. In particular because these bad fragmented 
> memory scenarios seems to be very difficult to produce and occur only in 
> specialized situations (f.e. minimal ram with lots of page pinned by I/O, 
> stuff like that).

Ok, makes sense.

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

to this patch and the kmem_cache_alloc equivalent (which you might as 
well fold into one patch).

Thanks Christoph.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 20:02               ` Pekka Enberg
@ 2008-02-14 20:08                 ` Christoph Lameter
  2008-02-14 20:13                   ` Pekka Enberg
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 20:08 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Pekka Enberg wrote:

> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
> 
> to this patch and the kmem_cache_alloc equivalent (which you might as well
> fold into one patch).

I would like to merge this patch into 2.6.25 and keep the other for mm to 
maybe merge in 2.6.26. Not sure how safe the general use of the fallback 
is. Definitely no problem for the kmalloc array.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
  2008-02-14 19:18     ` Christoph Lameter
@ 2008-02-14 20:08       ` Mel Gorman
  2008-02-14 20:14         ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2008-02-14 20:08 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On (14/02/08 11:18), Christoph Lameter didst pronounce:
> On Thu, 14 Feb 2008, Mel Gorman wrote:
> 
> > The only reason to have an allocation like this set as MOVABLE is so it can
> > make use of the partition created by movablecore= which has a few specific
> > purposes. One of them is that on a shared system, a partition can be created
> > that is of the same size as the largest hugepage pool required for any job. As
> > jobs run, they can grow or shrink the pool as desired.  When the jobs complete,
> > the hugepages are no longer in use and the partition becomes essentially free.
> 
> Doesnt it mean that the allocations can occur in MAX_ORDER blocks 
> marked MOVABLE?

Blocks aren't MAX_ORDER in size, they are pageblock_order in size and that
value can be thought of as HUGETLB_PAGE_ORDER. No matter what you mark them
as, it is still one pageblock. If you leave them as RECLAIMABLE or UNMOVABLE,
a MOVABLE block may still get reclaimed and given to slab instead without
this patch. Marking them movable when no partition exists gains nothing at all.

> I thought movablecore= is no longer necessary after the 
> rest of the antifrag stuff was merged?
> 

It's still used. movablecore= provides guarantees on how many movable blocks
will exist and what size the huge page pool can be guaranteeed to be resized
to. It would be used in a situation where a workload was found to fragment
memory or in situations where the guarantee must be mode but the administrator
still needs to be able to get that memory as small pages if necessary.

I wrote this about partitioning a while ago
http://www.csn.ul.ie/~mel/docs/poolmanagement/ 

> > SLAB pages do not have the same property. Even with all processes exited,
> > there will be slab allocations lying around, probably in this partition
> > preventing the hugepage pool being resized (or memory hot-remove for that
> > matter which can work on a section-boundary on POWER).
> 
> echo 2 >/proc/sys/vm/drop_cache will usually allow a significant shrinkage
> of the slab caches. In many ways it is the same.
> 

Except it doesn't work for all slabs. As part of the highorder allocation
stress tests I run, a final part is allocating pages when no other process is
running and /proc/sys/vm/drop_cache has been used. There are still remaining
pages left in odd places.

> > If the administrator has created a partition for memory hot-remove or
> > for having a known quantity when resizing the hugepage pool, it is
> > unlikely they want SLAB pages to be allocated from the same place
> > putting a spanner in the works. Without the partition and
> > slub_min_order==hugepage_size, this patch does nothing so;
> > 
> > NACK.
> 
> This is a feature enabled by a special command line boot option. So its 
> something that the admin did *intentionally*.

Not quite. What he asked for is that slub_min_order=HUGE_PAGESIZE, not
slub_use_zone_movable. In a situation where they wanted to have a hugepage
pool that reliably resized and slub_min_order == HUGE_PAGESIZE, he would
find that they collide for no obvious reason.

If you really want to open the possibility that slub uses the movable
partition, then the parameter should indicate that.

> Slab allocation will *not* 
> take away from the huge page pool but will take pages from the page 
> allocator.
> 

I understand that.

> A system with huge amounts of memory has a large amount of huge 
> pages. It is typically at this point to have 4G per cpu in a system and we 
> may go higher. 4G means up to 2048 huge pages per cpu! Huge page 
> allocation will be quite common and its good to reduce page allocator 
> overhead.
> 

Marking them movable makes no difference to that assertion.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs
  2008-02-14 20:08                 ` Christoph Lameter
@ 2008-02-14 20:13                   ` Pekka Enberg
  0 siblings, 0 replies; 27+ messages in thread
From: Pekka Enberg @ 2008-02-14 20:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mel Gorman, Nick Piggin, Andrew Morton, linux-mm

Christoph Lameter wrote:
> I would like to merge this patch into 2.6.25 and keep the other for mm to 
> maybe merge in 2.6.26. Not sure how safe the general use of the fallback 
> is. Definitely no problem for the kmalloc array.

Works for me.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
  2008-02-14 20:08       ` Mel Gorman
@ 2008-02-14 20:14         ` Christoph Lameter
  2008-02-14 20:25           ` Mel Gorman
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 20:14 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Mel Gorman wrote:

> > Doesnt it mean that the allocations can occur in MAX_ORDER blocks 
> > marked MOVABLE?
> 
> Blocks aren't MAX_ORDER in size, they are pageblock_order in size and that
> value can be thought of as HUGETLB_PAGE_ORDER. No matter what you mark them
> as, it is still one pageblock. If you leave them as RECLAIMABLE or UNMOVABLE,
> a MOVABLE block may still get reclaimed and given to slab instead without
> this patch. Marking them movable when no partition exists gains nothing at all.

Ok so order 9 allocs of slub could occur in pageblock_order blocks like 
what happens for huge pages today. AFAICT this makes the handling of 
slab pages consistent with huge pages?

> > echo 2 >/proc/sys/vm/drop_cache will usually allow a significant shrinkage
> > of the slab caches. In many ways it is the same.
> > 
> 
> Except it doesn't work for all slabs. As part of the highorder allocation
> stress tests I run, a final part is allocating pages when no other process is
> running and /proc/sys/vm/drop_cache has been used. There are still remaining
> pages left in odd places.

Nor does it work for huge pages which cannot be moved but they are marked 
__GFP_MOVABLE anyways.

> > This is a feature enabled by a special command line boot option. So its 
> > something that the admin did *intentionally*.
> 
> Not quite. What he asked for is that slub_min_order=HUGE_PAGESIZE, not
> slub_use_zone_movable. In a situation where they wanted to have a hugepage
> pool that reliably resized and slub_min_order == HUGE_PAGESIZE, he would
> find that they collide for no obvious reason.

Hmm.... So they would use the size of the movable area to size the hugetlb 
area? 

> > A system with huge amounts of memory has a large amount of huge 
> > pages. It is typically at this point to have 4G per cpu in a system and we 
> > may go higher. 4G means up to 2048 huge pages per cpu! Huge page 
> > allocation will be quite common and its good to reduce page allocator 
> > overhead.
> Marking them movable makes no difference to that assertion.

Hmmmm... Okay if pages are managed in pageblock_size chunks that are of 
HUGE_PAGE_SIZE then this patch makes no difference whatsoever.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
  2008-02-14 20:14         ` Christoph Lameter
@ 2008-02-14 20:25           ` Mel Gorman
  2008-02-14 20:32             ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2008-02-14 20:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On (14/02/08 12:14), Christoph Lameter didst pronounce:
> On Thu, 14 Feb 2008, Mel Gorman wrote:
> 
> > > Doesnt it mean that the allocations can occur in MAX_ORDER blocks 
> > > marked MOVABLE?
> > 
> > Blocks aren't MAX_ORDER in size, they are pageblock_order in size and that
> > value can be thought of as HUGETLB_PAGE_ORDER. No matter what you mark them
> > as, it is still one pageblock. If you leave them as RECLAIMABLE or UNMOVABLE,
> > a MOVABLE block may still get reclaimed and given to slab instead without
> > this patch. Marking them movable when no partition exists gains nothing at all.
> 
> Ok so order 9 allocs of slub could occur in pageblock_order blocks like 
> what happens for huge pages today. AFAICT this makes the handling of 
> slab pages consistent with huge pages?
> 

No. Huge pages are not marked MOVABLE unless it is specifically requested
for the situation where the partition is being used to guarantee the hugepage
pool can grow to that size.

        if (hugepages_treat_as_movable)
                htlb_alloc_mask = GFP_HIGHUSER_MOVABLE;
        else
                htlb_alloc_mask = GFP_HIGHUSER;

> > > echo 2 >/proc/sys/vm/drop_cache will usually allow a significant shrinkage
> > > of the slab caches. In many ways it is the same.
> > > 
> > 
> > Except it doesn't work for all slabs. As part of the highorder allocation
> > stress tests I run, a final part is allocating pages when no other process is
> > running and /proc/sys/vm/drop_cache has been used. There are still remaining
> > pages left in odd places.
> 
> Nor does it work for huge pages which cannot be moved but they are marked 
> __GFP_MOVABLE anyways.
> 

They are marked when it is specifically requested.

> > > This is a feature enabled by a special command line boot option. So its 
> > > something that the admin did *intentionally*.
> > 
> > Not quite. What he asked for is that slub_min_order=HUGE_PAGESIZE, not
> > slub_use_zone_movable. In a situation where they wanted to have a hugepage
> > pool that reliably resized and slub_min_order == HUGE_PAGESIZE, he would
> > find that they collide for no obvious reason.
> 
> Hmm.... So they would use the size of the movable area to size the hugetlb 
> area? 
> 

I'm not sure what you mean by that question. The situation is simple;

If an administrator knows that they need to have a pool of 200 huge pages
at some unknown time in the future, he can say movablecore=N (where N ==
200 hugepages worth of bytes) and set hugepages_treat_as_movable and they
can be reasonable sure it'll work (mlock being the obvious problem as memory
compaction was not merged)

If you wanted to have something similar available for SLUB for some reason,
then the parameter should be similarly named and obvious.

> > > A system with huge amounts of memory has a large amount of huge 
> > > pages. It is typically at this point to have 4G per cpu in a system and we 
> > > may go higher. 4G means up to 2048 huge pages per cpu! Huge page 
> > > allocation will be quite common and its good to reduce page allocator 
> > > overhead.
> > Marking them movable makes no difference to that assertion.
> 
> Hmmmm... Okay if pages are managed in pageblock_size chunks that are of 
> HUGE_PAGE_SIZE then this patch makes no difference whatsoever.
> 

Yes it does - it means that slub pages can be allocated from the movablecore=
partition if slub_min_order is set to a magic value. What it does not do at
all is help SLUB in a meaningful fashion.

Still NACK.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE
  2008-02-14 20:25           ` Mel Gorman
@ 2008-02-14 20:32             ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2008-02-14 20:32 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, Nick Piggin, Andrew Morton, linux-mm

On Thu, 14 Feb 2008, Mel Gorman wrote:

> > Hmmmm... Okay if pages are managed in pageblock_size chunks that are of 
> > HUGE_PAGE_SIZE then this patch makes no difference whatsoever.
> > 
> 
> Yes it does - it means that slub pages can be allocated from the movablecore=
> partition if slub_min_order is set to a magic value. What it does not do at
> all is help SLUB in a meaningful fashion.

No one that I know of is using this esoteric option. Did not even think 
about it when writing the patch.

> Still NACK.

Well its useless then. I will drop it then.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-02-14 20:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080214040245.915842795@sgi.com>
     [not found] ` <20080214040313.318658830@sgi.com>
2008-02-14  7:23   ` [patch 1/5] slub: Determine gfpflags once and not every time a slab is allocated Pekka Enberg
2008-02-14 13:55   ` Mel Gorman
     [not found] ` <20080214040314.388752493@sgi.com>
2008-02-14  7:14   ` [patch 5/5] slub: Large allocs for other slab sizes that do not fit in order 0 Pekka Enberg
2008-02-14 19:06     ` Christoph Lameter
2008-02-14  8:55   ` Pekka Enberg
     [not found] ` <20080214040313.616551392@sgi.com>
2008-02-14  7:04   ` [patch 2/5] slub: Fallback to kmalloc_large for failing higher order allocs Pekka Enberg
2008-02-14  8:56   ` Pekka Enberg
2008-02-14 19:07     ` Christoph Lameter
2008-02-14 14:06   ` Mel Gorman
2008-02-14 19:10     ` Christoph Lameter
2008-02-14 19:23       ` Pekka Enberg
2008-02-14 19:32         ` Christoph Lameter
2008-02-14 19:47           ` Pekka Enberg
2008-02-14 19:57             ` Christoph Lameter
2008-02-14 20:02               ` Pekka Enberg
2008-02-14 20:08                 ` Christoph Lameter
2008-02-14 20:13                   ` Pekka Enberg
     [not found] ` <20080214040314.118141086@sgi.com>
2008-02-14  7:07   ` [patch 4/5] slub: Use __GFP_MOVABLE for slabs of HPAGE_SIZE Pekka Enberg
2008-02-14 19:04     ` Christoph Lameter
2008-02-14  8:57   ` Pekka Enberg
2008-02-14 19:07     ` Christoph Lameter
2008-02-14 14:14   ` Mel Gorman
2008-02-14 19:18     ` Christoph Lameter
2008-02-14 20:08       ` Mel Gorman
2008-02-14 20:14         ` Christoph Lameter
2008-02-14 20:25           ` Mel Gorman
2008-02-14 20:32             ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).