[RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
@ 2011-02-23 15:10 Michal Hocko
  2011-02-23 18:19 ` Dave Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2011-02-23 15:10 UTC (permalink / raw)
  To: linux-mm; +Cc: KAMEZAWA Hiroyuki, linux-kernel

Hi,

I have just noticed that memory cgroups consume a lot of 2MB slab
objects (8 objects per 1GB of RAM on x86_64 with SPARSEMEM). It turned
out that this memory is allocated for per memory sections page_cgroup
arrays. 

If we consider that the array itself consume something above 1MB (but
still doesn't fit into 1MB kmalloc cache) it is rather big wasting of
(continous) memory (6MB per 1GB of RAM). 

The patch below tries to fix this up. Any thoughts?
---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
  2011-02-23 15:10 [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM Michal Hocko
@ 2011-02-23 18:19 ` Dave Hansen
  2011-02-23 23:52   ` KAMEZAWA Hiroyuki
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Dave Hansen @ 2011-02-23 18:19 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, KAMEZAWA Hiroyuki, linux-kernel

On Wed, 2011-02-23 at 16:10 +0100, Michal Hocko wrote:
> We can reduce this internal fragmentation by splitting the single
> page_cgroup array into more arrays where each one is well kmalloc
> aligned. This patch implements this idea. 

How about using alloc_pages_exact()?  These things aren't allocated
often enough to really get most of the benefits of being in a slab.
That'll at least get you down to a maximum of about PAGE_SIZE wasted.  

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
  2011-02-23 18:19 ` Dave Hansen
@ 2011-02-23 23:52   ` KAMEZAWA Hiroyuki
  2011-02-24  9:35     ` Michal Hocko
  2011-02-24  9:33   ` Michal Hocko
  2011-02-24 13:40   ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v2 Michal Hocko
  2 siblings, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-23 23:52 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Michal Hocko, linux-mm, linux-kernel

On Wed, 23 Feb 2011 10:19:22 -0800
Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Wed, 2011-02-23 at 16:10 +0100, Michal Hocko wrote:
> > We can reduce this internal fragmentation by splitting the single
> > page_cgroup array into more arrays where each one is well kmalloc
> > aligned. This patch implements this idea. 
> 
> How about using alloc_pages_exact()?  These things aren't allocated
> often enough to really get most of the benefits of being in a slab.
> That'll at least get you down to a maximum of about PAGE_SIZE wasted.  
> 

yes, alloc_pages_exact() is much better.

packing page_cgroups for multiple sections causes breakage in memory hotplug logic.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
  2011-02-23 18:19 ` Dave Hansen
  2011-02-23 23:52   ` KAMEZAWA Hiroyuki
@ 2011-02-24  9:33   ` Michal Hocko
  2011-02-24 13:40   ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v2 Michal Hocko
  2 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2011-02-24  9:33 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, KAMEZAWA Hiroyuki, linux-kernel

On Wed 23-02-11 10:19:22, Dave Hansen wrote:
> On Wed, 2011-02-23 at 16:10 +0100, Michal Hocko wrote:
> > We can reduce this internal fragmentation by splitting the single
> > page_cgroup array into more arrays where each one is well kmalloc
> > aligned. This patch implements this idea. 
> 
> How about using alloc_pages_exact()?  These things aren't allocated
> often enough to really get most of the benefits of being in a slab.

Yes, you are right. alloc_pages_exact (which I wasn't aware of) is much
simpler solution and it gets comparable results. I will prepare the
patch for review.

Thanks for pointing this out.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
  2011-02-23 23:52   ` KAMEZAWA Hiroyuki
@ 2011-02-24  9:35     ` Michal Hocko
  2011-02-24 10:02       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2011-02-24  9:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, linux-mm, linux-kernel

On Thu 24-02-11 08:52:27, KAMEZAWA Hiroyuki wrote:
> On Wed, 23 Feb 2011 10:19:22 -0800
> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> 
> > On Wed, 2011-02-23 at 16:10 +0100, Michal Hocko wrote:
> > > We can reduce this internal fragmentation by splitting the single
> > > page_cgroup array into more arrays where each one is well kmalloc
> > > aligned. This patch implements this idea. 
> > 
> > How about using alloc_pages_exact()?  These things aren't allocated
> > often enough to really get most of the benefits of being in a slab.
> > That'll at least get you down to a maximum of about PAGE_SIZE wasted.  
> > 
> 
> yes, alloc_pages_exact() is much better.
> 
> packing page_cgroups for multiple sections causes breakage in memory hotplug logic.

I am not sure I understand this. What do you mean by packing
page_cgroups for multiple sections? The patch I have posted doesn't do
any packing. Or do you mean that using a double array can break hotplog?
Not that this would change anything, alloc_pages_exact is really a
better solution, I am just curious ;) 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
  2011-02-24  9:35     ` Michal Hocko
@ 2011-02-24 10:02       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-24 10:02 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Dave Hansen, linux-mm, linux-kernel

On Thu, 24 Feb 2011 10:35:19 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> On Thu 24-02-11 08:52:27, KAMEZAWA Hiroyuki wrote:
> > On Wed, 23 Feb 2011 10:19:22 -0800
> > Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > 
> > > On Wed, 2011-02-23 at 16:10 +0100, Michal Hocko wrote:
> > > > We can reduce this internal fragmentation by splitting the single
> > > > page_cgroup array into more arrays where each one is well kmalloc
> > > > aligned. This patch implements this idea. 
> > > 
> > > How about using alloc_pages_exact()?  These things aren't allocated
> > > often enough to really get most of the benefits of being in a slab.
> > > That'll at least get you down to a maximum of about PAGE_SIZE wasted.  
> > > 
> > 
> > yes, alloc_pages_exact() is much better.
> > 
> > packing page_cgroups for multiple sections causes breakage in memory hotplug logic.
> 
> I am not sure I understand this. What do you mean by packing
> page_cgroups for multiple sections? The patch I have posted doesn't do
> any packing. Or do you mean that using a double array can break hotplog?
> Not that this would change anything, alloc_pages_exact is really a
> better solution, I am just curious ;) 
> 

Sorry, it seems I failed to read code correctly. 
You just implemented 2 level table..


Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v2
  2011-02-23 18:19 ` Dave Hansen
  2011-02-23 23:52   ` KAMEZAWA Hiroyuki
  2011-02-24  9:33   ` Michal Hocko
@ 2011-02-24 13:40   ` Michal Hocko
  2011-02-25  3:25     ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2011-02-24 13:40 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, KAMEZAWA Hiroyuki, linux-kernel

Here is the second version of the patch. I have used alloc_pages_exact
instead of the complex double array approach.

I still fallback to kmalloc/vmalloc because hotplug can happen quite
some time after boot and we can end up not having enough continuous
pages at that time. 

I am also thinking whether it would make sense to introduce
alloc_pages_exact_node function which would allocate pages from the
given node.

Any thoughts?
---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v2
  2011-02-24 13:40   ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v2 Michal Hocko
@ 2011-02-25  3:25     ` KAMEZAWA Hiroyuki
  2011-02-25  9:53       ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v3 Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-25  3:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Dave Hansen, linux-mm, linux-kernel

On Thu, 24 Feb 2011 14:40:45 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> Here is the second version of the patch. I have used alloc_pages_exact
> instead of the complex double array approach.
> 
> I still fallback to kmalloc/vmalloc because hotplug can happen quite
> some time after boot and we can end up not having enough continuous
> pages at that time. 
> 
> I am also thinking whether it would make sense to introduce
> alloc_pages_exact_node function which would allocate pages from the
> given node.
> 
> Any thoughts?

The patch itself is fine but please update the description.

But have some comments, below.

> ---
> From e8909bbd1d759de274a6ed7812530e576ad8bc44 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 24 Feb 2011 11:25:44 +0100
> Subject: [PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
> 
> Currently we are allocating a single page_cgroup array per memory
> section (stored in mem_section->base) when CONFIG_SPARSEMEM is selected.
> This is correct but memory inefficient solution because the allocated
> memory (unless we fall back to vmalloc) is not kmalloc friendly:
>         - 32b - 16384 entries (20B per entry) fit into 327680B so the
>           524288B slab cache is used
>         - 32b with PAE - 131072 entries with 2621440B fit into 4194304B
>         - 64b - 32768 entries (40B per entry) fit into 2097152 cache
> 
> This is ~37% wasted space per memory section and it sumps up for the
> whole memory. On a x86_64 machine it is something like 6MB per 1GB of
> RAM.
> 
> We can reduce the internal fragmentation either by imeplementing 2
> dimensional array and allocate kmalloc aligned sizes for each entry (as
> suggested in https://lkml.org/lkml/2011/2/23/232) or we can get rid of
> kmalloc altogether and allocate directly from the buddy allocator (use
> alloc_pages_exact) as suggested by Dave Hansen.
> 
> The later solution is much simpler and the internal fragmentation is
> comparable (~1 page per section).
> 
> We still need a fallback to kmalloc/vmalloc because we have no
> guarantees that we will have a continuous memory of that size (order-10)
> later on the hotplug events.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/page_cgroup.c |   62 ++++++++++++++++++++++++++++++++++--------------------
>  1 files changed, 39 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 5bffada..eaae7de 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -105,7 +105,41 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
>  	return section->page_cgroup + pfn;
>  }
>  
> -/* __alloc_bootmem...() is protected by !slab_available() */
> +static void *__init_refok alloc_mcg_table(size_t size, int nid)
> +{
> +	void *addr = NULL;
> +	if((addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN)))
> +		return addr;
> +
> +	if (node_state(nid, N_HIGH_MEMORY)) {
> +		addr = kmalloc_node(size, GFP_KERNEL | __GFP_NOWARN, nid);
> +		if (!addr)
> +			addr = vmalloc_node(size, nid);
> +	} else {
> +		addr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> +		if (!addr)
> +			addr = vmalloc(size);
> +	}
> +
> +	return addr;
> +}

What is the case we need to call kmalloc_node() even when alloc_pages_exact() fails ?
vmalloc() may need to be called when the size of chunk is larger than
MAX_ORDER or there is fragmentation.....

And the function name, alloc_mcg_table(), I don't like it because this is an
allocation for page_cgroup.

How about alloc_page_cgroup() simply ?


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v3
  2011-02-25  3:25     ` KAMEZAWA Hiroyuki
@ 2011-02-25  9:53       ` Michal Hocko
  2011-02-28  0:53         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2011-02-25  9:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, linux-mm, linux-kernel

On Fri 25-02-11 12:25:22, KAMEZAWA Hiroyuki wrote:
> On Thu, 24 Feb 2011 14:40:45 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Here is the second version of the patch. I have used alloc_pages_exact
> > instead of the complex double array approach.
> > 
> > I still fallback to kmalloc/vmalloc because hotplug can happen quite
> > some time after boot and we can end up not having enough continuous
> > pages at that time. 
> > 
> > I am also thinking whether it would make sense to introduce
> > alloc_pages_exact_node function which would allocate pages from the
> > given node.
> > 
> > Any thoughts?
> 
> The patch itself is fine but please update the description.

I have updated the description but kept those parts which describe how
the memory is wasted for different configurations. Do you have any tips
how it can be improved?

> 
> But have some comments, below.
[...]
> > -/* __alloc_bootmem...() is protected by !slab_available() */
> > +static void *__init_refok alloc_mcg_table(size_t size, int nid)
> > +{
> > +	void *addr = NULL;
> > +	if((addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN)))
> > +		return addr;
> > +
> > +	if (node_state(nid, N_HIGH_MEMORY)) {
> > +		addr = kmalloc_node(size, GFP_KERNEL | __GFP_NOWARN, nid);
> > +		if (!addr)
> > +			addr = vmalloc_node(size, nid);
> > +	} else {
> > +		addr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > +		if (!addr)
> > +			addr = vmalloc(size);
> > +	}
> > +
> > +	return addr;
> > +}
> 
> What is the case we need to call kmalloc_node() even when alloc_pages_exact() fails ?
> vmalloc() may need to be called when the size of chunk is larger than
> MAX_ORDER or there is fragmentation.....

I kept the original kmalloc with fallback to vmalloc because vmalloc is
more scarce resource (especially on i386 where we can have memory
hotplug configured as well).

> 
> And the function name, alloc_mcg_table(), I don't like it because this is an
> allocation for page_cgroup.
> 
> How about alloc_page_cgroup() simply ?

OK, I have no preferences for the name. alloc_page_cgroup sounds good as
well.

I have also added VM_BUG_ON(!slab_is_available()) back to the allocation
path.

Thanks for the review. The updated patch is bellow:

Changes since v2
- rename alloc_mcg_table to alloc_page_cgroup
- free__mcg_table renamed to free_page_cgroup
- get VM_BUG_ON(!slab_is_available()) back into the allocation path
--- 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v3
  2011-02-25  9:53       ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v3 Michal Hocko
@ 2011-02-28  0:53         ` KAMEZAWA Hiroyuki
  2011-02-28  9:12           ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4 Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-28  0:53 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Dave Hansen, linux-mm, linux-kernel

On Fri, 25 Feb 2011 10:53:57 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> On Fri 25-02-11 12:25:22, KAMEZAWA Hiroyuki wrote:
> > On Thu, 24 Feb 2011 14:40:45 +0100
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > Here is the second version of the patch. I have used alloc_pages_exact
> > > instead of the complex double array approach.
> > > 
> > > I still fallback to kmalloc/vmalloc because hotplug can happen quite
> > > some time after boot and we can end up not having enough continuous
> > > pages at that time. 
> > > 
> > > I am also thinking whether it would make sense to introduce
> > > alloc_pages_exact_node function which would allocate pages from the
> > > given node.
> > > 
> > > Any thoughts?
> > 
> > The patch itself is fine but please update the description.
> 
> I have updated the description but kept those parts which describe how
> the memory is wasted for different configurations. Do you have any tips
> how it can be improved?
> 

This part was in your description.
==
We can reduce the internal fragmentation either by imeplementing 2
dimensional array and allocate kmalloc aligned sizes for each entry (as
suggested in https://lkml.org/lkml/2011/2/23/232) or we can get rid of
kmalloc altogether and allocate directly from the buddy allocator (use
alloc_pages_exact) as suggested by Dave Hansen.
==

please remove 2 dimentional..... etc. That's just a history.




> > 
> > But have some comments, below.
> [...]
> > > -/* __alloc_bootmem...() is protected by !slab_available() */
> > > +static void *__init_refok alloc_mcg_table(size_t size, int nid)
> > > +{
> > > +	void *addr = NULL;
> > > +	if((addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN)))
> > > +		return addr;
> > > +
> > > +	if (node_state(nid, N_HIGH_MEMORY)) {
> > > +		addr = kmalloc_node(size, GFP_KERNEL | __GFP_NOWARN, nid);
> > > +		if (!addr)
> > > +			addr = vmalloc_node(size, nid);
> > > +	} else {
> > > +		addr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > > +		if (!addr)
> > > +			addr = vmalloc(size);
> > > +	}
> > > +
> > > +	return addr;
> > > +}
> > 
> > What is the case we need to call kmalloc_node() even when alloc_pages_exact() fails ?
> > vmalloc() may need to be called when the size of chunk is larger than
> > MAX_ORDER or there is fragmentation.....
> 
> I kept the original kmalloc with fallback to vmalloc because vmalloc is
> more scarce resource (especially on i386 where we can have memory
> hotplug configured as well).
> 

My point is, if alloc_pages_exact() failes because of order of the page,
kmalloc() will always fail. Please remove kmalloc().

> > 
> > And the function name, alloc_mcg_table(), I don't like it because this is an
> > allocation for page_cgroup.
> > 
> > How about alloc_page_cgroup() simply ?
> 
> OK, I have no preferences for the name. alloc_page_cgroup sounds good as
> well.
> 
> I have also added VM_BUG_ON(!slab_is_available()) back to the allocation
> path.
> 
> Thanks for the review. The updated patch is bellow:
> 

kmalloc() is unnecessary, again.


> Changes since v2
> - rename alloc_mcg_table to alloc_page_cgroup
> - free__mcg_table renamed to free_page_cgroup
> - get VM_BUG_ON(!slab_is_available()) back into the allocation path
> --- 
> From 9b34baf57f410a628bf45f2b35b23fdf38feae79 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 24 Feb 2011 11:25:44 +0100
> Subject: [PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
> 
> Currently we are allocating a single page_cgroup array per memory
> section (stored in mem_section->base) when CONFIG_SPARSEMEM is selected.
> This is correct but memory inefficient solution because the allocated
> memory (unless we fall back to vmalloc) is not kmalloc friendly:
>         - 32b - 16384 entries (20B per entry) fit into 327680B so the
>           524288B slab cache is used
>         - 32b with PAE - 131072 entries with 2621440B fit into 4194304B
>         - 64b - 32768 entries (40B per entry) fit into 2097152 cache
> 
> This is ~37% wasted space per memory section and it sumps up for the
> whole memory. On a x86_64 machine it is something like 6MB per 1GB of
> RAM.
> 
> We can reduce the internal fragmentation either by imeplementing 2
> dimensional array and allocate kmalloc aligned sizes for each entry (as
> suggested in https://lkml.org/lkml/2011/2/23/232) or allocate directly
> from the buddy allocator (use alloc_pages_exact) as suggested by Dave
> Hansen.
> 
> The later solution is much simpler and the internal fragmentation is
> comparable (~1 page per section). The initialization is done during the
> boot (unless we are doing memory hotplug) so we shouldn't have any
> issues from memory fragmentation and alloc_pages_exact should succeed.
> 
> We still need a fallback to kmalloc/vmalloc because we have no
> guarantees that we will have a continuous memory of that size (order-10)
> later on during the hotplug events.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> CC: Dave Hansen <dave@linux.vnet.ibm.com>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/page_cgroup.c |   62 ++++++++++++++++++++++++++++++++++-------------------
>  1 files changed, 40 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 5bffada..ae322dc 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -106,6 +106,42 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
>  }
>  
>  /* __alloc_bootmem...() is protected by !slab_available() */
> +static void *__init_refok alloc_page_cgroup(size_t size, int nid)
> +{
> +	void *addr = NULL;
> +	if((addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN)))
> +		return addr;
> +
> +	VM_BUG_ON(!slab_is_available());
> +	if (node_state(nid, N_HIGH_MEMORY)) {
> +		addr = kmalloc_node(size, GFP_KERNEL | __GFP_NOWARN, nid);
> +		if (!addr)
> +			addr = vmalloc_node(size, nid);
> +	} else {
> +		addr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> +		if (!addr)
> +			addr = vmalloc(size);
> +	}
> +
> +	return addr;
> +}
> +
> +static void *free_page_cgroup(void *addr)
> +{
> +	if (is_vmalloc_addr(addr)) {
> +		vfree(addr);
> +	} else {
> +		struct page *page = virt_to_page(addr);
> +		if (!PageReserved(page)) { /* Is bootmem ? */
> +			if (!PageSlab(page)) {
> +				size_t table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
> +				free_pages_exact(addr, table_size);
> +			} else
> +				kfree(addr);
> +		}
> +	}
> +}
> +
>  static int __init_refok init_section_page_cgroup(unsigned long pfn)
>  {
>  	struct mem_section *section = __pfn_to_section(pfn);
> @@ -114,19 +150,9 @@ static int __init_refok init_section_page_cgroup(unsigned long pfn)
>  	int nid, index;
>  
>  	if (!section->page_cgroup) {
> -		nid = page_to_nid(pfn_to_page(pfn));
>  		table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
> -		VM_BUG_ON(!slab_is_available());
> -		if (node_state(nid, N_HIGH_MEMORY)) {
> -			base = kmalloc_node(table_size,
> -				GFP_KERNEL | __GFP_NOWARN, nid);
> -			if (!base)
> -				base = vmalloc_node(table_size, nid);
> -		} else {
> -			base = kmalloc(table_size, GFP_KERNEL | __GFP_NOWARN);
> -			if (!base)
> -				base = vmalloc(table_size);
> -		}
> +		nid = page_to_nid(pfn_to_page(pfn));
> +		base = alloc_page_cgroup(table_size, nid);
>  		/*
>  		 * The value stored in section->page_cgroup is (base - pfn)
>  		 * and it does not point to the memory block allocated above,
> @@ -170,16 +196,8 @@ void __free_page_cgroup(unsigned long pfn)
>  	if (!ms || !ms->page_cgroup)
>  		return;
>  	base = ms->page_cgroup + pfn;
> -	if (is_vmalloc_addr(base)) {
> -		vfree(base);
> -		ms->page_cgroup = NULL;
> -	} else {
> -		struct page *page = virt_to_page(base);
> -		if (!PageReserved(page)) { /* Is bootmem ? */
> -			kfree(base);
> -			ms->page_cgroup = NULL;
> -		}
> -	}
> +	free_page_cgroup(base);
> +	ms->page_cgroup = NULL;
>  }
>  
>  int __meminit online_page_cgroup(unsigned long start_pfn,
> -- 
> 1.7.2.3
> 
> -- 
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9    
> Czech Republic
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4
  2011-02-28  0:53         ` KAMEZAWA Hiroyuki
@ 2011-02-28  9:12           ` Michal Hocko
  2011-02-28  9:23             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2011-02-28  9:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, linux-mm, linux-kernel

On Mon 28-02-11 09:53:47, KAMEZAWA Hiroyuki wrote:
> On Fri, 25 Feb 2011 10:53:57 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Fri 25-02-11 12:25:22, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 24 Feb 2011 14:40:45 +0100
[...]
> > > The patch itself is fine but please update the description.
> > 
> > I have updated the description but kept those parts which describe how
> > the memory is wasted for different configurations. Do you have any tips
> > how it can be improved?
> > 
> 
> This part was in your description.
> ==
> We can reduce the internal fragmentation either by imeplementing 2
> dimensional array and allocate kmalloc aligned sizes for each entry (as
> suggested in https://lkml.org/lkml/2011/2/23/232) or we can get rid of
> kmalloc altogether and allocate directly from the buddy allocator (use
> alloc_pages_exact) as suggested by Dave Hansen.
> ==
> 
> please remove 2 dimentional..... etc. That's just a history.

I just wanted to mention both approaches. OK, I can remove that, of
course.

> > > 
> > > But have some comments, below.
> > [...]
> > > > -/* __alloc_bootmem...() is protected by !slab_available() */
> > > > +static void *__init_refok alloc_mcg_table(size_t size, int nid)
> > > > +{
> > > > +	void *addr = NULL;
> > > > +	if((addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN)))
> > > > +		return addr;
> > > > +
> > > > +	if (node_state(nid, N_HIGH_MEMORY)) {
> > > > +		addr = kmalloc_node(size, GFP_KERNEL | __GFP_NOWARN, nid);
> > > > +		if (!addr)
> > > > +			addr = vmalloc_node(size, nid);
> > > > +	} else {
> > > > +		addr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > > > +		if (!addr)
> > > > +			addr = vmalloc(size);
> > > > +	}
> > > > +
> > > > +	return addr;
> > > > +}
> > > 
> > > What is the case we need to call kmalloc_node() even when alloc_pages_exact() fails ?
> > > vmalloc() may need to be called when the size of chunk is larger than
> > > MAX_ORDER or there is fragmentation.....
> > 
> > I kept the original kmalloc with fallback to vmalloc because vmalloc is
> > more scarce resource (especially on i386 where we can have memory
> > hotplug configured as well).
> > 
> 
> My point is, if alloc_pages_exact() failes because of order of the page,
> kmalloc() will always fail. 

You are right. I thought that kmalloc can make a difference due to reclaim
but the reclaim is already triggered by alloc_pages_exact and if it doesn't
succeed there are not big chances to have those pages ready for kmalloc.

> Please remove kmalloc().

OK.

Thanks for the review again and the updated patch is bellow:

Change since v3
- updated changelog - to not mentioned 2dim. solution
- get rid of kmalloc fallback based on Kame's suggestion.
- free_page_cgroup accidentally returned void* (we do not need any return value
  there)

Changes since v2
- rename alloc_mcg_table to alloc_page_cgroup
- free__mcg_table renamed to free_page_cgroup
- get VM_BUG_ON(!slab_is_available()) back into the allocation path

---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4
  2011-02-28  9:12           ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4 Michal Hocko
@ 2011-02-28  9:23             ` KAMEZAWA Hiroyuki
  2011-02-28  9:53               ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-28  9:23 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Dave Hansen, linux-mm, linux-kernel

On Mon, 28 Feb 2011 10:12:56 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> On Mon 28-02-11 09:53:47, KAMEZAWA Hiroyuki wrote:
> > On Fri, 25 Feb 2011 10:53:57 +0100
> > Michal Hocko <mhocko@suse.cz> wrote:
> > 
> > > On Fri 25-02-11 12:25:22, KAMEZAWA Hiroyuki wrote:
> > > > On Thu, 24 Feb 2011 14:40:45 +0100
> [...]
> > > > The patch itself is fine but please update the description.
> > > 
> > > I have updated the description but kept those parts which describe how
> > > the memory is wasted for different configurations. Do you have any tips
> > > how it can be improved?
> > > 
> > 
> > This part was in your description.
> > ==
> > We can reduce the internal fragmentation either by imeplementing 2
> > dimensional array and allocate kmalloc aligned sizes for each entry (as
> > suggested in https://lkml.org/lkml/2011/2/23/232) or we can get rid of
> > kmalloc altogether and allocate directly from the buddy allocator (use
> > alloc_pages_exact) as suggested by Dave Hansen.
> > ==
> > 
> > please remove 2 dimentional..... etc. That's just a history.
> 
> I just wanted to mention both approaches. OK, I can remove that, of
> course.
> 
> > > > 
> > > > But have some comments, below.
> > > [...]
> > > > > -/* __alloc_bootmem...() is protected by !slab_available() */
> > > > > +static void *__init_refok alloc_mcg_table(size_t size, int nid)
> > > > > +{
> > > > > +	void *addr = NULL;
> > > > > +	if((addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN)))
> > > > > +		return addr;
> > > > > +
> > > > > +	if (node_state(nid, N_HIGH_MEMORY)) {
> > > > > +		addr = kmalloc_node(size, GFP_KERNEL | __GFP_NOWARN, nid);
> > > > > +		if (!addr)
> > > > > +			addr = vmalloc_node(size, nid);
> > > > > +	} else {
> > > > > +		addr = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> > > > > +		if (!addr)
> > > > > +			addr = vmalloc(size);
> > > > > +	}
> > > > > +
> > > > > +	return addr;
> > > > > +}
> > > > 
> > > > What is the case we need to call kmalloc_node() even when alloc_pages_exact() fails ?
> > > > vmalloc() may need to be called when the size of chunk is larger than
> > > > MAX_ORDER or there is fragmentation.....
> > > 
> > > I kept the original kmalloc with fallback to vmalloc because vmalloc is
> > > more scarce resource (especially on i386 where we can have memory
> > > hotplug configured as well).
> > > 
> > 
> > My point is, if alloc_pages_exact() failes because of order of the page,
> > kmalloc() will always fail. 
> 
> You are right. I thought that kmalloc can make a difference due to reclaim
> but the reclaim is already triggered by alloc_pages_exact and if it doesn't
> succeed there are not big chances to have those pages ready for kmalloc.
> 
> > Please remove kmalloc().
> 
> OK.
> 
> Thanks for the review again and the updated patch is bellow:
> 
> Change since v3
> - updated changelog - to not mentioned 2dim. solution
> - get rid of kmalloc fallback based on Kame's suggestion.
> - free_page_cgroup accidentally returned void* (we do not need any return value
>   there)
> 
> Changes since v2
> - rename alloc_mcg_table to alloc_page_cgroup
> - free__mcg_table renamed to free_page_cgroup
> - get VM_BUG_ON(!slab_is_available()) back into the allocation path
> 
> ---
> From 84a9555741b59cb2a0a67b023e4bd0f92c670ca1 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 24 Feb 2011 11:25:44 +0100
> Subject: [PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
> 
> Currently we are allocating a single page_cgroup array per memory
> section (stored in mem_section->base) when CONFIG_SPARSEMEM is selected.
> This is correct but memory inefficient solution because the allocated
> memory (unless we fall back to vmalloc) is not kmalloc friendly:
>         - 32b - 16384 entries (20B per entry) fit into 327680B so the
>           524288B slab cache is used
>         - 32b with PAE - 131072 entries with 2621440B fit into 4194304B
>         - 64b - 32768 entries (40B per entry) fit into 2097152 cache
> 
> This is ~37% wasted space per memory section and it sumps up for the
> whole memory. On a x86_64 machine it is something like 6MB per 1GB of
> RAM.
> 
> We can reduce the internal fragmentation by using alloc_pages_exact
> which allocates PAGE_SIZE aligned blocks so we will get down to <4kB
> wasted memory per section which is much better.
> 
> We still need a fallback to vmalloc because we have no guarantees that
> we will have a continuous memory of that size (order-10) later on during
> the hotplug events.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> CC: Dave Hansen <dave@linux.vnet.ibm.com>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

But...nitpick, it may be from my fault..



> ---
>  mm/page_cgroup.c |   54 +++++++++++++++++++++++++++++++-----------------------
>  1 files changed, 31 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 5bffada..eae3cd2 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -105,7 +105,33 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
>  	return section->page_cgroup + pfn;
>  }
>  
> -/* __alloc_bootmem...() is protected by !slab_available() */
> +static void *__init_refok alloc_page_cgroup(size_t size, int nid)
> +{
> +	void *addr = NULL;
> +	if((addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN)))
> +		return addr;
> +
> +	if (node_state(nid, N_HIGH_MEMORY))
> +		addr = vmalloc_node(size, nid);
> +	else
> +		addr = vmalloc(size);
> +
> +	return addr;
> +}
> +
> +static void free_page_cgroup(void *addr)
> +{
> +	if (is_vmalloc_addr(addr)) {
> +		vfree(addr);
> +	} else {
> +		struct page *page = virt_to_page(addr);
> +		if (!PageReserved(page)) { /* Is bootmem ? */

I think we never see PageReserved if we just use alloc_pages_exact()/vmalloc().
Maybe my old patch was not enough and this kind of junks are remaining in
the original code.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4
  2011-02-28  9:53               ` Michal Hocko
@ 2011-02-28  9:48                 ` KAMEZAWA Hiroyuki
  2011-02-28 10:12                   ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-02-28  9:48 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Dave Hansen, linux-mm, linux-kernel

On Mon, 28 Feb 2011 10:53:16 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> On Mon 28-02-11 18:23:22, KAMEZAWA Hiroyuki wrote:
> [...]
> > > From 84a9555741b59cb2a0a67b023e4bd0f92c670ca1 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.cz>
> > > Date: Thu, 24 Feb 2011 11:25:44 +0100
> > > Subject: [PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
> > > 
> > > Currently we are allocating a single page_cgroup array per memory
> > > section (stored in mem_section->base) when CONFIG_SPARSEMEM is selected.
> > > This is correct but memory inefficient solution because the allocated
> > > memory (unless we fall back to vmalloc) is not kmalloc friendly:
> > >         - 32b - 16384 entries (20B per entry) fit into 327680B so the
> > >           524288B slab cache is used
> > >         - 32b with PAE - 131072 entries with 2621440B fit into 4194304B
> > >         - 64b - 32768 entries (40B per entry) fit into 2097152 cache
> > > 
> > > This is ~37% wasted space per memory section and it sumps up for the
> > > whole memory. On a x86_64 machine it is something like 6MB per 1GB of
> > > RAM.
> > > 
> > > We can reduce the internal fragmentation by using alloc_pages_exact
> > > which allocates PAGE_SIZE aligned blocks so we will get down to <4kB
> > > wasted memory per section which is much better.
> > > 
> > > We still need a fallback to vmalloc because we have no guarantees that
> > > we will have a continuous memory of that size (order-10) later on during
> > > the hotplug events.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > CC: Dave Hansen <dave@linux.vnet.ibm.com>
> > > CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Thanks. I will repost it with Andrew in the CC.
> 
> > 
> > But...nitpick, it may be from my fault..
> [...]
> > > +static void free_page_cgroup(void *addr)
> > > +{
> > > +	if (is_vmalloc_addr(addr)) {
> > > +		vfree(addr);
> > > +	} else {
> > > +		struct page *page = virt_to_page(addr);
> > > +		if (!PageReserved(page)) { /* Is bootmem ? */
> > 
> > I think we never see PageReserved if we just use alloc_pages_exact()/vmalloc().
> 
> I have checked that and we really do not (unless I am missing some
> subtle side effects). Anyway, I think we still should at least BUG_ON on
> that.
> 
> > Maybe my old patch was not enough and this kind of junks are remaining in
> > the original code.
> 
> Should I incorporate it into the patch. I think that a separate one
> would be better for readability.
> 
> ---
> From e7a897a42b526620eb4afada2d036e1c9ff9e62a Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Mon, 28 Feb 2011 10:43:12 +0100
> Subject: [PATCH] page_cgroup array is never stored on reserved pages
> 
> KAMEZAWA Hiroyuki noted that free_pages_cgroup doesn't have to check for
> PageReserved because we never store the array on reserved pages
> (neither alloc_pages_exact nor vmalloc use those pages).
> 
> So we can replace the check by a BUG_ON.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thank you.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4
  2011-02-28  9:23             ` KAMEZAWA Hiroyuki
@ 2011-02-28  9:53               ` Michal Hocko
  2011-02-28  9:48                 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2011-02-28  9:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, linux-mm, linux-kernel

On Mon 28-02-11 18:23:22, KAMEZAWA Hiroyuki wrote:
[...]
> > From 84a9555741b59cb2a0a67b023e4bd0f92c670ca1 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Thu, 24 Feb 2011 11:25:44 +0100
> > Subject: [PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM
> > 
> > Currently we are allocating a single page_cgroup array per memory
> > section (stored in mem_section->base) when CONFIG_SPARSEMEM is selected.
> > This is correct but memory inefficient solution because the allocated
> > memory (unless we fall back to vmalloc) is not kmalloc friendly:
> >         - 32b - 16384 entries (20B per entry) fit into 327680B so the
> >           524288B slab cache is used
> >         - 32b with PAE - 131072 entries with 2621440B fit into 4194304B
> >         - 64b - 32768 entries (40B per entry) fit into 2097152 cache
> > 
> > This is ~37% wasted space per memory section and it sumps up for the
> > whole memory. On a x86_64 machine it is something like 6MB per 1GB of
> > RAM.
> > 
> > We can reduce the internal fragmentation by using alloc_pages_exact
> > which allocates PAGE_SIZE aligned blocks so we will get down to <4kB
> > wasted memory per section which is much better.
> > 
> > We still need a fallback to vmalloc because we have no guarantees that
> > we will have a continuous memory of that size (order-10) later on during
> > the hotplug events.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > CC: Dave Hansen <dave@linux.vnet.ibm.com>
> > CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks. I will repost it with Andrew in the CC.

> 
> But...nitpick, it may be from my fault..
[...]
> > +static void free_page_cgroup(void *addr)
> > +{
> > +	if (is_vmalloc_addr(addr)) {
> > +		vfree(addr);
> > +	} else {
> > +		struct page *page = virt_to_page(addr);
> > +		if (!PageReserved(page)) { /* Is bootmem ? */
> 
> I think we never see PageReserved if we just use alloc_pages_exact()/vmalloc().

I have checked that and we really do not (unless I am missing some
subtle side effects). Anyway, I think we still should at least BUG_ON on
that.

> Maybe my old patch was not enough and this kind of junks are remaining in
> the original code.

Should I incorporate it into the patch. I think that a separate one
would be better for readability.

---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4
  2011-02-28  9:48                 ` KAMEZAWA Hiroyuki
@ 2011-02-28 10:12                   ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2011-02-28 10:12 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, linux-mm, linux-kernel

On Mon 28-02-11 18:48:21, KAMEZAWA Hiroyuki wrote:
> On Mon, 28 Feb 2011 10:53:16 +0100
> > From e7a897a42b526620eb4afada2d036e1c9ff9e62a Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Mon, 28 Feb 2011 10:43:12 +0100
> > Subject: [PATCH] page_cgroup array is never stored on reserved pages
> > 
> > KAMEZAWA Hiroyuki noted that free_pages_cgroup doesn't have to check for
> > PageReserved because we never store the array on reserved pages
> > (neither alloc_pages_exact nor vmalloc use those pages).
> > 
> > So we can replace the check by a BUG_ON.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Thank you.
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks!

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-28 10:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-23 15:10 [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM Michal Hocko
2011-02-23 18:19 ` Dave Hansen
2011-02-23 23:52   ` KAMEZAWA Hiroyuki
2011-02-24  9:35     ` Michal Hocko
2011-02-24 10:02       ` KAMEZAWA Hiroyuki
2011-02-24  9:33   ` Michal Hocko
2011-02-24 13:40   ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v2 Michal Hocko
2011-02-25  3:25     ` KAMEZAWA Hiroyuki
2011-02-25  9:53       ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v3 Michal Hocko
2011-02-28  0:53         ` KAMEZAWA Hiroyuki
2011-02-28  9:12           ` [RFC PATCH] page_cgroup: Reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM v4 Michal Hocko
2011-02-28  9:23             ` KAMEZAWA Hiroyuki
2011-02-28  9:53               ` Michal Hocko
2011-02-28  9:48                 ` KAMEZAWA Hiroyuki
2011-02-28 10:12                   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).