* [PATCH] mm/buddy: fix default NUMA nodes
@ 2012-06-09 15:11 Gavin Shan
2012-06-10 20:52 ` David Rientjes
0 siblings, 1 reply; 3+ messages in thread
From: Gavin Shan @ 2012-06-09 15:11 UTC (permalink / raw)
To: linux-mm; +Cc: hannes, akpm, Gavin Shan
In the core function __alloc_pages_nodemask() of buddy allocator,
the NUMA nodes would be allowed nodes of current process or online
high memory nodes if the nodemask passed into the function is NULL.
However, the current implementation of function __alloc_pages_nodemask()
might retrieve the preferred zones from the allowed nodes of current
process or online high memory nodes, but never use that in the case.
The patch fixes that. When the nodemask passed into __alloc_pages_nodemask()
is NULL. We will always use the nodemask from the allowed one of
current process or online high memory nodes.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
mm/page_alloc.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7892f84..dda83c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2474,6 +2474,7 @@ struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, nodemask_t *nodemask)
{
+ nodemask_t *preferred_nodemask = nodemask ? : &cpuset_current_mems_allowed;
enum zone_type high_zoneidx = gfp_zone(gfp_mask);
struct zone *preferred_zone;
struct page *page = NULL;
@@ -2501,19 +2502,18 @@ retry_cpuset:
cpuset_mems_cookie = get_mems_allowed();
/* The preferred zone is used for statistics later */
- first_zones_zonelist(zonelist, high_zoneidx,
- nodemask ? : &cpuset_current_mems_allowed,
+ first_zones_zonelist(zonelist, high_zoneidx, preferred_nodemask,
&preferred_zone);
if (!preferred_zone)
goto out;
/* First allocation attempt */
- page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
- zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET,
+ page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, preferred_nodemask,
+ order, zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET,
preferred_zone, migratetype);
if (unlikely(!page))
- page = __alloc_pages_slowpath(gfp_mask, order,
- zonelist, high_zoneidx, nodemask,
+ page = __alloc_pages_slowpath(gfp_mask, order, zonelist,
+ high_zoneidx, preferred_nodemask,
preferred_zone, migratetype);
trace_mm_page_alloc(page, order, gfp_mask, migratetype);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] mm/buddy: fix default NUMA nodes
2012-06-09 15:11 [PATCH] mm/buddy: fix default NUMA nodes Gavin Shan
@ 2012-06-10 20:52 ` David Rientjes
2012-06-11 3:41 ` Gavin Shan
0 siblings, 1 reply; 3+ messages in thread
From: David Rientjes @ 2012-06-10 20:52 UTC (permalink / raw)
To: Gavin Shan; +Cc: linux-mm, hannes, akpm
On Sun, 10 Jun 2012, Gavin Shan wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7892f84..dda83c5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2474,6 +2474,7 @@ struct page *
> __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> struct zonelist *zonelist, nodemask_t *nodemask)
> {
> + nodemask_t *preferred_nodemask = nodemask ? : &cpuset_current_mems_allowed;
> enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> struct zone *preferred_zone;
> struct page *page = NULL;
> @@ -2501,19 +2502,18 @@ retry_cpuset:
> cpuset_mems_cookie = get_mems_allowed();
>
> /* The preferred zone is used for statistics later */
> - first_zones_zonelist(zonelist, high_zoneidx,
> - nodemask ? : &cpuset_current_mems_allowed,
> + first_zones_zonelist(zonelist, high_zoneidx, preferred_nodemask,
> &preferred_zone);
> if (!preferred_zone)
> goto out;
>
> /* First allocation attempt */
> - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> - zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET,
> + page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, preferred_nodemask,
> + order, zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET,
> preferred_zone, migratetype);
> if (unlikely(!page))
> - page = __alloc_pages_slowpath(gfp_mask, order,
> - zonelist, high_zoneidx, nodemask,
> + page = __alloc_pages_slowpath(gfp_mask, order, zonelist,
> + high_zoneidx, preferred_nodemask,
> preferred_zone, migratetype);
>
> trace_mm_page_alloc(page, order, gfp_mask, migratetype);
Nack, this is wrong. The nodemask passed to first_zones_zonelist() is
only for statistics and is correct as written. The nodemask passed to
get_page_from_freelist() constrains the iteration to only those nodes
which would be done over cpuset_current_mems_allowed with your patch if a
NULL nodemask is passed into the page allocator (meaning it has a default
mempolicy). Allocations on non-cpuset nodes are allowed in some
contexts, see cpuset_zone_allowed_softwall(), so this would cause a
regression.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm/buddy: fix default NUMA nodes
2012-06-10 20:52 ` David Rientjes
@ 2012-06-11 3:41 ` Gavin Shan
0 siblings, 0 replies; 3+ messages in thread
From: Gavin Shan @ 2012-06-11 3:41 UTC (permalink / raw)
To: David Rientjes; +Cc: Gavin Shan, linux-mm, hannes, akpm
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7892f84..dda83c5 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2474,6 +2474,7 @@ struct page *
>> __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>> struct zonelist *zonelist, nodemask_t *nodemask)
>> {
>> + nodemask_t *preferred_nodemask = nodemask ? : &cpuset_current_mems_allowed;
>> enum zone_type high_zoneidx = gfp_zone(gfp_mask);
>> struct zone *preferred_zone;
>> struct page *page = NULL;
>> @@ -2501,19 +2502,18 @@ retry_cpuset:
>> cpuset_mems_cookie = get_mems_allowed();
>>
>> /* The preferred zone is used for statistics later */
>> - first_zones_zonelist(zonelist, high_zoneidx,
>> - nodemask ? : &cpuset_current_mems_allowed,
>> + first_zones_zonelist(zonelist, high_zoneidx, preferred_nodemask,
>> &preferred_zone);
>> if (!preferred_zone)
>> goto out;
>>
>> /* First allocation attempt */
>> - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
>> - zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET,
>> + page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, preferred_nodemask,
>> + order, zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET,
>> preferred_zone, migratetype);
>> if (unlikely(!page))
>> - page = __alloc_pages_slowpath(gfp_mask, order,
>> - zonelist, high_zoneidx, nodemask,
>> + page = __alloc_pages_slowpath(gfp_mask, order, zonelist,
>> + high_zoneidx, preferred_nodemask,
>> preferred_zone, migratetype);
>>
>> trace_mm_page_alloc(page, order, gfp_mask, migratetype);
>
>Nack, this is wrong. The nodemask passed to first_zones_zonelist() is
>only for statistics and is correct as written. The nodemask passed to
>get_page_from_freelist() constrains the iteration to only those nodes
>which would be done over cpuset_current_mems_allowed with your patch if a
>NULL nodemask is passed into the page allocator (meaning it has a default
>mempolicy). Allocations on non-cpuset nodes are allowed in some
>contexts, see cpuset_zone_allowed_softwall(), so this would cause a
>regression.
>
Thanks, David. I think you're correct. Please ignore/drop the code change :-)
Thanks,
Gavin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-06-11 3:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-09 15:11 [PATCH] mm/buddy: fix default NUMA nodes Gavin Shan
2012-06-10 20:52 ` David Rientjes
2012-06-11 3:41 ` Gavin Shan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).