* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-18 23:34 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-18 23:34 UTC (permalink / raw)
To: Michal Hocko; +Cc: Anton Blanchard, linuxppc-dev, LKML, linux-mm
Hi Michal,
On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> Hi,
> I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> enable zone reclaim). The commit message suggests that the zone reclaim
> is desirable for all NUMA configurations.
>
> History has shown that the zone reclaim is more often harmful than
> helpful and leads to performance problems. The default RECLAIM_DISTANCE
> for generic case has been increased from 20 to 30 around 3.0
> (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
Interesting.
> I strongly suspect that the patch is incorrect and it should be
> reverted. Before I will send a revert I would like to understand what
> led to the patch in the first place. I do not see why would PPC use only
> LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> seen use different values.
>
> Anton, could you comment please?
I'll let Anton comment here, but in looking into this issue in working
on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
memoryless nodes will set zone_reclaim_mode to 1. I think we want to
ignore memoryless nodes when we set up the reclaim mode like the
following? I'll send it as a proper patch if you agree?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5de4337..4f6ff6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
{
int i;
- for_each_online_node(i)
- if (node_distance(nid, i) <= RECLAIM_DISTANCE)
+ for_each_online_node(i) {
+ if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
+ local_memory_node(nid) != nid)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
zone_reclaim_mode = 1;
Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
not set, but if it is, I think semantically it will indicate that
memoryless nodes *have* to reclaim remotely.
And actually the above won't work, because the callpath is
start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
free_area_init_node -> init_zone_allows_reclaim] which is called before
build_all_zonelists. This is a similar ordering problem as I'm having
with the MEMORYLESS_NODE support, will work on it.
Thanks,
Nish
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-18 23:34 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-18 23:34 UTC (permalink / raw)
To: Michal Hocko; +Cc: Anton Blanchard, linuxppc-dev, LKML, linux-mm
Hi Michal,
On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> Hi,
> I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> enable zone reclaim). The commit message suggests that the zone reclaim
> is desirable for all NUMA configurations.
>
> History has shown that the zone reclaim is more often harmful than
> helpful and leads to performance problems. The default RECLAIM_DISTANCE
> for generic case has been increased from 20 to 30 around 3.0
> (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
Interesting.
> I strongly suspect that the patch is incorrect and it should be
> reverted. Before I will send a revert I would like to understand what
> led to the patch in the first place. I do not see why would PPC use only
> LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> seen use different values.
>
> Anton, could you comment please?
I'll let Anton comment here, but in looking into this issue in working
on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
memoryless nodes will set zone_reclaim_mode to 1. I think we want to
ignore memoryless nodes when we set up the reclaim mode like the
following? I'll send it as a proper patch if you agree?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5de4337..4f6ff6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
{
int i;
- for_each_online_node(i)
- if (node_distance(nid, i) <= RECLAIM_DISTANCE)
+ for_each_online_node(i) {
+ if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
+ local_memory_node(nid) != nid)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
zone_reclaim_mode = 1;
Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
not set, but if it is, I think semantically it will indicate that
memoryless nodes *have* to reclaim remotely.
And actually the above won't work, because the callpath is
start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
free_area_init_node -> init_zone_allows_reclaim] which is called before
build_all_zonelists. This is a similar ordering problem as I'm having
with the MEMORYLESS_NODE support, will work on it.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-18 23:34 ` Nishanth Aravamudan
@ 2014-02-18 23:58 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-18 23:58 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote:
> Hi Michal,
>
> On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > Hi,
> > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > enable zone reclaim). The commit message suggests that the zone reclaim
> > is desirable for all NUMA configurations.
> >
> > History has shown that the zone reclaim is more often harmful than
> > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > for generic case has been increased from 20 to 30 around 3.0
> > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
>
> Interesting.
>
> > I strongly suspect that the patch is incorrect and it should be
> > reverted. Before I will send a revert I would like to understand what
> > led to the patch in the first place. I do not see why would PPC use only
> > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > seen use different values.
> >
> > Anton, could you comment please?
>
> I'll let Anton comment here, but in looking into this issue in working
> on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> ignore memoryless nodes when we set up the reclaim mode like the
> following? I'll send it as a proper patch if you agree?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..4f6ff6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + for_each_online_node(i) {
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + local_memory_node(nid) != nid)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
>
> Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> not set, but if it is, I think semantically it will indicate that
> memoryless nodes *have* to reclaim remotely.
>
> And actually the above won't work, because the callpath is
>
> start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> free_area_init_node -> init_zone_allows_reclaim] which is called before
> build_all_zonelists. This is a similar ordering problem as I'm having
> with the MEMORYLESS_NODE support, will work on it.
How about the following?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5de4337..1a0eced 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
int i;
for_each_online_node(i)
- if (node_distance(nid, i) <= RECLAIM_DISTANCE)
+ if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
+ !NODE_DATA(nid)->node_present_pages)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
zone_reclaim_mode = 1;
@@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
- init_zone_allows_reclaim(nid);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
#endif
calculate_node_totalpages(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);
+ init_zone_allows_reclaim(nid);
alloc_node_mem_map(pgdat);
#ifdef CONFIG_FLAT_NODE_MEM_MAP
printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
I think it's safe to move init_zone_allows_reclaim, because I don't
think any allocates are occurring here that could cause us to reclaim
anyways, right? Moving it allows us to safely reference
node_present_pages.
Thanks,
Nish
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-18 23:58 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-18 23:58 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote:
> Hi Michal,
>
> On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > Hi,
> > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > enable zone reclaim). The commit message suggests that the zone reclaim
> > is desirable for all NUMA configurations.
> >
> > History has shown that the zone reclaim is more often harmful than
> > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > for generic case has been increased from 20 to 30 around 3.0
> > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
>
> Interesting.
>
> > I strongly suspect that the patch is incorrect and it should be
> > reverted. Before I will send a revert I would like to understand what
> > led to the patch in the first place. I do not see why would PPC use only
> > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > seen use different values.
> >
> > Anton, could you comment please?
>
> I'll let Anton comment here, but in looking into this issue in working
> on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> ignore memoryless nodes when we set up the reclaim mode like the
> following? I'll send it as a proper patch if you agree?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..4f6ff6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + for_each_online_node(i) {
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + local_memory_node(nid) != nid)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
>
> Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> not set, but if it is, I think semantically it will indicate that
> memoryless nodes *have* to reclaim remotely.
>
> And actually the above won't work, because the callpath is
>
> start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> free_area_init_node -> init_zone_allows_reclaim] which is called before
> build_all_zonelists. This is a similar ordering problem as I'm having
> with the MEMORYLESS_NODE support, will work on it.
How about the following?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5de4337..1a0eced 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
int i;
for_each_online_node(i)
- if (node_distance(nid, i) <= RECLAIM_DISTANCE)
+ if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
+ !NODE_DATA(nid)->node_present_pages)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
zone_reclaim_mode = 1;
@@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
- init_zone_allows_reclaim(nid);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
#endif
calculate_node_totalpages(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);
+ init_zone_allows_reclaim(nid);
alloc_node_mem_map(pgdat);
#ifdef CONFIG_FLAT_NODE_MEM_MAP
printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
I think it's safe to move init_zone_allows_reclaim, because I don't
think any allocates are occurring here that could cause us to reclaim
anyways, right? Moving it allows us to safely reference
node_present_pages.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-18 23:58 ` Nishanth Aravamudan
@ 2014-02-19 0:40 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 0:40 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 18.02.2014 [15:58:00 -0800], Nishanth Aravamudan wrote:
> On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote:
> > Hi Michal,
> >
> > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > > Hi,
> > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > > enable zone reclaim). The commit message suggests that the zone reclaim
> > > is desirable for all NUMA configurations.
> > >
> > > History has shown that the zone reclaim is more often harmful than
> > > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > > for generic case has been increased from 20 to 30 around 3.0
> > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
> >
> > Interesting.
> >
> > > I strongly suspect that the patch is incorrect and it should be
> > > reverted. Before I will send a revert I would like to understand what
> > > led to the patch in the first place. I do not see why would PPC use only
> > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > > seen use different values.
> > >
> > > Anton, could you comment please?
> >
> > I'll let Anton comment here, but in looking into this issue in working
> > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> > memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> > ignore memoryless nodes when we set up the reclaim mode like the
> > following? I'll send it as a proper patch if you agree?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..4f6ff6f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + for_each_online_node(i) {
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + local_memory_node(nid) != nid)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
> >
> > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> > not set, but if it is, I think semantically it will indicate that
> > memoryless nodes *have* to reclaim remotely.
> >
> > And actually the above won't work, because the callpath is
> >
> > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> > free_area_init_node -> init_zone_allows_reclaim] which is called before
> > build_all_zonelists. This is a similar ordering problem as I'm having
> > with the MEMORYLESS_NODE support, will work on it.
>
> How about the following?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..1a0eced 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> int i;
>
> for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + !NODE_DATA(nid)->node_present_pages)
err s/nid/i/ above.
-Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 0:40 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 0:40 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 18.02.2014 [15:58:00 -0800], Nishanth Aravamudan wrote:
> On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote:
> > Hi Michal,
> >
> > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > > Hi,
> > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > > enable zone reclaim). The commit message suggests that the zone reclaim
> > > is desirable for all NUMA configurations.
> > >
> > > History has shown that the zone reclaim is more often harmful than
> > > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > > for generic case has been increased from 20 to 30 around 3.0
> > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
> >
> > Interesting.
> >
> > > I strongly suspect that the patch is incorrect and it should be
> > > reverted. Before I will send a revert I would like to understand what
> > > led to the patch in the first place. I do not see why would PPC use only
> > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > > seen use different values.
> > >
> > > Anton, could you comment please?
> >
> > I'll let Anton comment here, but in looking into this issue in working
> > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> > memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> > ignore memoryless nodes when we set up the reclaim mode like the
> > following? I'll send it as a proper patch if you agree?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..4f6ff6f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + for_each_online_node(i) {
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + local_memory_node(nid) != nid)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
> >
> > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> > not set, but if it is, I think semantically it will indicate that
> > memoryless nodes *have* to reclaim remotely.
> >
> > And actually the above won't work, because the callpath is
> >
> > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> > free_area_init_node -> init_zone_allows_reclaim] which is called before
> > build_all_zonelists. This is a similar ordering problem as I'm having
> > with the MEMORYLESS_NODE support, will work on it.
>
> How about the following?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..1a0eced 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> int i;
>
> for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + !NODE_DATA(nid)->node_present_pages)
err s/nid/i/ above.
-Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-18 23:58 ` Nishanth Aravamudan
@ 2014-02-19 1:43 ` David Rientjes
-1 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2014-02-19 1:43 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> How about the following?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..1a0eced 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> int i;
>
> for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + !NODE_DATA(i)->node_present_pages)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
[ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
so we're looking at the right code. ]
That can't be right, it would allow reclaiming from a memoryless node. I
think what you want is
for_each_online_node(i) {
if (!node_present_pages(i))
continue;
if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
node_set(i, NODE_DATA(nid)->reclaim_nodes);
continue;
}
/* Always try to reclaim locally */
zone_reclaim_mode = 1;
}
but we really should be able to do for_each_node_state(i, N_MEMORY) here
and memoryless nodes should already be excluded from that mask.
> @@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> pgdat->node_id = nid;
> pgdat->node_start_pfn = node_start_pfn;
> - init_zone_allows_reclaim(nid);
> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> #endif
> calculate_node_totalpages(pgdat, start_pfn, end_pfn,
> zones_size, zholes_size);
>
> + init_zone_allows_reclaim(nid);
> alloc_node_mem_map(pgdat);
> #ifdef CONFIG_FLAT_NODE_MEM_MAP
> printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
>
> I think it's safe to move init_zone_allows_reclaim, because I don't
> think any allocates are occurring here that could cause us to reclaim
> anyways, right? Moving it allows us to safely reference
> node_present_pages.
>
Yeah, this is fine.
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 1:43 ` David Rientjes
0 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2014-02-19 1:43 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> How about the following?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..1a0eced 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> int i;
>
> for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + !NODE_DATA(i)->node_present_pages)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
[ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
so we're looking at the right code. ]
That can't be right, it would allow reclaiming from a memoryless node. I
think what you want is
for_each_online_node(i) {
if (!node_present_pages(i))
continue;
if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
node_set(i, NODE_DATA(nid)->reclaim_nodes);
continue;
}
/* Always try to reclaim locally */
zone_reclaim_mode = 1;
}
but we really should be able to do for_each_node_state(i, N_MEMORY) here
and memoryless nodes should already be excluded from that mask.
> @@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> pgdat->node_id = nid;
> pgdat->node_start_pfn = node_start_pfn;
> - init_zone_allows_reclaim(nid);
> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> #endif
> calculate_node_totalpages(pgdat, start_pfn, end_pfn,
> zones_size, zholes_size);
>
> + init_zone_allows_reclaim(nid);
> alloc_node_mem_map(pgdat);
> #ifdef CONFIG_FLAT_NODE_MEM_MAP
> printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
>
> I think it's safe to move init_zone_allows_reclaim, because I don't
> think any allocates are occurring here that could cause us to reclaim
> anyways, right? Moving it allows us to safely reference
> node_present_pages.
>
Yeah, this is fine.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-19 1:43 ` David Rientjes
(?)
@ 2014-02-19 8:33 ` Michal Hocko
-1 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 8:33 UTC (permalink / raw)
To: David Rientjes
Cc: linux-mm, Nishanth Aravamudan, linuxppc-dev, Anton Blanchard,
LKML
On Tue 18-02-14 17:43:38, David Rientjes wrote:
> On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
>
> > How about the following?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..1a0eced 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > int i;
> >
> > for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + !NODE_DATA(i)->node_present_pages)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
>
> [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> so we're looking at the right code. ]
>
> That can't be right, it would allow reclaiming from a memoryless node. I
> think what you want is
>
> for_each_online_node(i) {
> if (!node_present_pages(i))
> continue;
> if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> continue;
> }
> /* Always try to reclaim locally */
> zone_reclaim_mode = 1;
> }
>
> but we really should be able to do for_each_node_state(i, N_MEMORY) here
> and memoryless nodes should already be excluded from that mask.
Agreed! Actually the code I am currently interested in is based on 3.0
kernel where zone_reclaim_mode is set in build_zonelists which relies on
find_next_best_node which iterates only N_HIGH_MEMORY nodes which should
have non 0 present pages.
[...]
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 8:33 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 8:33 UTC (permalink / raw)
To: David Rientjes
Cc: Nishanth Aravamudan, linux-mm, linuxppc-dev, Anton Blanchard,
LKML
On Tue 18-02-14 17:43:38, David Rientjes wrote:
> On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
>
> > How about the following?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..1a0eced 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > int i;
> >
> > for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + !NODE_DATA(i)->node_present_pages)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
>
> [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> so we're looking at the right code. ]
>
> That can't be right, it would allow reclaiming from a memoryless node. I
> think what you want is
>
> for_each_online_node(i) {
> if (!node_present_pages(i))
> continue;
> if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> continue;
> }
> /* Always try to reclaim locally */
> zone_reclaim_mode = 1;
> }
>
> but we really should be able to do for_each_node_state(i, N_MEMORY) here
> and memoryless nodes should already be excluded from that mask.
Agreed! Actually the code I am currently interested in is based on 3.0
kernel where zone_reclaim_mode is set in build_zonelists which relies on
find_next_best_node which iterates only N_HIGH_MEMORY nodes which should
have non 0 present pages.
[...]
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 8:33 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 8:33 UTC (permalink / raw)
To: David Rientjes
Cc: Nishanth Aravamudan, linux-mm, linuxppc-dev, Anton Blanchard,
LKML
On Tue 18-02-14 17:43:38, David Rientjes wrote:
> On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
>
> > How about the following?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..1a0eced 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > int i;
> >
> > for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + !NODE_DATA(i)->node_present_pages)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
>
> [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> so we're looking at the right code. ]
>
> That can't be right, it would allow reclaiming from a memoryless node. I
> think what you want is
>
> for_each_online_node(i) {
> if (!node_present_pages(i))
> continue;
> if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> continue;
> }
> /* Always try to reclaim locally */
> zone_reclaim_mode = 1;
> }
>
> but we really should be able to do for_each_node_state(i, N_MEMORY) here
> and memoryless nodes should already be excluded from that mask.
Agreed! Actually the code I am currently interested in is based on 3.0
kernel where zone_reclaim_mode is set in build_zonelists which relies on
find_next_best_node which iterates only N_HIGH_MEMORY nodes which should
have non 0 present pages.
[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-19 1:43 ` David Rientjes
@ 2014-02-19 16:24 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 16:24 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 18.02.2014 [17:43:38 -0800], David Rientjes wrote:
> On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
>
> > How about the following?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..1a0eced 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > int i;
> >
> > for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + !NODE_DATA(i)->node_present_pages)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
>
> [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> so we're looking at the right code. ]
>
> That can't be right, it would allow reclaiming from a memoryless node. I
> think what you want is
Gah, you're right.
> for_each_online_node(i) {
> if (!node_present_pages(i))
> continue;
> if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> continue;
> }
> /* Always try to reclaim locally */
> zone_reclaim_mode = 1;
> }
>
> but we really should be able to do for_each_node_state(i, N_MEMORY) here
> and memoryless nodes should already be excluded from that mask.
Yep, I found that afterwards, which simplifies the logic. I'll add this
to my series :)
<snip>
> > I think it's safe to move init_zone_allows_reclaim, because I don't
> > think any allocates are occurring here that could cause us to reclaim
> > anyways, right? Moving it allows us to safely reference
> > node_present_pages.
> >
>
> Yeah, this is fine.
Thanks,
Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 16:24 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 16:24 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 18.02.2014 [17:43:38 -0800], David Rientjes wrote:
> On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
>
> > How about the following?
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..1a0eced 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > int i;
> >
> > for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + !NODE_DATA(i)->node_present_pages)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
>
> [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> so we're looking at the right code. ]
>
> That can't be right, it would allow reclaiming from a memoryless node. I
> think what you want is
Gah, you're right.
> for_each_online_node(i) {
> if (!node_present_pages(i))
> continue;
> if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> continue;
> }
> /* Always try to reclaim locally */
> zone_reclaim_mode = 1;
> }
>
> but we really should be able to do for_each_node_state(i, N_MEMORY) here
> and memoryless nodes should already be excluded from that mask.
Yep, I found that afterwards, which simplifies the logic. I'll add this
to my series :)
<snip>
> > I think it's safe to move init_zone_allows_reclaim, because I don't
> > think any allocates are occurring here that could cause us to reclaim
> > anyways, right? Moving it allows us to safely reference
> > node_present_pages.
> >
>
> Yeah, this is fine.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-19 16:24 ` Nishanth Aravamudan
@ 2014-02-19 16:33 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 16:33 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 19.02.2014 [08:24:38 -0800], Nishanth Aravamudan wrote:
> On 18.02.2014 [17:43:38 -0800], David Rientjes wrote:
> > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> >
> > > How about the following?
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 5de4337..1a0eced 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > int i;
> > >
> > > for_each_online_node(i)
> > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > > + !NODE_DATA(i)->node_present_pages)
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > else
> > > zone_reclaim_mode = 1;
> >
> > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> > so we're looking at the right code. ]
> >
> > That can't be right, it would allow reclaiming from a memoryless node. I
> > think what you want is
>
> Gah, you're right.
>
> > for_each_online_node(i) {
> > if (!node_present_pages(i))
> > continue;
> > if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > continue;
> > }
> > /* Always try to reclaim locally */
> > zone_reclaim_mode = 1;
> > }
> >
> > but we really should be able to do for_each_node_state(i, N_MEMORY) here
> > and memoryless nodes should already be excluded from that mask.
>
> Yep, I found that afterwards, which simplifies the logic. I'll add this
> to my series :)
In looking at the code, I am wondering about the following:
init_zone_allows_reclaim() is called for each nid from
free_area_init_node(). Which means that calculate_node_totalpages for
other "later" nids and check_for_memory() [which sets up the N_MEMORY
nodemask] hasn't been called yet.
So, would it make sense to pull up the
/* Any memory on that node */
if (pgdat->node_present_pages)
node_set_state(nid, N_MEMORY);
check_for_memory(pgdat, nid);
into free_area_init_node()?
Thanks,
Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 16:33 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 16:33 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 19.02.2014 [08:24:38 -0800], Nishanth Aravamudan wrote:
> On 18.02.2014 [17:43:38 -0800], David Rientjes wrote:
> > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> >
> > > How about the following?
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 5de4337..1a0eced 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > int i;
> > >
> > > for_each_online_node(i)
> > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > > + !NODE_DATA(i)->node_present_pages)
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > else
> > > zone_reclaim_mode = 1;
> >
> > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> > so we're looking at the right code. ]
> >
> > That can't be right, it would allow reclaiming from a memoryless node. I
> > think what you want is
>
> Gah, you're right.
>
> > for_each_online_node(i) {
> > if (!node_present_pages(i))
> > continue;
> > if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > continue;
> > }
> > /* Always try to reclaim locally */
> > zone_reclaim_mode = 1;
> > }
> >
> > but we really should be able to do for_each_node_state(i, N_MEMORY) here
> > and memoryless nodes should already be excluded from that mask.
>
> Yep, I found that afterwards, which simplifies the logic. I'll add this
> to my series :)
In looking at the code, I am wondering about the following:
init_zone_allows_reclaim() is called for each nid from
free_area_init_node(). Which means that calculate_node_totalpages for
other "later" nids and check_for_memory() [which sets up the N_MEMORY
nodemask] hasn't been called yet.
So, would it make sense to pull up the
/* Any memory on that node */
if (pgdat->node_present_pages)
node_set_state(nid, N_MEMORY);
check_for_memory(pgdat, nid);
into free_area_init_node()?
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-19 16:33 ` Nishanth Aravamudan
(?)
@ 2014-02-20 9:55 ` Michal Hocko
-1 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-20 9:55 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML, David Rientjes
On Wed 19-02-14 08:33:45, Nishanth Aravamudan wrote:
> On 19.02.2014 [08:24:38 -0800], Nishanth Aravamudan wrote:
> > On 18.02.2014 [17:43:38 -0800], David Rientjes wrote:
> > > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> > >
> > > > How about the following?
> > > >
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 5de4337..1a0eced 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > > int i;
> > > >
> > > > for_each_online_node(i)
> > > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > > > + !NODE_DATA(i)->node_present_pages)
> > > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > > else
> > > > zone_reclaim_mode = 1;
> > >
> > > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> > > so we're looking at the right code. ]
> > >
> > > That can't be right, it would allow reclaiming from a memoryless node. I
> > > think what you want is
> >
> > Gah, you're right.
> >
> > > for_each_online_node(i) {
> > > if (!node_present_pages(i))
> > > continue;
> > > if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > continue;
> > > }
> > > /* Always try to reclaim locally */
> > > zone_reclaim_mode = 1;
> > > }
> > >
> > > but we really should be able to do for_each_node_state(i, N_MEMORY) here
> > > and memoryless nodes should already be excluded from that mask.
> >
> > Yep, I found that afterwards, which simplifies the logic. I'll add this
> > to my series :)
>
> In looking at the code, I am wondering about the following:
>
> init_zone_allows_reclaim() is called for each nid from
> free_area_init_node(). Which means that calculate_node_totalpages for
> other "later" nids and check_for_memory() [which sets up the N_MEMORY
> nodemask] hasn't been called yet.
>
> So, would it make sense to pull up the
> /* Any memory on that node */
> if (pgdat->node_present_pages)
> node_set_state(nid, N_MEMORY);
> check_for_memory(pgdat, nid);
> into free_area_init_node()?
Dunno, but it shouldn't be needed because nodes are set N_MEMORY earlier
in early_calculate_totalpages as mentioned in other email.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-20 9:55 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-20 9:55 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: David Rientjes, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On Wed 19-02-14 08:33:45, Nishanth Aravamudan wrote:
> On 19.02.2014 [08:24:38 -0800], Nishanth Aravamudan wrote:
> > On 18.02.2014 [17:43:38 -0800], David Rientjes wrote:
> > > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> > >
> > > > How about the following?
> > > >
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 5de4337..1a0eced 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > > int i;
> > > >
> > > > for_each_online_node(i)
> > > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > > > + !NODE_DATA(i)->node_present_pages)
> > > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > > else
> > > > zone_reclaim_mode = 1;
> > >
> > > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> > > so we're looking at the right code. ]
> > >
> > > That can't be right, it would allow reclaiming from a memoryless node. I
> > > think what you want is
> >
> > Gah, you're right.
> >
> > > for_each_online_node(i) {
> > > if (!node_present_pages(i))
> > > continue;
> > > if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > continue;
> > > }
> > > /* Always try to reclaim locally */
> > > zone_reclaim_mode = 1;
> > > }
> > >
> > > but we really should be able to do for_each_node_state(i, N_MEMORY) here
> > > and memoryless nodes should already be excluded from that mask.
> >
> > Yep, I found that afterwards, which simplifies the logic. I'll add this
> > to my series :)
>
> In looking at the code, I am wondering about the following:
>
> init_zone_allows_reclaim() is called for each nid from
> free_area_init_node(). Which means that calculate_node_totalpages for
> other "later" nids and check_for_memory() [which sets up the N_MEMORY
> nodemask] hasn't been called yet.
>
> So, would it make sense to pull up the
> /* Any memory on that node */
> if (pgdat->node_present_pages)
> node_set_state(nid, N_MEMORY);
> check_for_memory(pgdat, nid);
> into free_area_init_node()?
Dunno, but it shouldn't be needed because nodes are set N_MEMORY earlier
in early_calculate_totalpages as mentioned in other email.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-20 9:55 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-20 9:55 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: David Rientjes, linux-mm, linuxppc-dev, Anton Blanchard, LKML
On Wed 19-02-14 08:33:45, Nishanth Aravamudan wrote:
> On 19.02.2014 [08:24:38 -0800], Nishanth Aravamudan wrote:
> > On 18.02.2014 [17:43:38 -0800], David Rientjes wrote:
> > > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote:
> > >
> > > > How about the following?
> > > >
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 5de4337..1a0eced 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > > int i;
> > > >
> > > > for_each_online_node(i)
> > > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > > > + !NODE_DATA(i)->node_present_pages)
> > > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > > else
> > > > zone_reclaim_mode = 1;
> > >
> > > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught
> > > so we're looking at the right code. ]
> > >
> > > That can't be right, it would allow reclaiming from a memoryless node. I
> > > think what you want is
> >
> > Gah, you're right.
> >
> > > for_each_online_node(i) {
> > > if (!node_present_pages(i))
> > > continue;
> > > if (node_distance(nid, i) <= RECLAIM_DISTANCE) {
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > continue;
> > > }
> > > /* Always try to reclaim locally */
> > > zone_reclaim_mode = 1;
> > > }
> > >
> > > but we really should be able to do for_each_node_state(i, N_MEMORY) here
> > > and memoryless nodes should already be excluded from that mask.
> >
> > Yep, I found that afterwards, which simplifies the logic. I'll add this
> > to my series :)
>
> In looking at the code, I am wondering about the following:
>
> init_zone_allows_reclaim() is called for each nid from
> free_area_init_node(). Which means that calculate_node_totalpages for
> other "later" nids and check_for_memory() [which sets up the N_MEMORY
> nodemask] hasn't been called yet.
>
> So, would it make sense to pull up the
> /* Any memory on that node */
> if (pgdat->node_present_pages)
> node_set_state(nid, N_MEMORY);
> check_for_memory(pgdat, nid);
> into free_area_init_node()?
Dunno, but it shouldn't be needed because nodes are set N_MEMORY earlier
in early_calculate_totalpages as mentioned in other email.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-18 23:34 ` Nishanth Aravamudan
(?)
@ 2014-02-19 8:23 ` Michal Hocko
-1 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 8:23 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML
On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote:
> Hi Michal,
>
> On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > Hi,
> > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > enable zone reclaim). The commit message suggests that the zone reclaim
> > is desirable for all NUMA configurations.
> >
> > History has shown that the zone reclaim is more often harmful than
> > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > for generic case has been increased from 20 to 30 around 3.0
> > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
>
> Interesting.
>
> > I strongly suspect that the patch is incorrect and it should be
> > reverted. Before I will send a revert I would like to understand what
> > led to the patch in the first place. I do not see why would PPC use only
> > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > seen use different values.
> >
> > Anton, could you comment please?
>
> I'll let Anton comment here, but in looking into this issue in working
> on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> ignore memoryless nodes when we set up the reclaim mode like the
> following? I'll send it as a proper patch if you agree?
Funny enough, ppc memoryless node setup is what led me to this code.
We had a setup like this:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node 0 2
0: 10 40
2: 40 10
Which ends up enabling zone_reclaim although there is only a single node
with memory. Not that RECLAIM_DISTANCE would make any difference here as
the distance is even above default RECLAIM_DISTANCE.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..4f6ff6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + for_each_online_node(i) {
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + local_memory_node(nid) != nid)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
>
> Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> not set, but if it is, I think semantically it will indicate that
> memoryless nodes *have* to reclaim remotely.
>
> And actually the above won't work, because the callpath is
>
> start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> free_area_init_node -> init_zone_allows_reclaim] which is called before
> build_all_zonelists. This is a similar ordering problem as I'm having
> with the MEMORYLESS_NODE support, will work on it.
I think you just want for_each_node_state(nid, N_MEMORY) and skip all
the memory less nodes, no?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 8:23 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 8:23 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Anton Blanchard, linuxppc-dev, LKML, linux-mm
On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote:
> Hi Michal,
>
> On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > Hi,
> > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > enable zone reclaim). The commit message suggests that the zone reclaim
> > is desirable for all NUMA configurations.
> >
> > History has shown that the zone reclaim is more often harmful than
> > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > for generic case has been increased from 20 to 30 around 3.0
> > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
>
> Interesting.
>
> > I strongly suspect that the patch is incorrect and it should be
> > reverted. Before I will send a revert I would like to understand what
> > led to the patch in the first place. I do not see why would PPC use only
> > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > seen use different values.
> >
> > Anton, could you comment please?
>
> I'll let Anton comment here, but in looking into this issue in working
> on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> ignore memoryless nodes when we set up the reclaim mode like the
> following? I'll send it as a proper patch if you agree?
Funny enough, ppc memoryless node setup is what led me to this code.
We had a setup like this:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node 0 2
0: 10 40
2: 40 10
Which ends up enabling zone_reclaim although there is only a single node
with memory. Not that RECLAIM_DISTANCE would make any difference here as
the distance is even above default RECLAIM_DISTANCE.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..4f6ff6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + for_each_online_node(i) {
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + local_memory_node(nid) != nid)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
>
> Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> not set, but if it is, I think semantically it will indicate that
> memoryless nodes *have* to reclaim remotely.
>
> And actually the above won't work, because the callpath is
>
> start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> free_area_init_node -> init_zone_allows_reclaim] which is called before
> build_all_zonelists. This is a similar ordering problem as I'm having
> with the MEMORYLESS_NODE support, will work on it.
I think you just want for_each_node_state(nid, N_MEMORY) and skip all
the memory less nodes, no?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 8:23 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 8:23 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Anton Blanchard, linuxppc-dev, LKML, linux-mm
On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote:
> Hi Michal,
>
> On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > Hi,
> > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > enable zone reclaim). The commit message suggests that the zone reclaim
> > is desirable for all NUMA configurations.
> >
> > History has shown that the zone reclaim is more often harmful than
> > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > for generic case has been increased from 20 to 30 around 3.0
> > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
>
> Interesting.
>
> > I strongly suspect that the patch is incorrect and it should be
> > reverted. Before I will send a revert I would like to understand what
> > led to the patch in the first place. I do not see why would PPC use only
> > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > seen use different values.
> >
> > Anton, could you comment please?
>
> I'll let Anton comment here, but in looking into this issue in working
> on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> ignore memoryless nodes when we set up the reclaim mode like the
> following? I'll send it as a proper patch if you agree?
Funny enough, ppc memoryless node setup is what led me to this code.
We had a setup like this:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node 0 2
0: 10 40
2: 40 10
Which ends up enabling zone_reclaim although there is only a single node
with memory. Not that RECLAIM_DISTANCE would make any difference here as
the distance is even above default RECLAIM_DISTANCE.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..4f6ff6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + for_each_online_node(i) {
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + local_memory_node(nid) != nid)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
>
> Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> not set, but if it is, I think semantically it will indicate that
> memoryless nodes *have* to reclaim remotely.
>
> And actually the above won't work, because the callpath is
>
> start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> free_area_init_node -> init_zone_allows_reclaim] which is called before
> build_all_zonelists. This is a similar ordering problem as I'm having
> with the MEMORYLESS_NODE support, will work on it.
I think you just want for_each_node_state(nid, N_MEMORY) and skip all
the memory less nodes, no?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
2014-02-19 8:23 ` Michal Hocko
(?)
@ 2014-02-19 16:26 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 16:26 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML
On 19.02.2014 [09:23:13 +0100], Michal Hocko wrote:
> On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote:
> > Hi Michal,
> >
> > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > > Hi,
> > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > > enable zone reclaim). The commit message suggests that the zone reclaim
> > > is desirable for all NUMA configurations.
> > >
> > > History has shown that the zone reclaim is more often harmful than
> > > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > > for generic case has been increased from 20 to 30 around 3.0
> > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
> >
> > Interesting.
> >
> > > I strongly suspect that the patch is incorrect and it should be
> > > reverted. Before I will send a revert I would like to understand what
> > > led to the patch in the first place. I do not see why would PPC use only
> > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > > seen use different values.
> > >
> > > Anton, could you comment please?
> >
> > I'll let Anton comment here, but in looking into this issue in working
> > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> > memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> > ignore memoryless nodes when we set up the reclaim mode like the
> > following? I'll send it as a proper patch if you agree?
>
> Funny enough, ppc memoryless node setup is what led me to this code.
> We had a setup like this:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node 0 2
> 0: 10 40
> 2: 40 10
Yeah, I think this happens fairly often ... and we didn't properly
support it (particularly with SLUB) on powerpc. I'll cc you on my
patchset.
> Which ends up enabling zone_reclaim although there is only a single node
> with memory. Not that RECLAIM_DISTANCE would make any difference here as
> the distance is even above default RECLAIM_DISTANCE.
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..4f6ff6f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + for_each_online_node(i) {
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + local_memory_node(nid) != nid)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
> >
> > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> > not set, but if it is, I think semantically it will indicate that
> > memoryless nodes *have* to reclaim remotely.
> >
> > And actually the above won't work, because the callpath is
> >
> > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> > free_area_init_node -> init_zone_allows_reclaim] which is called before
> > build_all_zonelists. This is a similar ordering problem as I'm having
> > with the MEMORYLESS_NODE support, will work on it.
>
> I think you just want for_each_node_state(nid, N_MEMORY) and skip all
> the memory less nodes, no?
Yep, thanks!
-Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 16:26 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 16:26 UTC (permalink / raw)
To: Michal Hocko; +Cc: Anton Blanchard, linuxppc-dev, LKML, linux-mm
On 19.02.2014 [09:23:13 +0100], Michal Hocko wrote:
> On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote:
> > Hi Michal,
> >
> > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > > Hi,
> > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > > enable zone reclaim). The commit message suggests that the zone reclaim
> > > is desirable for all NUMA configurations.
> > >
> > > History has shown that the zone reclaim is more often harmful than
> > > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > > for generic case has been increased from 20 to 30 around 3.0
> > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
> >
> > Interesting.
> >
> > > I strongly suspect that the patch is incorrect and it should be
> > > reverted. Before I will send a revert I would like to understand what
> > > led to the patch in the first place. I do not see why would PPC use only
> > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > > seen use different values.
> > >
> > > Anton, could you comment please?
> >
> > I'll let Anton comment here, but in looking into this issue in working
> > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> > memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> > ignore memoryless nodes when we set up the reclaim mode like the
> > following? I'll send it as a proper patch if you agree?
>
> Funny enough, ppc memoryless node setup is what led me to this code.
> We had a setup like this:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node 0 2
> 0: 10 40
> 2: 40 10
Yeah, I think this happens fairly often ... and we didn't properly
support it (particularly with SLUB) on powerpc. I'll cc you on my
patchset.
> Which ends up enabling zone_reclaim although there is only a single node
> with memory. Not that RECLAIM_DISTANCE would make any difference here as
> the distance is even above default RECLAIM_DISTANCE.
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..4f6ff6f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + for_each_online_node(i) {
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + local_memory_node(nid) != nid)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
> >
> > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> > not set, but if it is, I think semantically it will indicate that
> > memoryless nodes *have* to reclaim remotely.
> >
> > And actually the above won't work, because the callpath is
> >
> > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> > free_area_init_node -> init_zone_allows_reclaim] which is called before
> > build_all_zonelists. This is a similar ordering problem as I'm having
> > with the MEMORYLESS_NODE support, will work on it.
>
> I think you just want for_each_node_state(nid, N_MEMORY) and skip all
> the memory less nodes, no?
Yep, thanks!
-Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: ppc: RECLAIM_DISTANCE 10?
@ 2014-02-19 16:26 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 16:26 UTC (permalink / raw)
To: Michal Hocko; +Cc: Anton Blanchard, linuxppc-dev, LKML, linux-mm
On 19.02.2014 [09:23:13 +0100], Michal Hocko wrote:
> On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote:
> > Hi Michal,
> >
> > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > > Hi,
> > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > > enable zone reclaim). The commit message suggests that the zone reclaim
> > > is desirable for all NUMA configurations.
> > >
> > > History has shown that the zone reclaim is more often harmful than
> > > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > > for generic case has been increased from 20 to 30 around 3.0
> > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
> >
> > Interesting.
> >
> > > I strongly suspect that the patch is incorrect and it should be
> > > reverted. Before I will send a revert I would like to understand what
> > > led to the patch in the first place. I do not see why would PPC use only
> > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > > seen use different values.
> > >
> > > Anton, could you comment please?
> >
> > I'll let Anton comment here, but in looking into this issue in working
> > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> > memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> > ignore memoryless nodes when we set up the reclaim mode like the
> > following? I'll send it as a proper patch if you agree?
>
> Funny enough, ppc memoryless node setup is what led me to this code.
> We had a setup like this:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node 0 2
> 0: 10 40
> 2: 40 10
Yeah, I think this happens fairly often ... and we didn't properly
support it (particularly with SLUB) on powerpc. I'll cc you on my
patchset.
> Which ends up enabling zone_reclaim although there is only a single node
> with memory. Not that RECLAIM_DISTANCE would make any difference here as
> the distance is even above default RECLAIM_DISTANCE.
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5de4337..4f6ff6f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > + for_each_online_node(i) {
> > + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> > + local_memory_node(nid) != nid)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > zone_reclaim_mode = 1;
> >
> > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> > not set, but if it is, I think semantically it will indicate that
> > memoryless nodes *have* to reclaim remotely.
> >
> > And actually the above won't work, because the callpath is
> >
> > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> > free_area_init_node -> init_zone_allows_reclaim] which is called before
> > build_all_zonelists. This is a similar ordering problem as I'm having
> > with the MEMORYLESS_NODE support, will work on it.
>
> I think you just want for_each_node_state(nid, N_MEMORY) and skip all
> the memory less nodes, no?
Yep, thanks!
-Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 8:23 ` Michal Hocko
@ 2014-02-19 17:03 ` Michal Hocko
-1 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 17:03 UTC (permalink / raw)
To: linux-mm; +Cc: David Rientjes, Nishanth Aravamudan, LKML
We had a report about strange OOM killer strikes on a PPC machine
although there was a lot of swap free and a tons of anonymous memory
which could be swapped out. In the end it turned out that the OOM was
a side effect of zone reclaim which wasn't doesn't unmap and swapp out
and so the system was pushed to the OOM. Although this sounds like a bug
somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
numactl on the said hardware suggests that the zone reclaim should
have been set in the first place:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node 0 2
0: 10 40
2: 40 10
So all the CPUs are associated with Node0 which doesn't have any memory
while Node2 contains all the available memory. Node distances cause an
automatic zone_reclaim_mode enabling.
Zone reclaim is intended to keep the allocations local but this doesn't
make any sense on the memory less nodes. So let's exlcude such nodes
for init_zone_allows_reclaim which evaluates zone reclaim behavior and
suitable reclaim_nodes.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
I haven't got to testing this so I am sending this as an RFC for now.
But does this look reasonable?
mm/page_alloc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e953f07edb0..4a44bdc7a8cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
{
int i;
- for_each_online_node(i)
+ for_each_node_state(i, N_HIGH_MEMORY)
if (node_distance(nid, i) <= RECLAIM_DISTANCE)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
@@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
- init_zone_allows_reclaim(nid);
+ if (node_state(nid, N_HIGH_MEMORY))
+ init_zone_allows_reclaim(nid);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
#endif
--
1.9.0.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 17:03 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 17:03 UTC (permalink / raw)
To: linux-mm; +Cc: David Rientjes, Nishanth Aravamudan, LKML
We had a report about strange OOM killer strikes on a PPC machine
although there was a lot of swap free and a tons of anonymous memory
which could be swapped out. In the end it turned out that the OOM was
a side effect of zone reclaim which wasn't doesn't unmap and swapp out
and so the system was pushed to the OOM. Although this sounds like a bug
somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
numactl on the said hardware suggests that the zone reclaim should
have been set in the first place:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node 0 2
0: 10 40
2: 40 10
So all the CPUs are associated with Node0 which doesn't have any memory
while Node2 contains all the available memory. Node distances cause an
automatic zone_reclaim_mode enabling.
Zone reclaim is intended to keep the allocations local but this doesn't
make any sense on the memory less nodes. So let's exlcude such nodes
for init_zone_allows_reclaim which evaluates zone reclaim behavior and
suitable reclaim_nodes.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
I haven't got to testing this so I am sending this as an RFC for now.
But does this look reasonable?
mm/page_alloc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e953f07edb0..4a44bdc7a8cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
{
int i;
- for_each_online_node(i)
+ for_each_node_state(i, N_HIGH_MEMORY)
if (node_distance(nid, i) <= RECLAIM_DISTANCE)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
@@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
- init_zone_allows_reclaim(nid);
+ if (node_state(nid, N_HIGH_MEMORY))
+ init_zone_allows_reclaim(nid);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
#endif
--
1.9.0.rc3
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 17:03 ` Michal Hocko
@ 2014-02-19 17:16 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 17:16 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, David Rientjes, LKML
On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node 0 2
> 0: 10 40
> 2: 40 10
>
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
>
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exlcude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> I haven't got to testing this so I am sending this as an RFC for now.
> But does this look reasonable?
>
> mm/page_alloc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e953f07edb0..4a44bdc7a8cf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> + for_each_node_state(i, N_HIGH_MEMORY)
> if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> pgdat->node_id = nid;
> pgdat->node_start_pfn = node_start_pfn;
> - init_zone_allows_reclaim(nid);
> + if (node_state(nid, N_HIGH_MEMORY))
> + init_zone_allows_reclaim(nid);
I don't think this will work, because what sets N_HIGH_MEMORY (and
shouldn't it be N_MEMORY?) is check_for_memory() (free_area_init_nodes()
for N_MEMORY), which is run *after* init_zone_allows_reclaim(). Further,
the for_each_node_state() loop doesn't make sense at this point, becuase
we are actually setting up the nids as we go. So node 0, will only see
node 0 in the N_HIGH_MEMORY mask (if any). Node 1, will only see nodes 0
and 1, etc.
I'm working on testing a patch that reorders some of this in hopefully a
safe way.
Thanks,
Nish
> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> #endif
> --
> 1.9.0.rc3
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 17:16 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 17:16 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, David Rientjes, LKML
On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node 0 2
> 0: 10 40
> 2: 40 10
>
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
>
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exlcude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> I haven't got to testing this so I am sending this as an RFC for now.
> But does this look reasonable?
>
> mm/page_alloc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e953f07edb0..4a44bdc7a8cf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> + for_each_node_state(i, N_HIGH_MEMORY)
> if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> pgdat->node_id = nid;
> pgdat->node_start_pfn = node_start_pfn;
> - init_zone_allows_reclaim(nid);
> + if (node_state(nid, N_HIGH_MEMORY))
> + init_zone_allows_reclaim(nid);
I don't think this will work, because what sets N_HIGH_MEMORY (and
shouldn't it be N_MEMORY?) is check_for_memory() (free_area_init_nodes()
for N_MEMORY), which is run *after* init_zone_allows_reclaim(). Further,
the for_each_node_state() loop doesn't make sense at this point, becuase
we are actually setting up the nids as we go. So node 0, will only see
node 0 in the N_HIGH_MEMORY mask (if any). Node 1, will only see nodes 0
and 1, etc.
I'm working on testing a patch that reorders some of this in hopefully a
safe way.
Thanks,
Nish
> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> #endif
> --
> 1.9.0.rc3
>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 17:16 ` Nishanth Aravamudan
@ 2014-02-19 17:32 ` Michal Hocko
-1 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 17:32 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: linux-mm, David Rientjes, LKML
On Wed 19-02-14 09:16:28, Nishanth Aravamudan wrote:
> On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> > We had a report about strange OOM killer strikes on a PPC machine
> > although there was a lot of swap free and a tons of anonymous memory
> > which could be swapped out. In the end it turned out that the OOM was
> > a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> > and so the system was pushed to the OOM. Although this sounds like a bug
> > somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> > numactl on the said hardware suggests that the zone reclaim should
> > have been set in the first place:
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> > node 0 size: 0 MB
> > node 0 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7168 MB
> > node 2 free: 6019 MB
> > node distances:
> > node 0 2
> > 0: 10 40
> > 2: 40 10
> >
> > So all the CPUs are associated with Node0 which doesn't have any memory
> > while Node2 contains all the available memory. Node distances cause an
> > automatic zone_reclaim_mode enabling.
> >
> > Zone reclaim is intended to keep the allocations local but this doesn't
> > make any sense on the memory less nodes. So let's exlcude such nodes
> > for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> > suitable reclaim_nodes.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > ---
> > I haven't got to testing this so I am sending this as an RFC for now.
> > But does this look reasonable?
> >
> > mm/page_alloc.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e953f07edb0..4a44bdc7a8cf 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > + for_each_node_state(i, N_HIGH_MEMORY)
> > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> >
> > pgdat->node_id = nid;
> > pgdat->node_start_pfn = node_start_pfn;
> > - init_zone_allows_reclaim(nid);
> > + if (node_state(nid, N_HIGH_MEMORY))
> > + init_zone_allows_reclaim(nid);
>
> I don't think this will work, because what sets N_HIGH_MEMORY (and
> shouldn't it be N_MEMORY?)
This should be the same thing AFAIU.
> is check_for_memory() (free_area_init_nodes() for N_MEMORY), which is
> run *after* init_zone_allows_reclaim().
early_calculate_totalpages sets the memory state before we get here.
> Further, the for_each_node_state() loop doesn't make sense at this
> point, becuase we are actually setting up the nids as we go. So node
> 0, will only see node 0 in the N_HIGH_MEMORY mask (if any). Node 1,
> will only see nodes 0 and 1, etc.
I am not sure I understand what you are saying here. Why would se
consider distance to a memoryless node in the first place.
for_each_node_state just makes sure we are comparing only to a node
which has some memory.
> I'm working on testing a patch that reorders some of this in hopefully a
> safe way.
Although it might make some sense to reorganize the code (it's a mess if
you ask me), but I do not think it is necessary.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 17:32 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 17:32 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: linux-mm, David Rientjes, LKML
On Wed 19-02-14 09:16:28, Nishanth Aravamudan wrote:
> On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> > We had a report about strange OOM killer strikes on a PPC machine
> > although there was a lot of swap free and a tons of anonymous memory
> > which could be swapped out. In the end it turned out that the OOM was
> > a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> > and so the system was pushed to the OOM. Although this sounds like a bug
> > somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> > numactl on the said hardware suggests that the zone reclaim should
> > have been set in the first place:
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> > node 0 size: 0 MB
> > node 0 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7168 MB
> > node 2 free: 6019 MB
> > node distances:
> > node 0 2
> > 0: 10 40
> > 2: 40 10
> >
> > So all the CPUs are associated with Node0 which doesn't have any memory
> > while Node2 contains all the available memory. Node distances cause an
> > automatic zone_reclaim_mode enabling.
> >
> > Zone reclaim is intended to keep the allocations local but this doesn't
> > make any sense on the memory less nodes. So let's exlcude such nodes
> > for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> > suitable reclaim_nodes.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > ---
> > I haven't got to testing this so I am sending this as an RFC for now.
> > But does this look reasonable?
> >
> > mm/page_alloc.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e953f07edb0..4a44bdc7a8cf 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > + for_each_node_state(i, N_HIGH_MEMORY)
> > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> >
> > pgdat->node_id = nid;
> > pgdat->node_start_pfn = node_start_pfn;
> > - init_zone_allows_reclaim(nid);
> > + if (node_state(nid, N_HIGH_MEMORY))
> > + init_zone_allows_reclaim(nid);
>
> I don't think this will work, because what sets N_HIGH_MEMORY (and
> shouldn't it be N_MEMORY?)
This should be the same thing AFAIU.
> is check_for_memory() (free_area_init_nodes() for N_MEMORY), which is
> run *after* init_zone_allows_reclaim().
early_calculate_totalpages sets the memory state before we get here.
> Further, the for_each_node_state() loop doesn't make sense at this
> point, becuase we are actually setting up the nids as we go. So node
> 0, will only see node 0 in the N_HIGH_MEMORY mask (if any). Node 1,
> will only see nodes 0 and 1, etc.
I am not sure I understand what you are saying here. Why would se
consider distance to a memoryless node in the first place.
for_each_node_state just makes sure we are comparing only to a node
which has some memory.
> I'm working on testing a patch that reorders some of this in hopefully a
> safe way.
Although it might make some sense to reorganize the code (it's a mess if
you ask me), but I do not think it is necessary.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 17:32 ` Michal Hocko
@ 2014-02-19 17:49 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 17:49 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, David Rientjes, LKML
On 19.02.2014 [18:32:59 +0100], Michal Hocko wrote:
> On Wed 19-02-14 09:16:28, Nishanth Aravamudan wrote:
> > On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> > > We had a report about strange OOM killer strikes on a PPC machine
> > > although there was a lot of swap free and a tons of anonymous memory
> > > which could be swapped out. In the end it turned out that the OOM was
> > > a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> > > and so the system was pushed to the OOM. Although this sounds like a bug
> > > somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> > > numactl on the said hardware suggests that the zone reclaim should
> > > have been set in the first place:
> > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> > > node 0 size: 0 MB
> > > node 0 free: 0 MB
> > > node 2 cpus:
> > > node 2 size: 7168 MB
> > > node 2 free: 6019 MB
> > > node distances:
> > > node 0 2
> > > 0: 10 40
> > > 2: 40 10
> > >
> > > So all the CPUs are associated with Node0 which doesn't have any memory
> > > while Node2 contains all the available memory. Node distances cause an
> > > automatic zone_reclaim_mode enabling.
> > >
> > > Zone reclaim is intended to keep the allocations local but this doesn't
> > > make any sense on the memory less nodes. So let's exlcude such nodes
> > > for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> > > suitable reclaim_nodes.
> > >
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > ---
> > > I haven't got to testing this so I am sending this as an RFC for now.
> > > But does this look reasonable?
> > >
> > > mm/page_alloc.c | 5 +++--
> > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 3e953f07edb0..4a44bdc7a8cf 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > {
> > > int i;
> > >
> > > - for_each_online_node(i)
> > > + for_each_node_state(i, N_HIGH_MEMORY)
> > > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > else
> > > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > >
> > > pgdat->node_id = nid;
> > > pgdat->node_start_pfn = node_start_pfn;
> > > - init_zone_allows_reclaim(nid);
> > > + if (node_state(nid, N_HIGH_MEMORY))
> > > + init_zone_allows_reclaim(nid);
> >
> > I don't think this will work, because what sets N_HIGH_MEMORY (and
> > shouldn't it be N_MEMORY?)
>
> This should be the same thing AFAIU.
I don't think they are guaranteed to be? And, in any case, semantically,
we care if a node has MEMORY, not if it has HIGH_MEMORY?
> > is check_for_memory() (free_area_init_nodes() for N_MEMORY), which is
> > run *after* init_zone_allows_reclaim().
>
> early_calculate_totalpages sets the memory state before we get here.
Ah, I did not see this, thanks! But that does only
node_set_state(nid, N_MEMORY);
which to me means we should only rely on that being set in the
afore-mentioned loop.
> > Further, the for_each_node_state() loop doesn't make sense at this
> > point, becuase we are actually setting up the nids as we go. So node
> > 0, will only see node 0 in the N_HIGH_MEMORY mask (if any). Node 1,
> > will only see nodes 0 and 1, etc.
>
> I am not sure I understand what you are saying here. Why would se
> consider distance to a memoryless node in the first place.
> for_each_node_state just makes sure we are comparing only to a node
> which has some memory.
I apologize, I missed the call to early_calculate_totalpages(), so I was
simply saying that looping over the N_MEMORY/N_HIGH_MEMORY mask wouldn't
necessarily be right if we are setting up that mask on a node-by-node
basis. But early_calculate_totalpages ensures the N_MEMORY iteration
will be fine.
>
> > I'm working on testing a patch that reorders some of this in hopefully a
> > safe way.
>
> Although it might make some sense to reorganize the code (it's a mess if
> you ask me), but I do not think it is necessary.
Agreed.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 17:49 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 17:49 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, David Rientjes, LKML
On 19.02.2014 [18:32:59 +0100], Michal Hocko wrote:
> On Wed 19-02-14 09:16:28, Nishanth Aravamudan wrote:
> > On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> > > We had a report about strange OOM killer strikes on a PPC machine
> > > although there was a lot of swap free and a tons of anonymous memory
> > > which could be swapped out. In the end it turned out that the OOM was
> > > a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> > > and so the system was pushed to the OOM. Although this sounds like a bug
> > > somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> > > numactl on the said hardware suggests that the zone reclaim should
> > > have been set in the first place:
> > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> > > node 0 size: 0 MB
> > > node 0 free: 0 MB
> > > node 2 cpus:
> > > node 2 size: 7168 MB
> > > node 2 free: 6019 MB
> > > node distances:
> > > node 0 2
> > > 0: 10 40
> > > 2: 40 10
> > >
> > > So all the CPUs are associated with Node0 which doesn't have any memory
> > > while Node2 contains all the available memory. Node distances cause an
> > > automatic zone_reclaim_mode enabling.
> > >
> > > Zone reclaim is intended to keep the allocations local but this doesn't
> > > make any sense on the memory less nodes. So let's exlcude such nodes
> > > for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> > > suitable reclaim_nodes.
> > >
> > > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > > ---
> > > I haven't got to testing this so I am sending this as an RFC for now.
> > > But does this look reasonable?
> > >
> > > mm/page_alloc.c | 5 +++--
> > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 3e953f07edb0..4a44bdc7a8cf 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > {
> > > int i;
> > >
> > > - for_each_online_node(i)
> > > + for_each_node_state(i, N_HIGH_MEMORY)
> > > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > else
> > > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > >
> > > pgdat->node_id = nid;
> > > pgdat->node_start_pfn = node_start_pfn;
> > > - init_zone_allows_reclaim(nid);
> > > + if (node_state(nid, N_HIGH_MEMORY))
> > > + init_zone_allows_reclaim(nid);
> >
> > I don't think this will work, because what sets N_HIGH_MEMORY (and
> > shouldn't it be N_MEMORY?)
>
> This should be the same thing AFAIU.
I don't think they are guaranteed to be? And, in any case, semantically,
we care if a node has MEMORY, not if it has HIGH_MEMORY?
> > is check_for_memory() (free_area_init_nodes() for N_MEMORY), which is
> > run *after* init_zone_allows_reclaim().
>
> early_calculate_totalpages sets the memory state before we get here.
Ah, I did not see this, thanks! But that does only
node_set_state(nid, N_MEMORY);
which to me means we should only rely on that being set in the
afore-mentioned loop.
> > Further, the for_each_node_state() loop doesn't make sense at this
> > point, becuase we are actually setting up the nids as we go. So node
> > 0, will only see node 0 in the N_HIGH_MEMORY mask (if any). Node 1,
> > will only see nodes 0 and 1, etc.
>
> I am not sure I understand what you are saying here. Why would se
> consider distance to a memoryless node in the first place.
> for_each_node_state just makes sure we are comparing only to a node
> which has some memory.
I apologize, I missed the call to early_calculate_totalpages(), so I was
simply saying that looping over the N_MEMORY/N_HIGH_MEMORY mask wouldn't
necessarily be right if we are setting up that mask on a node-by-node
basis. But early_calculate_totalpages ensures the N_MEMORY iteration
will be fine.
>
> > I'm working on testing a patch that reorders some of this in hopefully a
> > safe way.
>
> Although it might make some sense to reorganize the code (it's a mess if
> you ask me), but I do not think it is necessary.
Agreed.
Thanks,
Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 17:49 ` Nishanth Aravamudan
@ 2014-02-19 19:40 ` Michal Hocko
-1 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 19:40 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: linux-mm, David Rientjes, LKML
On Wed 19-02-14 09:49:41, Nishanth Aravamudan wrote:
> On 19.02.2014 [18:32:59 +0100], Michal Hocko wrote:
> > On Wed 19-02-14 09:16:28, Nishanth Aravamudan wrote:
[...]
> > > I don't think this will work, because what sets N_HIGH_MEMORY (and
> > > shouldn't it be N_MEMORY?)
> >
> > This should be the same thing AFAIU.
>
> I don't think they are guaranteed to be? And, in any case, semantically,
> we care if a node has MEMORY, not if it has HIGH_MEMORY?
I don't know. The whole MEMORY vs HIGH_MEMORY thing is really
confusing. But my understanding was that HIGH_MEMORY is superset of the
other one. But now that I look at the code again it seems that N_MEMORY
is the right thing to use here. I will repost the patch tomorrow if
other parts are good.
[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 19:40 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-19 19:40 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: linux-mm, David Rientjes, LKML
On Wed 19-02-14 09:49:41, Nishanth Aravamudan wrote:
> On 19.02.2014 [18:32:59 +0100], Michal Hocko wrote:
> > On Wed 19-02-14 09:16:28, Nishanth Aravamudan wrote:
[...]
> > > I don't think this will work, because what sets N_HIGH_MEMORY (and
> > > shouldn't it be N_MEMORY?)
> >
> > This should be the same thing AFAIU.
>
> I don't think they are guaranteed to be? And, in any case, semantically,
> we care if a node has MEMORY, not if it has HIGH_MEMORY?
I don't know. The whole MEMORY vs HIGH_MEMORY thing is really
confusing. But my understanding was that HIGH_MEMORY is superset of the
other one. But now that I look at the code again it seems that N_MEMORY
is the right thing to use here. I will repost the patch tomorrow if
other parts are good.
[...]
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 17:03 ` Michal Hocko
@ 2014-02-19 17:53 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 17:53 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, David Rientjes, LKML
On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node 0 2
> 0: 10 40
> 2: 40 10
>
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
>
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exlcude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> I haven't got to testing this so I am sending this as an RFC for now.
> But does this look reasonable?
>
> mm/page_alloc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e953f07edb0..4a44bdc7a8cf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> + for_each_node_state(i, N_HIGH_MEMORY)
> if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> pgdat->node_id = nid;
> pgdat->node_start_pfn = node_start_pfn;
> - init_zone_allows_reclaim(nid);
> + if (node_state(nid, N_HIGH_MEMORY))
> + init_zone_allows_reclaim(nid);
I'm still new to this code, but isn't this saying that if a node has no
memory, then it shouldn't reclaim from any node? But, for a memoryless
node to ensure progress later if reclaim is necessary, it *must* reclaim
from other nodes? So wouldn't we want to set reclaim_nodes() in that
case to node_states[N_MEMORY]?
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 17:53 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 17:53 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, David Rientjes, LKML
On 19.02.2014 [18:03:03 +0100], Michal Hocko wrote:
> We had a report about strange OOM killer strikes on a PPC machine
> although there was a lot of swap free and a tons of anonymous memory
> which could be swapped out. In the end it turned out that the OOM was
> a side effect of zone reclaim which wasn't doesn't unmap and swapp out
> and so the system was pushed to the OOM. Although this sounds like a bug
> somewhere in the kswapd vs. zone reclaim vs. direct reclaim interaction
> numactl on the said hardware suggests that the zone reclaim should
> have been set in the first place:
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 7168 MB
> node 2 free: 6019 MB
> node distances:
> node 0 2
> 0: 10 40
> 2: 40 10
>
> So all the CPUs are associated with Node0 which doesn't have any memory
> while Node2 contains all the available memory. Node distances cause an
> automatic zone_reclaim_mode enabling.
>
> Zone reclaim is intended to keep the allocations local but this doesn't
> make any sense on the memory less nodes. So let's exlcude such nodes
> for init_zone_allows_reclaim which evaluates zone reclaim behavior and
> suitable reclaim_nodes.
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> I haven't got to testing this so I am sending this as an RFC for now.
> But does this look reasonable?
>
> mm/page_alloc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e953f07edb0..4a44bdc7a8cf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> + for_each_node_state(i, N_HIGH_MEMORY)
> if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> pgdat->node_id = nid;
> pgdat->node_start_pfn = node_start_pfn;
> - init_zone_allows_reclaim(nid);
> + if (node_state(nid, N_HIGH_MEMORY))
> + init_zone_allows_reclaim(nid);
I'm still new to this code, but isn't this saying that if a node has no
memory, then it shouldn't reclaim from any node? But, for a memoryless
node to ensure progress later if reclaim is necessary, it *must* reclaim
from other nodes? So wouldn't we want to set reclaim_nodes() in that
case to node_states[N_MEMORY]?
Thanks,
Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 17:53 ` Nishanth Aravamudan
@ 2014-02-19 21:56 ` David Rientjes
-1 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2014-02-19 21:56 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Michal Hocko, linux-mm, LKML
On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e953f07edb0..4a44bdc7a8cf 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > + for_each_node_state(i, N_HIGH_MEMORY)
> > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> >
> > pgdat->node_id = nid;
> > pgdat->node_start_pfn = node_start_pfn;
> > - init_zone_allows_reclaim(nid);
> > + if (node_state(nid, N_HIGH_MEMORY))
> > + init_zone_allows_reclaim(nid);
>
> I'm still new to this code, but isn't this saying that if a node has no
> memory, then it shouldn't reclaim from any node? But, for a memoryless
> node to ensure progress later if reclaim is necessary, it *must* reclaim
> from other nodes? So wouldn't we want to set reclaim_nodes() in that
> case to node_states[N_MEMORY]?
>
The only time when pgdat->reclaim_nodes or zone_reclaim_mode matters is
when iterating through a zonelist for page allocation and a memoryless
node should never appear in a zonelist for page allocation, so this is
just preventing setting zone_reclaim_mode unnecessarily because the only
nodes with > RECLAIM_DISTANCE to another node are memoryless. So this
patch is fine as long as it gets s/N_HIGH_MEMORY/N_MEMORY/.
Acked-by: David Rientjes <rientjes@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 21:56 ` David Rientjes
0 siblings, 0 replies; 61+ messages in thread
From: David Rientjes @ 2014-02-19 21:56 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: Michal Hocko, linux-mm, LKML
On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e953f07edb0..4a44bdc7a8cf 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > {
> > int i;
> >
> > - for_each_online_node(i)
> > + for_each_node_state(i, N_HIGH_MEMORY)
> > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > else
> > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> >
> > pgdat->node_id = nid;
> > pgdat->node_start_pfn = node_start_pfn;
> > - init_zone_allows_reclaim(nid);
> > + if (node_state(nid, N_HIGH_MEMORY))
> > + init_zone_allows_reclaim(nid);
>
> I'm still new to this code, but isn't this saying that if a node has no
> memory, then it shouldn't reclaim from any node? But, for a memoryless
> node to ensure progress later if reclaim is necessary, it *must* reclaim
> from other nodes? So wouldn't we want to set reclaim_nodes() in that
> case to node_states[N_MEMORY]?
>
The only time when pgdat->reclaim_nodes or zone_reclaim_mode matters is
when iterating through a zonelist for page allocation and a memoryless
node should never appear in a zonelist for page allocation, so this is
just preventing setting zone_reclaim_mode unnecessarily because the only
nodes with > RECLAIM_DISTANCE to another node are memoryless. So this
patch is fine as long as it gets s/N_HIGH_MEMORY/N_MEMORY/.
Acked-by: David Rientjes <rientjes@google.com>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 21:56 ` David Rientjes
@ 2014-02-19 23:05 ` Nishanth Aravamudan
-1 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 23:05 UTC (permalink / raw)
To: David Rientjes; +Cc: Michal Hocko, linux-mm, LKML
On 19.02.2014 [13:56:00 -0800], David Rientjes wrote:
> On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
>
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 3e953f07edb0..4a44bdc7a8cf 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > {
> > > int i;
> > >
> > > - for_each_online_node(i)
> > > + for_each_node_state(i, N_HIGH_MEMORY)
> > > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > else
> > > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > >
> > > pgdat->node_id = nid;
> > > pgdat->node_start_pfn = node_start_pfn;
> > > - init_zone_allows_reclaim(nid);
> > > + if (node_state(nid, N_HIGH_MEMORY))
> > > + init_zone_allows_reclaim(nid);
> >
> > I'm still new to this code, but isn't this saying that if a node has no
> > memory, then it shouldn't reclaim from any node? But, for a memoryless
> > node to ensure progress later if reclaim is necessary, it *must* reclaim
> > from other nodes? So wouldn't we want to set reclaim_nodes() in that
> > case to node_states[N_MEMORY]?
> >
>
> The only time when pgdat->reclaim_nodes or zone_reclaim_mode matters is
> when iterating through a zonelist for page allocation and a memoryless
> node should never appear in a zonelist for page allocation, so this is
> just preventing setting zone_reclaim_mode unnecessarily because the only
> nodes with > RECLAIM_DISTANCE to another node are memoryless. So this
> patch is fine as long as it gets s/N_HIGH_MEMORY/N_MEMORY/.
Ah yes, sorry, I've been looking at this code perhaps too much and going
a bit cross-eyed!
I wonder if we should also put some comments in? But
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-19 23:05 ` Nishanth Aravamudan
0 siblings, 0 replies; 61+ messages in thread
From: Nishanth Aravamudan @ 2014-02-19 23:05 UTC (permalink / raw)
To: David Rientjes; +Cc: Michal Hocko, linux-mm, LKML
On 19.02.2014 [13:56:00 -0800], David Rientjes wrote:
> On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
>
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 3e953f07edb0..4a44bdc7a8cf 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > {
> > > int i;
> > >
> > > - for_each_online_node(i)
> > > + for_each_node_state(i, N_HIGH_MEMORY)
> > > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > else
> > > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > >
> > > pgdat->node_id = nid;
> > > pgdat->node_start_pfn = node_start_pfn;
> > > - init_zone_allows_reclaim(nid);
> > > + if (node_state(nid, N_HIGH_MEMORY))
> > > + init_zone_allows_reclaim(nid);
> >
> > I'm still new to this code, but isn't this saying that if a node has no
> > memory, then it shouldn't reclaim from any node? But, for a memoryless
> > node to ensure progress later if reclaim is necessary, it *must* reclaim
> > from other nodes? So wouldn't we want to set reclaim_nodes() in that
> > case to node_states[N_MEMORY]?
> >
>
> The only time when pgdat->reclaim_nodes or zone_reclaim_mode matters is
> when iterating through a zonelist for page allocation and a memoryless
> node should never appear in a zonelist for page allocation, so this is
> just preventing setting zone_reclaim_mode unnecessarily because the only
> nodes with > RECLAIM_DISTANCE to another node are memoryless. So this
> patch is fine as long as it gets s/N_HIGH_MEMORY/N_MEMORY/.
Ah yes, sorry, I've been looking at this code perhaps too much and going
a bit cross-eyed!
I wonder if we should also put some comments in? But
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Thanks,
Nish
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
2014-02-19 23:05 ` Nishanth Aravamudan
@ 2014-02-20 9:50 ` Michal Hocko
-1 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-20 9:50 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: David Rientjes, linux-mm, LKML
On Wed 19-02-14 15:05:58, Nishanth Aravamudan wrote:
> On 19.02.2014 [13:56:00 -0800], David Rientjes wrote:
> > On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> >
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 3e953f07edb0..4a44bdc7a8cf 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > > {
> > > > int i;
> > > >
> > > > - for_each_online_node(i)
> > > > + for_each_node_state(i, N_HIGH_MEMORY)
> > > > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > > else
> > > > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > > >
> > > > pgdat->node_id = nid;
> > > > pgdat->node_start_pfn = node_start_pfn;
> > > > - init_zone_allows_reclaim(nid);
> > > > + if (node_state(nid, N_HIGH_MEMORY))
> > > > + init_zone_allows_reclaim(nid);
> > >
> > > I'm still new to this code, but isn't this saying that if a node has no
> > > memory, then it shouldn't reclaim from any node? But, for a memoryless
> > > node to ensure progress later if reclaim is necessary, it *must* reclaim
> > > from other nodes? So wouldn't we want to set reclaim_nodes() in that
> > > case to node_states[N_MEMORY]?
> > >
> >
> > The only time when pgdat->reclaim_nodes or zone_reclaim_mode matters is
> > when iterating through a zonelist for page allocation and a memoryless
> > node should never appear in a zonelist for page allocation, so this is
> > just preventing setting zone_reclaim_mode unnecessarily because the only
> > nodes with > RECLAIM_DISTANCE to another node are memoryless. So this
> > patch is fine as long as it gets s/N_HIGH_MEMORY/N_MEMORY/.
>
> Ah yes, sorry, I've been looking at this code perhaps too much and going
> a bit cross-eyed!
>
> I wonder if we should also put some comments in? But
>
> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Thanks both of you for acks and testing. I will submit the updated patch
and include Andrew to pick it up.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [RFC PATCH] mm: exclude memory less nodes from zone_reclaim
@ 2014-02-20 9:50 ` Michal Hocko
0 siblings, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2014-02-20 9:50 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: David Rientjes, linux-mm, LKML
On Wed 19-02-14 15:05:58, Nishanth Aravamudan wrote:
> On 19.02.2014 [13:56:00 -0800], David Rientjes wrote:
> > On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> >
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 3e953f07edb0..4a44bdc7a8cf 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -1855,7 +1855,7 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> > > > {
> > > > int i;
> > > >
> > > > - for_each_online_node(i)
> > > > + for_each_node_state(i, N_HIGH_MEMORY)
> > > > if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> > > > node_set(i, NODE_DATA(nid)->reclaim_nodes);
> > > > else
> > > > @@ -4901,7 +4901,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > > >
> > > > pgdat->node_id = nid;
> > > > pgdat->node_start_pfn = node_start_pfn;
> > > > - init_zone_allows_reclaim(nid);
> > > > + if (node_state(nid, N_HIGH_MEMORY))
> > > > + init_zone_allows_reclaim(nid);
> > >
> > > I'm still new to this code, but isn't this saying that if a node has no
> > > memory, then it shouldn't reclaim from any node? But, for a memoryless
> > > node to ensure progress later if reclaim is necessary, it *must* reclaim
> > > from other nodes? So wouldn't we want to set reclaim_nodes() in that
> > > case to node_states[N_MEMORY]?
> > >
> >
> > The only time when pgdat->reclaim_nodes or zone_reclaim_mode matters is
> > when iterating through a zonelist for page allocation and a memoryless
> > node should never appear in a zonelist for page allocation, so this is
> > just preventing setting zone_reclaim_mode unnecessarily because the only
> > nodes with > RECLAIM_DISTANCE to another node are memoryless. So this
> > patch is fine as long as it gets s/N_HIGH_MEMORY/N_MEMORY/.
>
> Ah yes, sorry, I've been looking at this code perhaps too much and going
> a bit cross-eyed!
>
> I wonder if we should also put some comments in? But
>
> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Thanks both of you for acks and testing. I will submit the updated patch
and include Andrew to pick it up.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 61+ messages in thread