* ppc: RECLAIM_DISTANCE 10? @ 2014-02-18 9:06 Michal Hocko 2014-02-18 22:27 ` David Rientjes 2014-02-18 23:34 ` Nishanth Aravamudan 0 siblings, 2 replies; 16+ messages in thread From: Michal Hocko @ 2014-02-18 9:06 UTC (permalink / raw) To: Anton Blanchard; +Cc: linuxppc-dev, LKML, linux-mm Hi, I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim). The commit message suggests that the zone reclaim is desirable for all NUMA configurations. History has shown that the zone reclaim is more often harmful than helpful and leads to performance problems. The default RECLAIM_DISTANCE for generic case has been increased from 20 to 30 around 3.0 (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). I strongly suspect that the patch is incorrect and it should be reverted. Before I will send a revert I would like to understand what led to the patch in the first place. I do not see why would PPC use only LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have seen use different values. Anton, could you comment please? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-18 9:06 ppc: RECLAIM_DISTANCE 10? Michal Hocko @ 2014-02-18 22:27 ` David Rientjes 2014-02-19 8:16 ` Michal Hocko 2014-02-18 23:34 ` Nishanth Aravamudan 1 sibling, 1 reply; 16+ messages in thread From: David Rientjes @ 2014-02-18 22:27 UTC (permalink / raw) To: Michal Hocko; +Cc: linuxppc-dev, Anton Blanchard, LKML, linux-mm On Tue, 18 Feb 2014, Michal Hocko wrote: > Hi, > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to > enable zone reclaim). The commit message suggests that the zone reclaim > is desirable for all NUMA configurations. > > History has shown that the zone reclaim is more often harmful than > helpful and leads to performance problems. The default RECLAIM_DISTANCE > for generic case has been increased from 20 to 30 around 3.0 > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). > > I strongly suspect that the patch is incorrect and it should be > reverted. Before I will send a revert I would like to understand what > led to the patch in the first place. I do not see why would PPC use only > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have > seen use different values. > I strongly suspect that the patch is correct since powerpc node distances are different than the architectures you're talking about and get doubled for every NUMA domain that the hardware supports. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-18 22:27 ` David Rientjes @ 2014-02-19 8:16 ` Michal Hocko 2014-02-19 8:20 ` David Rientjes 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2014-02-19 8:16 UTC (permalink / raw) To: David Rientjes; +Cc: linuxppc-dev, Anton Blanchard, LKML, linux-mm On Tue 18-02-14 14:27:11, David Rientjes wrote: > On Tue, 18 Feb 2014, Michal Hocko wrote: > > > Hi, > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to > > enable zone reclaim). The commit message suggests that the zone reclaim > > is desirable for all NUMA configurations. > > > > History has shown that the zone reclaim is more often harmful than > > helpful and leads to performance problems. The default RECLAIM_DISTANCE > > for generic case has been increased from 20 to 30 around 3.0 > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). > > > > I strongly suspect that the patch is incorrect and it should be > > reverted. Before I will send a revert I would like to understand what > > led to the patch in the first place. I do not see why would PPC use only > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have > > seen use different values. > > > > I strongly suspect that the patch is correct since powerpc node distances > are different than the architectures you're talking about and get doubled > for every NUMA domain that the hardware supports. Even if the units of the distance is different on PPC should every NUMA machine have zone_reclaim enabled? That doesn't right to me. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 8:16 ` Michal Hocko @ 2014-02-19 8:20 ` David Rientjes 2014-02-19 9:19 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: David Rientjes @ 2014-02-19 8:20 UTC (permalink / raw) To: Michal Hocko; +Cc: linuxppc-dev, Anton Blanchard, LKML, linux-mm On Wed, 19 Feb 2014, Michal Hocko wrote: > > I strongly suspect that the patch is correct since powerpc node distances > > are different than the architectures you're talking about and get doubled > > for every NUMA domain that the hardware supports. > > Even if the units of the distance is different on PPC should every NUMA > machine have zone_reclaim enabled? That doesn't right to me. > In my experience on powerpc it's very correct, there's typically a significant latency in remote access and we don't have the benefit of a SLIT that actually defines the locality between proximity domains like we do on other architectures. We have had significant issues with thp, for example, being allocated remotely instead of pages locally, much more drastic than on our x86 machines, particularly AMD machines. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 8:20 ` David Rientjes @ 2014-02-19 9:19 ` Michal Hocko 2014-02-19 21:45 ` David Rientjes 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2014-02-19 9:19 UTC (permalink / raw) To: David Rientjes; +Cc: linuxppc-dev, Anton Blanchard, LKML, linux-mm On Wed 19-02-14 00:20:21, David Rientjes wrote: > On Wed, 19 Feb 2014, Michal Hocko wrote: > > > > I strongly suspect that the patch is correct since powerpc node distances > > > are different than the architectures you're talking about and get doubled > > > for every NUMA domain that the hardware supports. > > > > Even if the units of the distance is different on PPC should every NUMA > > machine have zone_reclaim enabled? That doesn't right to me. > > > > In my experience on powerpc it's very correct, there's typically a > significant latency in remote access and we don't have the benefit of a > SLIT that actually defines the locality between proximity domains like we > do on other architectures. Interesting. So is the PPC NUMA basically about local vs. very distant? Should REMOTE_DISTANCE reflect that as well? Or can we have distance < REMOTE_DISTANCE and it would still make sense to have zone_reclaim enabled? [...] Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 9:19 ` Michal Hocko @ 2014-02-19 21:45 ` David Rientjes 0 siblings, 0 replies; 16+ messages in thread From: David Rientjes @ 2014-02-19 21:45 UTC (permalink / raw) To: Michal Hocko; +Cc: linuxppc-dev, Anton Blanchard, LKML, linux-mm On Wed, 19 Feb 2014, Michal Hocko wrote: > Interesting. So is the PPC NUMA basically about local vs. very distant? The point is that it's impossible to tell how distant they are from one NUMA domain to the next NUMA domain. > Should REMOTE_DISTANCE reflect that as well? Or can we have > distance < REMOTE_DISTANCE and it would still make sense to have > zone_reclaim enabled? > Ppc doesn't want to allocate in a different NUMA domain unless required, the latency of a remote access is too high. Everything that isn't in the same domain has a distance >10 and is setup as 2^(domain hops) * 10. We don't have the ability like with a SLIT to define remote nodes to have local latency vs remote or costly latency so the safe setting is a RECLAIM_DISTANCE of 10. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-18 9:06 ppc: RECLAIM_DISTANCE 10? Michal Hocko 2014-02-18 22:27 ` David Rientjes @ 2014-02-18 23:34 ` Nishanth Aravamudan 2014-02-18 23:58 ` Nishanth Aravamudan 2014-02-19 8:23 ` Michal Hocko 1 sibling, 2 replies; 16+ messages in thread From: Nishanth Aravamudan @ 2014-02-18 23:34 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML Hi Michal, On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote: > Hi, > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to > enable zone reclaim). The commit message suggests that the zone reclaim > is desirable for all NUMA configurations. > > History has shown that the zone reclaim is more often harmful than > helpful and leads to performance problems. The default RECLAIM_DISTANCE > for generic case has been increased from 20 to 30 around 3.0 > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). Interesting. > I strongly suspect that the patch is incorrect and it should be > reverted. Before I will send a revert I would like to understand what > led to the patch in the first place. I do not see why would PPC use only > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have > seen use different values. > > Anton, could you comment please? I'll let Anton comment here, but in looking into this issue in working on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with memoryless nodes will set zone_reclaim_mode to 1. I think we want to ignore memoryless nodes when we set up the reclaim mode like the following? I'll send it as a proper patch if you agree? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5de4337..4f6ff6f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid) { int i; - for_each_online_node(i) - if (node_distance(nid, i) <= RECLAIM_DISTANCE) + for_each_online_node(i) { + if (node_distance(nid, i) <= RECLAIM_DISTANCE || + local_memory_node(nid) != nid) node_set(i, NODE_DATA(nid)->reclaim_nodes); else zone_reclaim_mode = 1; Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is not set, but if it is, I think semantically it will indicate that memoryless nodes *have* to reclaim remotely. And actually the above won't work, because the callpath is start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes -> free_area_init_node -> init_zone_allows_reclaim] which is called before build_all_zonelists. This is a similar ordering problem as I'm having with the MEMORYLESS_NODE support, will work on it. Thanks, Nish ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-18 23:34 ` Nishanth Aravamudan @ 2014-02-18 23:58 ` Nishanth Aravamudan 2014-02-19 0:40 ` Nishanth Aravamudan 2014-02-19 1:43 ` David Rientjes 2014-02-19 8:23 ` Michal Hocko 1 sibling, 2 replies; 16+ messages in thread From: Nishanth Aravamudan @ 2014-02-18 23:58 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote: > Hi Michal, > > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote: > > Hi, > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to > > enable zone reclaim). The commit message suggests that the zone reclaim > > is desirable for all NUMA configurations. > > > > History has shown that the zone reclaim is more often harmful than > > helpful and leads to performance problems. The default RECLAIM_DISTANCE > > for generic case has been increased from 20 to 30 around 3.0 > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). > > Interesting. > > > I strongly suspect that the patch is incorrect and it should be > > reverted. Before I will send a revert I would like to understand what > > led to the patch in the first place. I do not see why would PPC use only > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have > > seen use different values. > > > > Anton, could you comment please? > > I'll let Anton comment here, but in looking into this issue in working > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with > memoryless nodes will set zone_reclaim_mode to 1. I think we want to > ignore memoryless nodes when we set up the reclaim mode like the > following? I'll send it as a proper patch if you agree? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5de4337..4f6ff6f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid) > { > int i; > > - for_each_online_node(i) > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > + for_each_online_node(i) { > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > + local_memory_node(nid) != nid) > node_set(i, NODE_DATA(nid)->reclaim_nodes); > else > zone_reclaim_mode = 1; > > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is > not set, but if it is, I think semantically it will indicate that > memoryless nodes *have* to reclaim remotely. > > And actually the above won't work, because the callpath is > > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes -> > free_area_init_node -> init_zone_allows_reclaim] which is called before > build_all_zonelists. This is a similar ordering problem as I'm having > with the MEMORYLESS_NODE support, will work on it. How about the following? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5de4337..1a0eced 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid) int i; for_each_online_node(i) - if (node_distance(nid, i) <= RECLAIM_DISTANCE) + if (node_distance(nid, i) <= RECLAIM_DISTANCE || + !NODE_DATA(nid)->node_present_pages) node_set(i, NODE_DATA(nid)->reclaim_nodes); else zone_reclaim_mode = 1; @@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size, pgdat->node_id = nid; pgdat->node_start_pfn = node_start_pfn; - init_zone_allows_reclaim(nid); #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); #endif calculate_node_totalpages(pgdat, start_pfn, end_pfn, zones_size, zholes_size); + init_zone_allows_reclaim(nid); alloc_node_mem_map(pgdat); #ifdef CONFIG_FLAT_NODE_MEM_MAP printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n", I think it's safe to move init_zone_allows_reclaim, because I don't think any allocates are occurring here that could cause us to reclaim anyways, right? Moving it allows us to safely reference node_present_pages. Thanks, Nish ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-18 23:58 ` Nishanth Aravamudan @ 2014-02-19 0:40 ` Nishanth Aravamudan 2014-02-19 1:43 ` David Rientjes 1 sibling, 0 replies; 16+ messages in thread From: Nishanth Aravamudan @ 2014-02-19 0:40 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML On 18.02.2014 [15:58:00 -0800], Nishanth Aravamudan wrote: > On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote: > > Hi Michal, > > > > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote: > > > Hi, > > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by > > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to > > > enable zone reclaim). The commit message suggests that the zone reclaim > > > is desirable for all NUMA configurations. > > > > > > History has shown that the zone reclaim is more often harmful than > > > helpful and leads to performance problems. The default RECLAIM_DISTANCE > > > for generic case has been increased from 20 to 30 around 3.0 > > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). > > > > Interesting. > > > > > I strongly suspect that the patch is incorrect and it should be > > > reverted. Before I will send a revert I would like to understand what > > > led to the patch in the first place. I do not see why would PPC use only > > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have > > > seen use different values. > > > > > > Anton, could you comment please? > > > > I'll let Anton comment here, but in looking into this issue in working > > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with > > memoryless nodes will set zone_reclaim_mode to 1. I think we want to > > ignore memoryless nodes when we set up the reclaim mode like the > > following? I'll send it as a proper patch if you agree? > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 5de4337..4f6ff6f 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid) > > { > > int i; > > > > - for_each_online_node(i) > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > > + for_each_online_node(i) { > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > > + local_memory_node(nid) != nid) > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > else > > zone_reclaim_mode = 1; > > > > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is > > not set, but if it is, I think semantically it will indicate that > > memoryless nodes *have* to reclaim remotely. > > > > And actually the above won't work, because the callpath is > > > > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes -> > > free_area_init_node -> init_zone_allows_reclaim] which is called before > > build_all_zonelists. This is a similar ordering problem as I'm having > > with the MEMORYLESS_NODE support, will work on it. > > How about the following? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5de4337..1a0eced 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid) > int i; > > for_each_online_node(i) > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > + !NODE_DATA(nid)->node_present_pages) err s/nid/i/ above. -Nish ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-18 23:58 ` Nishanth Aravamudan 2014-02-19 0:40 ` Nishanth Aravamudan @ 2014-02-19 1:43 ` David Rientjes 2014-02-19 8:33 ` Michal Hocko 2014-02-19 16:24 ` Nishanth Aravamudan 1 sibling, 2 replies; 16+ messages in thread From: David Rientjes @ 2014-02-19 1:43 UTC (permalink / raw) To: Nishanth Aravamudan Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML On Tue, 18 Feb 2014, Nishanth Aravamudan wrote: > How about the following? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5de4337..1a0eced 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid) > int i; > > for_each_online_node(i) > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > + !NODE_DATA(i)->node_present_pages) > node_set(i, NODE_DATA(nid)->reclaim_nodes); > else > zone_reclaim_mode = 1; [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught so we're looking at the right code. ] That can't be right, it would allow reclaiming from a memoryless node. I think what you want is for_each_online_node(i) { if (!node_present_pages(i)) continue; if (node_distance(nid, i) <= RECLAIM_DISTANCE) { node_set(i, NODE_DATA(nid)->reclaim_nodes); continue; } /* Always try to reclaim locally */ zone_reclaim_mode = 1; } but we really should be able to do for_each_node_state(i, N_MEMORY) here and memoryless nodes should already be excluded from that mask. > @@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size, > > pgdat->node_id = nid; > pgdat->node_start_pfn = node_start_pfn; > - init_zone_allows_reclaim(nid); > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > #endif > calculate_node_totalpages(pgdat, start_pfn, end_pfn, > zones_size, zholes_size); > > + init_zone_allows_reclaim(nid); > alloc_node_mem_map(pgdat); > #ifdef CONFIG_FLAT_NODE_MEM_MAP > printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n", > > I think it's safe to move init_zone_allows_reclaim, because I don't > think any allocates are occurring here that could cause us to reclaim > anyways, right? Moving it allows us to safely reference > node_present_pages. > Yeah, this is fine. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 1:43 ` David Rientjes @ 2014-02-19 8:33 ` Michal Hocko 2014-02-19 16:24 ` Nishanth Aravamudan 1 sibling, 0 replies; 16+ messages in thread From: Michal Hocko @ 2014-02-19 8:33 UTC (permalink / raw) To: David Rientjes Cc: linux-mm, Nishanth Aravamudan, linuxppc-dev, Anton Blanchard, LKML On Tue 18-02-14 17:43:38, David Rientjes wrote: > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote: > > > How about the following? > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 5de4337..1a0eced 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid) > > int i; > > > > for_each_online_node(i) > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > > + !NODE_DATA(i)->node_present_pages) > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > else > > zone_reclaim_mode = 1; > > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught > so we're looking at the right code. ] > > That can't be right, it would allow reclaiming from a memoryless node. I > think what you want is > > for_each_online_node(i) { > if (!node_present_pages(i)) > continue; > if (node_distance(nid, i) <= RECLAIM_DISTANCE) { > node_set(i, NODE_DATA(nid)->reclaim_nodes); > continue; > } > /* Always try to reclaim locally */ > zone_reclaim_mode = 1; > } > > but we really should be able to do for_each_node_state(i, N_MEMORY) here > and memoryless nodes should already be excluded from that mask. Agreed! Actually the code I am currently interested in is based on 3.0 kernel where zone_reclaim_mode is set in build_zonelists which relies on find_next_best_node which iterates only N_HIGH_MEMORY nodes which should have non 0 present pages. [...] -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 1:43 ` David Rientjes 2014-02-19 8:33 ` Michal Hocko @ 2014-02-19 16:24 ` Nishanth Aravamudan 2014-02-19 16:33 ` Nishanth Aravamudan 1 sibling, 1 reply; 16+ messages in thread From: Nishanth Aravamudan @ 2014-02-19 16:24 UTC (permalink / raw) To: David Rientjes Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML On 18.02.2014 [17:43:38 -0800], David Rientjes wrote: > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote: > > > How about the following? > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 5de4337..1a0eced 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid) > > int i; > > > > for_each_online_node(i) > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > > + !NODE_DATA(i)->node_present_pages) > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > else > > zone_reclaim_mode = 1; > > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught > so we're looking at the right code. ] > > That can't be right, it would allow reclaiming from a memoryless node. I > think what you want is Gah, you're right. > for_each_online_node(i) { > if (!node_present_pages(i)) > continue; > if (node_distance(nid, i) <= RECLAIM_DISTANCE) { > node_set(i, NODE_DATA(nid)->reclaim_nodes); > continue; > } > /* Always try to reclaim locally */ > zone_reclaim_mode = 1; > } > > but we really should be able to do for_each_node_state(i, N_MEMORY) here > and memoryless nodes should already be excluded from that mask. Yep, I found that afterwards, which simplifies the logic. I'll add this to my series :) <snip> > > I think it's safe to move init_zone_allows_reclaim, because I don't > > think any allocates are occurring here that could cause us to reclaim > > anyways, right? Moving it allows us to safely reference > > node_present_pages. > > > > Yeah, this is fine. Thanks, Nish ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 16:24 ` Nishanth Aravamudan @ 2014-02-19 16:33 ` Nishanth Aravamudan 2014-02-20 9:55 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: Nishanth Aravamudan @ 2014-02-19 16:33 UTC (permalink / raw) To: David Rientjes Cc: Michal Hocko, linux-mm, linuxppc-dev, Anton Blanchard, LKML On 19.02.2014 [08:24:38 -0800], Nishanth Aravamudan wrote: > On 18.02.2014 [17:43:38 -0800], David Rientjes wrote: > > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote: > > > > > How about the following? > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 5de4337..1a0eced 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid) > > > int i; > > > > > > for_each_online_node(i) > > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > > > + !NODE_DATA(i)->node_present_pages) > > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > > else > > > zone_reclaim_mode = 1; > > > > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught > > so we're looking at the right code. ] > > > > That can't be right, it would allow reclaiming from a memoryless node. I > > think what you want is > > Gah, you're right. > > > for_each_online_node(i) { > > if (!node_present_pages(i)) > > continue; > > if (node_distance(nid, i) <= RECLAIM_DISTANCE) { > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > continue; > > } > > /* Always try to reclaim locally */ > > zone_reclaim_mode = 1; > > } > > > > but we really should be able to do for_each_node_state(i, N_MEMORY) here > > and memoryless nodes should already be excluded from that mask. > > Yep, I found that afterwards, which simplifies the logic. I'll add this > to my series :) In looking at the code, I am wondering about the following: init_zone_allows_reclaim() is called for each nid from free_area_init_node(). Which means that calculate_node_totalpages for other "later" nids and check_for_memory() [which sets up the N_MEMORY nodemask] hasn't been called yet. So, would it make sense to pull up the /* Any memory on that node */ if (pgdat->node_present_pages) node_set_state(nid, N_MEMORY); check_for_memory(pgdat, nid); into free_area_init_node()? Thanks, Nish ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 16:33 ` Nishanth Aravamudan @ 2014-02-20 9:55 ` Michal Hocko 0 siblings, 0 replies; 16+ messages in thread From: Michal Hocko @ 2014-02-20 9:55 UTC (permalink / raw) To: Nishanth Aravamudan Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML, David Rientjes On Wed 19-02-14 08:33:45, Nishanth Aravamudan wrote: > On 19.02.2014 [08:24:38 -0800], Nishanth Aravamudan wrote: > > On 18.02.2014 [17:43:38 -0800], David Rientjes wrote: > > > On Tue, 18 Feb 2014, Nishanth Aravamudan wrote: > > > > > > > How about the following? > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index 5de4337..1a0eced 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid) > > > > int i; > > > > > > > > for_each_online_node(i) > > > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > > > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > > > > + !NODE_DATA(i)->node_present_pages) > > > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > > > else > > > > zone_reclaim_mode = 1; > > > > > > [ I changed the above from NODE_DATA(nid) -> NODE_DATA(i) as you caught > > > so we're looking at the right code. ] > > > > > > That can't be right, it would allow reclaiming from a memoryless node. I > > > think what you want is > > > > Gah, you're right. > > > > > for_each_online_node(i) { > > > if (!node_present_pages(i)) > > > continue; > > > if (node_distance(nid, i) <= RECLAIM_DISTANCE) { > > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > > continue; > > > } > > > /* Always try to reclaim locally */ > > > zone_reclaim_mode = 1; > > > } > > > > > > but we really should be able to do for_each_node_state(i, N_MEMORY) here > > > and memoryless nodes should already be excluded from that mask. > > > > Yep, I found that afterwards, which simplifies the logic. I'll add this > > to my series :) > > In looking at the code, I am wondering about the following: > > init_zone_allows_reclaim() is called for each nid from > free_area_init_node(). Which means that calculate_node_totalpages for > other "later" nids and check_for_memory() [which sets up the N_MEMORY > nodemask] hasn't been called yet. > > So, would it make sense to pull up the > /* Any memory on that node */ > if (pgdat->node_present_pages) > node_set_state(nid, N_MEMORY); > check_for_memory(pgdat, nid); > into free_area_init_node()? Dunno, but it shouldn't be needed because nodes are set N_MEMORY earlier in early_calculate_totalpages as mentioned in other email. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-18 23:34 ` Nishanth Aravamudan 2014-02-18 23:58 ` Nishanth Aravamudan @ 2014-02-19 8:23 ` Michal Hocko 2014-02-19 16:26 ` Nishanth Aravamudan 1 sibling, 1 reply; 16+ messages in thread From: Michal Hocko @ 2014-02-19 8:23 UTC (permalink / raw) To: Nishanth Aravamudan; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote: > Hi Michal, > > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote: > > Hi, > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to > > enable zone reclaim). The commit message suggests that the zone reclaim > > is desirable for all NUMA configurations. > > > > History has shown that the zone reclaim is more often harmful than > > helpful and leads to performance problems. The default RECLAIM_DISTANCE > > for generic case has been increased from 20 to 30 around 3.0 > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). > > Interesting. > > > I strongly suspect that the patch is incorrect and it should be > > reverted. Before I will send a revert I would like to understand what > > led to the patch in the first place. I do not see why would PPC use only > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have > > seen use different values. > > > > Anton, could you comment please? > > I'll let Anton comment here, but in looking into this issue in working > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with > memoryless nodes will set zone_reclaim_mode to 1. I think we want to > ignore memoryless nodes when we set up the reclaim mode like the > following? I'll send it as a proper patch if you agree? Funny enough, ppc memoryless node setup is what led me to this code. We had a setup like this: node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: node 2 size: 7168 MB node 2 free: 6019 MB node distances: node 0 2 0: 10 40 2: 40 10 Which ends up enabling zone_reclaim although there is only a single node with memory. Not that RECLAIM_DISTANCE would make any difference here as the distance is even above default RECLAIM_DISTANCE. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5de4337..4f6ff6f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid) > { > int i; > > - for_each_online_node(i) > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > + for_each_online_node(i) { > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > + local_memory_node(nid) != nid) > node_set(i, NODE_DATA(nid)->reclaim_nodes); > else > zone_reclaim_mode = 1; > > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is > not set, but if it is, I think semantically it will indicate that > memoryless nodes *have* to reclaim remotely. > > And actually the above won't work, because the callpath is > > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes -> > free_area_init_node -> init_zone_allows_reclaim] which is called before > build_all_zonelists. This is a similar ordering problem as I'm having > with the MEMORYLESS_NODE support, will work on it. I think you just want for_each_node_state(nid, N_MEMORY) and skip all the memory less nodes, no? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ppc: RECLAIM_DISTANCE 10? 2014-02-19 8:23 ` Michal Hocko @ 2014-02-19 16:26 ` Nishanth Aravamudan 0 siblings, 0 replies; 16+ messages in thread From: Nishanth Aravamudan @ 2014-02-19 16:26 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, linuxppc-dev, Anton Blanchard, LKML On 19.02.2014 [09:23:13 +0100], Michal Hocko wrote: > On Tue 18-02-14 15:34:05, Nishanth Aravamudan wrote: > > Hi Michal, > > > > On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote: > > > Hi, > > > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by > > > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to > > > enable zone reclaim). The commit message suggests that the zone reclaim > > > is desirable for all NUMA configurations. > > > > > > History has shown that the zone reclaim is more often harmful than > > > helpful and leads to performance problems. The default RECLAIM_DISTANCE > > > for generic case has been increased from 20 to 30 around 3.0 > > > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30). > > > > Interesting. > > > > > I strongly suspect that the patch is incorrect and it should be > > > reverted. Before I will send a revert I would like to understand what > > > led to the patch in the first place. I do not see why would PPC use only > > > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have > > > seen use different values. > > > > > > Anton, could you comment please? > > > > I'll let Anton comment here, but in looking into this issue in working > > on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with > > memoryless nodes will set zone_reclaim_mode to 1. I think we want to > > ignore memoryless nodes when we set up the reclaim mode like the > > following? I'll send it as a proper patch if you agree? > > Funny enough, ppc memoryless node setup is what led me to this code. > We had a setup like this: > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > node 0 size: 0 MB > node 0 free: 0 MB > node 2 cpus: > node 2 size: 7168 MB > node 2 free: 6019 MB > node distances: > node 0 2 > 0: 10 40 > 2: 40 10 Yeah, I think this happens fairly often ... and we didn't properly support it (particularly with SLUB) on powerpc. I'll cc you on my patchset. > Which ends up enabling zone_reclaim although there is only a single node > with memory. Not that RECLAIM_DISTANCE would make any difference here as > the distance is even above default RECLAIM_DISTANCE. > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 5de4337..4f6ff6f 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid) > > { > > int i; > > > > - for_each_online_node(i) > > - if (node_distance(nid, i) <= RECLAIM_DISTANCE) > > + for_each_online_node(i) { > > + if (node_distance(nid, i) <= RECLAIM_DISTANCE || > > + local_memory_node(nid) != nid) > > node_set(i, NODE_DATA(nid)->reclaim_nodes); > > else > > zone_reclaim_mode = 1; > > > > Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is > > not set, but if it is, I think semantically it will indicate that > > memoryless nodes *have* to reclaim remotely. > > > > And actually the above won't work, because the callpath is > > > > start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes -> > > free_area_init_node -> init_zone_allows_reclaim] which is called before > > build_all_zonelists. This is a similar ordering problem as I'm having > > with the MEMORYLESS_NODE support, will work on it. > > I think you just want for_each_node_state(nid, N_MEMORY) and skip all > the memory less nodes, no? Yep, thanks! -Nish ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-02-20 9:55 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-18 9:06 ppc: RECLAIM_DISTANCE 10? Michal Hocko 2014-02-18 22:27 ` David Rientjes 2014-02-19 8:16 ` Michal Hocko 2014-02-19 8:20 ` David Rientjes 2014-02-19 9:19 ` Michal Hocko 2014-02-19 21:45 ` David Rientjes 2014-02-18 23:34 ` Nishanth Aravamudan 2014-02-18 23:58 ` Nishanth Aravamudan 2014-02-19 0:40 ` Nishanth Aravamudan 2014-02-19 1:43 ` David Rientjes 2014-02-19 8:33 ` Michal Hocko 2014-02-19 16:24 ` Nishanth Aravamudan 2014-02-19 16:33 ` Nishanth Aravamudan 2014-02-20 9:55 ` Michal Hocko 2014-02-19 8:23 ` Michal Hocko 2014-02-19 16:26 ` Nishanth Aravamudan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).