From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
Anton Blanchard <anton@samba.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: ppc: RECLAIM_DISTANCE 10?
Date: Tue, 18 Feb 2014 15:58:00 -0800 [thread overview]
Message-ID: <20140218235800.GC10844@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140218233404.GB10844@linux.vnet.ibm.com>
On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote:
> Hi Michal,
>
> On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > Hi,
> > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > enable zone reclaim). The commit message suggests that the zone reclaim
> > is desirable for all NUMA configurations.
> >
> > History has shown that the zone reclaim is more often harmful than
> > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > for generic case has been increased from 20 to 30 around 3.0
> > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
>
> Interesting.
>
> > I strongly suspect that the patch is incorrect and it should be
> > reverted. Before I will send a revert I would like to understand what
> > led to the patch in the first place. I do not see why would PPC use only
> > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > seen use different values.
> >
> > Anton, could you comment please?
>
> I'll let Anton comment here, but in looking into this issue in working
> on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> ignore memoryless nodes when we set up the reclaim mode like the
> following? I'll send it as a proper patch if you agree?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..4f6ff6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + for_each_online_node(i) {
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + local_memory_node(nid) != nid)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
>
> Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> not set, but if it is, I think semantically it will indicate that
> memoryless nodes *have* to reclaim remotely.
>
> And actually the above won't work, because the callpath is
>
> start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> free_area_init_node -> init_zone_allows_reclaim] which is called before
> build_all_zonelists. This is a similar ordering problem as I'm having
> with the MEMORYLESS_NODE support, will work on it.
How about the following?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5de4337..1a0eced 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
int i;
for_each_online_node(i)
- if (node_distance(nid, i) <= RECLAIM_DISTANCE)
+ if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
+ !NODE_DATA(nid)->node_present_pages)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
zone_reclaim_mode = 1;
@@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
- init_zone_allows_reclaim(nid);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
#endif
calculate_node_totalpages(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);
+ init_zone_allows_reclaim(nid);
alloc_node_mem_map(pgdat);
#ifdef CONFIG_FLAT_NODE_MEM_MAP
printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
I think it's safe to move init_zone_allows_reclaim, because I don't
think any allocates are occurring here that could cause us to reclaim
anyways, right? Moving it allows us to safely reference
node_present_pages.
Thanks,
Nish
WARNING: multiple messages have this Message-ID (diff)
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
Anton Blanchard <anton@samba.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: ppc: RECLAIM_DISTANCE 10?
Date: Tue, 18 Feb 2014 15:58:00 -0800 [thread overview]
Message-ID: <20140218235800.GC10844@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140218233404.GB10844@linux.vnet.ibm.com>
On 18.02.2014 [15:34:05 -0800], Nishanth Aravamudan wrote:
> Hi Michal,
>
> On 18.02.2014 [10:06:58 +0100], Michal Hocko wrote:
> > Hi,
> > I have just noticed that ppc has RECLAIM_DISTANCE reduced to 10 set by
> > 56608209d34b (powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to
> > enable zone reclaim). The commit message suggests that the zone reclaim
> > is desirable for all NUMA configurations.
> >
> > History has shown that the zone reclaim is more often harmful than
> > helpful and leads to performance problems. The default RECLAIM_DISTANCE
> > for generic case has been increased from 20 to 30 around 3.0
> > (32e45ff43eaf mm: increase RECLAIM_DISTANCE to 30).
>
> Interesting.
>
> > I strongly suspect that the patch is incorrect and it should be
> > reverted. Before I will send a revert I would like to understand what
> > led to the patch in the first place. I do not see why would PPC use only
> > LOCAL_DISTANCE and REMOTE_DISTANCE distances and in fact machines I have
> > seen use different values.
> >
> > Anton, could you comment please?
>
> I'll let Anton comment here, but in looking into this issue in working
> on CONFIG_HAVE_MEMORYLESS_NODE support, I realized that any LPAR with
> memoryless nodes will set zone_reclaim_mode to 1. I think we want to
> ignore memoryless nodes when we set up the reclaim mode like the
> following? I'll send it as a proper patch if you agree?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5de4337..4f6ff6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1853,8 +1853,9 @@ static void __paginginit init_zone_allows_reclaim(int nid)
> {
> int i;
>
> - for_each_online_node(i)
> - if (node_distance(nid, i) <= RECLAIM_DISTANCE)
> + for_each_online_node(i) {
> + if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
> + local_memory_node(nid) != nid)
> node_set(i, NODE_DATA(nid)->reclaim_nodes);
> else
> zone_reclaim_mode = 1;
>
> Note, this won't actually do anything if CONFIG_HAVE_MEMORYLESS_NODES is
> not set, but if it is, I think semantically it will indicate that
> memoryless nodes *have* to reclaim remotely.
>
> And actually the above won't work, because the callpath is
>
> start_kernel -> setup_arch -> paging_init [-> free_area_init_nodes ->
> free_area_init_node -> init_zone_allows_reclaim] which is called before
> build_all_zonelists. This is a similar ordering problem as I'm having
> with the MEMORYLESS_NODE support, will work on it.
How about the following?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5de4337..1a0eced 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1854,7 +1854,8 @@ static void __paginginit init_zone_allows_reclaim(int nid)
int i;
for_each_online_node(i)
- if (node_distance(nid, i) <= RECLAIM_DISTANCE)
+ if (node_distance(nid, i) <= RECLAIM_DISTANCE ||
+ !NODE_DATA(nid)->node_present_pages)
node_set(i, NODE_DATA(nid)->reclaim_nodes);
else
zone_reclaim_mode = 1;
@@ -4901,13 +4902,13 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
- init_zone_allows_reclaim(nid);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
#endif
calculate_node_totalpages(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);
+ init_zone_allows_reclaim(nid);
alloc_node_mem_map(pgdat);
#ifdef CONFIG_FLAT_NODE_MEM_MAP
printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
I think it's safe to move init_zone_allows_reclaim, because I don't
think any allocates are occurring here that could cause us to reclaim
anyways, right? Moving it allows us to safely reference
node_present_pages.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-18 23:58 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-18 9:06 ppc: RECLAIM_DISTANCE 10? Michal Hocko
2014-02-18 9:06 ` Michal Hocko
2014-02-18 9:06 ` Michal Hocko
2014-02-18 22:27 ` David Rientjes
2014-02-18 22:27 ` David Rientjes
2014-02-18 22:27 ` David Rientjes
2014-02-19 8:16 ` Michal Hocko
2014-02-19 8:16 ` Michal Hocko
2014-02-19 8:16 ` Michal Hocko
2014-02-19 8:20 ` David Rientjes
2014-02-19 8:20 ` David Rientjes
2014-02-19 8:20 ` David Rientjes
2014-02-19 9:19 ` Michal Hocko
2014-02-19 9:19 ` Michal Hocko
2014-02-19 9:19 ` Michal Hocko
2014-02-19 21:45 ` David Rientjes
2014-02-19 21:45 ` David Rientjes
2014-02-19 21:45 ` David Rientjes
2014-02-18 23:34 ` Nishanth Aravamudan
2014-02-18 23:34 ` Nishanth Aravamudan
2014-02-18 23:34 ` Nishanth Aravamudan
2014-02-18 23:58 ` Nishanth Aravamudan [this message]
2014-02-18 23:58 ` Nishanth Aravamudan
2014-02-19 0:40 ` Nishanth Aravamudan
2014-02-19 0:40 ` Nishanth Aravamudan
2014-02-19 1:43 ` David Rientjes
2014-02-19 1:43 ` David Rientjes
2014-02-19 8:33 ` Michal Hocko
2014-02-19 8:33 ` Michal Hocko
2014-02-19 8:33 ` Michal Hocko
2014-02-19 16:24 ` Nishanth Aravamudan
2014-02-19 16:24 ` Nishanth Aravamudan
2014-02-19 16:33 ` Nishanth Aravamudan
2014-02-19 16:33 ` Nishanth Aravamudan
2014-02-20 9:55 ` Michal Hocko
2014-02-20 9:55 ` Michal Hocko
2014-02-20 9:55 ` Michal Hocko
2014-02-19 8:23 ` Michal Hocko
2014-02-19 8:23 ` Michal Hocko
2014-02-19 8:23 ` Michal Hocko
2014-02-19 16:26 ` Nishanth Aravamudan
2014-02-19 16:26 ` Nishanth Aravamudan
2014-02-19 16:26 ` Nishanth Aravamudan
2014-02-19 17:03 ` [RFC PATCH] mm: exclude memory less nodes from zone_reclaim Michal Hocko
2014-02-19 17:03 ` Michal Hocko
2014-02-19 17:16 ` Nishanth Aravamudan
2014-02-19 17:16 ` Nishanth Aravamudan
2014-02-19 17:32 ` Michal Hocko
2014-02-19 17:32 ` Michal Hocko
2014-02-19 17:49 ` Nishanth Aravamudan
2014-02-19 17:49 ` Nishanth Aravamudan
2014-02-19 19:40 ` Michal Hocko
2014-02-19 19:40 ` Michal Hocko
2014-02-19 17:53 ` Nishanth Aravamudan
2014-02-19 17:53 ` Nishanth Aravamudan
2014-02-19 21:56 ` David Rientjes
2014-02-19 21:56 ` David Rientjes
2014-02-19 23:05 ` Nishanth Aravamudan
2014-02-19 23:05 ` Nishanth Aravamudan
2014-02-20 9:50 ` Michal Hocko
2014-02-20 9:50 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140218235800.GC10844@linux.vnet.ibm.com \
--to=nacc@linux.vnet.ibm.com \
--cc=anton@samba.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.