From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>, Mel Gorman <mgorman@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Balbir Singh <bsingharora@gmail.com>,
Vlastimil Babka <vbabka@suse.cz>,
Minchan Kim <minchan@kernel.org>
Subject: Re: MPOL_BIND on memory only nodes
Date: Thu, 13 Oct 2016 15:24:54 +0530 [thread overview]
Message-ID: <57FF59EE.9050508@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161012131626.GL17128@dhcp22.suse.cz>
On 10/12/2016 06:46 PM, Michal Hocko wrote:
> On Wed 12-10-16 11:43:37, Michal Hocko wrote:
>> On Wed 12-10-16 14:55:24, Anshuman Khandual wrote:
> [...]
>>> Why we insist on __GFP_THISNODE ?
>>
>> AFAIU __GFP_THISNODE just overrides the given node to the policy
>> nodemask in case the current node is not part of that node mask. In
>> other words we are ignoring the given node and use what the policy says.
>> I can see how this can be confusing especially when confronting the
>> documentation:
>>
>> * __GFP_THISNODE forces the allocation to be satisified from the requested
>> * node with no fallbacks or placement policy enforcements.
>
> You made me think and look into this deeper. I came to the conclusion
> that this is actually a relict from the past. policy_zonelist is called
> only from 3 places:
> - huge_zonelist - never should do __GFP_THISNODE when going this path
> - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either
> - alloc_pages_current - which uses default_policy id __GFP_THISNODE is
> used
>
> So AFAICS this is essentially a dead code or I am missing something. Mel
> do you remember why we needed it in the past? I would be really tempted
> to just get rid of this confusing code and this instead:
> ---
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index ad1c96ac313c..98beec47bba9 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1679,25 +1679,17 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
> static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy,
> int nd)
> {
> - switch (policy->mode) {
> - case MPOL_PREFERRED:
> - if (!(policy->flags & MPOL_F_LOCAL))
> - nd = policy->v.preferred_node;
> - break;
> - case MPOL_BIND:
> + if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL))
> + nd = policy->v.preferred_node;
> + else {
> /*
> - * Normally, MPOL_BIND allocations are node-local within the
> - * allowed nodemask. However, if __GFP_THISNODE is set and the
> - * current node isn't part of the mask, we use the zonelist for
> - * the first node in the mask instead.
> + * __GFP_THISNODE shouldn't even be used with the bind policy because
> + * we might easily break the expectation to stay on the requested node
> + * and not break the policy.
> */
> - if (unlikely(gfp & __GFP_THISNODE) &&
> - unlikely(!node_isset(nd, policy->v.nodes)))
> - nd = first_node(policy->v.nodes);
> - break;
> - default:
> - BUG();
> + WARN_ON_ONCE(polic->mode == MPOL_BIND && (gfp && __GFP_THISNODE));
> }
> +
> return node_zonelist(nd, gfp);
> }
Which makes the function look like this. Even with these changes, MPOL_BIND is
still going to pick up the local node's zonelist instead of the first node in
policy->v.nodes nodemask. It completely ignores policy->v.nodes which it should
not.
/* Return a zonelist indicated by gfp for node representing a mempolicy */
static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy,
int nd)
{
if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL))
nd = policy->v.preferred_node;
else {
/*
* __GFP_THISNODE shouldn't even be used with the bind policy because
* we might easily break the expectation to stay on the requested node
* and not break the policy.
*/
WARN_ON_ONCE(polic->mode == MPOL_BIND && (gfp && __GFP_THISNODE));
}
return node_zonelist(nd, gfp);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>, Mel Gorman <mgorman@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Balbir Singh <bsingharora@gmail.com>,
Vlastimil Babka <vbabka@suse.cz>,
Minchan Kim <minchan@kernel.org>
Subject: Re: MPOL_BIND on memory only nodes
Date: Thu, 13 Oct 2016 15:24:54 +0530 [thread overview]
Message-ID: <57FF59EE.9050508@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161012131626.GL17128@dhcp22.suse.cz>
On 10/12/2016 06:46 PM, Michal Hocko wrote:
> On Wed 12-10-16 11:43:37, Michal Hocko wrote:
>> On Wed 12-10-16 14:55:24, Anshuman Khandual wrote:
> [...]
>>> Why we insist on __GFP_THISNODE ?
>>
>> AFAIU __GFP_THISNODE just overrides the given node to the policy
>> nodemask in case the current node is not part of that node mask. In
>> other words we are ignoring the given node and use what the policy says.
>> I can see how this can be confusing especially when confronting the
>> documentation:
>>
>> * __GFP_THISNODE forces the allocation to be satisified from the requested
>> * node with no fallbacks or placement policy enforcements.
>
> You made me think and look into this deeper. I came to the conclusion
> that this is actually a relict from the past. policy_zonelist is called
> only from 3 places:
> - huge_zonelist - never should do __GFP_THISNODE when going this path
> - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either
> - alloc_pages_current - which uses default_policy id __GFP_THISNODE is
> used
>
> So AFAICS this is essentially a dead code or I am missing something. Mel
> do you remember why we needed it in the past? I would be really tempted
> to just get rid of this confusing code and this instead:
> ---
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index ad1c96ac313c..98beec47bba9 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1679,25 +1679,17 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
> static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy,
> int nd)
> {
> - switch (policy->mode) {
> - case MPOL_PREFERRED:
> - if (!(policy->flags & MPOL_F_LOCAL))
> - nd = policy->v.preferred_node;
> - break;
> - case MPOL_BIND:
> + if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL))
> + nd = policy->v.preferred_node;
> + else {
> /*
> - * Normally, MPOL_BIND allocations are node-local within the
> - * allowed nodemask. However, if __GFP_THISNODE is set and the
> - * current node isn't part of the mask, we use the zonelist for
> - * the first node in the mask instead.
> + * __GFP_THISNODE shouldn't even be used with the bind policy because
> + * we might easily break the expectation to stay on the requested node
> + * and not break the policy.
> */
> - if (unlikely(gfp & __GFP_THISNODE) &&
> - unlikely(!node_isset(nd, policy->v.nodes)))
> - nd = first_node(policy->v.nodes);
> - break;
> - default:
> - BUG();
> + WARN_ON_ONCE(polic->mode == MPOL_BIND && (gfp && __GFP_THISNODE));
> }
> +
> return node_zonelist(nd, gfp);
> }
Which makes the function look like this. Even with these changes, MPOL_BIND is
still going to pick up the local node's zonelist instead of the first node in
policy->v.nodes nodemask. It completely ignores policy->v.nodes which it should
not.
/* Return a zonelist indicated by gfp for node representing a mempolicy */
static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy,
int nd)
{
if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL))
nd = policy->v.preferred_node;
else {
/*
* __GFP_THISNODE shouldn't even be used with the bind policy because
* we might easily break the expectation to stay on the requested node
* and not break the policy.
*/
WARN_ON_ONCE(polic->mode == MPOL_BIND && (gfp && __GFP_THISNODE));
}
return node_zonelist(nd, gfp);
}
next prev parent reply other threads:[~2016-10-13 9:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-12 9:25 MPOL_BIND on memory only nodes Anshuman Khandual
2016-10-12 9:25 ` Anshuman Khandual
2016-10-12 9:43 ` Michal Hocko
2016-10-12 9:43 ` Michal Hocko
2016-10-12 10:38 ` Anshuman Khandual
2016-10-12 10:38 ` Anshuman Khandual
2016-10-12 11:01 ` Michal Hocko
2016-10-12 11:01 ` Michal Hocko
2016-10-12 13:16 ` Michal Hocko
2016-10-12 13:16 ` Michal Hocko
2016-10-13 9:54 ` Anshuman Khandual [this message]
2016-10-13 9:54 ` Anshuman Khandual
2016-10-13 10:07 ` Michal Hocko
2016-10-13 10:07 ` Michal Hocko
2016-10-13 10:58 ` Anshuman Khandual
2016-10-13 10:58 ` Anshuman Khandual
2016-10-13 12:51 ` Michal Hocko
2016-10-13 12:51 ` Michal Hocko
2016-10-13 10:24 ` Mel Gorman
2016-10-13 10:24 ` Mel Gorman
2016-10-13 12:38 ` Michal Hocko
2016-10-13 12:38 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57FF59EE.9050508@linux.vnet.ibm.com \
--to=khandual@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=bsingharora@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.