All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm, mempolicy: clean up __GFP_THISNODE confusion in policy_zonelist
Date: Fri, 21 Oct 2016 17:04:50 +0530	[thread overview]
Message-ID: <877f92ue91.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161013125958.32155-1-mhocko@kernel.org>

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> __GFP_THISNODE is documented to enforce the allocation to be satisified
> from the requested node with no fallbacks or placement policy
> enforcements. policy_zonelist seemingly breaks this semantic if the
> current policy is MPOL_MBIND and instead of taking the node it will
> fallback to the first node in the mask if the requested one is not in
> the mask. This is confusing to say the least because it fact we
> shouldn't ever go that path. First tasks shouldn't be scheduled on CPUs
> with nodes outside of their mempolicy binding. And secondly
> policy_zonelist is called only from 3 places:
> - huge_zonelist - never should do __GFP_THISNODE when going this path
> - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either
> - alloc_pages_current - which uses default_policy id __GFP_THISNODE is
>   used
>
> So we shouldn't even need to care about this possibility and can drop
> the confusing code. Let's keep a WARN_ON_ONCE in place to catch
> potential users and fix them up properly (aka use a different allocation
> function which ignores mempolicy).
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>
> Hi,
> I have noticed this while discussing this code [1]. The code as is
> quite confusing and I think it is worth cleaning up. I decided to be
> conservative and keep at least WARN_ON_ONCE if we have some caller which
> relies on __GFP_THISNODE in a mempolicy context so that we can fix it up.
>
> [1] http://lkml.kernel.org/r/57FE0184.6030008@linux.vnet.ibm.com
>
>  mm/mempolicy.c | 24 ++++++++----------------
>  1 file changed, 8 insertions(+), 16 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index ad1c96ac313c..33a305397bd4 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1679,25 +1679,17 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
>  static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy,
>  	int nd)
>  {
> -	switch (policy->mode) {
> -	case MPOL_PREFERRED:
> -		if (!(policy->flags & MPOL_F_LOCAL))
> -			nd = policy->v.preferred_node;
> -		break;
> -	case MPOL_BIND:
> +	if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL))
> +		nd = policy->v.preferred_node;
> +	else {
>  		/*
> -		 * Normally, MPOL_BIND allocations are node-local within the
> -		 * allowed nodemask.  However, if __GFP_THISNODE is set and the
> -		 * current node isn't part of the mask, we use the zonelist for
> -		 * the first node in the mask instead.
> +		 * __GFP_THISNODE shouldn't even be used with the bind policy because
> +		 * we might easily break the expectation to stay on the requested node
> +		 * and not break the policy.
>  		 */
> -		if (unlikely(gfp & __GFP_THISNODE) &&
> -				unlikely(!node_isset(nd, policy->v.nodes)))
> -			nd = first_node(policy->v.nodes);
> -		break;
> -	default:
> -		BUG();
> +		WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
>  	}
> +
>  	return node_zonelist(nd, gfp);
>  }
>  

For both MPOL_PREFERED and MPOL_INTERLEAVE we pick the zone list from
the node other than the current running node. Why don't we do that for
MPOL_BIND ?ie, if the current node is not part of the policy node mask
why are we not picking the first node from the policy node mask for
MPOL_BIND ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm, mempolicy: clean up __GFP_THISNODE confusion in policy_zonelist
Date: Fri, 21 Oct 2016 17:04:50 +0530	[thread overview]
Message-ID: <877f92ue91.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161013125958.32155-1-mhocko@kernel.org>

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> __GFP_THISNODE is documented to enforce the allocation to be satisified
> from the requested node with no fallbacks or placement policy
> enforcements. policy_zonelist seemingly breaks this semantic if the
> current policy is MPOL_MBIND and instead of taking the node it will
> fallback to the first node in the mask if the requested one is not in
> the mask. This is confusing to say the least because it fact we
> shouldn't ever go that path. First tasks shouldn't be scheduled on CPUs
> with nodes outside of their mempolicy binding. And secondly
> policy_zonelist is called only from 3 places:
> - huge_zonelist - never should do __GFP_THISNODE when going this path
> - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either
> - alloc_pages_current - which uses default_policy id __GFP_THISNODE is
>   used
>
> So we shouldn't even need to care about this possibility and can drop
> the confusing code. Let's keep a WARN_ON_ONCE in place to catch
> potential users and fix them up properly (aka use a different allocation
> function which ignores mempolicy).
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>
> Hi,
> I have noticed this while discussing this code [1]. The code as is
> quite confusing and I think it is worth cleaning up. I decided to be
> conservative and keep at least WARN_ON_ONCE if we have some caller which
> relies on __GFP_THISNODE in a mempolicy context so that we can fix it up.
>
> [1] http://lkml.kernel.org/r/57FE0184.6030008@linux.vnet.ibm.com
>
>  mm/mempolicy.c | 24 ++++++++----------------
>  1 file changed, 8 insertions(+), 16 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index ad1c96ac313c..33a305397bd4 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1679,25 +1679,17 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
>  static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy,
>  	int nd)
>  {
> -	switch (policy->mode) {
> -	case MPOL_PREFERRED:
> -		if (!(policy->flags & MPOL_F_LOCAL))
> -			nd = policy->v.preferred_node;
> -		break;
> -	case MPOL_BIND:
> +	if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL))
> +		nd = policy->v.preferred_node;
> +	else {
>  		/*
> -		 * Normally, MPOL_BIND allocations are node-local within the
> -		 * allowed nodemask.  However, if __GFP_THISNODE is set and the
> -		 * current node isn't part of the mask, we use the zonelist for
> -		 * the first node in the mask instead.
> +		 * __GFP_THISNODE shouldn't even be used with the bind policy because
> +		 * we might easily break the expectation to stay on the requested node
> +		 * and not break the policy.
>  		 */
> -		if (unlikely(gfp & __GFP_THISNODE) &&
> -				unlikely(!node_isset(nd, policy->v.nodes)))
> -			nd = first_node(policy->v.nodes);
> -		break;
> -	default:
> -		BUG();
> +		WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
>  	}
> +
>  	return node_zonelist(nd, gfp);
>  }
>  

For both MPOL_PREFERED and MPOL_INTERLEAVE we pick the zone list from
the node other than the current running node. Why don't we do that for
MPOL_BIND ?ie, if the current node is not part of the policy node mask
why are we not picking the first node from the policy node mask for
MPOL_BIND ?

-aneesh

  parent reply	other threads:[~2016-10-21 11:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-13 12:59 [PATCH] mm, mempolicy: clean up __GFP_THISNODE confusion in policy_zonelist Michal Hocko
2016-10-13 12:59 ` Michal Hocko
2016-10-18  9:44 ` Vlastimil Babka
2016-10-18  9:44   ` Vlastimil Babka
2016-10-21 11:34 ` Aneesh Kumar K.V [this message]
2016-10-21 11:34   ` Aneesh Kumar K.V
2016-10-21 11:52   ` Michal Hocko
2016-10-21 11:52     ` Michal Hocko
2016-10-21 12:08   ` Vlastimil Babka
2016-10-21 12:08     ` Vlastimil Babka
2016-10-21 12:25     ` Aneesh Kumar K.V
2016-10-21 12:25       ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877f92ue91.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.