From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755552AbcJMMpU (ORCPT ); Thu, 13 Oct 2016 08:45:20 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:48810 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753268AbcJMMou (ORCPT ); Thu, 13 Oct 2016 08:44:50 -0400 Date: Thu, 13 Oct 2016 15:24:54 +0530 From: Anshuman Khandual User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Michal Hocko , Mel Gorman CC: Linux Kernel Mailing List , Linux Memory Management List , Andrew Morton , "Aneesh Kumar K.V" , Balbir Singh , Vlastimil Babka , Minchan Kim Subject: Re: MPOL_BIND on memory only nodes References: <57FE0184.6030008@linux.vnet.ibm.com> <20161012094337.GH17128@dhcp22.suse.cz> <20161012131626.GL17128@dhcp22.suse.cz> In-Reply-To: <20161012131626.GL17128@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16101309-0012-0000-0000-000001D662CB X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16101309-0013-0000-0000-0000063475A7 Message-Id: <57FF59EE.9050508@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-13_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610130170 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12/2016 06:46 PM, Michal Hocko wrote: > On Wed 12-10-16 11:43:37, Michal Hocko wrote: >> On Wed 12-10-16 14:55:24, Anshuman Khandual wrote: > [...] >>> Why we insist on __GFP_THISNODE ? >> >> AFAIU __GFP_THISNODE just overrides the given node to the policy >> nodemask in case the current node is not part of that node mask. In >> other words we are ignoring the given node and use what the policy says. >> I can see how this can be confusing especially when confronting the >> documentation: >> >> * __GFP_THISNODE forces the allocation to be satisified from the requested >> * node with no fallbacks or placement policy enforcements. > > You made me think and look into this deeper. I came to the conclusion > that this is actually a relict from the past. policy_zonelist is called > only from 3 places: > - huge_zonelist - never should do __GFP_THISNODE when going this path > - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either > - alloc_pages_current - which uses default_policy id __GFP_THISNODE is > used > > So AFAICS this is essentially a dead code or I am missing something. Mel > do you remember why we needed it in the past? I would be really tempted > to just get rid of this confusing code and this instead: > --- > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index ad1c96ac313c..98beec47bba9 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1679,25 +1679,17 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) > static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy, > int nd) > { > - switch (policy->mode) { > - case MPOL_PREFERRED: > - if (!(policy->flags & MPOL_F_LOCAL)) > - nd = policy->v.preferred_node; > - break; > - case MPOL_BIND: > + if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) > + nd = policy->v.preferred_node; > + else { > /* > - * Normally, MPOL_BIND allocations are node-local within the > - * allowed nodemask. However, if __GFP_THISNODE is set and the > - * current node isn't part of the mask, we use the zonelist for > - * the first node in the mask instead. > + * __GFP_THISNODE shouldn't even be used with the bind policy because > + * we might easily break the expectation to stay on the requested node > + * and not break the policy. > */ > - if (unlikely(gfp & __GFP_THISNODE) && > - unlikely(!node_isset(nd, policy->v.nodes))) > - nd = first_node(policy->v.nodes); > - break; > - default: > - BUG(); > + WARN_ON_ONCE(polic->mode == MPOL_BIND && (gfp && __GFP_THISNODE)); > } > + > return node_zonelist(nd, gfp); > } Which makes the function look like this. Even with these changes, MPOL_BIND is still going to pick up the local node's zonelist instead of the first node in policy->v.nodes nodemask. It completely ignores policy->v.nodes which it should not. /* Return a zonelist indicated by gfp for node representing a mempolicy */ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy, int nd) { if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) nd = policy->v.preferred_node; else { /* * __GFP_THISNODE shouldn't even be used with the bind policy because * we might easily break the expectation to stay on the requested node * and not break the policy. */ WARN_ON_ONCE(polic->mode == MPOL_BIND && (gfp && __GFP_THISNODE)); } return node_zonelist(nd, gfp); }