From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752377AbbJIFEQ (ORCPT ); Fri, 9 Oct 2015 01:04:16 -0400 Received: from mga03.intel.com ([134.134.136.65]:50798 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750776AbbJIFEO (ORCPT ); Fri, 9 Oct 2015 01:04:14 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,657,1437462000"; d="scan'208";a="660835926" Subject: Re: [Patch V3 3/9] sgi-xp: Replace cpu_to_node() with cpu_to_mem() to support memoryless node To: David Rientjes References: <1439781546-7217-1-git-send-email-jiang.liu@linux.intel.com> <1439781546-7217-4-git-send-email-jiang.liu@linux.intel.com> <55D43C63.7060802@linux.intel.com> <55D5755C.5060803@linux.intel.com> Cc: Andrew Morton , Mel Gorman , Mike Galbraith , Peter Zijlstra , "Rafael J . Wysocki" , Tang Chen , Tejun Heo , Cliff Whickman , Robin Holt , Tony Luck , linux-mm@kvack.org, linux-hotplug@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org From: Jiang Liu Organization: Intel Message-ID: <56174AC9.4090104@linux.intel.com> Date: Fri, 9 Oct 2015 13:04:09 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <55D5755C.5060803@linux.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/8/20 14:36, Jiang Liu wrote: > On 2015/8/20 8:02, David Rientjes wrote: >> On Wed, 19 Aug 2015, Jiang Liu wrote: >> >>>> Why not simply fix build_zonelists_node() so that the __GFP_THISNODE >>>> zonelists are set up to reference the zones of cpu_to_mem() for memoryless >>>> nodes? >>>> >>>> It seems much better than checking and maintaining every __GFP_THISNODE >>>> user to determine if they are using a memoryless node or not. I don't >>>> feel that this solution is maintainable in the longterm. >>> Hi David, >>> There are some usage cases, such as memory migration, >>> expect the page allocator rejecting memory allocation requests >>> if there is no memory on local node. So we have: >>> 1) alloc_pages_node(cpu_to_node(), __GFP_THISNODE) to only allocate >>> memory from local node. >>> 2) alloc_pages_node(cpu_to_mem(), __GFP_THISNODE) to allocate memory >>> from local node or from nearest node if local node is memoryless. >>> >> >> Right, so do you think it would be better to make the default zonelists be >> setup so that cpu_to_node()->zonelists == cpu_to_mem()->zonelists and then >> individual callers that want to fail for memoryless nodes check >> populated_zone() themselves? > Hi David, > Great idea:) I think that means we are going to kill the > concept of memoryless node, and we only need to specially handle > a few callers who really care about whether there is memory on > local node. > Then I need some time to audit all usages of __GFP_THISNODE > and update you whether it's doable. Hi David, It seems that I'm too optimistic:(. After auditing all usages of __GFP_THISNODE and reading Documentation/vm/numa again, I feel it would be better to keep cpu_to_mem()/numa_mem_id(). It makes things more clear if we follow rules: 1) cpu_to_node()/numa_node_id() for schedule domain 2) cpu_to_mem()/numa_mem_id() for memory management domain 3) alloc_pages_node(cpu_to_node(cpu), __GFP_THIS_NODE) for special usage cases. And it would be easier for maintenance than open-coded checking of populated_zone() by using alloc_pages_node(cpu_to_node(cpu), __GFP_THIS_NODE). Thanks! Gerry