All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiang Liu <jiang.liu@linux.intel.com>
To: Tang Chen <tangchen@cn.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>,
	linux-mm@kvack.org, linux-hotplug@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [Patch V3 0/9] Enable memoryless node support for x86
Date: Wed, 19 Aug 2015 08:09:08 +0000	[thread overview]
Message-ID: <55D439A4.6020407@linux.intel.com> (raw)
In-Reply-To: <55D302CA.9010703@cn.fujitsu.com>

On 2015/8/18 18:02, Tang Chen wrote:
> 
> On 08/17/2015 11:18 AM, Jiang Liu wrote:
>> This is the third version to enable memoryless node support on x86
>> platforms. The previous version (https://lkml.org/lkml/2014/7/11/75)
>> blindly replaces numa_node_id()/cpu_to_node() with numa_mem_id()/
>> cpu_to_mem(). That's not the right solution as pointed out by Tejun
>> and Peter due to:
>> 1) We shouldn't shift the burden to normal slab users.
>> 2) Details of memoryless node should be hidden in arch and mm code
>>     as much as possible.
>>
>> After digging into more code and documentation, we found the rules to
>> deal with memoryless node should be:
>> 1) Arch code should online corresponding NUMA node before onlining any
>>     CPU or memory, otherwise it may cause invalid memory access when
>>     accessing NODE_DATA(nid).
>> 2) For normal memory allocations without __GFP_THISNODE setting in the
>>     gfp_flags, we should prefer numa_node_id()/cpu_to_node() instead of
>>     numa_mem_id()/cpu_to_mem() because the latter loses hardware topology
>>     information as pointed out by Tejun:
>>        A - B - X - C - D
>>     Where X is the memless node.  numa_mem_id() on X would return
>>     either B or C, right?  If B or C can't satisfy the allocation,
>>     the allocator would fallback to A from B and D for C, both of
>>     which aren't optimal. It should first fall back to C or B
>>     respectively, which the allocator can't do anymoe because the
>>     information is lost when the caller side performs numa_mem_id().
> 
> Hi Liu,
> 
> BTW, how is this A - B - X - C - D problem solved ?
> I don't quite follow this.
> 
> I cannot tell the difference between numa_node_id()/cpu_to_node() and
> numa_mem_id()/cpu_to_mem() on this point. Even with hardware topology
> info, how could it avoid this problem ?
> 
> Isn't it still possible falling back to A from B and D for C ?
Hi Chen,
For the imagined topology, A<->B<->X<->C<->D, where A, B, C, D has
memory and X is memoryless.
Possible fallback lists are:
B: [ B, A, C, D]
X: [ B, C, A, D]
C: [ C, D, B, A]

cpu_to_mem(X) will either return B or C. Let's assume it returns B.
Then we will use "B: [ B, A, C, D]" to allocate memory for X, which
is not the optimal fallback list for X. And cpu_to_node(X) returns
X, and "X: [ B, C, A, D]" is the optimal fallback list for X.
Thanks!
Gerry

WARNING: multiple messages have this Message-ID (diff)
From: Jiang Liu <jiang.liu@linux.intel.com>
To: Tang Chen <tangchen@cn.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>,
	linux-mm@kvack.org, linux-hotplug@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [Patch V3 0/9] Enable memoryless node support for x86
Date: Wed, 19 Aug 2015 16:09:08 +0800	[thread overview]
Message-ID: <55D439A4.6020407@linux.intel.com> (raw)
In-Reply-To: <55D302CA.9010703@cn.fujitsu.com>

On 2015/8/18 18:02, Tang Chen wrote:
> 
> On 08/17/2015 11:18 AM, Jiang Liu wrote:
>> This is the third version to enable memoryless node support on x86
>> platforms. The previous version (https://lkml.org/lkml/2014/7/11/75)
>> blindly replaces numa_node_id()/cpu_to_node() with numa_mem_id()/
>> cpu_to_mem(). That's not the right solution as pointed out by Tejun
>> and Peter due to:
>> 1) We shouldn't shift the burden to normal slab users.
>> 2) Details of memoryless node should be hidden in arch and mm code
>>     as much as possible.
>>
>> After digging into more code and documentation, we found the rules to
>> deal with memoryless node should be:
>> 1) Arch code should online corresponding NUMA node before onlining any
>>     CPU or memory, otherwise it may cause invalid memory access when
>>     accessing NODE_DATA(nid).
>> 2) For normal memory allocations without __GFP_THISNODE setting in the
>>     gfp_flags, we should prefer numa_node_id()/cpu_to_node() instead of
>>     numa_mem_id()/cpu_to_mem() because the latter loses hardware topology
>>     information as pointed out by Tejun:
>>        A - B - X - C - D
>>     Where X is the memless node.  numa_mem_id() on X would return
>>     either B or C, right?  If B or C can't satisfy the allocation,
>>     the allocator would fallback to A from B and D for C, both of
>>     which aren't optimal. It should first fall back to C or B
>>     respectively, which the allocator can't do anymoe because the
>>     information is lost when the caller side performs numa_mem_id().
> 
> Hi Liu,
> 
> BTW, how is this A - B - X - C - D problem solved ?
> I don't quite follow this.
> 
> I cannot tell the difference between numa_node_id()/cpu_to_node() and
> numa_mem_id()/cpu_to_mem() on this point. Even with hardware topology
> info, how could it avoid this problem ?
> 
> Isn't it still possible falling back to A from B and D for C ?
Hi Chen,
For the imagined topology, A<->B<->X<->C<->D, where A, B, C, D has
memory and X is memoryless.
Possible fallback lists are:
B: [ B, A, C, D]
X: [ B, C, A, D]
C: [ C, D, B, A]

cpu_to_mem(X) will either return B or C. Let's assume it returns B.
Then we will use "B: [ B, A, C, D]" to allocate memory for X, which
is not the optimal fallback list for X. And cpu_to_node(X) returns
X, and "X: [ B, C, A, D]" is the optimal fallback list for X.
Thanks!
Gerry

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Jiang Liu <jiang.liu@linux.intel.com>
To: Tang Chen <tangchen@cn.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>,
	linux-mm@kvack.org, linux-hotplug@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [Patch V3 0/9] Enable memoryless node support for x86
Date: Wed, 19 Aug 2015 16:09:08 +0800	[thread overview]
Message-ID: <55D439A4.6020407@linux.intel.com> (raw)
In-Reply-To: <55D302CA.9010703@cn.fujitsu.com>

On 2015/8/18 18:02, Tang Chen wrote:
> 
> On 08/17/2015 11:18 AM, Jiang Liu wrote:
>> This is the third version to enable memoryless node support on x86
>> platforms. The previous version (https://lkml.org/lkml/2014/7/11/75)
>> blindly replaces numa_node_id()/cpu_to_node() with numa_mem_id()/
>> cpu_to_mem(). That's not the right solution as pointed out by Tejun
>> and Peter due to:
>> 1) We shouldn't shift the burden to normal slab users.
>> 2) Details of memoryless node should be hidden in arch and mm code
>>     as much as possible.
>>
>> After digging into more code and documentation, we found the rules to
>> deal with memoryless node should be:
>> 1) Arch code should online corresponding NUMA node before onlining any
>>     CPU or memory, otherwise it may cause invalid memory access when
>>     accessing NODE_DATA(nid).
>> 2) For normal memory allocations without __GFP_THISNODE setting in the
>>     gfp_flags, we should prefer numa_node_id()/cpu_to_node() instead of
>>     numa_mem_id()/cpu_to_mem() because the latter loses hardware topology
>>     information as pointed out by Tejun:
>>        A - B - X - C - D
>>     Where X is the memless node.  numa_mem_id() on X would return
>>     either B or C, right?  If B or C can't satisfy the allocation,
>>     the allocator would fallback to A from B and D for C, both of
>>     which aren't optimal. It should first fall back to C or B
>>     respectively, which the allocator can't do anymoe because the
>>     information is lost when the caller side performs numa_mem_id().
> 
> Hi Liu,
> 
> BTW, how is this A - B - X - C - D problem solved ?
> I don't quite follow this.
> 
> I cannot tell the difference between numa_node_id()/cpu_to_node() and
> numa_mem_id()/cpu_to_mem() on this point. Even with hardware topology
> info, how could it avoid this problem ?
> 
> Isn't it still possible falling back to A from B and D for C ?
Hi Chen,
For the imagined topology, A<->B<->X<->C<->D, where A, B, C, D has
memory and X is memoryless.
Possible fallback lists are:
B: [ B, A, C, D]
X: [ B, C, A, D]
C: [ C, D, B, A]

cpu_to_mem(X) will either return B or C. Let's assume it returns B.
Then we will use "B: [ B, A, C, D]" to allocate memory for X, which
is not the optimal fallback list for X. And cpu_to_node(X) returns
X, and "X: [ B, C, A, D]" is the optimal fallback list for X.
Thanks!
Gerry

  reply	other threads:[~2015-08-19  8:09 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-17  3:18 [Patch V3 0/9] Enable memoryless node support for x86 Jiang Liu
2015-08-17  3:18 ` Jiang Liu
2015-08-17  3:18 ` Jiang Liu
2015-08-17  3:18 ` [Patch V3 1/9] x86, NUMA, ACPI: Online node earlier when doing CPU hot-addition Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-17  3:18 ` [Patch V3 2/9] kernel/profile.c: Replace cpu_to_mem() with cpu_to_node() Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-18  0:31   ` David Rientjes
2015-08-18  0:31     ` David Rientjes
2015-08-18  0:31     ` David Rientjes
2015-08-19  7:18     ` Jiang Liu
2015-08-19  7:18       ` Jiang Liu
2015-08-19  7:18       ` Jiang Liu
2015-08-20  0:00       ` David Rientjes
2015-08-20  0:00         ` David Rientjes
2015-08-20  0:00         ` David Rientjes
2015-10-09  2:35         ` Jiang Liu
2015-10-09  2:35           ` Jiang Liu
2015-10-09  2:35           ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 3/9] sgi-xp: Replace cpu_to_node() with cpu_to_mem() to support memoryless node Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  0:25   ` David Rientjes
2015-08-18  0:25     ` David Rientjes
2015-08-18  0:25     ` David Rientjes
2015-08-19  8:20     ` Jiang Liu
2015-08-19  8:20       ` Jiang Liu
2015-08-19  8:20       ` Jiang Liu
2015-08-20  0:02       ` David Rientjes
2015-08-20  0:02         ` David Rientjes
2015-08-20  0:02         ` David Rientjes
2015-08-20  6:36         ` Jiang Liu
2015-08-20  6:36           ` Jiang Liu
2015-08-20  6:36           ` Jiang Liu
2015-10-09  5:04           ` Jiang Liu
2015-10-09  5:04             ` Jiang Liu
2015-10-09  5:04             ` Jiang Liu
2015-08-19 11:52   ` Robin Holt
2015-08-19 11:52     ` Robin Holt
2015-08-19 11:52     ` Robin Holt
2015-08-19 12:45     ` Jiang Liu
2015-08-19 12:45       ` Jiang Liu
2015-08-19 12:45       ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 4/9] openvswitch: " Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  0:14   ` Pravin Shelar
2015-08-18  0:14     ` Pravin Shelar
2015-08-18  0:14     ` Pravin Shelar
2015-08-17  3:19 ` [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  0:35   ` [Intel-wired-lan] " David Rientjes
2015-08-18  0:35     ` David Rientjes
2015-08-18  0:35     ` David Rientjes
2015-08-18  0:35     ` David Rientjes
2015-08-19 17:04   ` [Intel-wired-lan] " Bowers, AndrewX
2015-08-19 22:38   ` Patil, Kiran
2015-08-19 22:38     ` Patil, Kiran
2015-08-19 22:38     ` Patil, Kiran
2015-08-20  0:18     ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-10-08 20:20       ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-09  5:52         ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  9:08           ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:25             ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-08-17  3:19 ` [Intel-wired-lan] [Patch V3 6/9] i40evf: " Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17 19:03   ` [Intel-wired-lan] " Patil, Kiran
2015-08-17 19:03     ` Patil, Kiran
2015-08-17 19:03     ` Patil, Kiran
2015-08-18 21:34     ` Jeff Kirsher
2015-08-18 21:34       ` Jeff Kirsher
2015-08-17  3:19 ` [Patch V3 7/9] x86, numa: Kill useless code to improve code readability Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 8/9] mm: Update _mem_id_[] for every possible CPU when memory configuration changes Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 9/9] mm, x86: Enable memoryless node support to better support CPU/memory hotplug Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  6:11   ` Tang Chen
2015-08-18  6:11     ` Tang Chen
2015-08-18  6:11     ` Tang Chen
2015-08-18  6:11     ` Tang Chen
2015-08-18  6:59     ` Jiang Liu
2015-08-18  6:59       ` Jiang Liu
2015-08-18 11:28       ` Tang Chen
2015-08-18 11:28         ` Tang Chen
2015-08-18 11:28         ` Tang Chen
2015-08-18 11:28         ` Tang Chen
2015-08-18  7:31   ` Ingo Molnar
2015-08-18  7:31     ` Ingo Molnar
2015-08-18  7:31     ` Ingo Molnar
2015-08-18  7:31     ` Ingo Molnar
2015-08-17 21:35 ` [Patch V3 0/9] Enable memoryless node support for x86 Andrew Morton
2015-08-17 21:35   ` Andrew Morton
2015-08-17 21:35   ` Andrew Morton
2015-08-18 10:02 ` Tang Chen
2015-08-18 10:02   ` Tang Chen
2015-08-18 10:02   ` Tang Chen
2015-08-19  8:09   ` Jiang Liu [this message]
2015-08-19  8:09     ` Jiang Liu
2015-08-19  8:09     ` Jiang Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55D439A4.6020407@linux.intel.com \
    --to=jiang.liu@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-hotplug@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rientjes@google.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=umgwanakikbuti@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.