All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch V3 0/9] Enable memoryless node support for x86
@ 2015-08-17  3:18 ` Jiang Liu
  0 siblings, 0 replies; 131+ messages in thread
From: Jiang Liu @ 2015-08-17  3:18 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, David Rientjes, Mike Galbraith,
	Peter Zijlstra, Rafael J . Wysocki, Tang Chen, Tejun Heo
  Cc: Jiang Liu, Tony Luck, linux-mm, linux-hotplug, linux-kernel, x86

This is the third version to enable memoryless node support on x86
platforms. The previous version (https://lkml.org/lkml/2014/7/11/75)
blindly replaces numa_node_id()/cpu_to_node() with numa_mem_id()/
cpu_to_mem(). That's not the right solution as pointed out by Tejun
and Peter due to:
1) We shouldn't shift the burden to normal slab users.
2) Details of memoryless node should be hidden in arch and mm code
   as much as possible.

After digging into more code and documentation, we found the rules to
deal with memoryless node should be:
1) Arch code should online corresponding NUMA node before onlining any
   CPU or memory, otherwise it may cause invalid memory access when
   accessing NODE_DATA(nid).
2) For normal memory allocations without __GFP_THISNODE setting in the
   gfp_flags, we should prefer numa_node_id()/cpu_to_node() instead of
   numa_mem_id()/cpu_to_mem() because the latter loses hardware topology
   information as pointed out by Tejun:
	   A - B - X - C - D
	Where X is the memless node.  numa_mem_id() on X would return
	either B or C, right?  If B or C can't satisfy the allocation,
	the allocator would fallback to A from B and D for C, both of
	which aren't optimal. It should first fall back to C or B
	respectively, which the allocator can't do anymoe because the
	information is lost when the caller side performs numa_mem_id().
3) For memory allocation with __GFP_THISNODE setting in gfp_flags,
   numa_node_id()/cpu_to_node() should be used if caller only wants to
   allocate from local memory, otherwise numa_mem_id()/cpu_to_mem()
   should be used if caller wants to allocate from the nearest node
   with memory.
4) numa_mem_id()/cpu_to_mem() should be used if caller wants to check
   whether a page is allocated from the nearest node.

Based on above rules, this patch set
1) Patch 1 is a bugfix to resolve a crash caused by socket hot-addition
2) Patch 2 replaces numa_mem_id() with numa_node_id() when __GFP_THISNODE
   isn't set in gfp_flags.
3) Patch 3-6 replaces numa_node_id()/cpu_to_node() with numa_mem_id()/
   cpu_to_mem() if caller wants to allocate from local node only.
4) Patch 7-9 enables support of memoryless node on x86.

With this patch set applied, on a system with two sockets enabled at boot,
one with memory and the other without memory, we got following numa
topology after boot:
root@bkd04sdp:~# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
node 0 size: 15940 MB
node 0 free: 15397 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 1 size: 0 MB
node 1 free: 0 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

After hot-adding the third socket without memory, we got:
root@bkd04sdp:~# numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
node 0 size: 15940 MB
node 0 free: 15142 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus:
node 2 size: 0 MB
node 2 free: 0 MB
node distances:
node   0   1   2
  0:  10  21  21
  1:  21  10  21
  2:  21  21  10

Jiang Liu (9):
  x86, NUMA, ACPI: Online node earlier when doing CPU hot-addition
  kernel/profile.c: Replace cpu_to_mem() with cpu_to_node()
  sgi-xp: Replace cpu_to_node() with cpu_to_mem() to support memoryless
    node
  openvswitch: Replace cpu_to_node() with cpu_to_mem() to support
    memoryless node
  i40e: Use numa_mem_id() to better support memoryless node
  i40evf: Use numa_mem_id() to better support memoryless node
  x86, numa: Kill useless code to improve code readability
  mm: Update _mem_id_[] for every possible CPU when memory
    configuration changes
  mm, x86: Enable memoryless node support to better support CPU/memory
    hotplug

 arch/x86/Kconfig                              |    3 ++
 arch/x86/kernel/acpi/boot.c                   |    9 +++-
 arch/x86/kernel/smpboot.c                     |    2 +
 arch/x86/mm/numa.c                            |   59 +++++++++++++++----------
 drivers/misc/sgi-xp/xpc_uv.c                  |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |    2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |    2 +-
 kernel/profile.c                              |    2 +-
 mm/page_alloc.c                               |   10 ++---
 net/openvswitch/flow.c                        |    2 +-
 10 files changed, 59 insertions(+), 34 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 131+ messages in thread
* [Intel-wired-lan] [Patch V3 6/9] i40evf: Use numa_mem_id() to better support memoryless node
@ 2015-08-21 16:41 Bowers, AndrewX
  0 siblings, 0 replies; 131+ messages in thread
From: Bowers, AndrewX @ 2015-08-21 16:41 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Jiang Liu
> Sent: Sunday, August 16, 2015 8:19 PM
> To: Andrew Morton; Mel Gorman; David Rientjes; Mike Galbraith; Peter
> Zijlstra; Wysocki, Rafael J; Tang Chen; Tejun Heo; Kirsher, Jeffrey T;
> Brandeburg, Jesse; Nelson, Shannon; Wyborny, Carolyn; Skidmore, Donald C;
> Vick, Matthew; Ronciak, John; Williams, Mitch A
> Cc: Luck, Tony; netdev at vger.kernel.org; x86 at kernel.org; linux-
> hotplug at vger.kernel.org; linux-kernel at vger.kernel.org; linux-
> mm at kvack.org; intel-wired-lan at lists.osuosl.org; Jiang Liu
> Subject: [Intel-wired-lan] [Patch V3 6/9] i40evf: Use numa_mem_id() to
> better support memoryless node
> 
> Function i40e_clean_rx_irq() tries to reuse memory pages allocated from the
> nearest node. To better support memoryless node, use
> numa_mem_id() instead of numa_node_id() to get the nearest node with
> memory.
> 
> This change should only affect performance.
> 
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Present in git log, code changes present in tree, does not break base driver.

^ permalink raw reply	[flat|nested] 131+ messages in thread

end of thread, other threads:[~2015-10-09  9:27 UTC | newest]

Thread overview: 131+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-17  3:18 [Patch V3 0/9] Enable memoryless node support for x86 Jiang Liu
2015-08-17  3:18 ` Jiang Liu
2015-08-17  3:18 ` Jiang Liu
2015-08-17  3:18 ` [Patch V3 1/9] x86, NUMA, ACPI: Online node earlier when doing CPU hot-addition Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-17  3:18 ` [Patch V3 2/9] kernel/profile.c: Replace cpu_to_mem() with cpu_to_node() Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-17  3:18   ` Jiang Liu
2015-08-18  0:31   ` David Rientjes
2015-08-18  0:31     ` David Rientjes
2015-08-18  0:31     ` David Rientjes
2015-08-19  7:18     ` Jiang Liu
2015-08-19  7:18       ` Jiang Liu
2015-08-19  7:18       ` Jiang Liu
2015-08-20  0:00       ` David Rientjes
2015-08-20  0:00         ` David Rientjes
2015-08-20  0:00         ` David Rientjes
2015-10-09  2:35         ` Jiang Liu
2015-10-09  2:35           ` Jiang Liu
2015-10-09  2:35           ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 3/9] sgi-xp: Replace cpu_to_node() with cpu_to_mem() to support memoryless node Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  0:25   ` David Rientjes
2015-08-18  0:25     ` David Rientjes
2015-08-18  0:25     ` David Rientjes
2015-08-19  8:20     ` Jiang Liu
2015-08-19  8:20       ` Jiang Liu
2015-08-19  8:20       ` Jiang Liu
2015-08-20  0:02       ` David Rientjes
2015-08-20  0:02         ` David Rientjes
2015-08-20  0:02         ` David Rientjes
2015-08-20  6:36         ` Jiang Liu
2015-08-20  6:36           ` Jiang Liu
2015-08-20  6:36           ` Jiang Liu
2015-10-09  5:04           ` Jiang Liu
2015-10-09  5:04             ` Jiang Liu
2015-10-09  5:04             ` Jiang Liu
2015-08-19 11:52   ` Robin Holt
2015-08-19 11:52     ` Robin Holt
2015-08-19 11:52     ` Robin Holt
2015-08-19 12:45     ` Jiang Liu
2015-08-19 12:45       ` Jiang Liu
2015-08-19 12:45       ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 4/9] openvswitch: " Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  0:14   ` Pravin Shelar
2015-08-18  0:14     ` Pravin Shelar
2015-08-18  0:14     ` Pravin Shelar
2015-08-17  3:19 ` [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  0:35   ` [Intel-wired-lan] " David Rientjes
2015-08-18  0:35     ` David Rientjes
2015-08-18  0:35     ` David Rientjes
2015-08-18  0:35     ` David Rientjes
2015-08-19 17:04   ` [Intel-wired-lan] " Bowers, AndrewX
2015-08-19 22:38   ` Patil, Kiran
2015-08-19 22:38     ` Patil, Kiran
2015-08-19 22:38     ` Patil, Kiran
2015-08-20  0:18     ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-08-20  0:18       ` David Rientjes
2015-10-08 20:20       ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-08 20:20         ` Andrew Morton
2015-10-09  5:52         ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  5:52           ` Jiang Liu
2015-10-09  9:08           ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:08             ` Kamezawa Hiroyuki
2015-10-09  9:25             ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-10-09  9:25               ` Jiang Liu
2015-08-17  3:19 ` [Intel-wired-lan] [Patch V3 6/9] i40evf: " Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17 19:03   ` [Intel-wired-lan] " Patil, Kiran
2015-08-17 19:03     ` Patil, Kiran
2015-08-17 19:03     ` Patil, Kiran
2015-08-18 21:34     ` Jeff Kirsher
2015-08-18 21:34       ` Jeff Kirsher
2015-08-17  3:19 ` [Patch V3 7/9] x86, numa: Kill useless code to improve code readability Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 8/9] mm: Update _mem_id_[] for every possible CPU when memory configuration changes Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19 ` [Patch V3 9/9] mm, x86: Enable memoryless node support to better support CPU/memory hotplug Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-17  3:19   ` Jiang Liu
2015-08-18  6:11   ` Tang Chen
2015-08-18  6:11     ` Tang Chen
2015-08-18  6:11     ` Tang Chen
2015-08-18  6:11     ` Tang Chen
2015-08-18  6:59     ` Jiang Liu
2015-08-18  6:59       ` Jiang Liu
2015-08-18 11:28       ` Tang Chen
2015-08-18 11:28         ` Tang Chen
2015-08-18 11:28         ` Tang Chen
2015-08-18 11:28         ` Tang Chen
2015-08-18  7:31   ` Ingo Molnar
2015-08-18  7:31     ` Ingo Molnar
2015-08-18  7:31     ` Ingo Molnar
2015-08-18  7:31     ` Ingo Molnar
2015-08-17 21:35 ` [Patch V3 0/9] Enable memoryless node support for x86 Andrew Morton
2015-08-17 21:35   ` Andrew Morton
2015-08-17 21:35   ` Andrew Morton
2015-08-18 10:02 ` Tang Chen
2015-08-18 10:02   ` Tang Chen
2015-08-18 10:02   ` Tang Chen
2015-08-19  8:09   ` Jiang Liu
2015-08-19  8:09     ` Jiang Liu
2015-08-19  8:09     ` Jiang Liu
  -- strict thread matches above, loose matches on Subject: below --
2015-08-21 16:41 [Intel-wired-lan] [Patch V3 6/9] i40evf: Use numa_mem_id() to better support memoryless node Bowers, AndrewX

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.