xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Elena Ufimtseva <ufimtseva@gmail.com>
To: xen-devel@lists.xen.org
Cc: keir@xen.org, Ian.Campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com,
	dario.faggioli@citrix.com, lccycc123@gmail.com,
	ian.jackson@eu.citrix.com, JBeulich@suse.com, sw@linux.com,
	Elena Ufimtseva <ufimtseva@gmail.com>
Subject: [PATCH RFC v2 0/7] xen: vNUMA introduction
Date: Fri, 13 Sep 2013 04:49:37 -0400	[thread overview]
Message-ID: <1379062177-13681-1-git-send-email-ufimtseva@gmail.com> (raw)

This series of patches introduces vNUMA topology awareness and
provides interfaces and data structures to enable vNUMA for 
PV domU guests.

vNUMA topology support should be supported by PV guest kernel. 
Corresponging patches should be applied.

Introduction
-------------

vNUMA topology is exposed to the PV guest to improve performance when running
workloads on NUMA machines.
XEN vNUMA implementation provides a way to create vNUMA-enabled guests on NUMA/UMA
and map vNUMA topology to physical NUMA in a optimal way.

XEN vNUMA support

Current set of patches introduces subop hypercall that is available for enlightened
PV guests with vNUMA patches applied.

Domain structure was modified to reflect per-domain vNUMA topology for use in other
vNUMA-aware subsystems (e.g. ballooning).

libxc

libxc provides interfaces to build PV guests with vNUMA support and in case of NUMA
machines provides initial memory allocation on physical NUMA nodes. This implemented by
utilizing nodemap formed by automatic NUMA placement. Details are in patch #3.

libxl

libxl provides a way to predefine in VM config vNUMA topology - number of vnodes,
memory arrangement, vcpus to vnodes assignment, distance map.

PV guest

As of now, only PV guest can take advantage of vNUMA functionality. vNUMA Linux patches
should be applied and NUMA support should be compiled in kernel.

Example of booting vNUMA enabled pv domU:

NUMA machine:
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       1        0        0
  2:       2        0        0
  3:       3        0        0
  4:       0        1        1
  5:       1        1        1
  6:       2        1        1
  7:       3        1        1
numa_info              :
node:    memsize    memfree    distances
   0:     17664      12243      10,20
   1:     16384      11929      20,10

VM config:

memory = 16384
vcpus = 8
name = "rcbig"
vnodes = 8
vnumamem = "2g, 2g, 2g, 2g, 2g, 2g, 2g, 2g"
vcpu_to_vnode ="5 6 7 4 3 2 1 0"


root@superpipe:~# xl list -n
Name                                        ID   Mem VCPUs  State   Time(s) NODE Affinity
Domain-0                                     0  4096     1     r-----     581.5 any node
r9                                           1  2048     1     -b----      19.9 0
rc9k1                                        2  2048     6     -b----      21.1 1
*rcbig                                        6 16384     8     -b----       4.9 any node

xl debug-keys u:
XEN) Memory location of each domain:
(XEN) Domain 0 (total: 1048576):
(XEN)     Node 0: 510411
(XEN)     Node 1: 538165
(XEN) Domain 2 (total: 524288):
(XEN)     Node 0: 0
(XEN)     Node 1: 524288
(XEN) Domain 3 (total: 4194304):
(XEN)     Node 0: 2621440
(XEN)     Node 1: 1572864
(XEN)     Domain has 8 vnodes
(XEN)         pnode 0: vnodes: 0 (2048), 1 (2048), 2 (2048), 3 (2048), 4 (2048), 
(XEN)         pnode 1: vnodes: 5 (2048), 6 (2048), 7 (2048), 
(XEN)    Domain vcpu to vnode: 5 6 7 4 3 2 1 0 


pv linux boot (domain 3):
[    0.000000] init_memory_mapping: [mem 0x00100000-0x37fffffff]
[    0.000000]  [mem 0x00100000-0x37fffffff] page 4k
[    0.000000] RAMDISK: [mem 0x01dd6000-0x0347dfff]
[    0.000000] vNUMA: memblk[0] - 0x0 0x80000000
[    0.000000] vNUMA: memblk[1] - 0x80000000 0x100000000
[    0.000000] vNUMA: memblk[2] - 0x100000000 0x180000000
[    0.000000] vNUMA: memblk[3] - 0x180000000 0x200000000
[    0.000000] vNUMA: memblk[4] - 0x200000000 0x280000000
[    0.000000] vNUMA: memblk[5] - 0x280000000 0x300000000
[    0.000000] vNUMA: memblk[6] - 0x300000000 0x380000000
[    0.000000] vNUMA: memblk[7] - 0x380000000 0x400000000
[    0.000000] NUMA: Initialized distance table, cnt=8
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x7fffffff]
[    0.000000]   NODE_DATA [mem 0x7ffd9000-0x7fffffff]
[    0.000000] Initmem setup node 1 [mem 0x80000000-0xffffffff]
[    0.000000]   NODE_DATA [mem 0xfffd9000-0xffffffff]
[    0.000000] Initmem setup node 2 [mem 0x100000000-0x17fffffff]
[    0.000000]   NODE_DATA [mem 0x17ffd9000-0x17fffffff]
[    0.000000] Initmem setup node 3 [mem 0x180000000-0x1ffffffff]
[    0.000000]   NODE_DATA [mem 0x1fffd9000-0x1ffffffff]
[    0.000000] Initmem setup node 4 [mem 0x200000000-0x27fffffff]
[    0.000000]   NODE_DATA [mem 0x27ffd9000-0x27fffffff]
[    0.000000] Initmem setup node 5 [mem 0x280000000-0x2ffffffff]
[    0.000000]   NODE_DATA [mem 0x2fffd9000-0x2ffffffff]
[    0.000000] Initmem setup node 6 [mem 0x300000000-0x37fffffff]
[    0.000000]   NODE_DATA [mem 0x37ffd9000-0x37fffffff]
[    0.000000] Initmem setup node 7 [mem 0x380000000-0x3ffffffff]
[    0.000000]   NODE_DATA [mem 0x3fdff7000-0x3fe01dfff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x3ffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009ffff]
[    0.000000]   node   0: [mem 0x00100000-0x7fffffff]
[    0.000000]   node   1: [mem 0x80000000-0xffffffff]
[    0.000000]   node   2: [mem 0x100000000-0x17fffffff]
[    0.000000]   node   3: [mem 0x180000000-0x1ffffffff]
[    0.000000]   node   4: [mem 0x200000000-0x27fffffff]
[    0.000000]   node   5: [mem 0x280000000-0x2ffffffff]
[    0.000000]   node   6: [mem 0x300000000-0x37fffffff]
[    0.000000]   node   7: [mem 0x380000000-0x3ffffffff]
[    0.000000] On node 0 totalpages: 524191
[    0.000000]   DMA zone: 56 pages used for memmap
[    0.000000]   DMA zone: 21 pages reserved
[    0.000000]   DMA zone: 3999 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 7112 pages used for memmap
[    0.000000]   DMA32 zone: 520192 pages, LIFO batch:31
[    0.000000] On node 1 totalpages: 524288
[    0.000000]   DMA32 zone: 7168 pages used for memmap
[    0.000000]   DMA32 zone: 524288 pages, LIFO batch:31
[    0.000000] On node 2 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 3 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 4 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 5 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 6 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] On node 7 totalpages: 524288
[    0.000000]   Normal zone: 7168 pages used for memmap
[    0.000000]   Normal zone: 524288 pages, LIFO batch:31
[    0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
[    0.000000] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] APIC: switched to apic NOOP
[    0.000000] nr_irqs_gsi: 16
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.4-unstable (preserve-AD)
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:8 nr_node_ids:8
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007fc00000 s85120 r8192 d21376 u2097152
[    0.000000] pcpu-alloc: s85120 r8192 d21376 u2097152 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 0 [1] 1 [2] 2 [3] 3 [4] 4 [5] 5 [6] 6 [7] 7 
[    0.000000] Built 8 zonelists in Node order, mobility grouping on.  Total pages: 4136842

numactl withing running guest:
root@heatpipe:~# numactl --ha
available: 8 nodes (0-7)
node 0 cpus: 7
node 0 size: 2047 MB
node 0 free: 2001 MB
node 1 cpus: 6
node 1 size: 2048 MB
node 1 free: 2008 MB
node 2 cpus: 5
node 2 size: 2048 MB
node 2 free: 2010 MB
node 3 cpus: 4
node 3 size: 2048 MB
node 3 free: 2009 MB
node 4 cpus: 3
node 4 size: 2048 MB
node 4 free: 2009 MB
node 5 cpus: 0
node 5 size: 2048 MB
node 5 free: 1982 MB
node 6 cpus: 1
node 6 size: 2048 MB
node 6 free: 2008 MB
node 7 cpus: 2
node 7 size: 2048 MB
node 7 free: 1944 MB
node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  20  20  20  20  20  20  20 
  1:  20  10  20  20  20  20  20  20 
  2:  20  20  10  20  20  20  20  20 
  3:  20  20  20  10  20  20  20  20 
  4:  20  20  20  20  10  20  20  20 
  5:  20  20  20  20  20  10  20  20 
  6:  20  20  20  20  20  20  10  20 
  7:  20  20  20  20  20  20  20  10

root@heatpipe:~# numastat -c

Per-node numastat info (in MBs):
                Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total
                ------ ------ ------ ------ ------ ------ ------ ------ -----
Numa_Hit            37     43     35     42     43     97     45     58   401
Numa_Miss            0      0      0      0      0      0      0      0     0
Numa_Foreign         0      0      0      0      0      0      0      0     0
Interleave_Hit       7      7      7      7      7      7      7      7    56
Local_Node          28     34     26     33     34     97     36     49   336
Other_Node           9      9      9      9      9      0      9      9    65

Patchset applies to latest Xen tree
commit e008e9119d03852020b93e1d4da9a80ec1af9c75 
Available at http://git.gitorious.org/xenvnuma/xenvnuma.git

Elena Ufimtseva (7):
  Xen vNUMA for PV guests.
  Per-domain vNUMA initialization.
  vNUMA nodes allocation on NUMA nodes.
  vNUMA libxl supporting functionality.
  vNUMA VM config parsing functions
  xl.cgf documentation update for vNUMA.
  NUMA debug-key additional output for vNUMA

 docs/man/xl.cfg.pod.5        |   50 +++++++++++
 tools/libxc/xc_dom.h         |    9 ++
 tools/libxc/xc_dom_x86.c     |   77 ++++++++++++++--
 tools/libxc/xc_domain.c      |   57 ++++++++++++
 tools/libxc/xenctrl.h        |    9 ++
 tools/libxc/xg_private.h     |    1 +
 tools/libxl/libxl.c          |   19 ++++
 tools/libxl/libxl.h          |   20 ++++-
 tools/libxl/libxl_arch.h     |    5 ++
 tools/libxl/libxl_dom.c      |  105 +++++++++++++++++++++-
 tools/libxl/libxl_internal.h |    3 +
 tools/libxl/libxl_types.idl  |    5 +-
 tools/libxl/libxl_x86.c      |   86 ++++++++++++++++++
 tools/libxl/xl_cmdimpl.c     |  205 ++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/numa.c          |   23 ++++-
 xen/common/domain.c          |   25 +++++-
 xen/common/domctl.c          |   68 +++++++++++++-
 xen/common/memory.c          |   56 ++++++++++++
 xen/include/public/domctl.h  |   15 +++-
 xen/include/public/memory.h  |    9 +-
 xen/include/xen/domain.h     |   11 +++
 xen/include/xen/sched.h      |    1 +
 xen/include/xen/vnuma.h      |   27 ++++++
 23 files changed, 869 insertions(+), 17 deletions(-)
 create mode 100644 xen/include/xen/vnuma.h

-- 
1.7.10.4

             reply	other threads:[~2013-09-13  8:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-13  8:49 Elena Ufimtseva [this message]
2013-09-13 10:38 ` [PATCH RFC v2 0/7] xen: vNUMA introduction Jan Beulich
2013-09-13 11:12   ` Dario Faggioli
2013-09-13 12:00     ` Jan Beulich
2013-09-13 11:19 ` George Dunlap
2013-09-13 12:25   ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1379062177-13681-1-git-send-email-ufimtseva@gmail.com \
    --to=ufimtseva@gmail.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=lccycc123@gmail.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=sw@linux.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).