[Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
@ 2013-12-04  7:58 Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c Wanlong Gao
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

As you know, QEMU can't direct it's memory allocation now, this may cause
guest cross node access performance regression.
And, the worse thing is that if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policy before
the pages are really mapped.

According to this patch set, we are able to set guest nodes memory policy
like following:

 -numa node,nodeid=0,cpus=0, \
 -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
 -numa node,nodeid=1,cpus=1 \
 -numa mem,size=1024M,policy=interleave,host-nodes=1

This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format.

And add a QMP command "query-numa" to show numa info through
this API.

And convert the "info numa" monitor command to use this
QMP command "query-numa".

This version removes "set-mem-policy" qmp and hmp commands temporarily
as Marcelo and Paolo suggested.


The simple test is like following:
=====================================================
Before:
# numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096  -smp 2 -numa node,nodeid=0,cpus=0,mem=2048 -numa node,nodeid=1,cpus=1,mem=2048 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H
[1] 13320
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 5111 MB
node 0 free: 4653 MB
node 1 cpus: 1 3
node 1 size: 5120 MB
node 1 free: 4764 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 5111 MB
node 0 free: 4317 MB
node 1 cpus: 1 3
node 1 size: 5120 MB
node 1 free: 876 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 



After:
# numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 4 -numa node,nodeid=0,cpus=0,cpus=2 -numa mem,size=2048M,policy=membind,host-nodes=0 -numa node,nodeid=0,cpus=1,cpus=3 -numa mem,size=2048M,policy=membind,host-nodes=1 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H
[1] 10862
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 5111 MB
node 0 free: 4718 MB
node 1 cpus: 1 3
node 1 size: 5120 MB
node 1 free: 4799 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 5111 MB
node 0 free: 2544 MB
node 1 cpus: 1 3
node 1 size: 5120 MB
node 1 free: 2725 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
===================================================


V1->V2:
    change to use QemuOpts in numa options (Paolo)
    handle Error in mpol parser (Paolo)
    change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
V2->V3:
    also handle Error in cpus parser (5/10)
    split out common parser from cpus and hostnode parser (Bandan 6/10)
V3-V4:
    rebase to request for comments
V4->V5:
    use OptVisitor and split -numa option (Paolo)
     - s/set-mpol/set-mem-policy (Andreas)
     - s/mem-policy/policy
     - s/mem-hostnode/host-nodes
    fix hmp command process after error (Luiz)
    add qmp command query-numa and convert info numa to it (Luiz)
V5->V6:
    remove tabs in json file (Laszlo, Paolo)
    add back "-numa node,mem=xxx" as legacy (Paolo)
    change cpus and host-nodes to array (Laszlo, Eric)
    change "nodeid" to "uint16"
    add NumaMemPolicy enum type (Eric)
    rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for repeating options" patch set, thanks for Laszlo's help
V6-V7:
    change UInt16 to uint16 (Laszlo)
    fix a typo in adding qmp command set-mem-policy
V7-V8:
    rebase to current master with Laszlo's V2 of OptsVisitor patch set
    fix an adding white space line error
V8->V9:
    rebase to current master
    check if total numa memory size is equal to ram_size (Paolo)
    add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo)
    replace the use of numa_num_configured_nodes() (Andrew)
    avoid abusing the fact i==nodeid (Andrew)
V9->V10:
    rebase to current master
    remove libnuma (Andrew)
    MAX_NODES=64 -> MAX_NODES=128 since libnuma selected 128 (Andrew)
    use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew)
    remove a useless clear_bit() operation (Andrew)
V10->V11:
    rebase to current master
    fix "maxnode" argument of mbind(2)
V11->V12:
    rebase to current master
    split patch 02/11 of V11 (Eduardo)
    add some max value check (Eduardo)
    split MAX_NODES change patch (Eduardo)
V12->V13:
    rebase to current master
    thanks for Luiz's review (Luiz)
    doc hmp command set-mem-policy (Luiz)
    rename: NUMAInfo -> NUMANode (Luiz)
V13->V14:
    remove "set-mem-policy" qmp and hmp commands (Marcelo, Paolo)
V14->V15:
    rebase to the current master
V15->V16:
    rebase to current master
    add more test log
V16->V17:
    use MemoryRegion to set policy instead of using "pc.ram" (Paolo)

Wanlong Gao (11):
  NUMA: move numa related code to new file numa.c
  NUMA: check if the total numa memory size is equal to ram_size
  NUMA: Add numa_info structure to contain numa nodes info
  NUMA: convert -numa option to use OptsVisitor
  NUMA: introduce NumaMemOptions
  NUMA: add "-numa mem," options
  NUMA: expand MAX_NODES from 64 to 128
  NUMA: parse guest numa nodes memory policy
  NUMA: set guest numa nodes memory policy
  NUMA: add qmp command query-numa
  NUMA: convert hmp command info_numa to use qmp command query_numa

 Makefile.target         |   2 +-
 cpus.c                  |  14 --
 hmp.c                   |  57 +++++++
 hmp.h                   |   1 +
 hw/i386/pc.c            |  21 ++-
 include/exec/memory.h   |  15 ++
 include/sysemu/cpus.h   |   1 -
 include/sysemu/sysemu.h |  18 ++-
 monitor.c               |  21 +--
 numa.c                  | 408 ++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json        | 112 +++++++++++++
 qemu-options.hx         |   6 +-
 qmp-commands.hx         |  49 ++++++
 vl.c                    | 160 +++----------------
 14 files changed, 698 insertions(+), 187 deletions(-)
 create mode 100644 numa.c

-- 
1.8.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-10 13:06   ` Eduardo Habkost
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size Wanlong Gao
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 Makefile.target         |   2 +-
 cpus.c                  |  14 ----
 include/sysemu/cpus.h   |   1 -
 include/sysemu/sysemu.h |   3 +
 numa.c                  | 182 ++++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                    | 139 +-----------------------------------
 6 files changed, 187 insertions(+), 154 deletions(-)
 create mode 100644 numa.c

diff --git a/Makefile.target b/Makefile.target
index af6ac7e..0197c17 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -109,7 +109,7 @@ endif #CONFIG_BSD_USER
 #########################################################
 # System emulator target
 ifdef CONFIG_SOFTMMU
-obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o
+obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
 obj-y += qtest.o
 obj-y += hw/
 obj-$(CONFIG_FDT) += device_tree.o
diff --git a/cpus.c b/cpus.c
index 01d128d..53360b0 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1297,20 +1297,6 @@ static void tcg_exec_all(void)
     exit_request = 0;
 }
 
-void set_numa_modes(void)
-{
-    CPUState *cpu;
-    int i;
-
-    CPU_FOREACH(cpu) {
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
-                cpu->numa_node = i;
-            }
-        }
-    }
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 6502488..4f79081 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -23,7 +23,6 @@ extern int smp_threads;
 #define smp_threads 1
 #endif
 
-void set_numa_modes(void);
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 495dae8..2509649 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -136,6 +136,9 @@ extern QEMUClockType rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern unsigned long *node_cpumask[MAX_NODES];
+void numa_add(const char *optarg);
+void set_numa_nodes(void);
+void set_numa_modes(void);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
new file mode 100644
index 0000000..ce7736a
--- /dev/null
+++ b/numa.c
@@ -0,0 +1,182 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2013 Fujitsu Ltd.
+ * Author: Wanlong Gao <gaowanlong@cn.fujitsu.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "sysemu/sysemu.h"
+
+static void numa_node_parse_cpus(int nodenr, const char *cpus)
+{
+    char *endptr;
+    unsigned long long value, endvalue;
+
+    /* Empty CPU range strings will be considered valid, they will simply
+     * not set any bit in the CPU bitmap.
+     */
+    if (!*cpus) {
+        return;
+    }
+
+    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
+        goto error;
+    }
+    if (*endptr == '-') {
+        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
+            goto error;
+        }
+    } else if (*endptr == '\0') {
+        endvalue = value;
+    } else {
+        goto error;
+    }
+
+    if (endvalue >= MAX_CPUMASK_BITS) {
+        endvalue = MAX_CPUMASK_BITS - 1;
+        fprintf(stderr,
+            "qemu: NUMA: A max of %d VCPUs are supported\n",
+             MAX_CPUMASK_BITS);
+    }
+
+    if (endvalue < value) {
+        goto error;
+    }
+
+    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    return;
+
+error:
+    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
+    exit(1);
+}
+
+void numa_add(const char *optarg)
+{
+    char option[128];
+    char *endptr;
+    unsigned long long nodenr;
+
+    optarg = get_opt_name(option, 128, optarg, ',');
+    if (*optarg == ',') {
+        optarg++;
+    }
+    if (!strcmp(option, "node")) {
+
+        if (nb_numa_nodes >= MAX_NODES) {
+            fprintf(stderr, "qemu: too many NUMA nodes\n");
+            exit(1);
+        }
+
+        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
+            nodenr = nb_numa_nodes;
+        } else {
+            if (parse_uint_full(option, &nodenr, 10) < 0) {
+                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
+                exit(1);
+            }
+        }
+
+        if (nodenr >= MAX_NODES) {
+            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
+            exit(1);
+        }
+
+        if (get_param_value(option, 128, "mem", optarg) == 0) {
+            node_mem[nodenr] = 0;
+        } else {
+            int64_t sval;
+            sval = strtosz(option, &endptr);
+            if (sval < 0 || *endptr) {
+                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
+                exit(1);
+            }
+            node_mem[nodenr] = sval;
+        }
+        if (get_param_value(option, 128, "cpus", optarg) != 0) {
+            numa_node_parse_cpus(nodenr, option);
+        }
+        nb_numa_nodes++;
+    } else {
+        fprintf(stderr, "Invalid -numa option: %s\n", option);
+        exit(1);
+    }
+}
+
+void set_numa_nodes(void)
+{
+    if (nb_numa_nodes > 0) {
+        int i;
+
+        if (nb_numa_nodes > MAX_NODES) {
+            nb_numa_nodes = MAX_NODES;
+        }
+
+        /* If no memory size if given for any node, assume the default case
+         * and distribute the available memory equally across all nodes
+         */
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (node_mem[i] != 0)
+                break;
+        }
+        if (i == nb_numa_nodes) {
+            uint64_t usedmem = 0;
+
+            /* On Linux, the each node's border has to be 8MB aligned,
+             * the final node gets the rest.
+             */
+            for (i = 0; i < nb_numa_nodes - 1; i++) {
+                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
+                usedmem += node_mem[i];
+            }
+            node_mem[i] = ram_size - usedmem;
+        }
+
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+                break;
+            }
+        }
+        /* assigning the VCPUs round-robin is easier to implement, guest OSes
+         * must cope with this anyway, because there are BIOSes out there in
+         * real machines which also use this scheme.
+         */
+        if (i == nb_numa_nodes) {
+            for (i = 0; i < max_cpus; i++) {
+                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+            }
+        }
+    }
+}
+
+void set_numa_modes(void)
+{
+    CPUState *cpu;
+    int i;
+
+    CPU_FOREACH(cpu) {
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+                cpu->numa_node = i;
+            }
+        }
+    }
+}
diff --git a/vl.c b/vl.c
index 8d5d874..ce0782d 100644
--- a/vl.c
+++ b/vl.c
@@ -1260,102 +1260,6 @@ char *get_boot_devices_list(size_t *size)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
-{
-    char *endptr;
-    unsigned long long value, endvalue;
-
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
-    }
-
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
-    } else {
-        goto error;
-    }
-
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
-    }
-
-    if (endvalue < value) {
-        goto error;
-    }
-
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
-    return;
-
-error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
-}
-
-static void numa_add(const char *optarg)
-{
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
-
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
-    }
-    if (!strcmp(option, "node")) {
-
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
-        }
-
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            node_mem[nodenr] = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
-        }
-        nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
-        exit(1);
-    }
-}
-
 static QemuOptsList qemu_smp_opts = {
     .name = "smp-opts",
     .implied_opt_name = "cpus",
@@ -4155,48 +4059,7 @@ int main(int argc, char **argv, char **envp)
 
     register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
 
-    if (nb_numa_nodes > 0) {
-        int i;
-
-        if (nb_numa_nodes > MAX_NODES) {
-            nb_numa_nodes = MAX_NODES;
-        }
-
-        /* If no memory size if given for any node, assume the default case
-         * and distribute the available memory equally across all nodes
-         */
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
-                break;
-        }
-        if (i == nb_numa_nodes) {
-            uint64_t usedmem = 0;
-
-            /* On Linux, the each node's border has to be 8MB aligned,
-             * the final node gets the rest.
-             */
-            for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
-            }
-            node_mem[i] = ram_size - usedmem;
-        }
-
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
-                break;
-            }
-        }
-        /* assigning the VCPUs round-robin is easier to implement, guest OSes
-         * must cope with this anyway, because there are BIOSes out there in
-         * real machines which also use this scheme.
-         */
-        if (i == nb_numa_nodes) {
-            for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
-            }
-        }
-    }
+    set_numa_nodes();
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
         exit(1);
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-10 13:15   ` Eduardo Habkost
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 03/11] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

If the total number of the assigned numa nodes memory is not
equal to the assigned ram size, it will write the wrong data
to ACPI talb, then the guest will ignore the wrong ACPI table
and recognize all memory to one node. It's buggy, we should
check it to ensure that we write the right data to ACPI table.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/numa.c b/numa.c
index ce7736a..beda80e 100644
--- a/numa.c
+++ b/numa.c
@@ -150,6 +150,16 @@ void set_numa_nodes(void)
             node_mem[i] = ram_size - usedmem;
         }
 
+        uint64_t numa_total = 0;
+        for (i = 0; i < nb_numa_nodes; i++) {
+            numa_total += node_mem[i];
+        }
+        if (numa_total != ram_size) {
+            fprintf(stderr, "qemu: numa nodes total memory size "
+                            "should equal to ram_size\n");
+            exit(1);
+        }
+
         for (i = 0; i < nb_numa_nodes; i++) {
             if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
                 break;
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 03/11] NUMA: Add numa_info structure to contain numa nodes info
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 04/11] NUMA: convert -numa option to use OptsVisitor Wanlong Gao
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hw/i386/pc.c            | 12 ++++++++----
 include/sysemu/sysemu.h |  8 ++++++--
 monitor.c               |  2 +-
 numa.c                  | 23 ++++++++++++-----------
 vl.c                    |  7 +++----
 5 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 12c436e..74c1f16 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -670,14 +670,14 @@ static FWCfgState *bochs_bios_init(void)
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
                 break;
             }
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
+        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + apic_id_limit + nb_numa_nodes) *
@@ -1072,8 +1072,12 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
     guest_info->apic_id_limit = pc_apic_id_limit(max_cpus);
     guest_info->apic_xrupt_override = kvm_allows_irq0_override();
     guest_info->numa_nodes = nb_numa_nodes;
-    guest_info->node_mem = g_memdup(node_mem, guest_info->numa_nodes *
+    guest_info->node_mem = g_malloc0(guest_info->numa_nodes *
                                     sizeof *guest_info->node_mem);
+    for (i = 0; i < nb_numa_nodes; i++) {
+        guest_info->node_mem[i] = numa_info[i].node_mem;
+    }
+
     guest_info->node_cpu = g_malloc0(guest_info->apic_id_limit *
                                      sizeof *guest_info->node_cpu);
 
@@ -1081,7 +1085,7 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < guest_info->apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 guest_info->node_cpu[apic_id] = j;
                 break;
             }
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 2509649..d873b42 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 #include "qapi-types.h"
 #include "qemu/notify.h"
 #include "qemu/main-loop.h"
+#include "qemu/bitmap.h"
 
 /* vl.c */
 
@@ -134,8 +135,11 @@ extern QEMUClockType rtc_clock;
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
-extern uint64_t node_mem[MAX_NODES];
-extern unsigned long *node_cpumask[MAX_NODES];
+typedef struct node_info {
+    uint64_t node_mem;
+    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+} NodeInfo;
+extern NodeInfo numa_info[MAX_NODES];
 void numa_add(const char *optarg);
 void set_numa_nodes(void);
 void set_numa_modes(void);
diff --git a/monitor.c b/monitor.c
index 845f608..b97b7d3 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2004,7 +2004,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         }
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            node_mem[i] >> 20);
+            numa_info[i].node_mem >> 20);
     }
 }
 
diff --git a/numa.c b/numa.c
index beda80e..1bc0fad 100644
--- a/numa.c
+++ b/numa.c
@@ -61,7 +61,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus)
         goto error;
     }
 
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
+    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
     return;
 
 error:
@@ -101,7 +101,7 @@ void numa_add(const char *optarg)
         }
 
         if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
+            numa_info[nodenr].node_mem = 0;
         } else {
             int64_t sval;
             sval = strtosz(option, &endptr);
@@ -109,7 +109,7 @@ void numa_add(const char *optarg)
                 fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
                 exit(1);
             }
-            node_mem[nodenr] = sval;
+            numa_info[nodenr].node_mem = sval;
         }
         if (get_param_value(option, 128, "cpus", optarg) != 0) {
             numa_node_parse_cpus(nodenr, option);
@@ -134,7 +134,7 @@ void set_numa_nodes(void)
          * and distribute the available memory equally across all nodes
          */
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
+            if (numa_info[i].node_mem != 0)
                 break;
         }
         if (i == nb_numa_nodes) {
@@ -144,15 +144,16 @@ void set_numa_nodes(void)
              * the final node gets the rest.
              */
             for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
+                numa_info[i].node_mem = (ram_size / nb_numa_nodes) &
+                                        ~((1 << 23UL) - 1);
+                usedmem += numa_info[i].node_mem;
             }
-            node_mem[i] = ram_size - usedmem;
+            numa_info[i].node_mem = ram_size - usedmem;
         }
 
         uint64_t numa_total = 0;
         for (i = 0; i < nb_numa_nodes; i++) {
-            numa_total += node_mem[i];
+            numa_total += numa_info[i].node_mem;
         }
         if (numa_total != ram_size) {
             fprintf(stderr, "qemu: numa nodes total memory size "
@@ -161,7 +162,7 @@ void set_numa_nodes(void)
         }
 
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
                 break;
             }
         }
@@ -171,7 +172,7 @@ void set_numa_nodes(void)
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
             }
         }
     }
@@ -184,7 +185,7 @@ void set_numa_modes(void)
 
     CPU_FOREACH(cpu) {
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
                 cpu->numa_node = i;
             }
         }
diff --git a/vl.c b/vl.c
index ce0782d..404c16a 100644
--- a/vl.c
+++ b/vl.c
@@ -250,8 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
     QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
 int nb_numa_nodes;
-uint64_t node_mem[MAX_NODES];
-unsigned long *node_cpumask[MAX_NODES];
+NodeInfo numa_info[MAX_NODES];
 
 uint8_t qemu_uuid[16];
 bool qemu_uuid_set;
@@ -2812,8 +2811,8 @@ int main(int argc, char **argv, char **envp)
     translation = BIOS_ATA_TRANSLATION_AUTO;
 
     for (i = 0; i < MAX_NODES; i++) {
-        node_mem[i] = 0;
-        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
+        numa_info[i].node_mem = 0;
+        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
     }
 
     nb_numa_nodes = 0;
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 04/11] NUMA: convert -numa option to use OptsVisitor
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (2 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 03/11] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 05/11] NUMA: introduce NumaMemOptions Wanlong Gao
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |   3 +-
 numa.c                  | 148 +++++++++++++++++++++++-------------------------
 qapi-schema.json        |  30 ++++++++++
 vl.c                    |  11 +++-
 4 files changed, 114 insertions(+), 78 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d873b42..20b05a3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -140,9 +140,10 @@ typedef struct node_info {
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
-void numa_add(const char *optarg);
 void set_numa_nodes(void);
 void set_numa_modes(void);
+extern QemuOptsList qemu_numa_opts;
+int numa_init_func(QemuOpts *opts, void *opaque);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
index 1bc0fad..c4fa665 100644
--- a/numa.c
+++ b/numa.c
@@ -24,101 +24,97 @@
  */
 
 #include "sysemu/sysemu.h"
-
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
+#include "qapi-visit.h"
+#include "qapi/opts-visitor.h"
+#include "qapi/dealloc-visitor.h"
+QemuOptsList qemu_numa_opts = {
+    .name = "numa",
+    .implied_opt_name = "type",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
+    .desc = { { 0 } } /* validated with OptsVisitor */
+};
+
+static int numa_node_parse(NumaNodeOptions *opts)
 {
-    char *endptr;
-    unsigned long long value, endvalue;
-
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
-    }
+    uint16_t nodenr;
+    uint16List *cpus = NULL;
 
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
+    if (opts->has_nodeid) {
+        nodenr = opts->nodeid;
     } else {
-        goto error;
+        nodenr = nb_numa_nodes;
     }
 
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
+    if (nodenr >= MAX_NODES) {
+        fprintf(stderr, "qemu: Max number of NUMA nodes reached: %"
+                PRIu16 "\n", nodenr);
+        return -1;
     }
 
-    if (endvalue < value) {
-        goto error;
+    for (cpus = opts->cpus; cpus; cpus = cpus->next) {
+        if (cpus->value > MAX_CPUMASK_BITS) {
+            fprintf(stderr, "qemu: cpu number %" PRIu16 " is bigger than %d",
+                    cpus->value, MAX_CPUMASK_BITS);
+            continue;
+        }
+        bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
     }
 
-    bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1);
-    return;
+    if (opts->has_mem) {
+        int64_t mem_size;
+        char *endptr;
+        mem_size = strtosz(opts->mem, &endptr);
+        if (mem_size < 0 || *endptr) {
+            fprintf(stderr, "qemu: invalid numa mem size: %s\n", opts->mem);
+            return -1;
+        }
+        numa_info[nodenr].node_mem = mem_size;
+    }
 
-error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
+    return 0;
 }
 
-void numa_add(const char *optarg)
+int numa_init_func(QemuOpts *opts, void *opaque)
 {
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
-
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
+    NumaOptions *object = NULL;
+    Error *err = NULL;
+    int ret = 0;
+
+    {
+        OptsVisitor *ov = opts_visitor_new(opts);
+        visit_type_NumaOptions(opts_get_visitor(ov), &object, NULL, &err);
+        opts_visitor_cleanup(ov);
     }
-    if (!strcmp(option, "node")) {
-
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
 
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
-        }
-
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        ret = -1;
+        goto error;
+    }
 
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            numa_info[nodenr].node_mem = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            numa_info[nodenr].node_mem = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
+    switch (object->kind) {
+    case NUMA_OPTIONS_KIND_NODE:
+        ret = numa_node_parse(object->node);
+        if (ret) {
+            goto error;
         }
         nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
-        exit(1);
+        break;
+    default:
+        fprintf(stderr, "qemu: Invalid NUMA options type.\n");
+        ret = -1;
     }
+
+error:
+    if (object) {
+        QapiDeallocVisitor *dv = qapi_dealloc_visitor_new();
+        visit_type_NumaOptions(qapi_dealloc_get_visitor(dv),
+                               &object, NULL, NULL);
+        qapi_dealloc_visitor_cleanup(dv);
+    }
+
+    return ret;
 }
 
 void set_numa_nodes(void)
diff --git a/qapi-schema.json b/qapi-schema.json
index 83fa485..db539b6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4213,3 +4213,33 @@
 # Since: 1.7
 ##
 { 'command': 'blockdev-add', 'data': { 'options': 'BlockdevOptions' } }
+
+##
+# @NumaOptions
+#
+# A discriminated record of NUMA options. (for OptsVisitor)
+#
+# Since 1.7
+##
+{ 'union': 'NumaOptions',
+  'data': {
+    'node': 'NumaNodeOptions' }}
+
+##
+# @NumaNodeOptions
+#
+# Create a guest NUMA node. (for OptsVisitor)
+#
+# @nodeid: #optional NUMA node ID
+#
+# @cpus: #optional VCPUs belong to this node
+#
+# @mem: #optional memory size of this node (remain as legacy)
+#
+# Since: 1.7
+##
+{ 'type': 'NumaNodeOptions',
+  'data': {
+   '*nodeid': 'uint16',
+   '*cpus':   ['uint16'],
+   '*mem':    'str' }}
diff --git a/vl.c b/vl.c
index 404c16a..e67f34a 100644
--- a/vl.c
+++ b/vl.c
@@ -2791,6 +2791,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_tpmdev_opts);
     qemu_add_opts(&qemu_realtime_opts);
     qemu_add_opts(&qemu_msg_opts);
+    qemu_add_opts(&qemu_numa_opts);
 
     runstate_init();
 
@@ -2977,7 +2978,10 @@ int main(int argc, char **argv, char **envp)
                 }
                 break;
             case QEMU_OPTION_numa:
-                numa_add(optarg);
+                opts = qemu_opts_parse(qemu_find_opts("numa"), optarg, 1);
+                if (!opts) {
+                    exit(1);
+                }
                 break;
             case QEMU_OPTION_display:
                 display_type = select_display(optarg);
@@ -4058,6 +4062,11 @@ int main(int argc, char **argv, char **envp)
 
     register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
 
+    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
+                          NULL, 1) != 0) {
+        exit(1);
+    }
+
     set_numa_nodes();
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 05/11] NUMA: introduce NumaMemOptions
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (3 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 04/11] NUMA: convert -numa option to use OptsVisitor Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 06/11] NUMA: add "-numa mem," options Wanlong Gao
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 qapi-schema.json | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index db539b6..1043e57 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4223,7 +4223,8 @@
 ##
 { 'union': 'NumaOptions',
   'data': {
-    'node': 'NumaNodeOptions' }}
+    'node': 'NumaNodeOptions',
+    'mem' : 'NumaMemOptions' }}
 
 ##
 # @NumaNodeOptions
@@ -4243,3 +4244,19 @@
    '*nodeid': 'uint16',
    '*cpus':   ['uint16'],
    '*mem':    'str' }}
+
+##
+# @NumaMemOptions
+#
+# Set memory information of guest NUMA node. (for OptsVisitor)
+#
+# @nodeid: #optional NUMA node ID
+#
+# @size: #optional memory size of this node
+#
+# Since 1.7
+##
+{ 'type': 'NumaMemOptions',
+  'data': {
+   '*nodeid': 'uint16',
+   '*size':   'size' }}
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 06/11] NUMA: add "-numa mem," options
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (4 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 05/11] NUMA: introduce NumaMemOptions Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 07/11] NUMA: expand MAX_NODES from 64 to 128 Wanlong Gao
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Add "-numa mem," option like following as Paolo suggested:

    -numa mem,nodeid=0,size=1G

This new option will make later coming memory hotplug better.

We will use the new options to specify nodes memory info,
and just remain "-numa node,mem=xx" as legacy.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  1 +
 numa.c                  | 36 ++++++++++++++++++++++++++++++++++++
 qemu-options.hx         |  6 ++++--
 vl.c                    |  2 ++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 20b05a3..291aa6a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -135,6 +135,7 @@ extern QEMUClockType rtc_clock;
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
+extern int nb_numa_mem_nodes;
 typedef struct node_info {
     uint64_t node_mem;
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
diff --git a/numa.c b/numa.c
index c4fa665..c676c5e 100644
--- a/numa.c
+++ b/numa.c
@@ -74,6 +74,31 @@ static int numa_node_parse(NumaNodeOptions *opts)
     return 0;
 }
 
+static int numa_mem_parse(NumaMemOptions *opts)
+{
+    uint16_t nodenr;
+    uint64_t mem_size;
+
+    if (opts->has_nodeid) {
+        nodenr = opts->nodeid;
+    } else {
+        nodenr = nb_numa_mem_nodes;
+    }
+
+    if (nodenr >= MAX_NODES) {
+        fprintf(stderr, "qemu: Max number of NUMA nodes reached: %"
+                PRIu16 "\n", nodenr);
+        return -1;
+    }
+
+    if (opts->has_size) {
+        mem_size = opts->size;
+        numa_info[nodenr].node_mem = mem_size;
+    }
+
+    return 0;
+}
+
 int numa_init_func(QemuOpts *opts, void *opaque)
 {
     NumaOptions *object = NULL;
@@ -101,6 +126,13 @@ int numa_init_func(QemuOpts *opts, void *opaque)
         }
         nb_numa_nodes++;
         break;
+    case NUMA_OPTIONS_KIND_MEM:
+        ret = numa_mem_parse(object->mem);
+        if (ret) {
+            goto error;
+        }
+        nb_numa_mem_nodes++;
+        break;
     default:
         fprintf(stderr, "qemu: Invalid NUMA options type.\n");
         ret = -1;
@@ -119,6 +151,10 @@ error:
 
 void set_numa_nodes(void)
 {
+    if (nb_numa_mem_nodes > nb_numa_nodes) {
+        nb_numa_nodes = nb_numa_mem_nodes;
+    }
+
     if (nb_numa_nodes > 0) {
         int i;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b94264..e6afb6f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -95,11 +95,13 @@ specifies the maximum number of hotpluggable CPUs.
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
+    "-numa node[,nodeid=node][,cpus=cpu[-cpu]]\n"
+    "-numa mem[,nodeid=node][,size=size]\n"
+    , QEMU_ARCH_ALL)
 STEXI
 @item -numa @var{opts}
 @findex -numa
-Simulate a multi node NUMA system. If mem and cpus are omitted, resources
+Simulate a multi node NUMA system. If @var{size} and @var{cpus} are omitted, resources
 are split equally.
 ETEXI
 
diff --git a/vl.c b/vl.c
index e67f34a..064b821 100644
--- a/vl.c
+++ b/vl.c
@@ -250,6 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
     QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
 int nb_numa_nodes;
+int nb_numa_mem_nodes;
 NodeInfo numa_info[MAX_NODES];
 
 uint8_t qemu_uuid[16];
@@ -2817,6 +2818,7 @@ int main(int argc, char **argv, char **envp)
     }
 
     nb_numa_nodes = 0;
+    nb_numa_mem_nodes = 0;
     nb_nics = 0;
 
     bdrv_init_with_whitelist();
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 07/11] NUMA: expand MAX_NODES from 64 to 128
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (5 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 06/11] NUMA: add "-numa mem," options Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 08/11] NUMA: parse guest numa nodes memory policy Wanlong Gao
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

libnuma choosed 128 for MAX_NODES, so we follow libnuma here.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 291aa6a..807619e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -132,7 +132,7 @@ extern size_t boot_splash_filedata_size;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 
-#define MAX_NODES 64
+#define MAX_NODES 128
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
 extern int nb_numa_mem_nodes;
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 08/11] NUMA: parse guest numa nodes memory policy
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (6 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 07/11] NUMA: expand MAX_NODES from 64 to 128 Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 09/11] NUMA: set " Wanlong Gao
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

The memory policy setting format is like:
    policy={default|membind|interleave|preferred}[,relative=true],host-nodes=N-N
And we are adding this setting as a suboption of "-numa mem,",
the memory policy then can be set like following:
    -numa node,nodeid=0,cpus=0 \
    -numa node,nodeid=1,cpus=1 \
    -numa mem,nodeid=0,size=1G,policy=membind,host-nodes=0-1 \
    -numa mem,nodeid=1,size=1G,policy=interleave,relative=true,host-nodes=1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  3 +++
 numa.c                  | 18 ++++++++++++++++++
 qapi-schema.json        | 33 +++++++++++++++++++++++++++++++--
 vl.c                    |  3 +++
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 807619e..82f1447 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -139,6 +139,9 @@ extern int nb_numa_mem_nodes;
 typedef struct node_info {
     uint64_t node_mem;
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+    DECLARE_BITMAP(host_mem, MAX_NODES);
+    NumaNodePolicy policy;
+    bool relative;
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
 void set_numa_nodes(void);
diff --git a/numa.c b/numa.c
index c676c5e..da4dbbd 100644
--- a/numa.c
+++ b/numa.c
@@ -78,6 +78,7 @@ static int numa_mem_parse(NumaMemOptions *opts)
 {
     uint16_t nodenr;
     uint64_t mem_size;
+    uint16List *nodes;
 
     if (opts->has_nodeid) {
         nodenr = opts->nodeid;
@@ -96,6 +97,23 @@ static int numa_mem_parse(NumaMemOptions *opts)
         numa_info[nodenr].node_mem = mem_size;
     }
 
+    if (opts->has_policy) {
+        numa_info[nodenr].policy = opts->policy;
+    }
+
+    if (opts->has_relative) {
+        numa_info[nodenr].relative = opts->relative;
+    }
+
+    for (nodes = opts->host_nodes; nodes; nodes = nodes->next) {
+        if (nodes->value > MAX_NODES) {
+            fprintf(stderr, "qemu: node number %" PRIu16 " is bigger than %d\n",
+                    nodes->value, MAX_NODES);
+            continue;
+        }
+        bitmap_set(numa_info[nodenr].host_mem, nodes->value, 1);
+    }
+
     return 0;
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 1043e57..c0dad81 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4246,6 +4246,26 @@
    '*mem':    'str' }}
 
 ##
+# @NumaNodePolicy
+#
+# NUMA node policy types
+#
+# @default: restore default policy, remove any nondefault policy
+#
+# @preferred: set the preferred node for allocation
+#
+# @membind: a strict policy that restricts memory allocation to the
+#           nodes specified
+#
+# @interleave: the page allocations is interleaved across the set
+#              of nodes specified
+#
+# Since 1.7
+##
+{ 'enum': 'NumaNodePolicy',
+  'data': [ 'default', 'preferred', 'membind', 'interleave' ] }
+
+##
 # @NumaMemOptions
 #
 # Set memory information of guest NUMA node. (for OptsVisitor)
@@ -4254,9 +4274,18 @@
 #
 # @size: #optional memory size of this node
 #
+# @policy: #optional memory policy of this node
+#
+# @relative: #optional if the nodes specified are relative
+#
+# @host-nodes: #optional host nodes for its memory policy
+#
 # Since 1.7
 ##
 { 'type': 'NumaMemOptions',
   'data': {
-   '*nodeid': 'uint16',
-   '*size':   'size' }}
+   '*nodeid':     'uint16',
+   '*size':       'size',
+   '*policy':     'NumaNodePolicy',
+   '*relative':   'bool',
+   '*host-nodes': ['uint16'] }}
diff --git a/vl.c b/vl.c
index 064b821..95d03f5 100644
--- a/vl.c
+++ b/vl.c
@@ -2815,6 +2815,9 @@ int main(int argc, char **argv, char **envp)
     for (i = 0; i < MAX_NODES; i++) {
         numa_info[i].node_mem = 0;
         bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
+        bitmap_zero(numa_info[i].host_mem, MAX_NODES);
+        numa_info[i].policy = NUMA_NODE_POLICY_DEFAULT;
+        numa_info[i].relative = false;
     }
 
     nb_numa_nodes = 0;
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 09/11] NUMA: set guest numa nodes memory policy
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (7 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 08/11] NUMA: parse guest numa nodes memory policy Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 10/11] NUMA: add qmp command query-numa Wanlong Gao
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Set the guest numa nodes memory policies using the mbind(2)
system call node by node.
After this patch, we are able to set guest nodes memory policies
through the QEMU options, this arms to solve the guest cross
nodes memory access performance issue.
And as you all know, if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policies before
the pages are really mapped.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hw/i386/pc.c          |  9 +++++
 include/exec/memory.h | 15 ++++++++
 numa.c                | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 123 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 74c1f16..07553f2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1178,6 +1178,10 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
     memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
                              0, below_4g_mem_size);
     memory_region_add_subregion(system_memory, 0, ram_below_4g);
+    if (memory_region_set_mem_policy(ram_below_4g, 0, below_4g_mem_size, 0)) {
+        fprintf(stderr, "qemu: set below 4g memory policy failed\n");
+        exit(1);
+    }
     e820_add_entry(0, below_4g_mem_size, E820_RAM);
     if (above_4g_mem_size > 0) {
         ram_above_4g = g_malloc(sizeof(*ram_above_4g));
@@ -1185,6 +1189,11 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
                                  below_4g_mem_size, above_4g_mem_size);
         memory_region_add_subregion(system_memory, 0x100000000ULL,
                                     ram_above_4g);
+        if (memory_region_set_mem_policy(ram_above_4g, 0, above_4g_mem_size,
+                                     below_4g_mem_size)) {
+            fprintf(stderr, "qemu: set above 4g memory policy failed\n");
+            exit(1);
+        }
         e820_add_entry(0x100000000ULL, above_4g_mem_size, E820_RAM);
     }
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 480dfbf..33de50a 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -905,6 +905,21 @@ void memory_region_transaction_begin(void);
 void memory_region_transaction_commit(void);
 
 /**
+ * memory_region_set_mem_policy: Set memory policy
+ *
+ * Set the memory policy for the specified area.
+ *
+ * @mr: a MemoryRegion we are setting memory policy for
+ * @start: the start offset of the specific region in this MemoryRegion
+ * @length: the specific memory area length
+ * @offset: the start offset of the specific area in NUMA setting
+ */
+int memory_region_set_mem_policy(MemoryRegion *mr,
+                                 ram_addr_t start,
+                                 ram_addr_t length,
+                                 ram_addr_t offset);
+
+/**
  * memory_listener_register: register callbacks to be called when memory
  *                           sections are mapped or unmapped into an address
  *                           space
diff --git a/numa.c b/numa.c
index da4dbbd..43bba42 100644
--- a/numa.c
+++ b/numa.c
@@ -27,6 +27,16 @@
 #include "qapi-visit.h"
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
+#include "exec/memory.h"
+
+#ifdef __linux__
+#include <sys/syscall.h>
+#ifndef MPOL_F_RELATIVE_NODES
+#define MPOL_F_RELATIVE_NODES (1 << 14)
+#define MPOL_F_STATIC_NODES   (1 << 15)
+#endif
+#endif
+
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
     .implied_opt_name = "type",
@@ -228,6 +238,95 @@ void set_numa_nodes(void)
     }
 }
 
+#ifdef __linux__
+static int node_parse_bind_mode(unsigned int nodeid)
+{
+    int bind_mode;
+
+    switch (numa_info[nodeid].policy) {
+    case NUMA_NODE_POLICY_DEFAULT:
+    case NUMA_NODE_POLICY_PREFERRED:
+    case NUMA_NODE_POLICY_MEMBIND:
+    case NUMA_NODE_POLICY_INTERLEAVE:
+        bind_mode = numa_info[nodeid].policy;
+        break;
+    default:
+        bind_mode = NUMA_NODE_POLICY_DEFAULT;
+        return bind_mode;
+    }
+
+    bind_mode |= numa_info[nodeid].relative ?
+        MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
+
+    return bind_mode;
+}
+
+static int node_set_mem_policy(void *ram_ptr, ram_addr_t length, int nodeid)
+{
+    int bind_mode = node_parse_bind_mode(nodeid);
+    unsigned long *nodes = numa_info[nodeid].host_mem;
+
+    /* This is a workaround for a long standing bug in Linux'
+     * mbind implementation, which cuts off the last specified
+     * node. To stay compatible should this bug be fixed, we
+     * specify one more node and zero this one out.
+     */
+    unsigned long maxnode = find_last_bit(nodes, MAX_NODES);
+    if (syscall(SYS_mbind, ram_ptr, length, bind_mode,
+                nodes, maxnode + 2, 0)) {
+            perror("mbind");
+            return -1;
+    }
+
+    return 0;
+}
+#endif
+
+int memory_region_set_mem_policy(MemoryRegion *mr,
+                                 ram_addr_t start, ram_addr_t length,
+                                 ram_addr_t offset)
+{
+#ifdef __linux__
+    ram_addr_t len = 0;
+    int i;
+    for (i = 0; i < nb_numa_nodes; i++) {
+        len += numa_info[i].node_mem;
+        if (offset < len) {
+            break;
+        }
+    }
+    if (i == nb_numa_nodes) {
+        return -1;
+    }
+
+    void *ptr = memory_region_get_ram_ptr(mr);
+    for (; i < nb_numa_nodes; i++ ) {
+        if (offset + length <= len) {
+            if (node_set_mem_policy(ptr + start, length, i)) {
+                return -1;
+            }
+            break;
+        } else {
+            ram_addr_t tmp_len = len - offset;
+            offset += tmp_len;
+            length -= tmp_len;
+            if (node_set_mem_policy(ptr + start, tmp_len, i)) {
+                return -1;
+            }
+            start += tmp_len;
+        }
+
+        len += numa_info[i].node_mem;
+    }
+
+    if (i == nb_numa_nodes) {
+        return -1;
+    }
+#endif
+
+    return 0;
+}
+
 void set_numa_modes(void)
 {
     CPUState *cpu;
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 10/11] NUMA: add qmp command query-numa
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (8 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 09/11] NUMA: set " Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Add qmp command query-numa to show guest NUMA information.

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c           | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 36 +++++++++++++++++++++++++++++++
 qmp-commands.hx  | 49 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)

diff --git a/numa.c b/numa.c
index 43bba42..2954709 100644
--- a/numa.c
+++ b/numa.c
@@ -28,6 +28,7 @@
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
 #include "exec/memory.h"
+#include "qmp-commands.h"
 
 #ifdef __linux__
 #include <sys/syscall.h>
@@ -340,3 +341,68 @@ void set_numa_modes(void)
         }
     }
 }
+
+NUMANodeList *qmp_query_numa(Error **errp)
+{
+    NUMANodeList *head = NULL, *cur_item = NULL;
+    CPUState *cpu;
+    int i;
+
+    for (i = 0; i < nb_numa_nodes; i++) {
+        NUMANodeList *info;
+        uint16List *cur_cpu_item = NULL;
+        info = g_malloc0(sizeof(*info));
+        info->value = g_malloc0(sizeof(*info->value));
+        info->value->nodeid = i;
+        CPU_FOREACH(cpu) {
+            if (cpu->numa_node == i) {
+                uint16List *node_cpu = g_malloc0(sizeof(*node_cpu));
+                node_cpu->value = cpu->cpu_index;
+
+                if (!cur_cpu_item) {
+                    info->value->cpus = cur_cpu_item = node_cpu;
+                } else {
+                    cur_cpu_item->next = node_cpu;
+                    cur_cpu_item = node_cpu;
+                }
+            }
+        }
+        info->value->memory = numa_info[i].node_mem;
+
+#ifdef __linux__
+        info->value->policy = numa_info[i].policy;
+        info->value->relative = numa_info[i].relative;
+
+        unsigned long first, next;
+        next = first = find_first_bit(numa_info[i].host_mem, MAX_NODES);
+        if (first == MAX_NODES) {
+            goto end;
+        }
+        uint16List *cur_node_item = g_malloc0(sizeof(*cur_node_item));
+        cur_node_item->value = first;
+        info->value->host_nodes = cur_node_item;
+        do {
+            next = find_next_bit(numa_info[i].host_mem, MAX_NODES,
+                                 next + 1);
+            if (next == MAX_NODES) {
+                break;
+            }
+
+            uint16List *host_node = g_malloc0(sizeof(*host_node));
+            host_node->value = next;
+            cur_node_item->next = host_node;
+            cur_node_item = host_node;
+        } while (true);
+end:
+#endif
+
+        if (!cur_item) {
+            head = cur_item = info;
+        } else {
+            cur_item->next = info;
+            cur_item = info;
+        }
+    }
+
+    return head;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index c0dad81..af947e2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4289,3 +4289,39 @@
    '*policy':     'NumaNodePolicy',
    '*relative':   'bool',
    '*host-nodes': ['uint16'] }}
+
+##
+# @NUMANode:
+#
+# Information of guest NUMA node
+#
+# @nodeid: NUMA node ID
+#
+# @cpus: VCPUs contained in this node
+#
+# @memory: memory size of this node
+#
+# @policy: memory policy of this node
+#
+# @relative: if host nodes are relative for memory policy
+#
+# @host-nodes: host nodes for its memory policy
+#
+# Since: 1.7
+#
+##
+{ 'type': 'NUMANode',
+  'data': {'nodeid': 'uint16', 'cpus': ['uint16'], 'memory': 'uint64',
+           'policy': 'NumaNodePolicy', 'relative': 'bool',
+           'host-nodes': ['uint16'] }}
+
+##
+# @query-numa:
+#
+# Returns a list of information about each guest node.
+#
+# Returns: a list of @NUMANode for each guest node
+#
+# Since: 1.7
+##
+{ 'command': 'query-numa', 'returns': ['NUMANode'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index fba15cd..c2bc508 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3295,3 +3295,52 @@ Example (2):
 <- { "return": {} }
 
 EQMP
+
+    {
+        .name = "query-numa",
+        .args_type = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_numa,
+    },
+
+SQMP
+query-numa
+---------
+
+Show NUMA information.
+
+Return a json-array. Each NUMA node is represented by a json-object,
+which contains:
+
+- "nodeid": NUMA node ID (json-int)
+- "cpus": a json-arry of contained VCPUs
+- "memory": amount of memory in each node in Byte (json-int)
+- "policy": memory policy of this node (json-string)
+- "relative": if host nodes is relative for its memory policy (json-bool)
+- "host-nodes": a json-array of host nodes for its memory policy
+
+Arguments:
+
+Example:
+
+-> { "excute": "query-numa" }
+<- { "return":[
+        {
+            "nodeid": 0,
+            "cpus": [0, 1],
+            "memory": 536870912,
+            "policy": "membind",
+            "relative": false,
+            "host-nodes": [0, 1]
+        },
+        {
+            "nodeid": 1,
+            "cpus": [2, 3],
+            "memory": 536870912,
+            "policy": "interleave",
+            "relative": false,
+            "host-nodes": [1]
+        }
+     ]
+   }
+
+EQMP
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH V17 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (9 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 10/11] NUMA: add qmp command query-numa Wanlong Gao
@ 2013-12-04  7:58 ` Wanlong Gao
  2013-12-06  9:06 ` [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
  2013-12-06  9:06 ` Paolo Bonzini
  12 siblings, 0 replies; 27+ messages in thread
From: Wanlong Gao @ 2013-12-04  7:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, peter.huangpeng,
	lcapitulino, bsd, anthony, y-goto, pbonzini, afaerber, gaowanlong

Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hmp.c     | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hmp.h     |  1 +
 monitor.c | 21 +--------------------
 3 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/hmp.c b/hmp.c
index 32ee285..d6dedd2 100644
--- a/hmp.c
+++ b/hmp.c
@@ -24,6 +24,10 @@
 #include "ui/console.h"
 #include "block/qapi.h"
 #include "qemu-io.h"
+#include "qapi-visit.h"
+#include "qapi/opts-visitor.h"
+#include "qapi/dealloc-visitor.h"
+#include "sysemu/sysemu.h"
 
 static void hmp_handle_error(Monitor *mon, Error **errp)
 {
@@ -1564,3 +1568,56 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
     hmp_handle_error(mon, &err);
 }
+
+void hmp_info_numa(Monitor *mon, const QDict *qdict)
+{
+    NUMANodeList *node_list, *node;
+    uint16List *head;
+    int nodeid;
+    char *policy_str = NULL;
+
+    node_list = qmp_query_numa(NULL);
+
+    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
+    for (node = node_list; node; node = node->next) {
+        nodeid = node->value->nodeid;
+        monitor_printf(mon, "node %d cpus:", nodeid);
+        head = node->value->cpus;
+        for (head = node->value->cpus; head != NULL; head = head->next) {
+            monitor_printf(mon, " %d", (int)head->value);
+        }
+        monitor_printf(mon, "\n");
+        monitor_printf(mon, "node %d size: %" PRId64 " MB\n",
+                       nodeid, node->value->memory >> 20);
+        switch (node->value->policy) {
+        case NUMA_NODE_POLICY_DEFAULT:
+            policy_str = g_strdup("default");
+            break;
+        case NUMA_NODE_POLICY_PREFERRED:
+            policy_str = g_strdup("preferred");
+            break;
+        case NUMA_NODE_POLICY_MEMBIND:
+            policy_str = g_strdup("membind");
+            break;
+        case NUMA_NODE_POLICY_INTERLEAVE:
+            policy_str = g_strdup("interleave");
+            break;
+        default:
+            break;
+        }
+        monitor_printf(mon, "node %d policy: %s\n",
+                       nodeid, policy_str ? : " ");
+        if (policy_str) {
+            free(policy_str);
+        }
+        monitor_printf(mon, "node %d relative: %s\n", nodeid,
+                       node->value->relative ? "true" : "false");
+        monitor_printf(mon, "node %d host-nodes:", nodeid);
+        for (head = node->value->host_nodes; head != NULL; head = head->next) {
+            monitor_printf(mon, " %d", (int)head->value);
+        }
+        monitor_printf(mon, "\n");
+    }
+
+    qapi_free_NUMANodeList(node_list);
+}
diff --git a/hmp.h b/hmp.h
index 54cf71f..4f8d39b 100644
--- a/hmp.h
+++ b/hmp.h
@@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict);
 void hmp_info_pci(Monitor *mon, const QDict *qdict);
 void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
 void hmp_info_tpm(Monitor *mon, const QDict *qdict);
+void hmp_info_numa(Monitor *mon, const QDict *qdict);
 void hmp_quit(Monitor *mon, const QDict *qdict);
 void hmp_stop(Monitor *mon, const QDict *qdict);
 void hmp_system_reset(Monitor *mon, const QDict *qdict);
diff --git a/monitor.c b/monitor.c
index b97b7d3..f747a48 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1989,25 +1989,6 @@ static void do_info_mtree(Monitor *mon, const QDict *qdict)
     mtree_info((fprintf_function)monitor_printf, mon);
 }
 
-static void do_info_numa(Monitor *mon, const QDict *qdict)
-{
-    int i;
-    CPUState *cpu;
-
-    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
-    for (i = 0; i < nb_numa_nodes; i++) {
-        monitor_printf(mon, "node %d cpus:", i);
-        CPU_FOREACH(cpu) {
-            if (cpu->numa_node == i) {
-                monitor_printf(mon, " %d", cpu->cpu_index);
-            }
-        }
-        monitor_printf(mon, "\n");
-        monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            numa_info[i].node_mem >> 20);
-    }
-}
-
 #ifdef CONFIG_PROFILER
 
 int64_t qemu_time;
@@ -2775,7 +2756,7 @@ static mon_cmd_t info_cmds[] = {
         .args_type  = "",
         .params     = "",
         .help       = "show NUMA information",
-        .mhandler.cmd = do_info_numa,
+        .mhandler.cmd = hmp_info_numa,
     },
     {
         .name       = "usb",
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (10 preceding siblings ...)
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
@ 2013-12-06  9:06 ` Paolo Bonzini
  2013-12-06  9:31   ` Wanlong Gao
  2013-12-06  9:06 ` Paolo Bonzini
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-12-06  9:06 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: drjones, ehabkost, lersek, mtosatti, qemu-devel, lcapitulino, bsd,
	anthony, hutao, y-goto, peter.huangpeng, afaerber

Il 04/12/2013 08:58, Wanlong Gao ha scritto:
> As you know, QEMU can't direct it's memory allocation now, this may cause
> guest cross node access performance regression.
> And, the worse thing is that if PCI-passthrough is used,
> direct-attached-device uses DMA transfer between device and qemu process.
> All pages of the guest will be pinned by get_user_pages().
> 
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
>     =>kvm_assign_device()
>       => kvm_iommu_map_memslots()
>         => kvm_iommu_map_pages()
>            => kvm_pin_pages()
> 
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too.
> 
> So, we should set the guest nodes memory allocation policy before
> the pages are really mapped.
> 
> According to this patch set, we are able to set guest nodes memory policy
> like following:
> 
>  -numa node,nodeid=0,cpus=0, \
>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>  -numa node,nodeid=1,cpus=1 \
>  -numa mem,size=1024M,policy=interleave,host-nodes=1
> 
> This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format.
> 
> And add a QMP command "query-numa" to show numa info through
> this API.
> 
> And convert the "info numa" monitor command to use this
> QMP command "query-numa".
> 
> This version removes "set-mem-policy" qmp and hmp commands temporarily
> as Marcelo and Paolo suggested.
> 
> 
> The simple test is like following:
> =====================================================
> Before:
> # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096  -smp 2 -numa node,nodeid=0,cpus=0,mem=2048 -numa node,nodeid=1,cpus=1,mem=2048 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H
> [1] 13320
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 4653 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 4764 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 4317 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 876 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> 
> 
> 
> After:
> # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 4 -numa node,nodeid=0,cpus=0,cpus=2 -numa mem,size=2048M,policy=membind,host-nodes=0 -numa node,nodeid=0,cpus=1,cpus=3 -numa mem,size=2048M,policy=membind,host-nodes=1 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H
> [1] 10862
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 4718 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 4799 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 2544 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 2725 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> ===================================================
> 
> 
> V1->V2:
>     change to use QemuOpts in numa options (Paolo)
>     handle Error in mpol parser (Paolo)
>     change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
> V2->V3:
>     also handle Error in cpus parser (5/10)
>     split out common parser from cpus and hostnode parser (Bandan 6/10)
> V3-V4:
>     rebase to request for comments
> V4->V5:
>     use OptVisitor and split -numa option (Paolo)
>      - s/set-mpol/set-mem-policy (Andreas)
>      - s/mem-policy/policy
>      - s/mem-hostnode/host-nodes
>     fix hmp command process after error (Luiz)
>     add qmp command query-numa and convert info numa to it (Luiz)
> V5->V6:
>     remove tabs in json file (Laszlo, Paolo)
>     add back "-numa node,mem=xxx" as legacy (Paolo)
>     change cpus and host-nodes to array (Laszlo, Eric)
>     change "nodeid" to "uint16"
>     add NumaMemPolicy enum type (Eric)
>     rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for repeating options" patch set, thanks for Laszlo's help
> V6-V7:
>     change UInt16 to uint16 (Laszlo)
>     fix a typo in adding qmp command set-mem-policy
> V7-V8:
>     rebase to current master with Laszlo's V2 of OptsVisitor patch set
>     fix an adding white space line error
> V8->V9:
>     rebase to current master
>     check if total numa memory size is equal to ram_size (Paolo)
>     add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo)
>     replace the use of numa_num_configured_nodes() (Andrew)
>     avoid abusing the fact i==nodeid (Andrew)
> V9->V10:
>     rebase to current master
>     remove libnuma (Andrew)
>     MAX_NODES=64 -> MAX_NODES=128 since libnuma selected 128 (Andrew)
>     use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew)
>     remove a useless clear_bit() operation (Andrew)
> V10->V11:
>     rebase to current master
>     fix "maxnode" argument of mbind(2)
> V11->V12:
>     rebase to current master
>     split patch 02/11 of V11 (Eduardo)
>     add some max value check (Eduardo)
>     split MAX_NODES change patch (Eduardo)
> V12->V13:
>     rebase to current master
>     thanks for Luiz's review (Luiz)
>     doc hmp command set-mem-policy (Luiz)
>     rename: NUMAInfo -> NUMANode (Luiz)
> V13->V14:
>     remove "set-mem-policy" qmp and hmp commands (Marcelo, Paolo)
> V14->V15:
>     rebase to the current master
> V15->V16:
>     rebase to current master
>     add more test log
> V16->V17:
>     use MemoryRegion to set policy instead of using "pc.ram" (Paolo)
> 
> Wanlong Gao (11):
>   NUMA: move numa related code to new file numa.c
>   NUMA: check if the total numa memory size is equal to ram_size
>   NUMA: Add numa_info structure to contain numa nodes info
>   NUMA: convert -numa option to use OptsVisitor
>   NUMA: introduce NumaMemOptions
>   NUMA: add "-numa mem," options
>   NUMA: expand MAX_NODES from 64 to 128
>   NUMA: parse guest numa nodes memory policy
>   NUMA: set guest numa nodes memory policy
>   NUMA: add qmp command query-numa
>   NUMA: convert hmp command info_numa to use qmp command query_numa
> 
>  Makefile.target         |   2 +-
>  cpus.c                  |  14 --
>  hmp.c                   |  57 +++++++
>  hmp.h                   |   1 +
>  hw/i386/pc.c            |  21 ++-
>  include/exec/memory.h   |  15 ++
>  include/sysemu/cpus.h   |   1 -
>  include/sysemu/sysemu.h |  18 ++-
>  monitor.c               |  21 +--
>  numa.c                  | 408 ++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json        | 112 +++++++++++++
>  qemu-options.hx         |   6 +-
>  qmp-commands.hx         |  49 ++++++
>  vl.c                    | 160 +++----------------
>  14 files changed, 698 insertions(+), 187 deletions(-)
>  create mode 100644 numa.c
> 

I think patches 1-4 and 7 are fine.  For the rest, I'd rather wait for
Igor's patches and try to integrate with Igor's memory hotplug patches.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (11 preceding siblings ...)
  2013-12-06  9:06 ` [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
@ 2013-12-06  9:06 ` Paolo Bonzini
  2013-12-06 18:49   ` Marcelo Tosatti
  12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-12-06  9:06 UTC (permalink / raw)
  To: Wanlong Gao, mtosatti
  Cc: drjones, ehabkost, lersek, qemu-devel, lcapitulino, bsd, anthony,
	hutao, y-goto, peter.huangpeng, afaerber

Il 04/12/2013 08:58, Wanlong Gao ha scritto:
> According to this patch set, we are able to set guest nodes memory policy
> like following:
> 
>  -numa node,nodeid=0,cpus=0, \
>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>  -numa node,nodeid=1,cpus=1 \
>  -numa mem,size=1024M,policy=interleave,host-nodes=1
> 
> This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format.
> 
> And add a QMP command "query-numa" to show numa info through
> this API.

Marcelo, I'm afraid that these NUMA settings complicate getting properly
aligned pages.  If you do something like this:

  -numa node,nodeid=0,cpus=0, \
  -numa mem,size=4096M,policy=membind,host-nodes=0 \
  -numa node,nodeid=1,cpus=1 \
  -numa mem,size=4096M,policy=membind,host-nodes=1

You'll have with your patches (without them it's worse of course):

   RAM offset    physical address   node 0
   0-3840M       0-3840M            host node 0
   4096M-4352M   4096M-4352M        host node 0
   4352M-8192M   4352M-8192M        host node 1
   3840M-4096M   8192M-8448M        host node 1

So only 0-3G and 5-8G are aligned, 3G-5G and 8G-8.25G cannot use
gigabyte pages because they are split across host nodes.

So rather than your patches, it seems simpler to just widen the PCI hole
to 1G for i440FX and 2G for q35.

What do you think?

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-06  9:06 ` [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
@ 2013-12-06  9:31   ` Wanlong Gao
  2013-12-06  9:48     ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Wanlong Gao @ 2013-12-06  9:31 UTC (permalink / raw)
  To: Paolo Bonzini, Igor Mammedov
  Cc: drjones, ehabkost, lersek, mtosatti, qemu-devel, lcapitulino, bsd,
	anthony, hutao, y-goto, peter.huangpeng, afaerber, Wanlong Gao

On 12/06/2013 05:06 PM, Paolo Bonzini wrote:
> I think patches 1-4 and 7 are fine.  For the rest, I'd rather wait for
> Igor's patches and try to integrate with Igor's memory hotplug patches.

So, how about apply them first and then I can help Igor to rebase my
remaining patches for him?

Thanks,
Wanlong Gao

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-06  9:31   ` Wanlong Gao
@ 2013-12-06  9:48     ` Paolo Bonzini
  2013-12-09 18:16       ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-12-06  9:48 UTC (permalink / raw)
  To: gaowanlong
  Cc: drjones, ehabkost, lersek, hutao, mtosatti, qemu-devel,
	lcapitulino, bsd, anthony, y-goto, Igor Mammedov, peter.huangpeng,
	afaerber

Il 06/12/2013 10:31, Wanlong Gao ha scritto:
>> > I think patches 1-4 and 7 are fine.  For the rest, I'd rather wait for
>> > Igor's patches and try to integrate with Igor's memory hotplug patches.
> So, how about apply them first and then I can help Igor to rebase my
> remaining patches for him?

Yes.  We just need to find someone who sends a pull request.  Eduardo,
Andreas, Luiz, any takers?  Otherwise I can do that too.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-06  9:06 ` Paolo Bonzini
@ 2013-12-06 18:49   ` Marcelo Tosatti
  2013-12-09 17:33     ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2013-12-06 18:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: drjones, ehabkost, lersek, qemu-devel, lcapitulino, bsd, anthony,
	hutao, y-goto, peter.huangpeng, afaerber, Wanlong Gao

On Fri, Dec 06, 2013 at 10:06:37AM +0100, Paolo Bonzini wrote:
> Il 04/12/2013 08:58, Wanlong Gao ha scritto:
> > According to this patch set, we are able to set guest nodes memory policy
> > like following:
> > 
> >  -numa node,nodeid=0,cpus=0, \
> >  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
> >  -numa node,nodeid=1,cpus=1 \
> >  -numa mem,size=1024M,policy=interleave,host-nodes=1
> > 
> > This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format.
> > 
> > And add a QMP command "query-numa" to show numa info through
> > this API.
> 
> Marcelo, I'm afraid that these NUMA settings complicate getting properly
> aligned pages.  If you do something like this:
> 
>   -numa node,nodeid=0,cpus=0, \
>   -numa mem,size=4096M,policy=membind,host-nodes=0 \
>   -numa node,nodeid=1,cpus=1 \
>   -numa mem,size=4096M,policy=membind,host-nodes=1
> 
> You'll have with your patches (without them it's worse of course):
> 
>    RAM offset    physical address   node 0
>    0-3840M       0-3840M            host node 0
>    4096M-4352M   4096M-4352M        host node 0
>    4352M-8192M   4352M-8192M        host node 1
>    3840M-4096M   8192M-8448M        host node 1
> 
> So only 0-3G and 5-8G are aligned, 3G-5G and 8G-8.25G cannot use
> gigabyte pages because they are split across host nodes.

AFAIK the TLB caches virt->phys translations, why specifics of 
a given phys address is a factor into TLB caching?

> So rather than your patches, it seems simpler to just widen the PCI hole
> to 1G for i440FX and 2G for q35.
> 
> What do you think?

Problem is its a guest visible change. To get 1GB TLB entries with
"legacy guest visible machine types" (which require new machine types
at the host side, but invisible to guest), that won't work.
Windows registration invalidation etc.

But for q35, sure.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-06 18:49   ` Marcelo Tosatti
@ 2013-12-09 17:33     ` Paolo Bonzini
  2013-12-09 18:10       ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-12-09 17:33 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: drjones, ehabkost, lersek, qemu-devel, lcapitulino, bsd, anthony,
	hutao, y-goto, peter.huangpeng, afaerber, Wanlong Gao

Il 06/12/2013 19:49, Marcelo Tosatti ha scritto:
>> > You'll have with your patches (without them it's worse of course):
>> > 
>> >    RAM offset    physical address   node 0
>> >    0-3840M       0-3840M            host node 0
>> >    4096M-4352M   4096M-4352M        host node 0
>> >    4352M-8192M   4352M-8192M        host node 1
>> >    3840M-4096M   8192M-8448M        host node 1
>> > 
>> > So only 0-3G and 5-8G are aligned, 3G-5G and 8G-8.25G cannot use
>> > gigabyte pages because they are split across host nodes.
> AFAIK the TLB caches virt->phys translations, why specifics of 
> a given phys address is a factor into TLB caching?

The problem is that "-numa mem" receives memory sizes and these do not
take into account the hole below 4G.

Thus, two adjacent host-physical addresses (two adjacent ram_addr_t-s)
map to very far guest-physical addresses, are assigned to different
guest nodes, and from there to different host nodes.  In the above
example this happens for 3G-5G.

On second thought, this is not particularly important, or at least not
yet.  It's not really possible to control the NUMA policy for
hugetlbfs-allocated memory, right?

>> > So rather than your patches, it seems simpler to just widen the PCI hole
>> > to 1G for i440FX and 2G for q35.
>> > 
>> > What do you think?
> 
> Problem is its a guest visible change. To get 1GB TLB entries with
> "legacy guest visible machine types" (which require new machine types
> at the host side, but invisible to guest), that won't work.
> Windows registration invalidation etc.

Yeah, that's a tradeoff to make.

Paolo

> But for q35, sure.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-09 17:33     ` Paolo Bonzini
@ 2013-12-09 18:10       ` Marcelo Tosatti
  2013-12-09 18:26         ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2013-12-09 18:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: drjones, ehabkost, lersek, qemu-devel, lcapitulino, bsd, anthony,
	hutao, y-goto, peter.huangpeng, afaerber, Wanlong Gao

On Mon, Dec 09, 2013 at 06:33:41PM +0100, Paolo Bonzini wrote:
> Il 06/12/2013 19:49, Marcelo Tosatti ha scritto:
> >> > You'll have with your patches (without them it's worse of course):
> >> > 
> >> >    RAM offset    physical address   node 0
> >> >    0-3840M       0-3840M            host node 0
> >> >    4096M-4352M   4096M-4352M        host node 0
> >> >    4352M-8192M   4352M-8192M        host node 1
> >> >    3840M-4096M   8192M-8448M        host node 1
> >> > 
> >> > So only 0-3G and 5-8G are aligned, 3G-5G and 8G-8.25G cannot use
> >> > gigabyte pages because they are split across host nodes.
> > AFAIK the TLB caches virt->phys translations, why specifics of 
> > a given phys address is a factor into TLB caching?
> 
> The problem is that "-numa mem" receives memory sizes and these do not
> take into account the hole below 4G.
> 
> Thus, two adjacent host-physical addresses (two adjacent ram_addr_t-s)
> map to very far guest-physical addresses, are assigned to different
> guest nodes, and from there to different host nodes.  In the above
> example this happens for 3G-5G.

Physical address which is what the TLB uses does not take node
information into account.

> On second thought, this is not particularly important, or at least not
> yet.  It's not really possible to control the NUMA policy for
> hugetlbfs-allocated memory, right?

It is possible. I don't know what happens if conflicting NUMA policies
are specified for different virtual address ranges that map to a single
huge page.

In whatever way that is resolved by the kernel, it is not relevant since the TLB
caches phys->virt translations and not {phys, node info}->virt
translations.

> >> > So rather than your patches, it seems simpler to just widen the PCI hole
> >> > to 1G for i440FX and 2G for q35.
> >> > 
> >> > What do you think?
> > 
> > Problem is its a guest visible change. To get 1GB TLB entries with
> > "legacy guest visible machine types" (which require new machine types
> > at the host side, but invisible to guest), that won't work.
> > Windows registration invalidation etc.
> 
> Yeah, that's a tradeoff to make.

Perhaps increasing the PCI hole size should be done for other reasons?
Note that dropping the 1GB alignment piix.c patch requires the hole size
+ start to be 1G aligned.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-06  9:48     ` Paolo Bonzini
@ 2013-12-09 18:16       ` Eduardo Habkost
  2013-12-09 18:26         ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2013-12-09 18:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: drjones, hutao, mtosatti, qemu-devel, lcapitulino, bsd, anthony,
	Igor Mammedov, y-goto, peter.huangpeng, lersek, afaerber,
	gaowanlong

On Fri, Dec 06, 2013 at 10:48:04AM +0100, Paolo Bonzini wrote:
> Il 06/12/2013 10:31, Wanlong Gao ha scritto:
> >> > I think patches 1-4 and 7 are fine.  For the rest, I'd rather wait for
> >> > Igor's patches and try to integrate with Igor's memory hotplug patches.
> > So, how about apply them first and then I can help Igor to rebase my
> > remaining patches for him?
> 
> Yes.  We just need to find someone who sends a pull request.  Eduardo,
> Andreas, Luiz, any takers?  Otherwise I can do that too.

As no other maintainer replied, would you take them, Paolo?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-09 18:10       ` Marcelo Tosatti
@ 2013-12-09 18:26         ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2013-12-09 18:26 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: drjones, ehabkost, lersek, qemu-devel, lcapitulino, bsd, anthony,
	hutao, y-goto, peter.huangpeng, afaerber, Wanlong Gao

Il 09/12/2013 19:10, Marcelo Tosatti ha scritto:
> On Mon, Dec 09, 2013 at 06:33:41PM +0100, Paolo Bonzini wrote:
>> Il 06/12/2013 19:49, Marcelo Tosatti ha scritto:
>>>>> You'll have with your patches (without them it's worse of course):
>>>>>
>>>>>    RAM offset    physical address   node 0
>>>>>    0-3840M       0-3840M            host node 0
>>>>>    4096M-4352M   4096M-4352M        host node 0
>>>>>    4352M-8192M   4352M-8192M        host node 1
>>>>>    3840M-4096M   8192M-8448M        host node 1
>>>>>
>>>>> So only 0-3G and 5-8G are aligned, 3G-5G and 8G-8.25G cannot use
>>>>> gigabyte pages because they are split across host nodes.
>>> AFAIK the TLB caches virt->phys translations, why specifics of 
>>> a given phys address is a factor into TLB caching?
>>
>> The problem is that "-numa mem" receives memory sizes and these do not
>> take into account the hole below 4G.
>>
>> Thus, two adjacent host-physical addresses (two adjacent ram_addr_t-s)
>> map to very far guest-physical addresses, are assigned to different
>> guest nodes, and from there to different host nodes.  In the above
>> example this happens for 3G-5G.
> 
> Physical address which is what the TLB uses does not take node
> information into account.

Indeed.  What I should have written is "two adjacent host-virtual
addresses".

>> On second thought, this is not particularly important, or at least not
>> yet.  It's not really possible to control the NUMA policy for
>> hugetlbfs-allocated memory, right?
> 
> It is possible. I don't know what happens if conflicting NUMA policies
> are specified for different virtual address ranges that map to a single
> huge page.

So what will happen is that 3G-5G will use GB pages but it will not be
on the requested node.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-12-09 18:16       ` Eduardo Habkost
@ 2013-12-09 18:26         ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2013-12-09 18:26 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: drjones, hutao, mtosatti, qemu-devel, lcapitulino, bsd, anthony,
	Igor Mammedov, y-goto, peter.huangpeng, lersek, afaerber,
	gaowanlong

Il 09/12/2013 19:16, Eduardo Habkost ha scritto:
>> > Il 06/12/2013 10:31, Wanlong Gao ha scritto:
>>>>> > >> > I think patches 1-4 and 7 are fine.  For the rest, I'd rather wait for
>>>>> > >> > Igor's patches and try to integrate with Igor's memory hotplug patches.
>>> > > So, how about apply them first and then I can help Igor to rebase my
>>> > > remaining patches for him?
>> > 
>> > Yes.  We just need to find someone who sends a pull request.  Eduardo,
>> > Andreas, Luiz, any takers?  Otherwise I can do that too.
> As no other maintainer replied, would you take them, Paolo?

Sure, but I'd like a second review.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c Wanlong Gao
@ 2013-12-10 13:06   ` Eduardo Habkost
  0 siblings, 0 replies; 27+ messages in thread
From: Eduardo Habkost @ 2013-12-10 13:06 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: drjones, hutao, mtosatti, qemu-devel, peter.huangpeng, bsd,
	anthony, pbonzini, y-goto, lcapitulino, lersek, afaerber

On Wed, Dec 04, 2013 at 03:58:49PM +0800, Wanlong Gao wrote:
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size
  2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size Wanlong Gao
@ 2013-12-10 13:15   ` Eduardo Habkost
  2013-12-10 18:03     ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2013-12-10 13:15 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: drjones, jyang, libvir-list, hutao, mtosatti, qemu-devel,
	peter.huangpeng, bsd, anthony, pbonzini, mkletzan, y-goto,
	lcapitulino, lersek, afaerber

CCing libvir-list.

On Wed, Dec 04, 2013 at 03:58:50PM +0800, Wanlong Gao wrote:
> If the total number of the assigned numa nodes memory is not
> equal to the assigned ram size, it will write the wrong data
> to ACPI talb, then the guest will ignore the wrong ACPI table
> and recognize all memory to one node. It's buggy, we should
> check it to ensure that we write the right data to ACPI table.
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>

This will make configurations that could be running for years (except
that the guest OS was ignoring the NUMA data) suddenly stop running. I
just want to confirm: we really want that, right?

Does libvirt allow this kind of broken configuration to be generated, or
it already ensures the total NUMA node sizes match RAM size?


> ---
>  numa.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/numa.c b/numa.c
> index ce7736a..beda80e 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -150,6 +150,16 @@ void set_numa_nodes(void)
>              node_mem[i] = ram_size - usedmem;
>          }
>  
> +        uint64_t numa_total = 0;
> +        for (i = 0; i < nb_numa_nodes; i++) {
> +            numa_total += node_mem[i];
> +        }
> +        if (numa_total != ram_size) {
> +            fprintf(stderr, "qemu: numa nodes total memory size "
> +                            "should equal to ram_size\n");
> +            exit(1);
> +        }
> +
>          for (i = 0; i < nb_numa_nodes; i++) {
>              if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
>                  break;
> -- 
> 1.8.5
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size
  2013-12-10 13:15   ` Eduardo Habkost
@ 2013-12-10 18:03     ` Paolo Bonzini
  2013-12-10 19:01       ` Eduardo Habkost
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2013-12-10 18:03 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: drjones, jyang, libvir-list, hutao, mtosatti, qemu-devel,
	peter.huangpeng, bsd, anthony, mkletzan, y-goto, lcapitulino,
	lersek, afaerber, Wanlong Gao

Il 10/12/2013 14:15, Eduardo Habkost ha scritto:
>> > If the total number of the assigned numa nodes memory is not
>> > equal to the assigned ram size, it will write the wrong data
>> > to ACPI talb, then the guest will ignore the wrong ACPI table
>> > and recognize all memory to one node. It's buggy, we should
>> > check it to ensure that we write the right data to ACPI table.
>> > 
>> > Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> This will make configurations that could be running for years (except
> that the guest OS was ignoring the NUMA data) suddenly stop running. I
> just want to confirm: we really want that, right?
> 
> Does libvirt allow this kind of broken configuration to be generated, or
> it already ensures the total NUMA node sizes match RAM size?

It allows this.  It just converts the <numa> XML to "-numa node".

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size
  2013-12-10 18:03     ` Paolo Bonzini
@ 2013-12-10 19:01       ` Eduardo Habkost
  2013-12-11 12:26         ` Daniel P. Berrange
  0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Habkost @ 2013-12-10 19:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: drjones, jyang, libvir-list, hutao, mtosatti, qemu-devel,
	peter.huangpeng, bsd, anthony, mkletzan, y-goto, lcapitulino,
	lersek, afaerber, Wanlong Gao

On Tue, Dec 10, 2013 at 07:03:50PM +0100, Paolo Bonzini wrote:
> Il 10/12/2013 14:15, Eduardo Habkost ha scritto:
> >> > If the total number of the assigned numa nodes memory is not
> >> > equal to the assigned ram size, it will write the wrong data
> >> > to ACPI talb, then the guest will ignore the wrong ACPI table
> >> > and recognize all memory to one node. It's buggy, we should
> >> > check it to ensure that we write the right data to ACPI table.
> >> > 
> >> > Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > This will make configurations that could be running for years (except
> > that the guest OS was ignoring the NUMA data) suddenly stop running. I
> > just want to confirm: we really want that, right?
> > 
> > Does libvirt allow this kind of broken configuration to be generated, or
> > it already ensures the total NUMA node sizes match RAM size?
> 
> It allows this.  It just converts the <numa> XML to "-numa node".

In that case, if we apply this patch we may want to make libvirt
validate the NUMA configuration instead of getting a cryptic "QEMU
aborted" error message with the actual problem buried in a log file.

(Well, even if we do not apply this patch, I believe it is a good idea to
make libvirt validate the NUMA configuration.)

-- 
Eduardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size
  2013-12-10 19:01       ` Eduardo Habkost
@ 2013-12-11 12:26         ` Daniel P. Berrange
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel P. Berrange @ 2013-12-11 12:26 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: drjones, anthony, libvir-list, hutao, mtosatti, qemu-devel,
	peter.huangpeng, bsd, jyang, y-goto, mkletzan, Paolo Bonzini,
	lcapitulino, lersek, afaerber, Wanlong Gao

On Tue, Dec 10, 2013 at 05:01:02PM -0200, Eduardo Habkost wrote:
> On Tue, Dec 10, 2013 at 07:03:50PM +0100, Paolo Bonzini wrote:
> > Il 10/12/2013 14:15, Eduardo Habkost ha scritto:
> > >> > If the total number of the assigned numa nodes memory is not
> > >> > equal to the assigned ram size, it will write the wrong data
> > >> > to ACPI talb, then the guest will ignore the wrong ACPI table
> > >> > and recognize all memory to one node. It's buggy, we should
> > >> > check it to ensure that we write the right data to ACPI table.
> > >> > 
> > >> > Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > > This will make configurations that could be running for years (except
> > > that the guest OS was ignoring the NUMA data) suddenly stop running. I
> > > just want to confirm: we really want that, right?
> > > 
> > > Does libvirt allow this kind of broken configuration to be generated, or
> > > it already ensures the total NUMA node sizes match RAM size?
> > 
> > It allows this.  It just converts the <numa> XML to "-numa node".
> 
> In that case, if we apply this patch we may want to make libvirt
> validate the NUMA configuration instead of getting a cryptic "QEMU
> aborted" error message with the actual problem buried in a log file.
> 
> (Well, even if we do not apply this patch, I believe it is a good idea to
> make libvirt validate the NUMA configuration.)

Yes, libvirt really ought to validate this, since such inconsistency is
a bogus configuration. It would be desirable for libvirt to reject it
completely as an error, but we should check if there any common apps
which are (accidentally) relying on such broken configs already.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-12-11 12:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c Wanlong Gao
2013-12-10 13:06   ` Eduardo Habkost
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size Wanlong Gao
2013-12-10 13:15   ` Eduardo Habkost
2013-12-10 18:03     ` Paolo Bonzini
2013-12-10 19:01       ` Eduardo Habkost
2013-12-11 12:26         ` Daniel P. Berrange
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 03/11] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 04/11] NUMA: convert -numa option to use OptsVisitor Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 05/11] NUMA: introduce NumaMemOptions Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 06/11] NUMA: add "-numa mem," options Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 07/11] NUMA: expand MAX_NODES from 64 to 128 Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 08/11] NUMA: parse guest numa nodes memory policy Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 09/11] NUMA: set " Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 10/11] NUMA: add qmp command query-numa Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
2013-12-06  9:06 ` [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
2013-12-06  9:31   ` Wanlong Gao
2013-12-06  9:48     ` Paolo Bonzini
2013-12-09 18:16       ` Eduardo Habkost
2013-12-09 18:26         ` Paolo Bonzini
2013-12-06  9:06 ` Paolo Bonzini
2013-12-06 18:49   ` Marcelo Tosatti
2013-12-09 17:33     ` Paolo Bonzini
2013-12-09 18:10       ` Marcelo Tosatti
2013-12-09 18:26         ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).