qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
@ 2013-08-20  1:07 Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
                   ` (12 more replies)
  0 siblings, 13 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

As you know, QEMU can't direct it's memory allocation now, this may cause
guest cross node access performance regression.
And, the worse thing is that if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policy before
the pages are really mapped.

According to this patch set, we are able to set guest nodes memory policy
like following:

 -numa node,nodeid=0,cpus=0, \
 -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
 -numa node,nodeid=1,cpus=1 \
 -numa mem,size=1024M,policy=interleave,host-nodes=1

This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format.


Also add "set-mem-policy" QMP and hmp command to set memory policy.

And patch 10/11 adds a QMP command "query-numa" to show numa info through
this API.

And patch 11/11 converts the "info numa" monitor command to use this
QMP command "query-numa".


V1->V2:
    change to use QemuOpts in numa options (Paolo)
    handle Error in mpol parser (Paolo)
    change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
V2->V3:
    also handle Error in cpus parser (5/10)
    split out common parser from cpus and hostnode parser (Bandan 6/10)
V3-V4:
    rebase to request for comments
V4->V5:
    use OptVisitor and split -numa option (Paolo)
     - s/set-mpol/set-mem-policy (Andreas)
     - s/mem-policy/policy
     - s/mem-hostnode/host-nodes
    fix hmp command process after error (Luiz)
    add qmp command query-numa and convert info numa to it (Luiz)
V5->V6:
    remove tabs in json file (Laszlo, Paolo)
    add back "-numa node,mem=xxx" as legacy (Paolo)
    change cpus and host-nodes to array (Laszlo, Eric)
    change "nodeid" to "uint16"
    add NumaMemPolicy enum type (Eric)
    rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for repeating options" patch set, thanks for Laszlo's help
V6-V7:
    change UInt16 to uint16 (Laszlo)
    fix a typo in adding qmp command set-mem-policy
V7-V8:
    rebase to current master with Laszlo's V2 of OptsVisitor patch set
    fix an adding white space line error


Wanlong Gao (11):
  NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  NUMA: split -numa option
  NUMA: move numa related code to numa.c
  NUMA: Add numa_info structure to contain numa nodes info
  NUMA: Add Linux libnuma detection
  NUMA: parse guest numa nodes memory policy
  NUMA: set guest numa nodes memory policy
  NUMA: add qmp command set-mem-policy to set memory policy for NUMA
    node
  NUMA: add hmp command set-mem-policy
  NUMA: add qmp command query-numa
  NUMA: convert hmp command info_numa to use qmp command query_numa

 Makefile.target         |   2 +-
 configure               |  32 ++++
 cpus.c                  |  14 --
 hmp-commands.hx         |  16 ++
 hmp.c                   | 119 +++++++++++++
 hmp.h                   |   2 +
 hw/i386/pc.c            |   4 +-
 include/sysemu/cpus.h   |   1 -
 include/sysemu/sysemu.h |  16 +-
 monitor.c               |  21 +--
 numa.c                  | 444 ++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json        | 142 ++++++++++++++++
 qemu-options.hx         |   6 +-
 qmp-commands.hx         |  90 ++++++++++
 vl.c                    | 160 ++---------------
 15 files changed, 885 insertions(+), 184 deletions(-)
 create mode 100644 numa.c

-- 
1.8.3.3.754.g9c3c367

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-21 20:59   ` Eric Blake
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 02/11] NUMA: split -numa option Wanlong Gao
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 qapi-schema.json | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index a51f7d2..b9b18e4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3773,3 +3773,50 @@
 ##
 { 'command': 'query-rx-filter', 'data': { '*name': 'str' },
   'returns': ['RxFilterInfo'] }
+
+##
+# @NumaOptions
+#
+# A discriminated record of NUMA options.
+#
+# Since 1.7
+##
+{ 'union': 'NumaOptions',
+  'data': {
+    'node': 'NumaNodeOptions',
+    'mem':  'NumaMemOptions' }}
+
+##
+# @NumaNodeOptions
+#
+# Create a guest NUMA node.
+#
+# @nodeid: #optional NUMA node ID
+#
+# @cpus: #optional VCPUs belong to this node
+#
+# @mem: #optional memory size of this node (remain as legacy)
+#
+# Since: 1.7
+##
+{ 'type': 'NumaNodeOptions',
+  'data': {
+   '*nodeid': 'uint16',
+   '*cpus':   ['uint16'],
+   '*mem':    'str' }}
+
+##
+# @NumaMemOptions
+#
+# Set memory information of guest NUMA node.
+#
+# @nodeid: #optional NUMA node ID
+#
+# @size: #optional memory size of this node
+#
+# Since 1.7
+##
+{ 'type': 'NumaMemOptions',
+  'data': {
+   '*nodeid': 'uint16',
+   '*size':   'size' }}
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 02/11] NUMA: split -numa option
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 03/11] NUMA: move numa related code to numa.c Wanlong Gao
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Change -numa option like following as Paolo suggested:
    -numa node,nodeid=0,cpus=0-1 \
    -numa mem,nodeid=0,size=1G

This new option will make later coming memory hotplug better.
And this new option is implemented using OptsVisitor.

And just remain "-numa node,mem=xx" as legacy.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 Makefile.target         |   2 +-
 include/sysemu/sysemu.h |   3 +
 numa.c                  | 144 ++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx         |   6 +-
 vl.c                    | 113 ++++++-------------------------------
 5 files changed, 168 insertions(+), 100 deletions(-)
 create mode 100644 numa.c

diff --git a/Makefile.target b/Makefile.target
index 9a49852..7e1fddf 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -113,7 +113,7 @@ endif #CONFIG_BSD_USER
 #########################################################
 # System emulator target
 ifdef CONFIG_SOFTMMU
-obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o
+obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
 obj-y += qtest.o
 obj-y += hw/
 obj-$(CONFIG_FDT) += device_tree.o
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d7a77b6..474dd9e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -129,8 +129,11 @@ extern QEMUClock *rtc_clock;
 #define MAX_NODES 64
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
+extern int nb_numa_mem_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern unsigned long *node_cpumask[MAX_NODES];
+extern QemuOptsList qemu_numa_opts;
+int numa_init_func(QemuOpts *opts, void *opaque);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
new file mode 100644
index 0000000..e6924f4
--- /dev/null
+++ b/numa.c
@@ -0,0 +1,144 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2013 Fujitsu Ltd.
+ * Author: Wanlong Gao <gaowanlong@cn.fujitsu.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "sysemu/sysemu.h"
+#include "qemu/bitmap.h"
+#include "qapi-visit.h"
+#include "qapi/opts-visitor.h"
+#include "qapi/dealloc-visitor.h"
+
+QemuOptsList qemu_numa_opts = {
+    .name = "numa",
+    .implied_opt_name = "type",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head),
+    .desc = { { 0 } } /* validated with OptsVisitor */
+};
+
+static int numa_node_parse(NumaNodeOptions *opts)
+{
+    uint16_t nodenr;
+    uint16List *cpus = NULL;
+
+    if (opts->has_nodeid) {
+        nodenr = opts->nodeid;
+        if (nodenr >= MAX_NODES) {
+            fprintf(stderr, "qemu: Max number of NUMA nodes reached: %"
+                    PRIu16 "\n", nodenr);
+            return -1;
+        }
+    } else {
+        nodenr = nb_numa_nodes;
+    }
+
+    for (cpus = opts->cpus; cpus; cpus = cpus->next) {
+        bitmap_set(node_cpumask[nodenr], cpus->value, 1);
+    }
+
+    if (opts->has_mem) {
+        int64_t mem_size;
+        char *endptr;
+        mem_size = strtosz(opts->mem, &endptr);
+        if (mem_size < 0 || *endptr) {
+            fprintf(stderr, "qemu: invalid numa mem size: %s\n", opts->mem);
+            return -1;
+        }
+        node_mem[nodenr] = mem_size;
+    }
+
+    return 0;
+}
+
+static int numa_mem_parse(NumaMemOptions *opts)
+{
+    uint16_t nodenr;
+    uint64_t mem_size;
+
+    if (opts->has_nodeid) {
+        nodenr = opts->nodeid;
+        if (nodenr >= MAX_NODES) {
+            fprintf(stderr, "qemu: Max number of NUMA nodes reached: %"
+                    PRIu16 "\n", nodenr);
+            return -1;
+        }
+    } else {
+        nodenr = nb_numa_mem_nodes;
+    }
+
+    if (opts->has_size) {
+        mem_size = opts->size;
+        node_mem[nodenr] = mem_size;
+    }
+
+    return 0;
+}
+
+int numa_init_func(QemuOpts *opts, void *opaque)
+{
+    NumaOptions *object = NULL;
+    Error *err = NULL;
+    int ret = 0;
+
+    {
+        OptsVisitor *ov = opts_visitor_new(opts);
+        visit_type_NumaOptions(opts_get_visitor(ov), &object, NULL, &err);
+        opts_visitor_cleanup(ov);
+    }
+
+    if (error_is_set(&err)) {
+        fprintf(stderr, "qemu: %s\n", error_get_pretty(err));
+        error_free(err);
+        ret = -1;
+        goto error;
+    }
+
+    switch (object->kind) {
+    case NUMA_OPTIONS_KIND_NODE:
+        if (nb_numa_nodes >= MAX_NODES) {
+            fprintf(stderr, "qemu: too many NUMA nodes\n");
+            ret = -1;
+            goto error;
+        }
+        ret = numa_node_parse(object->node);
+        nb_numa_nodes++;
+        break;
+    case NUMA_OPTIONS_KIND_MEM:
+        ret = numa_mem_parse(object->mem);
+        nb_numa_mem_nodes++;
+        break;
+    default:
+        fprintf(stderr, "qemu: Invalid NUMA options type.\n");
+        ret = -1;
+    }
+
+error:
+    if (object) {
+        QapiDeallocVisitor *dv = qapi_dealloc_visitor_new();
+        visit_type_NumaOptions(qapi_dealloc_get_visitor(dv),
+                               &object, NULL, NULL);
+        qapi_dealloc_visitor_cleanup(dv);
+    }
+
+    return ret;
+}
diff --git a/qemu-options.hx b/qemu-options.hx
index d15338e..e9123b8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -95,11 +95,13 @@ specifies the maximum number of hotpluggable CPUs.
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
+    "-numa node[,nodeid=node][,cpus=cpu[-cpu]]\n"
+    "-numa mem[,nodeid=node][,size=size]\n"
+    , QEMU_ARCH_ALL)
 STEXI
 @item -numa @var{opts}
 @findex -numa
-Simulate a multi node NUMA system. If mem and cpus are omitted, resources
+Simulate a multi node NUMA system. If @var{size} and @var{cpus} are omitted, resources
 are split equally.
 ETEXI
 
diff --git a/vl.c b/vl.c
index f422a1c..3d75f78 100644
--- a/vl.c
+++ b/vl.c
@@ -250,6 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
     QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
 int nb_numa_nodes;
+int nb_numa_mem_nodes;
 uint64_t node_mem[MAX_NODES];
 unsigned long *node_cpumask[MAX_NODES];
 
@@ -1330,102 +1331,6 @@ char *get_boot_devices_list(size_t *size)
     return list;
 }
 
-static void numa_node_parse_cpus(int nodenr, const char *cpus)
-{
-    char *endptr;
-    unsigned long long value, endvalue;
-
-    /* Empty CPU range strings will be considered valid, they will simply
-     * not set any bit in the CPU bitmap.
-     */
-    if (!*cpus) {
-        return;
-    }
-
-    if (parse_uint(cpus, &value, &endptr, 10) < 0) {
-        goto error;
-    }
-    if (*endptr == '-') {
-        if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) {
-            goto error;
-        }
-    } else if (*endptr == '\0') {
-        endvalue = value;
-    } else {
-        goto error;
-    }
-
-    if (endvalue >= MAX_CPUMASK_BITS) {
-        endvalue = MAX_CPUMASK_BITS - 1;
-        fprintf(stderr,
-            "qemu: NUMA: A max of %d VCPUs are supported\n",
-             MAX_CPUMASK_BITS);
-    }
-
-    if (endvalue < value) {
-        goto error;
-    }
-
-    bitmap_set(node_cpumask[nodenr], value, endvalue-value+1);
-    return;
-
-error:
-    fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus);
-    exit(1);
-}
-
-static void numa_add(const char *optarg)
-{
-    char option[128];
-    char *endptr;
-    unsigned long long nodenr;
-
-    optarg = get_opt_name(option, 128, optarg, ',');
-    if (*optarg == ',') {
-        optarg++;
-    }
-    if (!strcmp(option, "node")) {
-
-        if (nb_numa_nodes >= MAX_NODES) {
-            fprintf(stderr, "qemu: too many NUMA nodes\n");
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
-            nodenr = nb_numa_nodes;
-        } else {
-            if (parse_uint_full(option, &nodenr, 10) < 0) {
-                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
-                exit(1);
-            }
-        }
-
-        if (nodenr >= MAX_NODES) {
-            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
-            exit(1);
-        }
-
-        if (get_param_value(option, 128, "mem", optarg) == 0) {
-            node_mem[nodenr] = 0;
-        } else {
-            int64_t sval;
-            sval = strtosz(option, &endptr);
-            if (sval < 0 || *endptr) {
-                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
-                exit(1);
-            }
-            node_mem[nodenr] = sval;
-        }
-        if (get_param_value(option, 128, "cpus", optarg) != 0) {
-            numa_node_parse_cpus(nodenr, option);
-        }
-        nb_numa_nodes++;
-    } else {
-        fprintf(stderr, "Invalid -numa option: %s\n", option);
-        exit(1);
-    }
-}
-
 static QemuOptsList qemu_smp_opts = {
     .name = "smp-opts",
     .implied_opt_name = "cpus",
@@ -2961,6 +2866,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_tpmdev_opts);
     qemu_add_opts(&qemu_realtime_opts);
     qemu_add_opts(&qemu_msg_opts);
+    qemu_add_opts(&qemu_numa_opts);
 
     runstate_init();
 
@@ -2986,6 +2892,7 @@ int main(int argc, char **argv, char **envp)
     }
 
     nb_numa_nodes = 0;
+    nb_numa_mem_nodes = 0;
     nb_nics = 0;
 
     bdrv_init_with_whitelist();
@@ -3147,7 +3054,10 @@ int main(int argc, char **argv, char **envp)
                 }
                 break;
             case QEMU_OPTION_numa:
-                numa_add(optarg);
+                opts = qemu_opts_parse(qemu_find_opts("numa"), optarg, 1);
+                if (!opts) {
+                    exit(1);
+                }
                 break;
             case QEMU_OPTION_display:
                 display_type = select_display(optarg);
@@ -4226,6 +4136,15 @@ int main(int argc, char **argv, char **envp)
 
     register_savevm_live(NULL, "ram", 0, 4, &savevm_ram_handlers, NULL);
 
+    if (qemu_opts_foreach(qemu_find_opts("numa"), numa_init_func,
+                          NULL, 1) != 0) {
+        exit(1);
+    }
+
+    if (nb_numa_mem_nodes > nb_numa_nodes) {
+        nb_numa_nodes = nb_numa_mem_nodes;
+    }
+
     if (nb_numa_nodes > 0) {
         int i;
 
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 03/11] NUMA: move numa related code to numa.c
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 02/11] NUMA: split -numa option Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 04/11] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 cpus.c                  | 14 -----------
 include/sysemu/cpus.h   |  1 -
 include/sysemu/sysemu.h |  2 ++
 numa.c                  | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
 vl.c                    | 47 +----------------------------------
 5 files changed, 69 insertions(+), 61 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0f65e76..08b8ef1 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1201,20 +1201,6 @@ static void tcg_exec_all(void)
     exit_request = 0;
 }
 
-void set_numa_modes(void)
-{
-    CPUState *cpu;
-    int i;
-
-    for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
-                cpu->numa_node = i;
-            }
-        }
-    }
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 6502488..4f79081 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -23,7 +23,6 @@ extern int smp_threads;
 #define smp_threads 1
 #endif
 
-void set_numa_modes(void);
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 474dd9e..b42f4a1 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -134,6 +134,8 @@ extern uint64_t node_mem[MAX_NODES];
 extern unsigned long *node_cpumask[MAX_NODES];
 extern QemuOptsList qemu_numa_opts;
 int numa_init_func(QemuOpts *opts, void *opaque);
+void set_numa_nodes(void);
+void set_numa_modes(void);
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
index e6924f4..7b37a3b 100644
--- a/numa.c
+++ b/numa.c
@@ -142,3 +142,69 @@ error:
 
     return ret;
 }
+
+void set_numa_nodes(void)
+{
+    if (nb_numa_mem_nodes > nb_numa_nodes) {
+        nb_numa_nodes = nb_numa_mem_nodes;
+    }
+
+    if (nb_numa_nodes > 0) {
+        int i;
+
+        if (nb_numa_nodes > MAX_NODES) {
+            nb_numa_nodes = MAX_NODES;
+        }
+
+        /* If no memory size if given for any node, assume the default case
+         * and distribute the available memory equally across all nodes
+         */
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (node_mem[i] != 0) {
+                break;
+            }
+        }
+
+        if (i == nb_numa_nodes) {
+            uint64_t usedmem = 0;
+
+            /* On Linux, the each node's border has to be 8MB aligned,
+             * the final node gets the rest.
+             */
+            for (i = 0; i < nb_numa_nodes - 1; i++) {
+                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
+                usedmem += node_mem[i];
+            }
+            node_mem[i] = ram_size - usedmem;
+        }
+
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+                break;
+            }
+        }
+        /* assigning the VCPUs round-robin is easier to implement, guest OSes
+         * must cope with this anyway, because there are BIOSes out there in
+         * real machines which also use this scheme.
+         */
+        if (i == nb_numa_nodes) {
+            for (i = 0; i < max_cpus; i++) {
+                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+            }
+        }
+    }
+}
+
+void set_numa_modes(void)
+{
+    CPUState *cpu;
+    int i;
+
+    for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
+        for (i = 0; i < nb_numa_nodes; i++) {
+            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+                cpu->numa_node = i;
+            }
+        }
+    }
+}
diff --git a/vl.c b/vl.c
index 3d75f78..5f0a5e4 100644
--- a/vl.c
+++ b/vl.c
@@ -4141,52 +4141,7 @@ int main(int argc, char **argv, char **envp)
         exit(1);
     }
 
-    if (nb_numa_mem_nodes > nb_numa_nodes) {
-        nb_numa_nodes = nb_numa_mem_nodes;
-    }
-
-    if (nb_numa_nodes > 0) {
-        int i;
-
-        if (nb_numa_nodes > MAX_NODES) {
-            nb_numa_nodes = MAX_NODES;
-        }
-
-        /* If no memory size if given for any node, assume the default case
-         * and distribute the available memory equally across all nodes
-         */
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0)
-                break;
-        }
-        if (i == nb_numa_nodes) {
-            uint64_t usedmem = 0;
-
-            /* On Linux, the each node's border has to be 8MB aligned,
-             * the final node gets the rest.
-             */
-            for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
-            }
-            node_mem[i] = ram_size - usedmem;
-        }
-
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
-                break;
-            }
-        }
-        /* assigning the VCPUs round-robin is easier to implement, guest OSes
-         * must cope with this anyway, because there are BIOSes out there in
-         * real machines which also use this scheme.
-         */
-        if (i == nb_numa_nodes) {
-            for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
-            }
-        }
-    }
+    set_numa_nodes();
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
         exit(1);
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 04/11] NUMA: Add numa_info structure to contain numa nodes info
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (2 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 03/11] NUMA: move numa related code to numa.c Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 05/11] NUMA: Add Linux libnuma detection Wanlong Gao
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hw/i386/pc.c            |  4 ++--
 include/sysemu/sysemu.h |  8 ++++++--
 monitor.c               |  2 +-
 numa.c                  | 21 +++++++++++----------
 vl.c                    |  7 +++----
 5 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e8bc8ce..1a534b2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -653,14 +653,14 @@ static FWCfgState *bochs_bios_init(void)
         unsigned int apic_id = x86_cpu_apic_id_from_index(i);
         assert(apic_id < apic_id_limit);
         for (j = 0; j < nb_numa_nodes; j++) {
-            if (test_bit(i, node_cpumask[j])) {
+            if (test_bit(i, numa_info[j].node_cpu)) {
                 numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
                 break;
             }
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]);
+        numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
                      (1 + apic_id_limit + nb_numa_nodes) *
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b42f4a1..b683d08 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -9,6 +9,7 @@
 #include "qapi-types.h"
 #include "qemu/notify.h"
 #include "qemu/main-loop.h"
+#include "qemu/bitmap.h"
 
 /* vl.c */
 
@@ -130,8 +131,11 @@ extern QEMUClock *rtc_clock;
 #define MAX_CPUMASK_BITS 255
 extern int nb_numa_nodes;
 extern int nb_numa_mem_nodes;
-extern uint64_t node_mem[MAX_NODES];
-extern unsigned long *node_cpumask[MAX_NODES];
+typedef struct node_info {
+    uint64_t node_mem;
+    DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+} NodeInfo;
+extern NodeInfo numa_info[MAX_NODES];
 extern QemuOptsList qemu_numa_opts;
 int numa_init_func(QemuOpts *opts, void *opaque);
 void set_numa_nodes(void);
diff --git a/monitor.c b/monitor.c
index 5dc0aa9..513af25 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1826,7 +1826,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict)
         }
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            node_mem[i] >> 20);
+            numa_info[i].node_mem >> 20);
     }
 }
 
diff --git a/numa.c b/numa.c
index 7b37a3b..fa2e4ea 100644
--- a/numa.c
+++ b/numa.c
@@ -53,7 +53,7 @@ static int numa_node_parse(NumaNodeOptions *opts)
     }
 
     for (cpus = opts->cpus; cpus; cpus = cpus->next) {
-        bitmap_set(node_cpumask[nodenr], cpus->value, 1);
+        bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
     }
 
     if (opts->has_mem) {
@@ -64,7 +64,7 @@ static int numa_node_parse(NumaNodeOptions *opts)
             fprintf(stderr, "qemu: invalid numa mem size: %s\n", opts->mem);
             return -1;
         }
-        node_mem[nodenr] = mem_size;
+        numa_info[nodenr].node_mem = mem_size;
     }
 
     return 0;
@@ -88,7 +88,7 @@ static int numa_mem_parse(NumaMemOptions *opts)
 
     if (opts->has_size) {
         mem_size = opts->size;
-        node_mem[nodenr] = mem_size;
+        numa_info[nodenr].node_mem = mem_size;
     }
 
     return 0;
@@ -160,7 +160,7 @@ void set_numa_nodes(void)
          * and distribute the available memory equally across all nodes
          */
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (node_mem[i] != 0) {
+            if (numa_info[i].node_mem != 0) {
                 break;
             }
         }
@@ -172,14 +172,15 @@ void set_numa_nodes(void)
              * the final node gets the rest.
              */
             for (i = 0; i < nb_numa_nodes - 1; i++) {
-                node_mem[i] = (ram_size / nb_numa_nodes) & ~((1 << 23UL) - 1);
-                usedmem += node_mem[i];
+                numa_info[i].node_mem = (ram_size / nb_numa_nodes) &
+                                        ~((1 << 23UL) - 1);
+                usedmem += numa_info[i].node_mem;
             }
-            node_mem[i] = ram_size - usedmem;
+            numa_info[i].node_mem = ram_size - usedmem;
         }
 
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) {
+            if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
                 break;
             }
         }
@@ -189,7 +190,7 @@ void set_numa_nodes(void)
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, node_cpumask[i % nb_numa_nodes]);
+                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
             }
         }
     }
@@ -202,7 +203,7 @@ void set_numa_modes(void)
 
     for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
         for (i = 0; i < nb_numa_nodes; i++) {
-            if (test_bit(cpu->cpu_index, node_cpumask[i])) {
+            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
                 cpu->numa_node = i;
             }
         }
diff --git a/vl.c b/vl.c
index 5f0a5e4..89bbccb 100644
--- a/vl.c
+++ b/vl.c
@@ -251,8 +251,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
 
 int nb_numa_nodes;
 int nb_numa_mem_nodes;
-uint64_t node_mem[MAX_NODES];
-unsigned long *node_cpumask[MAX_NODES];
+NodeInfo numa_info[MAX_NODES];
 
 uint8_t qemu_uuid[16];
 
@@ -2887,8 +2886,8 @@ int main(int argc, char **argv, char **envp)
     translation = BIOS_ATA_TRANSLATION_AUTO;
 
     for (i = 0; i < MAX_NODES; i++) {
-        node_mem[i] = 0;
-        node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
+        numa_info[i].node_mem = 0;
+        bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
     }
 
     nb_numa_nodes = 0;
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 05/11] NUMA: Add Linux libnuma detection
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (3 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 04/11] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 06/11] NUMA: parse guest numa nodes memory policy Wanlong Gao
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command line,
default is use if available.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 configure | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/configure b/configure
index 18fa608..b82e89a 100755
--- a/configure
+++ b/configure
@@ -243,6 +243,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+numa=""
 
 # parse CC options first
 for opt do
@@ -945,6 +946,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1159,6 +1164,8 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
 echo "  --disable-libssh2        disable ssh block device support"
 echo "  --enable-libssh2         enable ssh block device support"
+echo "  --disable-numa           disable libnuma support"
+echo "  --enable-numa            enable libnuma support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2412,6 +2419,27 @@ EOF
 fi
 
 ##########################################
+# libnuma probe
+
+if test "$numa" != "no" ; then
+  numa=no
+  cat > $TMPC << EOF
+#include <numa.h>
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+    numa=yes
+    libs_softmmu="-lnuma $libs_softmmu"
+  else
+    if test "$numa" = "yes" ; then
+      feature_not_found "linux NUMA (install numactl?)"
+    fi
+    numa=no
+  fi
+fi
+
+##########################################
 # linux-aio probe
 
 if test "$linux_aio" != "no" ; then
@@ -3613,6 +3641,7 @@ echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
+echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3646,6 +3675,9 @@ echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
 echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
 echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
 echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
 
 echo "ARCH=$ARCH" >> $config_host_mak
 
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 06/11] NUMA: parse guest numa nodes memory policy
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (4 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 05/11] NUMA: Add Linux libnuma detection Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 07/11] NUMA: set " Wanlong Gao
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

The memory policy setting format is like:
    policy={default|membind|interleave|preferred}[,relative=true],host-nodes=N-N
And we are adding this setting as a suboption of "-numa mem,",
the memory policy then can be set like following:
    -numa node,nodeid=0,cpus=0 \
    -numa node,nodeid=1,cpus=1 \
    -numa mem,nodeid=0,size=1G,policy=membind,host-nodes=0-1 \
    -numa mem,nodeid=1,size=1G,policy=interleave,relative=true,host-nodes=1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 include/sysemu/sysemu.h |  3 +++
 numa.c                  | 13 +++++++++++++
 qapi-schema.json        | 31 +++++++++++++++++++++++++++++--
 vl.c                    |  3 +++
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b683d08..81d16a5 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -134,6 +134,9 @@ extern int nb_numa_mem_nodes;
 typedef struct node_info {
     uint64_t node_mem;
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+    NumaNodePolicy policy;
+    bool relative;
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
 extern QemuOptsList qemu_numa_opts;
diff --git a/numa.c b/numa.c
index fa2e4ea..436b8e0 100644
--- a/numa.c
+++ b/numa.c
@@ -74,6 +74,7 @@ static int numa_mem_parse(NumaMemOptions *opts)
 {
     uint16_t nodenr;
     uint64_t mem_size;
+    uint16List *nodes;
 
     if (opts->has_nodeid) {
         nodenr = opts->nodeid;
@@ -91,6 +92,18 @@ static int numa_mem_parse(NumaMemOptions *opts)
         numa_info[nodenr].node_mem = mem_size;
     }
 
+    if (opts->has_policy) {
+        numa_info[nodenr].policy = opts->policy;
+    }
+
+    if (opts->has_relative) {
+        numa_info[nodenr].relative = opts->relative;
+    }
+
+    for (nodes = opts->host_nodes; nodes; nodes = nodes->next) {
+        bitmap_set(numa_info[nodenr].host_mem, nodes->value, 1);
+    }
+
     return 0;
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index b9b18e4..a34350c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3806,6 +3806,24 @@
    '*mem':    'str' }}
 
 ##
+# @NumaNodePolicy
+#
+# NUMA node policy types
+#
+# @default: restore default policy, remove any nondefault policy
+#
+# @membind: a strict policy that restricts memory allocation to the
+#           nodes specified
+#
+# @interleave: the page allocations is interleaved across the set
+#              of nodes specified
+#
+# @preferred: set the preferred node for allocation
+##
+{ 'enum': 'NumaNodePolicy',
+  'data': [ 'default', 'membind', 'interleave', 'preferred' ] }
+
+##
 # @NumaMemOptions
 #
 # Set memory information of guest NUMA node.
@@ -3814,9 +3832,18 @@
 #
 # @size: #optional memory size of this node
 #
+# @policy: #optional memory policy of this node
+#
+# @relative: #optional if the nodes specified are relative
+#
+# @host-nodes: #optional host nodes for its memory policy
+#
 # Since 1.7
 ##
 { 'type': 'NumaMemOptions',
   'data': {
-   '*nodeid': 'uint16',
-   '*size':   'size' }}
+   '*nodeid':     'uint16',
+   '*size':       'size',
+   '*policy':     'NumaNodePolicy',
+   '*relative':   'bool',
+   '*host-nodes': ['uint16'] }}
diff --git a/vl.c b/vl.c
index 89bbccb..49c807a 100644
--- a/vl.c
+++ b/vl.c
@@ -2888,6 +2888,9 @@ int main(int argc, char **argv, char **envp)
     for (i = 0; i < MAX_NODES; i++) {
         numa_info[i].node_mem = 0;
         bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS);
+        bitmap_zero(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        numa_info[i].policy = NUMA_NODE_POLICY_DEFAULT;
+        numa_info[i].relative = false;
     }
 
     nb_numa_nodes = 0;
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (5 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 06/11] NUMA: parse guest numa nodes memory policy Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20 13:41   ` Andrew Jones
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 08/11] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Set the guest numa nodes memory policies using the mbind(2)
system call node by node.
After this patch, we are able to set guest nodes memory policies
through the QEMU options, this arms to solve the guest cross
nodes memory access performance issue.
And as you all know, if PCI-passthrough is used,
direct-attached-device uses DMA transfer between device and qemu process.
All pages of the guest will be pinned by get_user_pages().

KVM_ASSIGN_PCI_DEVICE ioctl
  kvm_vm_ioctl_assign_device()
    =>kvm_assign_device()
      => kvm_iommu_map_memslots()
        => kvm_iommu_map_pages()
           => kvm_pin_pages()

So, with direct-attached-device, all guest page's page count will be +1 and
any page migration will not work. AutoNUMA won't too.

So, we should set the guest nodes memory allocation policies before
the pages are really mapped.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/numa.c b/numa.c
index 436b8e0..b2c0048 100644
--- a/numa.c
+++ b/numa.c
@@ -28,6 +28,16 @@
 #include "qapi-visit.h"
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
+#include "exec/memory.h"
+
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#include <numaif.h>
+#ifndef MPOL_F_RELATIVE_NODES
+#define MPOL_F_RELATIVE_NODES (1 << 14)
+#define MPOL_F_STATIC_NODES   (1 << 15)
+#endif
+#endif
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -209,6 +219,78 @@ void set_numa_nodes(void)
     }
 }
 
+#ifdef CONFIG_NUMA
+static int node_parse_bind_mode(unsigned int nodeid)
+{
+    int bind_mode;
+
+    switch (numa_info[nodeid].policy) {
+    case NUMA_NODE_POLICY_MEMBIND:
+        bind_mode = MPOL_BIND;
+        break;
+    case NUMA_NODE_POLICY_INTERLEAVE:
+        bind_mode = MPOL_INTERLEAVE;
+        break;
+    case NUMA_NODE_POLICY_PREFERRED:
+        bind_mode = MPOL_PREFERRED;
+        break;
+    case NUMA_NODE_POLICY_DEFAULT:
+    default:
+        bind_mode = MPOL_DEFAULT;
+        return bind_mode;
+    }
+
+    bind_mode |= numa_info[nodeid].relative ?
+        MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
+
+    return bind_mode;
+}
+#endif
+
+static int set_node_mem_policy(int nodeid)
+{
+#ifdef CONFIG_NUMA
+    void *ram_ptr;
+    RAMBlock *block;
+    ram_addr_t len, ram_offset = 0;
+    int bind_mode;
+    int i;
+
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (!strcmp(block->mr->name, "pc.ram")) {
+            break;
+        }
+    }
+
+    if (block->host == NULL) {
+        return -1;
+    }
+
+    ram_ptr = block->host;
+    for (i = 0; i < nodeid; i++) {
+        len = numa_info[i].node_mem;
+        ram_offset += len;
+    }
+
+    len = numa_info[i].node_mem;
+    bind_mode = node_parse_bind_mode(i);
+
+    /* This is a workaround for a long standing bug in Linux'
+     * mbind implementation, which cuts off the last specified
+     * node. To stay compatible should this bug be fixed, we
+     * specify one more node and zero this one out.
+     */
+    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
+    if (mbind(ram_ptr + ram_offset, len, bind_mode,
+        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
+            perror("mbind");
+            return -1;
+    }
+#endif
+
+    return 0;
+}
+
 void set_numa_modes(void)
 {
     CPUState *cpu;
@@ -221,4 +303,11 @@ void set_numa_modes(void)
             }
         }
     }
+
+    for (i = 0; i < nb_numa_nodes; i++) {
+        if (set_node_mem_policy(i) == -1) {
+            fprintf(stderr,
+                    "qemu: can not set host memory policy for node%d\n", i);
+        }
+    }
 }
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 08/11] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (6 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 07/11] NUMA: set " Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 09/11] NUMA: add hmp command set-mem-policy Wanlong Gao
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

This QMP command allows user set guest node's memory policy
through the QMP protocol. The qmp-shell command is like:
    set-mem-policy nodeid=0 policy=membind relative=true host-nodes=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c           | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 21 +++++++++++++++++++
 qmp-commands.hx  | 41 +++++++++++++++++++++++++++++++++++++
 3 files changed, 124 insertions(+)

diff --git a/numa.c b/numa.c
index b2c0048..c260d73 100644
--- a/numa.c
+++ b/numa.c
@@ -29,6 +29,7 @@
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
 #include "exec/memory.h"
+#include "qmp-commands.h"
 
 #ifdef CONFIG_NUMA
 #include <numa.h>
@@ -311,3 +312,64 @@ void set_numa_modes(void)
         }
     }
 }
+
+void qmp_set_mem_policy(uint16_t nodeid, bool has_policy, NumaNodePolicy policy,
+                        bool has_relative, bool relative,
+                        bool has_host_nodes, uint16List *host_nodes,
+                        Error **errp)
+{
+    NumaNodePolicy old_policy;
+    bool old_relative;
+    DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS);
+    uint16List *nodes;
+
+    if (nodeid >= nb_numa_nodes) {
+        error_setg(errp, "Only has '%d' NUMA nodes", nb_numa_nodes);
+        return;
+    }
+
+    bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    old_policy = numa_info[nodeid].policy;
+    old_relative = numa_info[nodeid].relative;
+
+    numa_info[nodeid].policy = NUMA_NODE_POLICY_DEFAULT;
+    numa_info[nodeid].relative = false;
+    bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+
+    if (!has_policy) {
+        if (set_node_mem_policy(nodeid) == -1) {
+            error_setg(errp, "Failed to set memory policy for node%" PRIu16,
+                       nodeid);
+            goto error;
+        }
+        return;
+    }
+
+    numa_info[nodeid].policy = policy;
+
+    if (has_relative) {
+        numa_info[nodeid].relative = relative;
+    }
+
+    if (!has_host_nodes) {
+        bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS);
+    }
+
+    for (nodes = host_nodes; nodes; nodes = nodes->next) {
+        bitmap_set(numa_info[nodeid].host_mem, nodes->value, 1);
+    }
+
+    if (set_node_mem_policy(nodeid) == -1) {
+        error_setg(errp, "Failed to set memory policy for node%" PRIu16,
+                   nodeid);
+        goto error;
+    }
+
+    return;
+
+error:
+    bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS);
+    numa_info[nodeid].policy = old_policy;
+    numa_info[nodeid].relative = old_relative;
+    return;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index a34350c..a1e0d36 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3847,3 +3847,24 @@
    '*policy':     'NumaNodePolicy',
    '*relative':   'bool',
    '*host-nodes': ['uint16'] }}
+
+##
+# @set-mem-policy:
+#
+# Set the host memory binding policy for guest NUMA node.
+#
+# @nodeid: The node ID of guest NUMA node to set memory policy to.
+#
+# @policy: #optional The memory policy to be set (default 'default').
+#
+# @relative: #optional If the specified nodes are relative (default 'false')
+#
+# @host-nodes: #optional The host nodes range for memory policy.
+#
+# Returns: Nothing on success
+#
+# Since: 1.7
+##
+{ 'command': 'set-mem-policy',
+  'data': {'nodeid': 'uint16', '*policy': 'NumaNodePolicy',
+           '*relative': 'bool', '*host-nodes': ['uint16'] } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index cf47e3f..52e6ff3 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3061,6 +3061,7 @@ Example:
 <- { "return": {} }
 
 EQMP
+
     {
         .name       = "query-rx-filter",
         .args_type  = "name:s?",
@@ -3124,3 +3125,43 @@ Example:
    }
 
 EQMP
+
+    {
+        .name      = "set-mem-policy",
+        .args_type = "nodeid:i,policy:s?,relative:b?,host-nodes:q?",
+        .help      = "Set the host memory binding policy for guest NUMA node",
+        .mhandler.cmd_new = qmp_marshal_input_set_mem_policy,
+    },
+
+SQMP
+set-mem-policy
+------
+
+Set the host memory binding policy for guest NUMA node
+
+Arguments:
+
+- "nodeid": The nodeid of guest NUMA node to set memory policy to.
+            (json-int)
+- "policy": The memory policy to set.
+            (json-string, optional)
+- "relative": If the specified nodes are relative.
+              (json-bool, optional)
+- "host-nodes": The host nodes contained to this memory policy.
+                (a json-array of int, optional)
+
+Example:
+
+-> { "execute": "set-mem-policy", "arguments": { "nodeid": 0,
+                                                 "policy": "membind",
+                                                 "relative": true,
+                                                 "host-nodes": [0, 1] } }
+<- { "return": {} }
+
+Notes:
+    1. If "policy" is not set, the memory policy of this "nodeid" will be set
+       to "default".
+    2. If "host-nodes" is not set, the node mask of this "policy" will be set
+       to all available host nodes.
+
+EQMP
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 09/11] NUMA: add hmp command set-mem-policy
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (7 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 08/11] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 10/11] NUMA: add qmp command query-numa Wanlong Gao
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Add hmp command set-mem-policy to set host memory policy for a guest
NUMA node. Then we can also set node's memory policy using
the monitor command like:
    (qemu) set-mem-policy 0 policy=membind,relative=false,host-nodes=0-1

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hmp-commands.hx | 16 ++++++++++++++
 hmp.c           | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hmp.h           |  1 +
 3 files changed, 82 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8c6b91a..fe3a26f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1587,6 +1587,22 @@ Executes a qemu-io command on the given block device.
 ETEXI
 
     {
+        .name       = "set-mem-policy",
+        .args_type  = "nodeid:i,args:s?",
+        .params     = "nodeid [args]",
+        .help       = "set host memory policy for a guest NUMA node",
+        .mhandler.cmd = hmp_set_mem_policy,
+    },
+
+STEXI
+@item set-mem-policy @var{nodeid} @var{args}
+@findex set-mem-policy
+
+Set host memory policy for a guest NUMA node
+
+ETEXI
+
+    {
         .name       = "info",
         .args_type  = "item:s?",
         .params     = "[subcommand]",
diff --git a/hmp.c b/hmp.c
index c45514b..98d2a76 100644
--- a/hmp.c
+++ b/hmp.c
@@ -24,6 +24,9 @@
 #include "ui/console.h"
 #include "block/qapi.h"
 #include "qemu-io.h"
+#include "qapi-visit.h"
+#include "qapi/opts-visitor.h"
+#include "qapi/dealloc-visitor.h"
 
 static void hmp_handle_error(Monitor *mon, Error **errp)
 {
@@ -1514,3 +1517,65 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 
     hmp_handle_error(mon, &err);
 }
+
+void hmp_set_mem_policy(Monitor *mon, const QDict *qdict)
+{
+    Error *local_err = NULL;
+    bool has_policy = true;
+    bool has_relative = true;
+    bool has_host_nodes = true;
+    QemuOpts *opts;
+    NumaMemOptions *object = NULL;
+    NumaNodePolicy policy = NUMA_NODE_POLICY_DEFAULT;
+    bool relative = false;
+    uint16List *host_nodes = NULL;
+
+    uint64_t nodeid = qdict_get_int(qdict, "nodeid");
+    const char *args = qdict_get_try_str(qdict, "args");
+
+    if (args == NULL) {
+        has_policy = false;
+        has_relative = false;
+        has_host_nodes = false;
+    } else {
+        opts = qemu_opts_parse(qemu_find_opts("numa"), args, 1);
+        if (opts == NULL) {
+            monitor_printf(mon, "Parsing memory policy args failed\n");
+            return;
+        } else {
+            OptsVisitor *ov = opts_visitor_new(opts);
+            visit_type_NumaMemOptions(opts_get_visitor(ov), &object, NULL,
+                                      &local_err);
+            opts_visitor_cleanup(ov);
+
+            if (error_is_set(&local_err)) {
+                goto error;
+            }
+
+            has_policy = object->has_policy;
+            if (has_policy) {
+                policy = object->policy;
+            }
+            has_relative = object->has_relative;
+            if (has_relative) {
+                relative = object->relative;
+            }
+            has_host_nodes = object->has_host_nodes;
+            if (has_host_nodes) {
+                host_nodes = object->host_nodes;
+            }
+        }
+    }
+
+    qmp_set_mem_policy(nodeid, has_policy, policy, has_relative, relative,
+                       has_host_nodes, host_nodes, &local_err);
+error:
+    if (object) {
+        QapiDeallocVisitor *dv = qapi_dealloc_visitor_new();
+        visit_type_NumaMemOptions(qapi_dealloc_get_visitor(dv),
+                                  &object, NULL, NULL);
+        qapi_dealloc_visitor_cleanup(dv);
+    }
+
+    hmp_handle_error(mon, &local_err);
+}
diff --git a/hmp.h b/hmp.h
index 6c3bdcd..ae09525 100644
--- a/hmp.h
+++ b/hmp.h
@@ -87,5 +87,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict);
 void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_qemu_io(Monitor *mon, const QDict *qdict);
+void hmp_set_mem_policy(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 10/11] NUMA: add qmp command query-numa
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (8 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 09/11] NUMA: add hmp command set-mem-policy Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Add qmp command query-numa to show guest NUMA information.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 numa.c           | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 36 +++++++++++++++++++++++++++++
 qmp-commands.hx  | 49 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+)

diff --git a/numa.c b/numa.c
index c260d73..f9ba01d 100644
--- a/numa.c
+++ b/numa.c
@@ -373,3 +373,72 @@ error:
     numa_info[nodeid].relative = old_relative;
     return;
 }
+
+NUMAInfoList *qmp_query_numa(Error **errp)
+{
+    NUMAInfoList *head = NULL, *cur_item = NULL;
+    CPUState *cpu;
+    int i;
+
+    for (i = 0; i < nb_numa_nodes; i++) {
+        NUMAInfoList *info;
+        uint16List *cur_cpu_item = NULL;
+        info = g_malloc0(sizeof(*info));
+        info->value = g_malloc0(sizeof(*info->value));
+        info->value->nodeid = i;
+        for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
+            if (cpu->numa_node == i) {
+                uint16List *node_cpu = g_malloc0(sizeof(*node_cpu));
+                node_cpu->value = cpu->cpu_index;
+
+                if (!cur_cpu_item) {
+                    info->value->cpus = cur_cpu_item = node_cpu;
+                } else {
+                    cur_cpu_item->next = node_cpu;
+                    cur_cpu_item = node_cpu;
+                }
+            }
+        }
+        info->value->memory = numa_info[i].node_mem;
+
+#ifdef CONFIG_NUMA
+        info->value->policy = numa_info[i].policy;
+        info->value->relative = numa_info[i].relative;
+
+        unsigned long first, next;
+        next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS);
+        if (first == MAX_CPUMASK_BITS) {
+            goto end;
+        }
+        uint16List *cur_node_item = g_malloc0(sizeof(*cur_node_item));
+        cur_node_item->value = first;
+        info->value->host_nodes = cur_node_item;
+        do {
+            if (next == numa_max_node()) {
+                break;
+            }
+
+            next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS,
+                                 next + 1);
+            if (next > numa_max_node() || next == MAX_CPUMASK_BITS) {
+                break;
+            }
+
+            uint16List *host_node = g_malloc0(sizeof(*host_node));
+            host_node->value = next;
+            cur_node_item->next = host_node;
+            cur_node_item = host_node;
+        } while (true);
+end:
+#endif
+
+        if (!cur_item) {
+            head = cur_item = info;
+        } else {
+            cur_item->next = info;
+            cur_item = info;
+        }
+    }
+
+    return head;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index a1e0d36..b80669c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3868,3 +3868,39 @@
 { 'command': 'set-mem-policy',
   'data': {'nodeid': 'uint16', '*policy': 'NumaNodePolicy',
            '*relative': 'bool', '*host-nodes': ['uint16'] } }
+
+##
+# @NUMAInfo:
+#
+# Information about guest NUMA nodes
+#
+# @nodeid: NUMA node ID
+#
+# @cpus: VCPUs contained in this node
+#
+# @memory: memory size of this node
+#
+# @policy: memory policy of this node
+#
+# @relative: if host nodes are relative for memory policy
+#
+# @host-nodes: host nodes for its memory policy
+#
+# Since: 1.7
+#
+##
+{ 'type': 'NUMAInfo',
+  'data': {'nodeid': 'uint16', 'cpus': ['uint16'], 'memory': 'uint64',
+           'policy': 'NumaNodePolicy', 'relative': 'bool',
+           'host-nodes': ['uint16'] }}
+
+##
+# @query-numa:
+#
+# Returns a list of information about each guest node.
+#
+# Returns: a list of @NUMAInfo for each guest node
+#
+# Since: 1.7
+##
+{ 'command': 'query-numa', 'returns': ['NUMAInfo'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 52e6ff3..20f1e74 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3165,3 +3165,52 @@ Notes:
        to all available host nodes.
 
 EQMP
+
+    {
+        .name = "query-numa",
+        .args_type = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_numa,
+    },
+
+SQMP
+query-numa
+---------
+
+Show NUMA information.
+
+Return a json-array. Each NUMA node is represented by a json-object,
+which contains:
+
+- "nodeid": NUMA node ID (json-int)
+- "cpus": a json-arry of contained VCPUs
+- "memory": amount of memory in each node in Byte (json-int)
+- "policy": memory policy of this node (json-string)
+- "relative": if host nodes is relative for its memory policy (json-bool)
+- "host-nodes": a json-array of host nodes for its memory policy
+
+Arguments:
+
+Example:
+
+-> { "excute": "query-numa" }
+<- { "return":[
+        {
+            "nodeid": 0,
+            "cpus": [0, 1],
+            "memory": 536870912,
+            "policy": "membind",
+            "relative": false,
+            "host-nodes": [0, 1]
+        },
+        {
+            "nodeid": 1,
+            "cpus": [2, 3],
+            "memory": 536870912,
+            "policy": "interleave",
+            "relative": false,
+            "host-nodes": [1]
+        }
+     ]
+   }
+
+EQMP
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH V8 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (9 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 10/11] NUMA: add qmp command query-numa Wanlong Gao
@ 2013-08-20  1:07 ` Wanlong Gao
  2013-08-20 13:43 ` [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
  2013-08-21  9:33 ` Laszlo Ersek
  12 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-20  1:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: aliguori, ehabkost, lersek, peter.huangpeng, lcapitulino, bsd,
	hutao, y-goto, pbonzini, afaerber, gaowanlong

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 hmp.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hmp.h     |  1 +
 monitor.c | 21 +--------------------
 3 files changed, 56 insertions(+), 20 deletions(-)

diff --git a/hmp.c b/hmp.c
index 98d2a76..3b2f04d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -27,6 +27,7 @@
 #include "qapi-visit.h"
 #include "qapi/opts-visitor.h"
 #include "qapi/dealloc-visitor.h"
+#include "sysemu/sysemu.h"
 
 static void hmp_handle_error(Monitor *mon, Error **errp)
 {
@@ -1579,3 +1580,56 @@ error:
 
     hmp_handle_error(mon, &local_err);
 }
+
+void hmp_info_numa(Monitor *mon, const QDict *qdict)
+{
+    NUMAInfoList *node_list, *node;
+    uint16List *head;
+    int nodeid;
+    char *policy_str = NULL;
+
+    node_list = qmp_query_numa(NULL);
+
+    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
+    for (node = node_list; node; node = node->next) {
+        nodeid = node->value->nodeid;
+        monitor_printf(mon, "node %d cpus:", nodeid);
+        head = node->value->cpus;
+        for (head = node->value->cpus; head != NULL; head = head->next) {
+            monitor_printf(mon, " %d", (int)head->value);
+        }
+        monitor_printf(mon, "\n");
+        monitor_printf(mon, "node %d size: %" PRId64 " MB\n",
+                       nodeid, node->value->memory >> 20);
+        switch (node->value->policy) {
+        case NUMA_NODE_POLICY_DEFAULT:
+            policy_str = g_strdup("default");
+            break;
+        case NUMA_NODE_POLICY_MEMBIND:
+            policy_str = g_strdup("membind");
+            break;
+        case NUMA_NODE_POLICY_INTERLEAVE:
+            policy_str = g_strdup("interleave");
+            break;
+        case NUMA_NODE_POLICY_PREFERRED:
+            policy_str = g_strdup("preferred");
+            break;
+        default:
+            break;
+        }
+        monitor_printf(mon, "node %d policy: %s\n",
+                       nodeid, policy_str ? : " ");
+        if (policy_str) {
+            free(policy_str);
+        }
+        monitor_printf(mon, "node %d relative: %s\n", nodeid,
+                       node->value->relative ? "true" : "false");
+        monitor_printf(mon, "node %d host-nodes:", nodeid);
+        for (head = node->value->host_nodes; head != NULL; head = head->next) {
+            monitor_printf(mon, " %d", (int)head->value);
+        }
+        monitor_printf(mon, "\n");
+    }
+
+    qapi_free_NUMAInfoList(node_list);
+}
diff --git a/hmp.h b/hmp.h
index ae09525..56a5efd 100644
--- a/hmp.h
+++ b/hmp.h
@@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict);
 void hmp_info_pci(Monitor *mon, const QDict *qdict);
 void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
 void hmp_info_tpm(Monitor *mon, const QDict *qdict);
+void hmp_info_numa(Monitor *mon, const QDict *qdict);
 void hmp_quit(Monitor *mon, const QDict *qdict);
 void hmp_stop(Monitor *mon, const QDict *qdict);
 void hmp_system_reset(Monitor *mon, const QDict *qdict);
diff --git a/monitor.c b/monitor.c
index 513af25..13fbe2b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1811,25 +1811,6 @@ static void do_info_mtree(Monitor *mon, const QDict *qdict)
     mtree_info((fprintf_function)monitor_printf, mon);
 }
 
-static void do_info_numa(Monitor *mon, const QDict *qdict)
-{
-    int i;
-    CPUState *cpu;
-
-    monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
-    for (i = 0; i < nb_numa_nodes; i++) {
-        monitor_printf(mon, "node %d cpus:", i);
-        for (cpu = first_cpu; cpu != NULL; cpu = cpu->next_cpu) {
-            if (cpu->numa_node == i) {
-                monitor_printf(mon, " %d", cpu->cpu_index);
-            }
-        }
-        monitor_printf(mon, "\n");
-        monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
-            numa_info[i].node_mem >> 20);
-    }
-}
-
 #ifdef CONFIG_PROFILER
 
 int64_t qemu_time;
@@ -2597,7 +2578,7 @@ static mon_cmd_t info_cmds[] = {
         .args_type  = "",
         .params     = "",
         .help       = "show NUMA information",
-        .mhandler.cmd = do_info_numa,
+        .mhandler.cmd = hmp_info_numa,
     },
     {
         .name       = "usb",
-- 
1.8.4.rc1.21.gfb56570

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 07/11] NUMA: set " Wanlong Gao
@ 2013-08-20 13:41   ` Andrew Jones
  2013-08-21  2:43     ` Wanlong Gao
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Jones @ 2013-08-20 13:41 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, qemu-devel, hutao, peter huangpeng,
	lcapitulino, bsd, pbonzini, y-goto, lersek, afaerber



----- Original Message -----
> Set the guest numa nodes memory policies using the mbind(2)
> system call node by node.
> After this patch, we are able to set guest nodes memory policies
> through the QEMU options, this arms to solve the guest cross
> nodes memory access performance issue.
> And as you all know, if PCI-passthrough is used,
> direct-attached-device uses DMA transfer between device and qemu process.
> All pages of the guest will be pinned by get_user_pages().
> 
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
>     =>kvm_assign_device()
>       => kvm_iommu_map_memslots()
>         => kvm_iommu_map_pages()
>            => kvm_pin_pages()
> 
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too.
> 
> So, we should set the guest nodes memory allocation policies before
> the pages are really mapped.
> 
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  numa.c | 89
>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
> 
> diff --git a/numa.c b/numa.c
> index 436b8e0..b2c0048 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -28,6 +28,16 @@
>  #include "qapi-visit.h"
>  #include "qapi/opts-visitor.h"
>  #include "qapi/dealloc-visitor.h"
> +#include "exec/memory.h"
> +
> +#ifdef CONFIG_NUMA
> +#include <numa.h>
> +#include <numaif.h>
> +#ifndef MPOL_F_RELATIVE_NODES
> +#define MPOL_F_RELATIVE_NODES (1 << 14)
> +#define MPOL_F_STATIC_NODES   (1 << 15)
> +#endif
> +#endif
>  
>  QemuOptsList qemu_numa_opts = {
>      .name = "numa",
> @@ -209,6 +219,78 @@ void set_numa_nodes(void)
>      }
>  }
>  
> +#ifdef CONFIG_NUMA
> +static int node_parse_bind_mode(unsigned int nodeid)
> +{
> +    int bind_mode;
> +
> +    switch (numa_info[nodeid].policy) {
> +    case NUMA_NODE_POLICY_MEMBIND:
> +        bind_mode = MPOL_BIND;
> +        break;
> +    case NUMA_NODE_POLICY_INTERLEAVE:
> +        bind_mode = MPOL_INTERLEAVE;
> +        break;
> +    case NUMA_NODE_POLICY_PREFERRED:
> +        bind_mode = MPOL_PREFERRED;
> +        break;
> +    case NUMA_NODE_POLICY_DEFAULT:
> +    default:
> +        bind_mode = MPOL_DEFAULT;
> +        return bind_mode;
> +    }
> +
> +    bind_mode |= numa_info[nodeid].relative ?
> +        MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
> +
> +    return bind_mode;
> +}
> +#endif
> +
> +static int set_node_mem_policy(int nodeid)
> +{
> +#ifdef CONFIG_NUMA
> +    void *ram_ptr;
> +    RAMBlock *block;
> +    ram_addr_t len, ram_offset = 0;
> +    int bind_mode;
> +    int i;
> +
> +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> +        if (!strcmp(block->mr->name, "pc.ram")) {
> +            break;
> +        }
> +    }
> +
> +    if (block->host == NULL) {
> +        return -1;
> +    }
> +
> +    ram_ptr = block->host;
> +    for (i = 0; i < nodeid; i++) {
> +        len = numa_info[i].node_mem;
> +        ram_offset += len;
> +    }
> +
> +    len = numa_info[i].node_mem;
> +    bind_mode = node_parse_bind_mode(i);
> +
> +    /* This is a workaround for a long standing bug in Linux'
> +     * mbind implementation, which cuts off the last specified
> +     * node. To stay compatible should this bug be fixed, we
> +     * specify one more node and zero this one out.
> +     */
> +    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
> +    if (mbind(ram_ptr + ram_offset, len, bind_mode,
> +        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
> +            perror("mbind");
> +            return -1;
> +    }

>From my quick read of this patch series, I think these two calls of
numa_num_configured_nodes() are the only places that libnuma is used.
Is it really worth the new dependency? Actually libnuma will only calculate
what it returns from numa_num_configured_nodes() once, because it simply
counts bits in a bitmask that it only initializes at library load time. So
it would be more robust wrt to node onlining/offlining to avoid libnuma and
to just fetch information from sysfs as needed anyway. In this particular
code though, I think replacing numa_num_configured_nodes() with a maxnode,
where

unsigned long maxnode = find_last_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS)

would work the best.

Another comment I have on this function is that I'd prefer to see something
like

unsigned long *nodes = numa_info[nodeid].host_mem;

at the top, and then use that for a shorter name, rather than abusing
the fact that i == nodeid after the loop, presumably just to keep the name
short.

drew

> +#endif
> +
> +    return 0;
> +}
> +
>  void set_numa_modes(void)
>  {
>      CPUState *cpu;
> @@ -221,4 +303,11 @@ void set_numa_modes(void)
>              }
>          }
>      }
> +
> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        if (set_node_mem_policy(i) == -1) {
> +            fprintf(stderr,
> +                    "qemu: can not set host memory policy for node%d\n", i);
> +        }
> +    }
>  }
> --
> 1.8.4.rc1.21.gfb56570
> 
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (10 preceding siblings ...)
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
@ 2013-08-20 13:43 ` Paolo Bonzini
  2013-08-21  1:22   ` Wanlong Gao
  2013-08-21  9:33 ` Laszlo Ersek
  12 siblings, 1 reply; 33+ messages in thread
From: Paolo Bonzini @ 2013-08-20 13:43 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, hutao,
	y-goto, peter.huangpeng, afaerber

Il 20/08/2013 03:07, Wanlong Gao ha scritto:
>  -numa node,nodeid=0,cpus=0, \
>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>  -numa node,nodeid=1,cpus=1 \
>  -numa mem,size=1024M,policy=interleave,host-nodes=1

What nodes would the memory be in, for this command line?  Does it just
compute the total and split it evenly across the nodes (so that the
"-numa node" options could omit nodeid and cpus too)?

Also, do you still need a "-m" option if you use "-numa mem"?

Paolo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-20 13:43 ` [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
@ 2013-08-21  1:22   ` Wanlong Gao
  2013-08-21  9:00     ` Paolo Bonzini
  0 siblings, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-21  1:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, hutao,
	y-goto, peter.huangpeng, afaerber, Wanlong Gao

On 08/20/2013 09:43 PM, Paolo Bonzini wrote:
> Il 20/08/2013 03:07, Wanlong Gao ha scritto:
>>  -numa node,nodeid=0,cpus=0, \
>>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>>  -numa node,nodeid=1,cpus=1 \
>>  -numa mem,size=1024M,policy=interleave,host-nodes=1
> 
> What nodes would the memory be in, for this command line?  Does it just

The original concept here is that if the nodeid is omitted, it will be
set node by node from node0. Here I also keep the original concept, so the
memory will be in node0 and node1.

> compute the total and split it evenly across the nodes (so that the
> "-numa node" options could omit nodeid and cpus too)?

If no memory size is given for any nodes, the memory will split across
all nodes like (ram_size / nb_numa_nodes). And yes nodeid and cpus options
can also be omitted from the original concept.

> 
> Also, do you still need a "-m" option if you use "-numa mem"?

The "-m" options will be used to compute the memory size of each node
if the memory size of each node is not set by "-numa mem" option. This
is also be consistent with the original concept.


Thanks,
Wanlong Gao

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy
  2013-08-20 13:41   ` Andrew Jones
@ 2013-08-21  2:43     ` Wanlong Gao
  2013-08-21  7:15       ` Andrew Jones
  0 siblings, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-21  2:43 UTC (permalink / raw)
  To: Andrew Jones
  Cc: aliguori, ehabkost, qemu-devel, hutao, peter huangpeng,
	lcapitulino, bsd, pbonzini, y-goto, lersek, afaerber, Wanlong Gao

On 08/20/2013 09:41 PM, Andrew Jones wrote:
> 
> 
> ----- Original Message -----
>> Set the guest numa nodes memory policies using the mbind(2)
>> system call node by node.
>> After this patch, we are able to set guest nodes memory policies
>> through the QEMU options, this arms to solve the guest cross
>> nodes memory access performance issue.
>> And as you all know, if PCI-passthrough is used,
>> direct-attached-device uses DMA transfer between device and qemu process.
>> All pages of the guest will be pinned by get_user_pages().
>>
>> KVM_ASSIGN_PCI_DEVICE ioctl
>>   kvm_vm_ioctl_assign_device()
>>     =>kvm_assign_device()
>>       => kvm_iommu_map_memslots()
>>         => kvm_iommu_map_pages()
>>            => kvm_pin_pages()
>>
>> So, with direct-attached-device, all guest page's page count will be +1 and
>> any page migration will not work. AutoNUMA won't too.
>>
>> So, we should set the guest nodes memory allocation policies before
>> the pages are really mapped.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
>>  numa.c | 89
>>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 89 insertions(+)
>>
>> diff --git a/numa.c b/numa.c
>> index 436b8e0..b2c0048 100644
>> --- a/numa.c
>> +++ b/numa.c
>> @@ -28,6 +28,16 @@
>>  #include "qapi-visit.h"
>>  #include "qapi/opts-visitor.h"
>>  #include "qapi/dealloc-visitor.h"
>> +#include "exec/memory.h"
>> +
>> +#ifdef CONFIG_NUMA
>> +#include <numa.h>
>> +#include <numaif.h>
>> +#ifndef MPOL_F_RELATIVE_NODES
>> +#define MPOL_F_RELATIVE_NODES (1 << 14)
>> +#define MPOL_F_STATIC_NODES   (1 << 15)
>> +#endif
>> +#endif
>>  
>>  QemuOptsList qemu_numa_opts = {
>>      .name = "numa",
>> @@ -209,6 +219,78 @@ void set_numa_nodes(void)
>>      }
>>  }
>>  
>> +#ifdef CONFIG_NUMA
>> +static int node_parse_bind_mode(unsigned int nodeid)
>> +{
>> +    int bind_mode;
>> +
>> +    switch (numa_info[nodeid].policy) {
>> +    case NUMA_NODE_POLICY_MEMBIND:
>> +        bind_mode = MPOL_BIND;
>> +        break;
>> +    case NUMA_NODE_POLICY_INTERLEAVE:
>> +        bind_mode = MPOL_INTERLEAVE;
>> +        break;
>> +    case NUMA_NODE_POLICY_PREFERRED:
>> +        bind_mode = MPOL_PREFERRED;
>> +        break;
>> +    case NUMA_NODE_POLICY_DEFAULT:
>> +    default:
>> +        bind_mode = MPOL_DEFAULT;
>> +        return bind_mode;
>> +    }
>> +
>> +    bind_mode |= numa_info[nodeid].relative ?
>> +        MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
>> +
>> +    return bind_mode;
>> +}
>> +#endif
>> +
>> +static int set_node_mem_policy(int nodeid)
>> +{
>> +#ifdef CONFIG_NUMA
>> +    void *ram_ptr;
>> +    RAMBlock *block;
>> +    ram_addr_t len, ram_offset = 0;
>> +    int bind_mode;
>> +    int i;
>> +
>> +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>> +        if (!strcmp(block->mr->name, "pc.ram")) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (block->host == NULL) {
>> +        return -1;
>> +    }
>> +
>> +    ram_ptr = block->host;
>> +    for (i = 0; i < nodeid; i++) {
>> +        len = numa_info[i].node_mem;
>> +        ram_offset += len;
>> +    }
>> +
>> +    len = numa_info[i].node_mem;
>> +    bind_mode = node_parse_bind_mode(i);
>> +
>> +    /* This is a workaround for a long standing bug in Linux'
>> +     * mbind implementation, which cuts off the last specified
>> +     * node. To stay compatible should this bug be fixed, we
>> +     * specify one more node and zero this one out.
>> +     */
>> +    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
>> +    if (mbind(ram_ptr + ram_offset, len, bind_mode,
>> +        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
>> +            perror("mbind");
>> +            return -1;
>> +    }
> 
>>From my quick read of this patch series, I think these two calls of
> numa_num_configured_nodes() are the only places that libnuma is used.
> Is it really worth the new dependency? Actually libnuma will only calculate
> what it returns from numa_num_configured_nodes() once, because it simply
> counts bits in a bitmask that it only initializes at library load time. So
> it would be more robust wrt to node onlining/offlining to avoid libnuma and
> to just fetch information from sysfs as needed anyway. In this particular
> code though, I think replacing numa_num_configured_nodes() with a maxnode,
> where
> 
> unsigned long maxnode = find_last_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS)

Sorry I can't understand this since numa_numa_configured_nodes() is for host,
but why could we find the last bit of guest setting to replace it?

Thanks,
Wanlong Gao

> 
> would work the best.
> 
> Another comment I have on this function is that I'd prefer to see something
> like
> 
> unsigned long *nodes = numa_info[nodeid].host_mem;
> 
> at the top, and then use that for a shorter name, rather than abusing
> the fact that i == nodeid after the loop, presumably just to keep the name
> short.
> 
> drew
> 
>> +#endif
>> +
>> +    return 0;
>> +}
>> +
>>  void set_numa_modes(void)
>>  {
>>      CPUState *cpu;
>> @@ -221,4 +303,11 @@ void set_numa_modes(void)
>>              }
>>          }
>>      }
>> +
>> +    for (i = 0; i < nb_numa_nodes; i++) {
>> +        if (set_node_mem_policy(i) == -1) {
>> +            fprintf(stderr,
>> +                    "qemu: can not set host memory policy for node%d\n", i);
>> +        }
>> +    }
>>  }
>> --
>> 1.8.4.rc1.21.gfb56570
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy
  2013-08-21  2:43     ` Wanlong Gao
@ 2013-08-21  7:15       ` Andrew Jones
  2013-08-21  7:23         ` Wanlong Gao
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Jones @ 2013-08-21  7:15 UTC (permalink / raw)
  To: gaowanlong
  Cc: aliguori, ehabkost, qemu-devel, hutao, peter huangpeng,
	lcapitulino, bsd, pbonzini, y-goto, lersek, afaerber



----- Original Message -----
> On 08/20/2013 09:41 PM, Andrew Jones wrote:
> >> +
> >> +    /* This is a workaround for a long standing bug in Linux'
> >> +     * mbind implementation, which cuts off the last specified
> >> +     * node. To stay compatible should this bug be fixed, we
> >> +     * specify one more node and zero this one out.
> >> +     */
> >> +    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
> >> +    if (mbind(ram_ptr + ram_offset, len, bind_mode,
> >> +        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
> >> +            perror("mbind");
> >> +            return -1;
> >> +    }
> > 
> >>From my quick read of this patch series, I think these two calls of
> > numa_num_configured_nodes() are the only places that libnuma is used.
> > Is it really worth the new dependency? Actually libnuma will only calculate
> > what it returns from numa_num_configured_nodes() once, because it simply
> > counts bits in a bitmask that it only initializes at library load time. So
> > it would be more robust wrt to node onlining/offlining to avoid libnuma and
> > to just fetch information from sysfs as needed anyway. In this particular
> > code though, I think replacing numa_num_configured_nodes() with a maxnode,
> > where
> > 
> > unsigned long maxnode = find_last_bit(numa_info[i].host_mem,
> > MAX_CPUMASK_BITS)
> 
> Sorry I can't understand this since numa_numa_configured_nodes() is for host,
> but why could we find the last bit of guest setting to replace it?
> 

You're not using numa_numa_configured_nodes() to index _the_ host's nodemask,
you're using it to find the highest possible bit set in _a_ nodemask,
numa_info[i].host_mem. mbind doesn't need its 'maxnode' param to be
the highest possible host node bit, but rather just the highest possible bit
set in the nodemask passed to it. find_last_bit will find that bit. You still
need to add 1 to it as you do with numa_numa_configured_nodes() though, due
to the kernel decrementing it by one erroneously as you've pointed out in your
comment.

drew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 07/11] NUMA: set guest numa nodes memory policy
  2013-08-21  7:15       ` Andrew Jones
@ 2013-08-21  7:23         ` Wanlong Gao
  0 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-21  7:23 UTC (permalink / raw)
  To: Andrew Jones
  Cc: aliguori, ehabkost, qemu-devel, hutao, peter huangpeng,
	lcapitulino, bsd, pbonzini, y-goto, lersek, afaerber, Wanlong Gao

On 08/21/2013 03:15 PM, Andrew Jones wrote:
> 
> 
> ----- Original Message -----
>> On 08/20/2013 09:41 PM, Andrew Jones wrote:
>>>> +
>>>> +    /* This is a workaround for a long standing bug in Linux'
>>>> +     * mbind implementation, which cuts off the last specified
>>>> +     * node. To stay compatible should this bug be fixed, we
>>>> +     * specify one more node and zero this one out.
>>>> +     */
>>>> +    clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem);
>>>> +    if (mbind(ram_ptr + ram_offset, len, bind_mode,
>>>> +        numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) {
>>>> +            perror("mbind");
>>>> +            return -1;
>>>> +    }
>>>
>>> >From my quick read of this patch series, I think these two calls of
>>> numa_num_configured_nodes() are the only places that libnuma is used.
>>> Is it really worth the new dependency? Actually libnuma will only calculate
>>> what it returns from numa_num_configured_nodes() once, because it simply
>>> counts bits in a bitmask that it only initializes at library load time. So
>>> it would be more robust wrt to node onlining/offlining to avoid libnuma and
>>> to just fetch information from sysfs as needed anyway. In this particular
>>> code though, I think replacing numa_num_configured_nodes() with a maxnode,
>>> where
>>>
>>> unsigned long maxnode = find_last_bit(numa_info[i].host_mem,
>>> MAX_CPUMASK_BITS)
>>
>> Sorry I can't understand this since numa_numa_configured_nodes() is for host,
>> but why could we find the last bit of guest setting to replace it?
>>
> 
> You're not using numa_numa_configured_nodes() to index _the_ host's nodemask,
> you're using it to find the highest possible bit set in _a_ nodemask,
> numa_info[i].host_mem. mbind doesn't need its 'maxnode' param to be
> the highest possible host node bit, but rather just the highest possible bit
> set in the nodemask passed to it. find_last_bit will find that bit. You still
> need to add 1 to it as you do with numa_numa_configured_nodes() though, due
> to the kernel decrementing it by one erroneously as you've pointed out in your
> comment.

Thank you very much for your explanation, I'll change as you said. ;)

Regards,
Wanlong Gao

> 
> drew
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-21  1:22   ` Wanlong Gao
@ 2013-08-21  9:00     ` Paolo Bonzini
  2013-08-21  9:08       ` Wanlong Gao
  0 siblings, 1 reply; 33+ messages in thread
From: Paolo Bonzini @ 2013-08-21  9:00 UTC (permalink / raw)
  To: gaowanlong
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, hutao,
	y-goto, peter.huangpeng, afaerber

Il 21/08/2013 03:22, Wanlong Gao ha scritto:
> On 08/20/2013 09:43 PM, Paolo Bonzini wrote:
>> Il 20/08/2013 03:07, Wanlong Gao ha scritto:
>>>  -numa node,nodeid=0,cpus=0, \
>>>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>>>  -numa node,nodeid=1,cpus=1 \
>>>  -numa mem,size=1024M,policy=interleave,host-nodes=1
>>
>> What nodes would the memory be in, for this command line?  Does it just
> 
> The original concept here is that if the nodeid is omitted, it will be
> set node by node from node0. Here I also keep the original concept, so the
> memory will be in node0 and node1.
> 
>> compute the total and split it evenly across the nodes (so that the
>> "-numa node" options could omit nodeid and cpus too)?
> 
> If no memory size is given for any nodes, the memory will split across
> all nodes like (ram_size / nb_numa_nodes). And yes nodeid and cpus options
> can also be omitted from the original concept.
> 
>>
>> Also, do you still need a "-m" option if you use "-numa mem"?
> 
> The "-m" options will be used to compute the memory size of each node
> if the memory size of each node is not set by "-numa mem" option. This
> is also be consistent with the original concept.

Ok.  You didn't answer my exact question though---could you run the
above command line without "-m 2048"?

(Of course with hotplug like in Igor's case you'll still need "-m
maxmem=4096,slots=4" or something like that)

Paolo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-21  9:00     ` Paolo Bonzini
@ 2013-08-21  9:08       ` Wanlong Gao
  2013-08-21  9:22         ` Paolo Bonzini
  0 siblings, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-21  9:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, hutao,
	y-goto, peter.huangpeng, afaerber, Wanlong Gao

On 08/21/2013 05:00 PM, Paolo Bonzini wrote:
> Il 21/08/2013 03:22, Wanlong Gao ha scritto:
>> On 08/20/2013 09:43 PM, Paolo Bonzini wrote:
>>> Il 20/08/2013 03:07, Wanlong Gao ha scritto:
>>>>  -numa node,nodeid=0,cpus=0, \
>>>>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>>>>  -numa node,nodeid=1,cpus=1 \
>>>>  -numa mem,size=1024M,policy=interleave,host-nodes=1
>>>
>>> What nodes would the memory be in, for this command line?  Does it just
>>
>> The original concept here is that if the nodeid is omitted, it will be
>> set node by node from node0. Here I also keep the original concept, so the
>> memory will be in node0 and node1.
>>
>>> compute the total and split it evenly across the nodes (so that the
>>> "-numa node" options could omit nodeid and cpus too)?
>>
>> If no memory size is given for any nodes, the memory will split across
>> all nodes like (ram_size / nb_numa_nodes). And yes nodeid and cpus options
>> can also be omitted from the original concept.
>>
>>>
>>> Also, do you still need a "-m" option if you use "-numa mem"?
>>
>> The "-m" options will be used to compute the memory size of each node
>> if the memory size of each node is not set by "-numa mem" option. This
>> is also be consistent with the original concept.
> 
> Ok.  You didn't answer my exact question though---could you run the
> above command line without "-m 2048"?

Sorry :(
Sure not. And if the memory size assigned by "-m" option is not equal
the total memory size assigned by "-numa mem,", the ACPI will write the
wrong table, then the guest kernel will detect this and ignore the wrong
ACPI numa table and assume that all of the memory are belong to one node.

Would you like to run without "-m" option in mind?

Thanks,
Wanlong Gao

> 
> (Of course with hotplug like in Igor's case you'll still need "-m
> maxmem=4096,slots=4" or something like that)
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-21  9:08       ` Wanlong Gao
@ 2013-08-21  9:22         ` Paolo Bonzini
  2013-08-21  9:34           ` Wanlong Gao
  0 siblings, 1 reply; 33+ messages in thread
From: Paolo Bonzini @ 2013-08-21  9:22 UTC (permalink / raw)
  To: gaowanlong
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, hutao,
	y-goto, peter.huangpeng, afaerber

Il 21/08/2013 11:08, Wanlong Gao ha scritto:
>>>> Also, do you still need a "-m" option if you use "-numa mem"?
>>>
>>> The "-m" options will be used to compute the memory size of each node
>>> if the memory size of each node is not set by "-numa mem" option. This
>>> is also be consistent with the original concept.
>>
>> Ok.  You didn't answer my exact question though---could you run the
>> above command line without "-m 2048"?
> 
> Sorry :(

No problem!

> Sure not. And if the memory size assigned by "-m" option is not equal
> the total memory size assigned by "-numa mem,", the ACPI will write the
> wrong table, then the guest kernel will detect this and ignore the wrong
> ACPI numa table and assume that all of the memory are belong to one node.

Can you make the code print a warning in this case, or fail to start the
VM (if there is "-numa mem" but the total doesn't match "-m"?

> Would you like to run without "-m" option in mind?

This would definitely be nice to have, if it's not too complicated.
Otherwise, the above idea will do.

Paolo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
                   ` (11 preceding siblings ...)
  2013-08-20 13:43 ` [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
@ 2013-08-21  9:33 ` Laszlo Ersek
  2013-08-21  9:37   ` Wanlong Gao
  12 siblings, 1 reply; 33+ messages in thread
From: Laszlo Ersek @ 2013-08-21  9:33 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: ehabkost, qemu-devel, hutao, peter.huangpeng, lcapitulino, bsd,
	Anthony Liguori, pbonzini, y-goto, afaerber

On 08/20/13 03:07, Wanlong Gao wrote:

> V7-V8:
>     rebase to current master with Laszlo's V2 of OptsVisitor patch set
>     fix an adding white space line error

My R-b's that you've kept from V7, for patches 01 and 02, stand.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-21  9:22         ` Paolo Bonzini
@ 2013-08-21  9:34           ` Wanlong Gao
  2013-08-21 10:00             ` Paolo Bonzini
  0 siblings, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-21  9:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, hutao,
	y-goto, peter.huangpeng, afaerber, Wanlong Gao

On 08/21/2013 05:22 PM, Paolo Bonzini wrote:
> Il 21/08/2013 11:08, Wanlong Gao ha scritto:
>>>>> Also, do you still need a "-m" option if you use "-numa mem"?
>>>>
>>>> The "-m" options will be used to compute the memory size of each node
>>>> if the memory size of each node is not set by "-numa mem" option. This
>>>> is also be consistent with the original concept.
>>>
>>> Ok.  You didn't answer my exact question though---could you run the
>>> above command line without "-m 2048"?
>>
>> Sorry :(
> 
> No problem!
> 
>> Sure not. And if the memory size assigned by "-m" option is not equal
>> the total memory size assigned by "-numa mem,", the ACPI will write the
>> wrong table, then the guest kernel will detect this and ignore the wrong
>> ACPI numa table and assume that all of the memory are belong to one node.
> 
> Can you make the code print a warning in this case, or fail to start the
> VM (if there is "-numa mem" but the total doesn't match "-m"?

You mean this?  http://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg03065.html
I sent a patch previously to do this, if you like I can merge to this patch set.

> 
>> Would you like to run without "-m" option in mind?
> 
> This would definitely be nice to have, if it's not too complicated.
> Otherwise, the above idea will do.

Yeah, but more complicated since we should make "-numa mem,size" and "-m"
options both optional.

Thanks,
Wanlong Gao

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-21  9:33 ` Laszlo Ersek
@ 2013-08-21  9:37   ` Wanlong Gao
  0 siblings, 0 replies; 33+ messages in thread
From: Wanlong Gao @ 2013-08-21  9:37 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: ehabkost, qemu-devel, hutao, peter.huangpeng, lcapitulino, bsd,
	Anthony Liguori, pbonzini, y-goto, afaerber, Wanlong Gao

On 08/21/2013 05:33 PM, Laszlo Ersek wrote:
> On 08/20/13 03:07, Wanlong Gao wrote:
> 
>> V7-V8:
>>     rebase to current master with Laszlo's V2 of OptsVisitor patch set
>>     fix an adding white space line error
> 
> My R-b's that you've kept from V7, for patches 01 and 02, stand.

Yeah, thank you. ;)

Regards,
Wanlong Gao

> 
> Thanks
> Laszlo
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes
  2013-08-21  9:34           ` Wanlong Gao
@ 2013-08-21 10:00             ` Paolo Bonzini
  0 siblings, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2013-08-21 10:00 UTC (permalink / raw)
  To: gaowanlong
  Cc: aliguori, ehabkost, lersek, qemu-devel, lcapitulino, bsd, hutao,
	y-goto, peter.huangpeng, afaerber

Il 21/08/2013 11:34, Wanlong Gao ha scritto:
> On 08/21/2013 05:22 PM, Paolo Bonzini wrote:
>> Il 21/08/2013 11:08, Wanlong Gao ha scritto:
>>>>>> Also, do you still need a "-m" option if you use "-numa mem"?
>>>>>
>>>>> The "-m" options will be used to compute the memory size of each node
>>>>> if the memory size of each node is not set by "-numa mem" option. This
>>>>> is also be consistent with the original concept.
>>>>
>>>> Ok.  You didn't answer my exact question though---could you run the
>>>> above command line without "-m 2048"?
>>>
>>> Sorry :(
>>
>> No problem!
>>
>>> Sure not. And if the memory size assigned by "-m" option is not equal
>>> the total memory size assigned by "-numa mem,", the ACPI will write the
>>> wrong table, then the guest kernel will detect this and ignore the wrong
>>> ACPI numa table and assume that all of the memory are belong to one node.
>>
>> Can you make the code print a warning in this case, or fail to start the
>> VM (if there is "-numa mem" but the total doesn't match "-m"?
> 
> You mean this?  http://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg03065.html
> I sent a patch previously to do this, if you like I can merge to this patch set.

Yes!

>>
>>> Would you like to run without "-m" option in mind?
>>
>> This would definitely be nice to have, if it's not too complicated.
>> Otherwise, the above idea will do.
> 
> Yeah, but more complicated since we should make "-numa mem,size" and "-m"
> options both optional.

Yes.

Paolo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
@ 2013-08-21 20:59   ` Eric Blake
  2013-08-22  1:12     ` Wanlong Gao
  0 siblings, 1 reply; 33+ messages in thread
From: Eric Blake @ 2013-08-21 20:59 UTC (permalink / raw)
  To: Wanlong Gao
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, y-goto,
	pbonzini, peter.huangpeng, lersek, afaerber

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

On 08/19/2013 07:07 PM, Wanlong Gao wrote:
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  qapi-schema.json | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 

> +##
> +# @NumaNodeOptions
> +#
> +# Create a guest NUMA node.
> +#
> +# @nodeid: #optional NUMA node ID
> +#
> +# @cpus: #optional VCPUs belong to this node
> +#
> +# @mem: #optional memory size of this node (remain as legacy)

What does (remain as legacy) mean, that I shouldn't use this parameter?
Is it something where the command line parsing code should be
translating the legacy option into the correct usage of the QMP command,
so we don't have to expose cruft?

> +#
> +# Since: 1.7
> +##
> +{ 'type': 'NumaNodeOptions',
> +  'data': {
> +   '*nodeid': 'uint16',
> +   '*cpus':   ['uint16'],
> +   '*mem':    'str' }}

Why is size passed as a 'str' instead of an integral type?  If anything,
at the QMP layer, it should be an integer representing size in bytes
(the command line and HMP are already capable of converting shorthand
like 1G into proper byte counts for use in QAPI).

> +
> +##
> +# @NumaMemOptions
> +#
> +# Set memory information of guest NUMA node.
> +#
> +# @nodeid: #optional NUMA node ID
> +#
> +# @size: #optional memory size of this node

If everything is optional, then what defaults are used if I specify
nothing?  Should nodeid be mandatory (here, and in NumaNodeOptions)?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-21 20:59   ` Eric Blake
@ 2013-08-22  1:12     ` Wanlong Gao
  2013-08-22  2:29       ` Eric Blake
  0 siblings, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-22  1:12 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, pbonzini,
	y-goto, peter.huangpeng, lersek, afaerber, Wanlong Gao

On 08/22/2013 04:59 AM, Eric Blake wrote:
> On 08/19/2013 07:07 PM, Wanlong Gao wrote:
>> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>> ---
>>  qapi-schema.json | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
> 
>> +##
>> +# @NumaNodeOptions
>> +#
>> +# Create a guest NUMA node.
>> +#
>> +# @nodeid: #optional NUMA node ID
>> +#
>> +# @cpus: #optional VCPUs belong to this node
>> +#
>> +# @mem: #optional memory size of this node (remain as legacy)
> 
> What does (remain as legacy) mean, that I shouldn't use this parameter?
> Is it something where the command line parsing code should be
> translating the legacy option into the correct usage of the QMP command,
> so we don't have to expose cruft?

OK, will remove this.

> 
>> +#
>> +# Since: 1.7
>> +##
>> +{ 'type': 'NumaNodeOptions',
>> +  'data': {
>> +   '*nodeid': 'uint16',
>> +   '*cpus':   ['uint16'],
>> +   '*mem':    'str' }}
> 
> Why is size passed as a 'str' instead of an integral type?  If anything,
> at the QMP layer, it should be an integer representing size in bytes
> (the command line and HMP are already capable of converting shorthand
> like 1G into proper byte counts for use in QAPI).

Since the original "mem" options is MB default, but "size" type is byte default,
so we should pass a "str" first to be consistent with original option.

> 
>> +
>> +##
>> +# @NumaMemOptions
>> +#
>> +# Set memory information of guest NUMA node.
>> +#
>> +# @nodeid: #optional NUMA node ID
>> +#
>> +# @size: #optional memory size of this node
> 
> If everything is optional, then what defaults are used if I specify
> nothing?  Should nodeid be mandatory (here, and in NumaNodeOptions)?

The defaults are all consistent with original behaviour. If nodeid is
omitted, the option will be assigned node by node from node0.

Thanks,
Wanlong Gao


> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-22  1:12     ` Wanlong Gao
@ 2013-08-22  2:29       ` Eric Blake
  2013-08-22  3:16         ` Wanlong Gao
  2013-08-22 19:21         ` Paolo Bonzini
  0 siblings, 2 replies; 33+ messages in thread
From: Eric Blake @ 2013-08-22  2:29 UTC (permalink / raw)
  To: gaowanlong
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, pbonzini,
	y-goto, peter.huangpeng, lersek, afaerber

[-- Attachment #1: Type: text/plain, Size: 1521 bytes --]

On 08/21/2013 07:12 PM, Wanlong Gao wrote:

>>> +   '*mem':    'str' }}
>>
>> Why is size passed as a 'str' instead of an integral type?  If anything,
>> at the QMP layer, it should be an integer representing size in bytes
>> (the command line and HMP are already capable of converting shorthand
>> like 1G into proper byte counts for use in QAPI).
> 
> Since the original "mem" options is MB default, but "size" type is byte default,
> so we should pass a "str" first to be consistent with original option.

No. HMP is human-friendly - it can default to M.  QMP is
machine-friendly - it should default to bytes and take an 'int' rather
than a 'str'.  Part of the glue between HMP and QMP is converting from
human-friendly to machine-friendly, so that QMP doesn't have to carry cruft.


>>> +#
>>> +# @nodeid: #optional NUMA node ID
>>> +#
>>> +# @size: #optional memory size of this node
>>
>> If everything is optional, then what defaults are used if I specify
>> nothing?  Should nodeid be mandatory (here, and in NumaNodeOptions)?
> 
> The defaults are all consistent with original behaviour. If nodeid is
> omitted, the option will be assigned node by node from node0.

What will be assigned?  If I omit both nodeid and size, there's nothing
left in the object I'm passing.  Just because HMP can do sane defaults
doesn't mean that QMP needs to mark all fields as optional.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-22  2:29       ` Eric Blake
@ 2013-08-22  3:16         ` Wanlong Gao
  2013-08-22  8:46           ` Laszlo Ersek
  2013-08-22 19:21         ` Paolo Bonzini
  1 sibling, 1 reply; 33+ messages in thread
From: Wanlong Gao @ 2013-08-22  3:16 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, pbonzini,
	y-goto, peter.huangpeng, lersek, afaerber, Wanlong Gao

On 08/22/2013 10:29 AM, Eric Blake wrote:
> On 08/21/2013 07:12 PM, Wanlong Gao wrote:
> 
>>>> +   '*mem':    'str' }}
>>>
>>> Why is size passed as a 'str' instead of an integral type?  If anything,
>>> at the QMP layer, it should be an integer representing size in bytes
>>> (the command line and HMP are already capable of converting shorthand
>>> like 1G into proper byte counts for use in QAPI).
>>
>> Since the original "mem" options is MB default, but "size" type is byte default,
>> so we should pass a "str" first to be consistent with original option.
> 
> No. HMP is human-friendly - it can default to M.  QMP is
> machine-friendly - it should default to bytes and take an 'int' rather
> than a 'str'.  Part of the glue between HMP and QMP is converting from
> human-friendly to machine-friendly, so that QMP doesn't have to carry cruft.

This "mem" options is only for command line options, I can't understand you
are saying QMP command here. Because the original "mem" option treat "1024"
as "1024MB", but if I set this to "size" type, this "mem" options will
treat "1024" as "124B". So I should pass a str first and make it to "MB"
default in the options parse function to be consistent with original one.

> 
> 
>>>> +#
>>>> +# @nodeid: #optional NUMA node ID
>>>> +#
>>>> +# @size: #optional memory size of this node
>>>
>>> If everything is optional, then what defaults are used if I specify
>>> nothing?  Should nodeid be mandatory (here, and in NumaNodeOptions)?
>>
>> The defaults are all consistent with original behaviour. If nodeid is
>> omitted, the option will be assigned node by node from node0.
> 
> What will be assigned?  If I omit both nodeid and size, there's nothing

If the "-m" option assigned the total memory size is 2G, then if you omit
both nodeid and memory size in the options, for example two "-numa mem,"
options here, it will split total memory across these two node to :
node0 1G
node1 1G

This is the original behaviour and I didn't change any.

Thanks,
Wanlong Gao

> left in the object I'm passing.  Just because HMP can do sane defaults
> doesn't mean that QMP needs to mark all fields as optional.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-22  3:16         ` Wanlong Gao
@ 2013-08-22  8:46           ` Laszlo Ersek
  2013-08-22 16:14             ` Eric Blake
  0 siblings, 1 reply; 33+ messages in thread
From: Laszlo Ersek @ 2013-08-22  8:46 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, pbonzini,
	y-goto, peter.huangpeng, afaerber, gaowanlong

On 08/22/13 05:16, Wanlong Gao wrote:
> On 08/22/2013 10:29 AM, Eric Blake wrote:
>> On 08/21/2013 07:12 PM, Wanlong Gao wrote:
>>
>>>>> +   '*mem':    'str' }}
>>>>
>>>> Why is size passed as a 'str' instead of an integral type?  If anything,
>>>> at the QMP layer, it should be an integer representing size in bytes
>>>> (the command line and HMP are already capable of converting shorthand
>>>> like 1G into proper byte counts for use in QAPI).
>>>
>>> Since the original "mem" options is MB default, but "size" type is byte default,
>>> so we should pass a "str" first to be consistent with original option.
>>
>> No. HMP is human-friendly - it can default to M.  QMP is
>> machine-friendly - it should default to bytes and take an 'int' rather
>> than a 'str'.  Part of the glue between HMP and QMP is converting from
>> human-friendly to machine-friendly, so that QMP doesn't have to carry cruft.
> 
> This "mem" options is only for command line options, I can't understand you
> are saying QMP command here. Because the original "mem" option treat "1024"
> as "1024MB", but if I set this to "size" type, this "mem" options will
> treat "1024" as "124B". So I should pass a str first and make it to "MB"
> default in the options parse function to be consistent with original one.

Yes. This part of the schema is not for exposure over QMP, it just
generates stuff for OptsVisitor, and it must remain compatible with the
original, manual parsing of the option.

This came up for V6:

http://thread.gmane.org/gmane.comp.emulators.qemu/225678/focus=225714

Laszlo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-22  8:46           ` Laszlo Ersek
@ 2013-08-22 16:14             ` Eric Blake
  2013-08-22 16:36               ` Laszlo Ersek
  0 siblings, 1 reply; 33+ messages in thread
From: Eric Blake @ 2013-08-22 16:14 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, pbonzini,
	y-goto, peter.huangpeng, afaerber, gaowanlong

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On 08/22/2013 02:46 AM, Laszlo Ersek wrote:
> Yes. This part of the schema is not for exposure over QMP, it just
> generates stuff for OptsVisitor, and it must remain compatible with the
> original, manual parsing of the option.
> 
> This came up for V6:
> 
> http://thread.gmane.org/gmane.comp.emulators.qemu/225678/focus=225714

My fault for coming into the conversation late, but a note to that
effect in the commit log, and/or in the description of why this type is
listed in the qapi document, would be handy.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-22 16:14             ` Eric Blake
@ 2013-08-22 16:36               ` Laszlo Ersek
  0 siblings, 0 replies; 33+ messages in thread
From: Laszlo Ersek @ 2013-08-22 16:36 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, pbonzini,
	y-goto, peter.huangpeng, afaerber, gaowanlong

On 08/22/13 18:14, Eric Blake wrote:
> On 08/22/2013 02:46 AM, Laszlo Ersek wrote:
>> Yes. This part of the schema is not for exposure over QMP, it just
>> generates stuff for OptsVisitor, and it must remain compatible with the
>> original, manual parsing of the option.
>>
>> This came up for V6:
>>
>> http://thread.gmane.org/gmane.comp.emulators.qemu/225678/focus=225714
> 
> My fault for coming into the conversation late, but a note to that
> effect in the commit log, and/or in the description of why this type is
> listed in the qapi document, would be handy.

I agree. We should probably tack a banner to each such structure in the
qapi schema json.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions
  2013-08-22  2:29       ` Eric Blake
  2013-08-22  3:16         ` Wanlong Gao
@ 2013-08-22 19:21         ` Paolo Bonzini
  1 sibling, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2013-08-22 19:21 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, ehabkost, hutao, qemu-devel, lcapitulino, bsd, y-goto,
	peter.huangpeng, lersek, afaerber, gaowanlong

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 22/08/2013 04:29, Eric Blake ha scritto:
>>>>> If everything is optional, then what defaults are used if
>>>>> I specify nothing?  Should nodeid be mandatory (here, and
>>>>> in NumaNodeOptions)?
>>> 
>>> The defaults are all consistent with original behaviour. If 
>>> nodeid is omitted, the option will be assigned node by node 
>>> from node0.
> What will be assigned?  If I omit both nodeid and size, there's 
> nothing left in the object I'm passing.  Just because HMP can do 
> sane defaults doesn't mean that QMP needs to mark all fields as 
> optional.

Even though this is for the command-line, I agree that both should be
mandatory.  The default (splitting memory equally across nodes) can be
specified by not having "-numa mem" at all, so there is no need to
have short versions of the "-numa mem" command line.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSFmTSAAoJEBvWZb6bTYbyzUsP/3BhPGBHfzNsFHdwPL6aiBW4
67TJJ/X9GB4wjav6yjtKtBGuDNOGmEO8BkyJZVNUzw051t74uPpYzDIgd7QvrsU/
nMFRFX6q3vRUxX7k9L5RP3OkJbcAgaq8YQ135XNh59bOEd/VOeTrrEtgqyVPo45h
6tt0JSlGDPF9qc+jGoqgQN8522nvV8mcF/XhJdQ7DqttpAP75ECxViKFKezXwrsh
H3dX6v0SbWBMAlihldxEMte8hlKNB/Asb3CDQz5L2ySaPGHcwOI06wSVKGD9slbv
4SSiGg2XMdPx0VKh0ozlKmXE46+hlDJkxi7JLEuscSmZL9gZwFpenz2+29OBk2MM
g8fwxKkm36jJE7cnn/z3TPiTVZ7sJm8LvtRW2183CC5I9LpTIUrCZxxP/06ZOREq
2fH41A3PyxDOvhOwp8B3XDAnOkmklSr86hiZR+s3I+59uPa9KzOY2WFeG/6w90uq
6EmZNbmkLrZYfns7wk24YRhH9DAgTFP9GmceaJu0lEbk41INeZggswugwPvrFNI3
bPsXacwK5Y4T4ZkRs4vyaw4Dn2B732D2CMvmPJ+EpKeVIc+8tKMXNtuFKybiraQi
zrlHk4zGjeUZwW8+cx8xkHe3qyKUwbYFuRvDqSKXYRrQMzzJNMf/rNGb4JcDHiH6
BmVejHjaYKPJKA+LF3hB
=Oqhm
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2013-08-22 19:22 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-20  1:07 [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
2013-08-21 20:59   ` Eric Blake
2013-08-22  1:12     ` Wanlong Gao
2013-08-22  2:29       ` Eric Blake
2013-08-22  3:16         ` Wanlong Gao
2013-08-22  8:46           ` Laszlo Ersek
2013-08-22 16:14             ` Eric Blake
2013-08-22 16:36               ` Laszlo Ersek
2013-08-22 19:21         ` Paolo Bonzini
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 02/11] NUMA: split -numa option Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 03/11] NUMA: move numa related code to numa.c Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 04/11] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 05/11] NUMA: Add Linux libnuma detection Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 06/11] NUMA: parse guest numa nodes memory policy Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 07/11] NUMA: set " Wanlong Gao
2013-08-20 13:41   ` Andrew Jones
2013-08-21  2:43     ` Wanlong Gao
2013-08-21  7:15       ` Andrew Jones
2013-08-21  7:23         ` Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 08/11] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 09/11] NUMA: add hmp command set-mem-policy Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 10/11] NUMA: add qmp command query-numa Wanlong Gao
2013-08-20  1:07 ` [Qemu-devel] [PATCH V8 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
2013-08-20 13:43 ` [Qemu-devel] [PATCH V8 00/11] Add support for binding guest numa nodes to host numa nodes Paolo Bonzini
2013-08-21  1:22   ` Wanlong Gao
2013-08-21  9:00     ` Paolo Bonzini
2013-08-21  9:08       ` Wanlong Gao
2013-08-21  9:22         ` Paolo Bonzini
2013-08-21  9:34           ` Wanlong Gao
2013-08-21 10:00             ` Paolo Bonzini
2013-08-21  9:33 ` Laszlo Ersek
2013-08-21  9:37   ` Wanlong Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).