[Qemu-devel] [PULL 4/6] numa: introduce machine callback for VCPU to node mapping

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eduardo Habkost <ehabkost@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Andreas Färber" <afaerber@suse.de>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: [Qemu-devel] [PULL 4/6] numa: introduce machine callback for VCPU to node mapping
Date: Thu, 19 Mar 2015 16:26:12 -0300	[thread overview]
Message-ID: <1426793174-19012-5-git-send-email-ehabkost@redhat.com> (raw)
In-Reply-To: <1426793174-19012-1-git-send-email-ehabkost@redhat.com>

From: Igor Mammedov <imammedo@redhat.com>

Current default round-robin way of distributing VCPUs among
NUMA nodes might be wrong in case on multi-core/threads
CPUs. Making guests confused wrt topology where cores from
the same socket are on different nodes.

Allow a machine to override default mapping by providing
 MachineClass::cpu_index_to_socket_id()
callback which would allow it group VCPUs from a socket
on the same NUMA node.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
 include/hw/boards.h   |  5 +++++
 include/sysemu/numa.h |  3 ++-
 numa.c                | 18 +++++++++++++-----
 vl.c                  |  2 +-
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 1feea2b..78838d1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -82,6 +82,10 @@ bool machine_mem_merge(MachineState *machine);
  *    of HotplugHandler object, which handles hotplug operation
  *    for a given @dev. It may return NULL if @dev doesn't require
  *    any actions to be performed by hotplug handler.
+ * @cpu_index_to_socket_id:
+ *    used to provide @cpu_index to socket number mapping, allowing
+ *    a machine to group CPU threads belonging to the same socket/package
+ *    Returns: socket number given cpu_index belongs to.
  */
 struct MachineClass {
     /*< private >*/
@@ -118,6 +122,7 @@ struct MachineClass {
 
     HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
                                            DeviceState *dev);
+    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
 };
 
 /**
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5633b85..6523b4d 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -6,6 +6,7 @@
 #include "qemu/option.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/hostmem.h"
+#include "hw/boards.h"
 
 extern int nb_numa_nodes;   /* Number of NUMA nodes */
 
@@ -16,7 +17,7 @@ typedef struct node_info {
     bool present;
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(void);
+void parse_numa_opts(MachineClass *mc);
 void numa_post_machine_init(void);
 void query_numa_node_mem(uint64_t node_mem[]);
 extern QemuOptsList qemu_numa_opts;
diff --git a/numa.c b/numa.c
index 518aedd..fe74e1e 100644
--- a/numa.c
+++ b/numa.c
@@ -202,7 +202,7 @@ static void validate_numa_cpus(void)
     }
 }
 
-void parse_numa_opts(void)
+void parse_numa_opts(MachineClass *mc)
 {
     int i;
 
@@ -270,13 +270,21 @@ void parse_numa_opts(void)
                 break;
             }
         }
-        /* assigning the VCPUs round-robin is easier to implement, guest OSes
-         * must cope with this anyway, because there are BIOSes out there in
-         * real machines which also use this scheme.
+        /* Historically VCPUs were assigned in round-robin order to NUMA
+         * nodes. However it causes issues with guest not handling it nice
+         * in case where cores/threads from a multicore CPU appear on
+         * different nodes. So allow boards to override default distribution
+         * rule grouping VCPUs by socket so that VCPUs from the same socket
+         * would be on the same node.
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
+                unsigned node_id = i % nb_numa_nodes;
+                if (mc->cpu_index_to_socket_id) {
+                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
+                }
+
+                set_bit(i, numa_info[node_id].node_cpu);
             }
         }
 
diff --git a/vl.c b/vl.c
index 69617d6..75ec292 100644
--- a/vl.c
+++ b/vl.c
@@ -4170,7 +4170,7 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
     default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
-    parse_numa_opts();
+    parse_numa_opts(machine_class);
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
         exit(1);
-- 
2.1.0

next prev parent reply	other threads:[~2015-03-19 19:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-19 19:26 [Qemu-devel] [PULL 0/6] NUMA queue 2015-03-19 Eduardo Habkost
2015-03-19 19:26 ` [Qemu-devel] [PULL 1/6] numa: Fix off-by-one error at MAX_CPUMASK_BITS check Eduardo Habkost
2015-03-19 19:26 ` [Qemu-devel] [PULL 2/6] numa: Reject CPU indexes > max_cpus Eduardo Habkost
2015-03-19 19:26 ` [Qemu-devel] [PULL 3/6] numa: Reject configuration if CPU appears on multiple nodes Eduardo Habkost
2015-03-19 19:26 ` Eduardo Habkost [this message]
2015-03-19 19:26 ` [Qemu-devel] [PULL 5/6] pc: fix default VCPU to NUMA node mapping Eduardo Habkost
2015-03-19 19:26 ` [Qemu-devel] [PULL 6/6] numa: Print warning if no node is assigned to a CPU Eduardo Habkost
2015-03-20 12:25 ` [Qemu-devel] [PULL 0/6] NUMA queue 2015-03-19 Peter Maydell

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1feea2b dfblob:78838d1 dfblob:5633b85 dfblob:6523b4d
dfblob:518aedd dfblob:fe74e1e dfblob:69617d6 dfblob:75ec292 )
 OR (
bs:"[Qemu-devel] [PULL 4/6] numa: introduce machine callback for VCPU to node mapping" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1426793174-19012-5-git-send-email-ehabkost@redhat.com \
    --to=ehabkost@redhat.com \
    --cc=afaerber@suse.de \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).