From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45864) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYdww-0001tD-FV for qemu-devel@nongnu.org; Thu, 19 Mar 2015 13:09:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYdwp-0002VY-3Q for qemu-devel@nongnu.org; Thu, 19 Mar 2015 13:09:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47732) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYdwo-0002VG-Ss for qemu-devel@nongnu.org; Thu, 19 Mar 2015 13:09:31 -0400 From: Igor Mammedov Date: Thu, 19 Mar 2015 17:09:21 +0000 Message-Id: <1426784962-7541-2-git-send-email-imammedo@redhat.com> In-Reply-To: <1426784962-7541-1-git-send-email-imammedo@redhat.com> References: <1426784962-7541-1-git-send-email-imammedo@redhat.com> Subject: [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU to node mapping List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: ehabkost@redhat.com, afaerber@suse.de Current default round-robin way of distributing VCPUs among NUMA nodes might be wrong in case on multi-core/threads CPUs. Making guests confused wrt topology where cores from the same socket are on different nodes. Allow a machine to override default mapping by providing MachineClass->cpu_index_to_socket_id() callback which would allow it group VCPUs from a socket on the same NUMA node. Signed-off-by: Igor Mammedov --- v3: - split out numa/machine change into a separate patch --- include/hw/boards.h | 5 +++++ include/sysemu/numa.h | 3 ++- numa.c | 18 +++++++++++++----- vl.c | 2 +- 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index 1feea2b..78838d1 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -82,6 +82,10 @@ bool machine_mem_merge(MachineState *machine); * of HotplugHandler object, which handles hotplug operation * for a given @dev. It may return NULL if @dev doesn't require * any actions to be performed by hotplug handler. + * @cpu_index_to_socket_id: + * used to provide @cpu_index to socket number mapping, allowing + * a machine to group CPU threads belonging to the same socket/package + * Returns: socket number given cpu_index belongs to. */ struct MachineClass { /*< private >*/ @@ -118,6 +122,7 @@ struct MachineClass { HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); + unsigned (*cpu_index_to_socket_id)(unsigned cpu_index); }; /** diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 5633b85..6523b4d 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -6,6 +6,7 @@ #include "qemu/option.h" #include "sysemu/sysemu.h" #include "sysemu/hostmem.h" +#include "hw/boards.h" extern int nb_numa_nodes; /* Number of NUMA nodes */ @@ -16,7 +17,7 @@ typedef struct node_info { bool present; } NodeInfo; extern NodeInfo numa_info[MAX_NODES]; -void parse_numa_opts(void); +void parse_numa_opts(MachineClass *mc); void numa_post_machine_init(void); void query_numa_node_mem(uint64_t node_mem[]); extern QemuOptsList qemu_numa_opts; diff --git a/numa.c b/numa.c index ffbec68..f1f571a 100644 --- a/numa.c +++ b/numa.c @@ -165,7 +165,7 @@ error: return -1; } -void parse_numa_opts(void) +void parse_numa_opts(MachineClass *mc) { int i; @@ -233,13 +233,21 @@ void parse_numa_opts(void) break; } } - /* assigning the VCPUs round-robin is easier to implement, guest OSes - * must cope with this anyway, because there are BIOSes out there in - * real machines which also use this scheme. + /* Historically VCPUs were assigned in round-robin order to NUMA + * nodes. However it causes issues with guest not handling it nice + * in case where cores/threads from a multicore CPU appear on + * different nodes. So allow boards to override default distribution + * rule grouping VCPUs by socket so that VCPUs from the same socket + * would be on the same node. */ if (i == nb_numa_nodes) { for (i = 0; i < max_cpus; i++) { - set_bit(i, numa_info[i % nb_numa_nodes].node_cpu); + unsigned node_id = i % nb_numa_nodes; + if (mc->cpu_index_to_socket_id) { + node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes; + } + + set_bit(i, numa_info[node_id].node_cpu); } } } diff --git a/vl.c b/vl.c index 69617d6..75ec292 100644 --- a/vl.c +++ b/vl.c @@ -4170,7 +4170,7 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); - parse_numa_opts(); + parse_numa_opts(machine_class); if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) { exit(1); -- 1.8.3.1