From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44692) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sgjus-0006jr-11 for qemu-devel@nongnu.org; Mon, 18 Jun 2012 17:55:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sgjup-0004yT-CM for qemu-devel@nongnu.org; Mon, 18 Jun 2012 17:55:21 -0400 Received: from g6t0185.atlanta.hp.com ([15.193.32.62]:1278) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sgjup-0004xS-5f for qemu-devel@nongnu.org; Mon, 18 Jun 2012 17:55:19 -0400 Message-ID: <4FDFA3C2.9030602@hp.com> Date: Mon, 18 Jun 2012 14:55:14 -0700 From: Chegu Vinod MIME-Version: 1.0 References: <1339963951-7798-1-git-send-email-chegu_vinod@hp.com> <20120618202956.GA32417@otherpad.lan.raisama.net> In-Reply-To: <20120618202956.GA32417@otherpad.lan.raisama.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] Fixes related to processing of qemu's -numa option List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org On 6/18/2012 1:29 PM, Eduardo Habkost wrote: > On Sun, Jun 17, 2012 at 01:12:31PM -0700, Chegu Vinod wrote: >> The -numa option to qemu is used to create [fake] numa nodes >> and expose them to the guest OS instance. >> >> There are a couple of issues with the -numa option: >> >> a) Max VCPU's that can be specified for a guest while using >> the qemu's -numa option is 64. Due to a typecasting issue >> when the number of VCPUs is> 32 the VCPUs don't show up >> under the specified [fake] numa nodes. >> >> b) KVM currently has support for 160VCPUs per guest. The >> qemu's -numa option has only support for upto 64VCPUs >> per guest. >> >> This patch addresses these two issues. [ Note: This >> patch has been verified by Eduardo Habkost ]. > I was going to add a Tested-by line, but this patch breaks the automatic > round-robin assignment when no CPU is added to any node on the > command-line. Pl. see below. > > Additional questions below: > > [...] >> diff --git a/cpus.c b/cpus.c >> index b182b3d..f9cee60 100644 >> --- a/cpus.c >> +++ b/cpus.c >> @@ -1145,7 +1145,7 @@ void set_numa_modes(void) >> >> for (env = first_cpu; env != NULL; env = env->next_cpu) { >> for (i = 0; i< nb_numa_nodes; i++) { >> - if (node_cpumask[i]& (1<< env->cpu_index)) { >> + if (CPU_ISSET_S(env->cpu_index, cpumask_size, node_cpumask[i])) { >> env->numa_node = i; >> } >> } >> diff --git a/hw/pc.c b/hw/pc.c >> index 8368701..f0c3665 100644 >> --- a/hw/pc.c >> +++ b/hw/pc.c >> @@ -639,7 +639,7 @@ static void *bochs_bios_init(void) >> numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes); >> for (i = 0; i< max_cpus; i++) { >> for (j = 0; j< nb_numa_nodes; j++) { >> - if (node_cpumask[j]& (1<< i)) { >> + if (CPU_ISSET_S(i, cpumask_size, node_cpumask[j])) { >> numa_fw_cfg[i + 1] = cpu_to_le64(j); >> break; >> } >> diff --git a/sysemu.h b/sysemu.h >> index bc2c788..6e4d342 100644 >> --- a/sysemu.h >> +++ b/sysemu.h >> @@ -9,6 +9,7 @@ >> #include "qapi-types.h" >> #include "notify.h" >> #include "main-loop.h" >> +#include >> >> /* vl.c */ >> >> @@ -133,9 +134,11 @@ extern uint8_t qemu_extra_params_fw[2]; >> extern QEMUClock *rtc_clock; >> >> #define MAX_NODES 64 >> +#define KVM_MAX_VCPUS 254 > Do we really need this constant? Why not just use max_cpus when > allocating the CPU sets, instead? Hmm.... I had thought about this earlier too max_cpus was not getting set at the point where the allocations were being done. I have now moved that code to a bit later and switched to using max_cpus. > > >> extern int nb_numa_nodes; >> extern uint64_t node_mem[MAX_NODES]; >> -extern uint64_t node_cpumask[MAX_NODES]; >> +extern cpu_set_t *node_cpumask[MAX_NODES]; >> +extern size_t cpumask_size; >> >> #define MAX_OPTION_ROMS 16 >> typedef struct QEMUOptionRom { >> diff --git a/vl.c b/vl.c >> index 204d85b..1906412 100644 >> --- a/vl.c >> +++ b/vl.c >> @@ -28,6 +28,7 @@ >> #include >> #include >> #include >> +#include >> >> /* Needed early for CONFIG_BSD etc. */ >> #include "config-host.h" >> @@ -240,7 +241,8 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order >> >> int nb_numa_nodes; >> uint64_t node_mem[MAX_NODES]; >> -uint64_t node_cpumask[MAX_NODES]; >> +cpu_set_t *node_cpumask[MAX_NODES]; >> +size_t cpumask_size; >> >> uint8_t qemu_uuid[16]; >> >> @@ -950,6 +952,9 @@ static void numa_add(const char *optarg) >> char *endptr; >> unsigned long long value, endvalue; >> int nodenr; >> + int i; >> + >> + value = endvalue = 0; >> >> optarg = get_opt_name(option, 128, optarg, ',') + 1; >> if (!strcmp(option, "node")) { >> @@ -970,27 +975,17 @@ static void numa_add(const char *optarg) >> } >> node_mem[nodenr] = sval; >> } >> - if (get_param_value(option, 128, "cpus", optarg) == 0) { >> - node_cpumask[nodenr] = 0; >> - } else { >> + if (get_param_value(option, 128, "cpus", optarg) != 0) { >> value = strtoull(option,&endptr, 10); >> - if (value>= 64) { >> - value = 63; >> - fprintf(stderr, "only 64 CPUs in NUMA mode supported.\n"); >> + if (*endptr == '-') { >> + endvalue = strtoull(endptr+1,&endptr, 10); >> } else { >> - if (*endptr == '-') { >> - endvalue = strtoull(endptr+1,&endptr, 10); >> - if (endvalue>= 63) { >> - endvalue = 62; >> - fprintf(stderr, >> - "only 63 CPUs in NUMA mode supported.\n"); >> - } >> - value = (2ULL<< endvalue) - (1ULL<< value); >> - } else { >> - value = 1ULL<< value; >> - } >> + endvalue = value; >> + } >> + >> + for (i = value; i<= endvalue; ++i) { >> + CPU_SET_S(i, cpumask_size, node_cpumask[nodenr]); > What if endvalue is larger than the cpu mask size we allocated? I can add a check (endvalue >= max_cpus) and print an error. Should we force set the endvalue to max_cpus-1 in that case ? > > >> } >> - node_cpumask[nodenr] = value; >> } >> nb_numa_nodes++; >> } >> @@ -2331,7 +2326,9 @@ int main(int argc, char **argv, char **envp) >> >> for (i = 0; i< MAX_NODES; i++) { >> node_mem[i] = 0; >> - node_cpumask[i] = 0; >> + node_cpumask[i] = CPU_ALLOC(KVM_MAX_VCPUS); >> + cpumask_size = CPU_ALLOC_SIZE(KVM_MAX_VCPUS); >> + CPU_ZERO_S(cpumask_size, node_cpumask[i]); >> } >> >> nb_numa_nodes = 0; >> @@ -3465,8 +3462,9 @@ int main(int argc, char **argv, char **envp) >> } >> >> for (i = 0; i< nb_numa_nodes; i++) { >> - if (node_cpumask[i] != 0) >> + if (node_cpumask[i] != NULL) { > This will be true for every node (as you preallocate all the CPU > sets in the beginning), so the loop will always end with i==0, in turn > unconditionally disabling the round-robin CPU assignment code below. > You can use CPU_COUNT_S, instead. Argh... I should have tested this. My bad ! Thanks for catching this. I made the change and tested it and its doing the round-robin assignment now. Will post the next version of the patch soon. Thanks for your feedback. Vinod > >> break; >> + } >> } >> /* assigning the VCPUs round-robin is easier to implement, guest OSes >> * must cope with this anyway, because there are BIOSes out there in >> @@ -3474,7 +3472,7 @@ int main(int argc, char **argv, char **envp) >> */ >> if (i == nb_numa_nodes) { >> for (i = 0; i< max_cpus; i++) { >> - node_cpumask[i % nb_numa_nodes] |= 1<< i; >> + CPU_SET_S(i, cpumask_size, node_cpumask[i % nb_numa_nodes]); >> } >> } >> } >> -- >> 1.7.1 >>