From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37836) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zo1F5-0008I1-61 for qemu-devel@nongnu.org; Sun, 18 Oct 2015 23:36:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zo1F0-00069x-5y for qemu-devel@nongnu.org; Sun, 18 Oct 2015 23:36:11 -0400 Received: from e23smtp05.au.ibm.com ([202.81.31.147]:50117) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zo1Ez-00069Y-KW for qemu-devel@nongnu.org; Sun, 18 Oct 2015 23:36:06 -0400 Received: from /spool/local by e23smtp05.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 19 Oct 2015 13:36:00 +1000 Date: Mon, 19 Oct 2015 14:34:47 +1100 From: Sam Bobroff Message-ID: <20151019033447.GA5977@tungsten.ozlabs.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: [Qemu-devel] PPC VCPU ID packing via KVM_CAP_PPC_SMT List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-ppc@nongnu.org Cc: qemu-devel@nongnu.org Hi everyone, It's currently possible to configure QEMU and KVM such that (on a Power 7 or 8 host) users are unable to create as many VCPUs as they might reasonably expect. I'll outline one fairly straight forward solution (below) and I would welcome feedback: Does this seem a reasonable approach? Are there alternatives? The issue: The behaviour is caused by three things: * QEMU limits the total number (count) of VCPUs based on the machine type (hard coded to 256 for pseries). * See hw/ppc/spapr.c spapr_machine_class_init() * KVM limits the highest VCPU ID to CONFIG_NR_CPUS (2048 for pseries_defconfig). * See arch/powerpc/configs/pseries_defconfig * and arch/powerpc/include/asm/kvm_host.h * If the host SMT mode is higher than the guest SMT mode when creating VCPUs, QEMU must "pad out" the VCPU IDs to align the VCPUs with physical cores (KVM doesn't know which SMT mode the guest wants). * See target-ppc/translate_init.c ppc_cpu_realizefn(). In the most pathological case the guest is SMT 1 (smp_threads = 1) and the host SMT 8 (max_smt = 8), which causes the VCPU IDs to be spaced 8 apart (e.g. 0, 8, 24, ...). This doesn't produce any strange behaviour with default limits, but consider the case where CONFIG_NR_CPUs is set to 1024 (with the same SMT modes as above): as the 128th VCPU is created, it's VCPU ID will be 128 * 8 = 1024, which will be rejected by KVM. This could be surprising because only 128 VCPUs can be created when max_cpus = 256 and CONFIG_NR_CPUS = 1024. Proposal: One solution is to provide a way for QEMU to inform KVM of the guest's SMT mode. This would allow KVM to place the VCPUs correctly within physical cores without any VCPU ID padding. And one way to do that would be for KVM to allow QEMU to set the (currently read-only) KVM_CAP_PPC_SMT capability to the required guest SMT mode. The simplest implementation would seem to be to add a new version of the pseries machine and have it require that the kernel support setting KVM_CAP_PPC_SMT, but would this be a reasonable restriction? Should we add a property (where?) to allow the new machine version to run without the new kernel feature? Could that property default to "on" or "on if supported by the kernel" without it becoming too complicated or causing trouble during migration? Thanks, Sam.