From: David Gibson <david@gibson.dropbear.id.au>
To: Greg Kurz <groug@kaod.org>
Cc: "Paul Mackerras" <paulus@ozlabs.org>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Cédric Le Goater" <clg@kaod.org>,
kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] KVM: PPC: Book3S HV: Tunable to configure maximum # of vCPUs per VM
Date: Wed, 11 Sep 2019 12:30:48 +1000 [thread overview]
Message-ID: <20190911023048.GI30740@umbus.fritz.box> (raw)
In-Reply-To: <156813417397.1880979.6162333671088177553.stgit@bahia.tls.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 6205 bytes --]
On Tue, Sep 10, 2019 at 06:49:34PM +0200, Greg Kurz wrote:
> Each vCPU of a VM allocates a XIVE VP in OPAL which is associated with
> 8 event queue (EQ) descriptors, one for each priority. A POWER9 socket
> can handle a maximum of 1M event queues.
>
> The powernv platform allocates NR_CPUS (== 2048) VPs for the hypervisor,
> and each XIVE KVM device allocates KVM_MAX_VCPUS (== 2048) VPs. This means
> that on a bi-socket system, we can create at most:
>
> (2 * 1M) / (8 * 2048) - 1 == 127 XIVE or XICS-on-XIVE KVM devices
>
> ie, start at most 127 VMs benefiting from an in-kernel interrupt controller.
> Subsequent VMs need to rely on much slower userspace emulated XIVE device in
> QEMU.
>
> This is problematic as one can legitimately expect to start the same
> number of mono-CPU VMs as the number of HW threads available on the
> system (eg, 144 on Witherspoon).
>
> I'm not aware of any userspace supporting more that 1024 vCPUs. It thus
> seem overkill to consume that many VPs per VM. Ideally we would even
> want userspace to be able to tell KVM about the maximum number of vCPUs
> when creating the VM.
>
> For now, provide a module parameter to configure the maximum number of
> vCPUs per VM. While here, reduce the default value to 1024 to match the
> current limit in QEMU. This number is only used by the XIVE KVM devices,
> but some more users of KVM_MAX_VCPUS could possibly be converted.
>
> With this change, I could successfully run 230 mono-CPU VMs on a
> Witherspoon system using the official skiboot-6.3.
>
> I could even run more VMs by using upstream skiboot containing this
> fix, that allows to better spread interrupts between sockets:
>
> e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()")
>
> MAX VPCUS | MAX VMS
> ----------+---------
> 1024 | 255
> 512 | 511
> 256 | 1023 (*)
>
> (*) the system was barely usable because of the extreme load and
> memory exhaustion but the VMs did start.
Hrm. I don't love the idea of using a global tunable for this,
although I guess it could have some use. It's another global system
property that admins have to worry about.
A better approach would seem to be a way for userspace to be able to
hint the maximum number of cpus for a specific VM to the kernel.
>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
> arch/powerpc/include/asm/kvm_host.h | 1 +
> arch/powerpc/kvm/book3s_hv.c | 32 ++++++++++++++++++++++++++++++++
> arch/powerpc/kvm/book3s_xive.c | 2 +-
> arch/powerpc/kvm/book3s_xive_native.c | 2 +-
> 4 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 6fb5fb4779e0..17582ce38788 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -335,6 +335,7 @@ struct kvm_arch {
> struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS];
> /* This array can grow quite large, keep it at the end */
> struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
> + unsigned int max_vcpus;
> #endif
> };
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index f8975c620f41..393d8a1ce9d8 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -125,6 +125,36 @@ static bool nested = true;
> module_param(nested, bool, S_IRUGO | S_IWUSR);
> MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)");
>
> +#define MIN(x, y) (((x) < (y)) ? (x) : (y))
> +
> +static unsigned int max_vcpus = MIN(KVM_MAX_VCPUS, 1024);
> +
> +static int set_max_vcpus(const char *val, const struct kernel_param *kp)
> +{
> + unsigned int new_max_vcpus;
> + int ret;
> +
> + ret = kstrtouint(val, 0, &new_max_vcpus);
> + if (ret)
> + return ret;
> +
> + if (new_max_vcpus > KVM_MAX_VCPUS)
> + return -EINVAL;
> +
> + max_vcpus = new_max_vcpus;
> +
> + return 0;
> +}
> +
> +static struct kernel_param_ops max_vcpus_ops = {
> + .set = set_max_vcpus,
> + .get = param_get_uint,
> +};
> +
> +module_param_cb(max_vcpus, &max_vcpus_ops, &max_vcpus, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(max_vcpus, "Maximum number of vCPUS per VM (max = "
> + __stringify(KVM_MAX_VCPUS) ")");
> +
> static inline bool nesting_enabled(struct kvm *kvm)
> {
> return kvm->arch.nested_enable && kvm_is_radix(kvm);
> @@ -4918,6 +4948,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> if (radix_enabled())
> kvmhv_radix_debugfs_init(kvm);
>
> + kvm->arch.max_vcpus = max_vcpus;
> +
> return 0;
> }
>
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 2ef43d037a4f..0fea31b64564 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -2026,7 +2026,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
> xive->q_page_order = xive->q_order - PAGE_SHIFT;
>
> /* Allocate a bunch of VPs */
> - xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
> + xive->vp_base = xive_native_alloc_vp_block(kvm->arch.max_vcpus);
> pr_devel("VP_Base=%x\n", xive->vp_base);
>
> if (xive->vp_base == XIVE_INVALID_VP)
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 84a354b90f60..20314010da56 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -1095,7 +1095,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
> * a default. Getting the max number of CPUs the VM was
> * configured with would improve our usage of the XIVE VP space.
> */
> - xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
> + xive->vp_base = xive_native_alloc_vp_block(kvm->arch.max_vcpus);
> pr_devel("VP_Base=%x\n", xive->vp_base);
>
> if (xive->vp_base == XIVE_INVALID_VP)
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2019-09-11 4:05 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-10 16:49 [PATCH] KVM: PPC: Book3S HV: Tunable to configure maximum # of vCPUs per VM Greg Kurz
2019-09-11 2:30 ` David Gibson [this message]
2019-09-11 10:25 ` Greg Kurz
2019-09-12 13:49 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190911023048.GI30740@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=clg@kaod.org \
--cc=groug@kaod.org \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox