qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] PPC VCPU ID packing via KVM_CAP_PPC_SMT
@ 2015-10-19  3:34 Sam Bobroff
  2015-10-23  1:34 ` [Qemu-devel] [Qemu-ppc] " David Gibson
  0 siblings, 1 reply; 2+ messages in thread
From: Sam Bobroff @ 2015-10-19  3:34 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel

Hi everyone,

It's currently possible to configure QEMU and KVM such that (on a Power 7 or 8
host) users are unable to create as many VCPUs as they might reasonably expect.
I'll outline one fairly straight forward solution (below) and I would welcome
feedback: Does this seem a reasonable approach? Are there alternatives?

The issue:

The behaviour is caused by three things:
* QEMU limits the total number (count) of VCPUs based on the machine type (hard
  coded to 256 for pseries).
	* See hw/ppc/spapr.c spapr_machine_class_init()
* KVM limits the highest VCPU ID to CONFIG_NR_CPUS (2048 for
  pseries_defconfig).
	* See arch/powerpc/configs/pseries_defconfig
	* and arch/powerpc/include/asm/kvm_host.h
* If the host SMT mode is higher than the guest SMT mode when creating VCPUs,
  QEMU must "pad out" the VCPU IDs to align the VCPUs with physical cores (KVM
  doesn't know which SMT mode the guest wants).
	* See target-ppc/translate_init.c ppc_cpu_realizefn().

In the most pathological case the guest is SMT 1 (smp_threads = 1) and the host
SMT 8 (max_smt = 8), which causes the VCPU IDs to be spaced 8 apart (e.g. 0, 8,
24, ...).

This doesn't produce any strange behaviour with default limits, but consider
the case where CONFIG_NR_CPUs is set to 1024 (with the same SMT modes as
above): as the 128th VCPU is created, it's VCPU ID will be 128 * 8 = 1024,
which will be rejected by KVM. This could be surprising because only 128 VCPUs
can be created when max_cpus = 256 and CONFIG_NR_CPUS = 1024.

Proposal:

One solution is to provide a way for QEMU to inform KVM of the guest's SMT
mode. This would allow KVM to place the VCPUs correctly within physical cores
without any VCPU ID padding.

And one way to do that would be for KVM to allow QEMU to set the (currently
read-only) KVM_CAP_PPC_SMT capability to the required guest SMT mode.

The simplest implementation would seem to be to add a new version of the
pseries machine and have it require that the kernel support setting
KVM_CAP_PPC_SMT, but would this be a reasonable restriction? Should we add a
property (where?) to allow the new machine version to run without the new
kernel feature? Could that property default to "on" or "on if supported by the
kernel" without it becoming too complicated or causing trouble during
migration?

Thanks,
Sam.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] PPC VCPU ID packing via KVM_CAP_PPC_SMT
  2015-10-19  3:34 [Qemu-devel] PPC VCPU ID packing via KVM_CAP_PPC_SMT Sam Bobroff
@ 2015-10-23  1:34 ` David Gibson
  0 siblings, 0 replies; 2+ messages in thread
From: David Gibson @ 2015-10-23  1:34 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3257 bytes --]

On Mon, Oct 19, 2015 at 02:34:47PM +1100, Sam Bobroff wrote:
> Hi everyone,
> 
> It's currently possible to configure QEMU and KVM such that (on a Power 7 or 8
> host) users are unable to create as many VCPUs as they might reasonably expect.
> I'll outline one fairly straight forward solution (below) and I would welcome
> feedback: Does this seem a reasonable approach? Are there alternatives?
> 
> The issue:
> 
> The behaviour is caused by three things:
> * QEMU limits the total number (count) of VCPUs based on the machine type (hard
>   coded to 256 for pseries).
> 	* See hw/ppc/spapr.c spapr_machine_class_init()
> * KVM limits the highest VCPU ID to CONFIG_NR_CPUS (2048 for
>   pseries_defconfig).
> 	* See arch/powerpc/configs/pseries_defconfig
> 	* and arch/powerpc/include/asm/kvm_host.h
> * If the host SMT mode is higher than the guest SMT mode when creating VCPUs,
>   QEMU must "pad out" the VCPU IDs to align the VCPUs with physical cores (KVM
>   doesn't know which SMT mode the guest wants).
> 	* See target-ppc/translate_init.c ppc_cpu_realizefn().
> 
> In the most pathological case the guest is SMT 1 (smp_threads = 1) and the host
> SMT 8 (max_smt = 8), which causes the VCPU IDs to be spaced 8 apart (e.g. 0, 8,
> 24, ...).
> 
> This doesn't produce any strange behaviour with default limits, but consider
> the case where CONFIG_NR_CPUs is set to 1024 (with the same SMT modes as
> above): as the 128th VCPU is created, it's VCPU ID will be 128 * 8 = 1024,
> which will be rejected by KVM. This could be surprising because only 128 VCPUs
> can be created when max_cpus = 256 and CONFIG_NR_CPUS = 1024.
> 
> Proposal:
> 
> One solution is to provide a way for QEMU to inform KVM of the guest's SMT
> mode. This would allow KVM to place the VCPUs correctly within physical cores
> without any VCPU ID padding.

I think that's a good idea.  In fact it's what we should have done in
the first place.  Controlling the guest SMT mode implicitly with the
vcpu IDs was a case of too-clever-by-half on my part.

> And one way to do that would be for KVM to allow QEMU to set the (currently
> read-only) KVM_CAP_PPC_SMT capability to the required guest SMT mode.

Sounds ok.

> The simplest implementation would seem to be to add a new version of the
> pseries machine and have it require that the kernel support setting
> KVM_CAP_PPC_SMT, but would this be a reasonable restriction?

It's.. not great.

> Should we add a
> property (where?) to allow the new machine version to run without the new
> kernel feature? Could that property default to "on" or "on if supported by the
> kernel" without it becoming too complicated or causing trouble during
> migration?

So, migration is the issue, yes.

But.. I thought we already disconnected the KVM vcpu IDs from the qemu
internal cpu IDs, which is what we need for migration.  If that's so
(check, please), then it should be sufficient to make sure that the
KVM vcpu ID is included in the migration stream - it might be already.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-10-23  2:58 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-19  3:34 [Qemu-devel] PPC VCPU ID packing via KVM_CAP_PPC_SMT Sam Bobroff
2015-10-23  1:34 ` [Qemu-devel] [Qemu-ppc] " David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).