[PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs
@ 2021-11-16 14:10 Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
  0 siblings, 2 replies; 17+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

In order to be able to have a single kernel for supporting even huge
numbers of vcpus per guest some arrays should be sized dynamically.

The easiest way to do that is to add boot parameters for the maximum
number of vcpus and to calculate the maximum vcpu-id from that using
either the host topology or a topology hint via another boot parameter.

This patch series is doing that for x86. The same scheme can be easily
adapted to other architectures, but I don't want to do that in the
first iteration.

I've tested the series not to break normal guest operation and the new
parameters to be effective on x86.

This series is based on Marc Zyngier's xarray series:
https://lore.kernel.org/kvm/20211105192101.3862492-1-maz@kernel.org/

Changes in V2:
- removed old patch 1, as already applied
- patch 1 (old patch 2) only for reference, as the patch is already in
  the kvm tree
- switch patch 2 (old patch 3) to calculate vcpu-id
- added patch 4

Changes in V3:
- removed V2 patches 1 and 4, as already applied
- removed V2 patch 5, as replaced by Marc Zyngier's xarray series
- removed hyperv handling from patch 2
- new patch 3 handling hyperv specifics
- comments addressed

Juergen Gross (4):
  x86/kvm: add boot parameter for adding vcpu-id bits
  x86/kvm: introduce a per cpu vcpu mask
  x86/kvm: add max number of vcpus for hyperv emulation
  x86/kvm: add boot parameter for setting max number of vcpus per guest

 .../admin-guide/kernel-parameters.txt         | 25 +++++++++
 arch/x86/include/asm/kvm_host.h               | 29 +++++-----
 arch/x86/kvm/hyperv.c                         | 15 +++---
 arch/x86/kvm/ioapic.c                         | 20 ++++++-
 arch/x86/kvm/ioapic.h                         |  4 +-
 arch/x86/kvm/irq_comm.c                       |  9 +++-
 arch/x86/kvm/x86.c                            | 54 ++++++++++++++++++-
 7 files changed, 128 insertions(+), 28 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
@ 2021-11-16 14:10 ` Juergen Gross
  2021-11-17  6:59   ` Juergen Gross
  2021-11-17 23:44   ` Sean Christopherson
  2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
  1 sibling, 2 replies; 17+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
via a #define in a header file.

In order to support higher vcpu-ids without generally increasing the
memory consumption of guests on the host (some guest structures contain
arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
bits to the vcpu-id. Additional bits are needed as the vcpu-id is
constructed via bit-wise concatenation of socket-id, core-id, etc.
As those ids maximum values are not always a power of 2, the vcpu-ids
are sparse.

The additional number of bits needed is basically the number of
topology levels with a non-power-of-2 maximum value, excluding the top
most level.

The default value of the new parameter will be 2 in order to support
today's possible topologies. The special value of -1 will use the
number of bits needed for a guest with the current host's topology.

Calculating the maximum vcpu-id dynamically requires to allocate the
arrays using KVM_MAX_VCPU_IDS as the size dynamically.

Signed-of-by: Juergen Gross <jgross@suse.com>
---
V2:
- switch to specifying additional bits (based on comment by Vitaly
  Kuznetsov)
V3:
- set default of new parameter to 2 (Eduardo Habkost)
- deliberately NOT add another bit for topology_max_die_per_package()
  == 1 AND parameter being -1, as this would make this parameter
  setting always equivalent to specifying "2"

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 .../admin-guide/kernel-parameters.txt         | 18 ++++++++++++
 arch/x86/include/asm/kvm_host.h               | 16 ++--------
 arch/x86/kvm/ioapic.c                         | 12 +++++++-
 arch/x86/kvm/ioapic.h                         |  4 +--
 arch/x86/kvm/x86.c                            | 29 +++++++++++++++++++
 5 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..e269c3f66ba4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2445,6 +2445,24 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	kvm.vcpu_id_add_bits=
+			[KVM,X86] The vcpu-ids of guests are sparse, as they
+			are constructed by bit-wise concatenation of the ids of
+			the different topology levels (sockets, cores, threads).
+
+			This parameter specifies how many additional bits the
+			maximum vcpu-id needs compared to the maximum number of
+			vcpus.
+
+			Normally this value is the number of topology levels
+			without the threads level and without the highest
+			level.
+
+			The special value -1 can be used to support guests
+			with the same topology is the host.
+
+			Default: 2
+
 	l1d_flush=	[X86,INTEL]
 			Control mitigation for L1D based snooping vulnerability.
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e5d8700319cc..bcef56f1039a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -38,19 +38,7 @@
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
 #define KVM_MAX_VCPUS 1024
-
-/*
- * In x86, the VCPU ID corresponds to the APIC ID, and APIC IDs
- * might be larger than the actual number of VCPUs because the
- * APIC ID encodes CPU topology information.
- *
- * In the worst case, we'll need less than one extra bit for the
- * Core ID, and less than one extra bit for the Package (Die) ID,
- * so ratio of 4 should be enough.
- */
-#define KVM_VCPU_ID_RATIO 4
-#define KVM_MAX_VCPU_IDS (KVM_MAX_VCPUS * KVM_VCPU_ID_RATIO)
-
+#define KVM_MAX_VCPU_IDS kvm_max_vcpu_ids()
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
 
@@ -1621,6 +1609,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
 extern u64  kvm_default_tsc_scaling_ratio;
 /* bus lock detection supported? */
 extern bool kvm_has_bus_lock_exit;
+/* maximum vcpu-id */
+unsigned int kvm_max_vcpu_ids(void);
 
 extern u64 kvm_mce_cap_supported;
 
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 816a82515dcd..64ba9b1c8b3d 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
 int kvm_ioapic_init(struct kvm *kvm)
 {
 	struct kvm_ioapic *ioapic;
+	size_t sz;
 	int ret;
 
-	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
+	sz = sizeof(struct kvm_ioapic) +
+	     sizeof(*ioapic->rtc_status.dest_map.map) *
+		    BITS_TO_LONGS(KVM_MAX_VCPU_IDS) +
+	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
+		    (KVM_MAX_VCPU_IDS);
+	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
 	if (!ioapic)
 		return -ENOMEM;
+	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
+	ioapic->rtc_status.dest_map.vectors =
+		(void *)(ioapic->rtc_status.dest_map.map +
+			 BITS_TO_LONGS(KVM_MAX_VCPU_IDS));
 	spin_lock_init(&ioapic->lock);
 	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
 	kvm->arch.vioapic = ioapic;
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index e66e620c3bed..623a3c5afad7 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -39,13 +39,13 @@ struct kvm_vcpu;
 
 struct dest_map {
 	/* vcpu bitmap where IRQ has been sent */
-	DECLARE_BITMAP(map, KVM_MAX_VCPU_IDS);
+	unsigned long *map;
 
 	/*
 	 * Vector sent to a given vcpu, only valid when
 	 * the vcpu's bit in map is set
 	 */
-	u8 vectors[KVM_MAX_VCPU_IDS];
+	u8 *vectors;
 };
 
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 259f719014c9..61bab2bdeefb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -80,6 +80,7 @@
 #include <asm/intel_pt.h>
 #include <asm/emulate_prefix.h>
 #include <asm/sgx.h>
+#include <asm/topology.h>
 #include <clocksource/hyperv_timer.h>
 
 #define CREATE_TRACE_POINTS
@@ -186,6 +187,34 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+static int __read_mostly vcpu_id_add_bits = 2;
+module_param(vcpu_id_add_bits, int, S_IRUGO);
+
+unsigned int kvm_max_vcpu_ids(void)
+{
+	int n_bits = fls(KVM_MAX_VCPUS - 1);
+
+	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
+		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
+		       vcpu_id_add_bits);
+		vcpu_id_add_bits = 2;
+	}
+
+	if (vcpu_id_add_bits >= 0) {
+		n_bits += vcpu_id_add_bits;
+	} else {
+		n_bits++;		/* One additional bit for core level. */
+		if (topology_max_die_per_package() > 1)
+			n_bits++;	/* One additional bit for die level. */
+	}
+
+	if (!n_bits)
+		n_bits = 1;
+
+	return 1U << n_bits;
+}
+EXPORT_SYMBOL_GPL(kvm_max_vcpu_ids);
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
@ 2021-11-17  6:59   ` Juergen Gross
  2021-11-17 23:46     ` Sean Christopherson
  2021-11-17 23:44   ` Sean Christopherson
  1 sibling, 1 reply; 17+ messages in thread
From: Juergen Gross @ 2021-11-17  6:59 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2010 bytes --]

On 16.11.21 15:10, Juergen Gross wrote:
> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> via a #define in a header file.
> 
> In order to support higher vcpu-ids without generally increasing the
> memory consumption of guests on the host (some guest structures contain
> arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> constructed via bit-wise concatenation of socket-id, core-id, etc.
> As those ids maximum values are not always a power of 2, the vcpu-ids
> are sparse.
> 
> The additional number of bits needed is basically the number of
> topology levels with a non-power-of-2 maximum value, excluding the top
> most level.
> 
> The default value of the new parameter will be 2 in order to support
> today's possible topologies. The special value of -1 will use the
> number of bits needed for a guest with the current host's topology.
> 
> Calculating the maximum vcpu-id dynamically requires to allocate the
> arrays using KVM_MAX_VCPU_IDS as the size dynamically.
> 
> Signed-of-by: Juergen Gross <jgross@suse.com>

Just thought about vcpu-ids a little bit more.

It would be possible to replace the topology games completely by an
arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
depending on the max vcpu-id just as needed.

Right now the only vcpu-id dependent memory is for the ioapic consisting
of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).

We could start with a minimal size when setting up an ioapic and extend
the areas in case a new vcpu created would introduce a vcpu-id outside
the currently allocated memory. Both arrays are protected by the ioapic
specific lock (at least I couldn't spot any unprotected usage when
looking briefly into the code), so reallocating those arrays shouldn't
be hard. In case of ENOMEM the related vcpu creation would just fail.

Thoughts?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-17  6:59   ` Juergen Gross
@ 2021-11-17 23:46     ` Sean Christopherson
  2021-11-18  7:45       ` Juergen Gross
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2021-11-17 23:46 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Wed, Nov 17, 2021, Juergen Gross wrote:
> On 16.11.21 15:10, Juergen Gross wrote:
> > Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> > via a #define in a header file.
> > 
> > In order to support higher vcpu-ids without generally increasing the
> > memory consumption of guests on the host (some guest structures contain
> > arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
> > bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> > constructed via bit-wise concatenation of socket-id, core-id, etc.
> > As those ids maximum values are not always a power of 2, the vcpu-ids
> > are sparse.
> > 
> > The additional number of bits needed is basically the number of
> > topology levels with a non-power-of-2 maximum value, excluding the top
> > most level.
> > 
> > The default value of the new parameter will be 2 in order to support
> > today's possible topologies. The special value of -1 will use the
> > number of bits needed for a guest with the current host's topology.
> > 
> > Calculating the maximum vcpu-id dynamically requires to allocate the
> > arrays using KVM_MAX_VCPU_IDS as the size dynamically.
> > 
> > Signed-of-by: Juergen Gross <jgross@suse.com>
> 
> Just thought about vcpu-ids a little bit more.
> 
> It would be possible to replace the topology games completely by an
> arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
> depending on the max vcpu-id just as needed.
> 
> Right now the only vcpu-id dependent memory is for the ioapic consisting
> of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
> 
> We could start with a minimal size when setting up an ioapic and extend
> the areas in case a new vcpu created would introduce a vcpu-id outside
> the currently allocated memory. Both arrays are protected by the ioapic
> specific lock (at least I couldn't spot any unprotected usage when
> looking briefly into the code), so reallocating those arrays shouldn't
> be hard. In case of ENOMEM the related vcpu creation would just fail.
> 
> Thoughts?

Why not have userspace state the max vcpu_id it intends to creates on a per-VM
basis?  Same end result, but doesn't require the complexity of reallocating the
I/O APIC stuff.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-17 23:46     ` Sean Christopherson
@ 2021-11-18  7:45       ` Juergen Gross
  2021-11-18 15:09         ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Juergen Gross @ 2021-11-18  7:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2440 bytes --]

On 18.11.21 00:46, Sean Christopherson wrote:
> On Wed, Nov 17, 2021, Juergen Gross wrote:
>> On 16.11.21 15:10, Juergen Gross wrote:
>>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>>> via a #define in a header file.
>>>
>>> In order to support higher vcpu-ids without generally increasing the
>>> memory consumption of guests on the host (some guest structures contain
>>> arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
>>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>>> As those ids maximum values are not always a power of 2, the vcpu-ids
>>> are sparse.
>>>
>>> The additional number of bits needed is basically the number of
>>> topology levels with a non-power-of-2 maximum value, excluding the top
>>> most level.
>>>
>>> The default value of the new parameter will be 2 in order to support
>>> today's possible topologies. The special value of -1 will use the
>>> number of bits needed for a guest with the current host's topology.
>>>
>>> Calculating the maximum vcpu-id dynamically requires to allocate the
>>> arrays using KVM_MAX_VCPU_IDS as the size dynamically.
>>>
>>> Signed-of-by: Juergen Gross <jgross@suse.com>
>>
>> Just thought about vcpu-ids a little bit more.
>>
>> It would be possible to replace the topology games completely by an
>> arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
>> depending on the max vcpu-id just as needed.
>>
>> Right now the only vcpu-id dependent memory is for the ioapic consisting
>> of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
>>
>> We could start with a minimal size when setting up an ioapic and extend
>> the areas in case a new vcpu created would introduce a vcpu-id outside
>> the currently allocated memory. Both arrays are protected by the ioapic
>> specific lock (at least I couldn't spot any unprotected usage when
>> looking briefly into the code), so reallocating those arrays shouldn't
>> be hard. In case of ENOMEM the related vcpu creation would just fail.
>>
>> Thoughts?
> 
> Why not have userspace state the max vcpu_id it intends to creates on a per-VM
> basis?  Same end result, but doesn't require the complexity of reallocating the
> I/O APIC stuff.
> 

And if the userspace doesn't do it (like today)?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-18  7:45       ` Juergen Gross
@ 2021-11-18 15:09         ` Sean Christopherson
  2021-11-18 15:19           ` Juergen Gross
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:09 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 18.11.21 00:46, Sean Christopherson wrote:
> > On Wed, Nov 17, 2021, Juergen Gross wrote:
> > > On 16.11.21 15:10, Juergen Gross wrote:
> > > > Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> > > > via a #define in a header file.
> > > > 
> > > > In order to support higher vcpu-ids without generally increasing the
> > > > memory consumption of guests on the host (some guest structures contain
> > > > arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
> > > > bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> > > > constructed via bit-wise concatenation of socket-id, core-id, etc.
> > > > As those ids maximum values are not always a power of 2, the vcpu-ids
> > > > are sparse.
> > > > 
> > > > The additional number of bits needed is basically the number of
> > > > topology levels with a non-power-of-2 maximum value, excluding the top
> > > > most level.
> > > > 
> > > > The default value of the new parameter will be 2 in order to support
> > > > today's possible topologies. The special value of -1 will use the
> > > > number of bits needed for a guest with the current host's topology.
> > > > 
> > > > Calculating the maximum vcpu-id dynamically requires to allocate the
> > > > arrays using KVM_MAX_VCPU_IDS as the size dynamically.
> > > > 
> > > > Signed-of-by: Juergen Gross <jgross@suse.com>
> > > 
> > > Just thought about vcpu-ids a little bit more.
> > > 
> > > It would be possible to replace the topology games completely by an
> > > arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
> > > depending on the max vcpu-id just as needed.
> > > 
> > > Right now the only vcpu-id dependent memory is for the ioapic consisting
> > > of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
> > > 
> > > We could start with a minimal size when setting up an ioapic and extend
> > > the areas in case a new vcpu created would introduce a vcpu-id outside
> > > the currently allocated memory. Both arrays are protected by the ioapic
> > > specific lock (at least I couldn't spot any unprotected usage when
> > > looking briefly into the code), so reallocating those arrays shouldn't
> > > be hard. In case of ENOMEM the related vcpu creation would just fail.
> > > 
> > > Thoughts?
> > 
> > Why not have userspace state the max vcpu_id it intends to creates on a per-VM
> > basis?  Same end result, but doesn't require the complexity of reallocating the
> > I/O APIC stuff.
> > 
> 
> And if the userspace doesn't do it (like today)?

Similar to my comments in patch 4, KVM's current limits could be used as the
defaults, and any use case wanting to go beyond that would need an updated
userspace.  Exceeding those limits today doesn't work, so there's no ABI breakage
by requiring a userspace change.

Or again, this could be a Kconfig knob, though that feels a bit weird in this case.
But it might make sense if it can be tied to something in the kernel's config?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-18 15:09         ` Sean Christopherson
@ 2021-11-18 15:19           ` Juergen Gross
  0 siblings, 0 replies; 17+ messages in thread
From: Juergen Gross @ 2021-11-18 15:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 3364 bytes --]

On 18.11.21 16:09, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 18.11.21 00:46, Sean Christopherson wrote:
>>> On Wed, Nov 17, 2021, Juergen Gross wrote:
>>>> On 16.11.21 15:10, Juergen Gross wrote:
>>>>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>>>>> via a #define in a header file.
>>>>>
>>>>> In order to support higher vcpu-ids without generally increasing the
>>>>> memory consumption of guests on the host (some guest structures contain
>>>>> arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
>>>>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>>>>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>>>>> As those ids maximum values are not always a power of 2, the vcpu-ids
>>>>> are sparse.
>>>>>
>>>>> The additional number of bits needed is basically the number of
>>>>> topology levels with a non-power-of-2 maximum value, excluding the top
>>>>> most level.
>>>>>
>>>>> The default value of the new parameter will be 2 in order to support
>>>>> today's possible topologies. The special value of -1 will use the
>>>>> number of bits needed for a guest with the current host's topology.
>>>>>
>>>>> Calculating the maximum vcpu-id dynamically requires to allocate the
>>>>> arrays using KVM_MAX_VCPU_IDS as the size dynamically.
>>>>>
>>>>> Signed-of-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> Just thought about vcpu-ids a little bit more.
>>>>
>>>> It would be possible to replace the topology games completely by an
>>>> arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
>>>> depending on the max vcpu-id just as needed.
>>>>
>>>> Right now the only vcpu-id dependent memory is for the ioapic consisting
>>>> of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
>>>>
>>>> We could start with a minimal size when setting up an ioapic and extend
>>>> the areas in case a new vcpu created would introduce a vcpu-id outside
>>>> the currently allocated memory. Both arrays are protected by the ioapic
>>>> specific lock (at least I couldn't spot any unprotected usage when
>>>> looking briefly into the code), so reallocating those arrays shouldn't
>>>> be hard. In case of ENOMEM the related vcpu creation would just fail.
>>>>
>>>> Thoughts?
>>>
>>> Why not have userspace state the max vcpu_id it intends to creates on a per-VM
>>> basis?  Same end result, but doesn't require the complexity of reallocating the
>>> I/O APIC stuff.
>>>
>>
>> And if the userspace doesn't do it (like today)?
> 
> Similar to my comments in patch 4, KVM's current limits could be used as the
> defaults, and any use case wanting to go beyond that would need an updated
> userspace.  Exceeding those limits today doesn't work, so there's no ABI breakage
> by requiring a userspace change.

Hmm, nice idea. Will look into it.

> Or again, this could be a Kconfig knob, though that feels a bit weird in this case.
> But it might make sense if it can be tied to something in the kernel's config?

Having a Kconfig knob for an absolute upper bound of vcpus should
be fine. If someone doesn't like the capability to explicitly let
qemu create very large VMs, he/she can still set that upper bound
to the normal KVM_MAX_VCPUS value.

Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
  2021-11-17  6:59   ` Juergen Gross
@ 2021-11-17 23:44   ` Sean Christopherson
  2021-11-18  7:44     ` Juergen Gross
  1 sibling, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2021-11-17 23:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Tue, Nov 16, 2021, Juergen Gross wrote:
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 816a82515dcd..64ba9b1c8b3d 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
>  int kvm_ioapic_init(struct kvm *kvm)
>  {
>  	struct kvm_ioapic *ioapic;
> +	size_t sz;
>  	int ret;
>  
> -	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
> +	sz = sizeof(struct kvm_ioapic) +
> +	     sizeof(*ioapic->rtc_status.dest_map.map) *
> +		    BITS_TO_LONGS(KVM_MAX_VCPU_IDS) +
> +	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
> +		    (KVM_MAX_VCPU_IDS);
> +	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
>  	if (!ioapic)
>  		return -ENOMEM;
> +	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);

Oof.  Just do separate allocations.  I highly doubt the performance of the
emulated RTC hinges on the spatial locality of the bitmap and array.  The array
is going to end up in a second page for most configuration anyways.

> +	ioapic->rtc_status.dest_map.vectors =
> +		(void *)(ioapic->rtc_status.dest_map.map +
> +			 BITS_TO_LONGS(KVM_MAX_VCPU_IDS));
>  	spin_lock_init(&ioapic->lock);
>  	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
>  	kvm->arch.vioapic = ioapic;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-17 23:44   ` Sean Christopherson
@ 2021-11-18  7:44     ` Juergen Gross
  0 siblings, 0 replies; 17+ messages in thread
From: Juergen Gross @ 2021-11-18  7:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 1197 bytes --]

On 18.11.21 00:44, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Juergen Gross wrote:
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 816a82515dcd..64ba9b1c8b3d 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
>>   int kvm_ioapic_init(struct kvm *kvm)
>>   {
>>   	struct kvm_ioapic *ioapic;
>> +	size_t sz;
>>   	int ret;
>>   
>> -	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
>> +	sz = sizeof(struct kvm_ioapic) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.map) *
>> +		    BITS_TO_LONGS(KVM_MAX_VCPU_IDS) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
>> +		    (KVM_MAX_VCPU_IDS);
>> +	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
>>   	if (!ioapic)
>>   		return -ENOMEM;
>> +	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
> 
> Oof.  Just do separate allocations.  I highly doubt the performance of the
> emulated RTC hinges on the spatial locality of the bitmap and array.  The array
> is going to end up in a second page for most configuration anyways.

Okay, fine with me.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
@ 2021-11-16 14:10 ` Juergen Gross
  2021-11-17 20:57   ` Sean Christopherson
  1 sibling, 1 reply; 17+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Today the maximum number of vcpus of a kvm guest is set via a #define
in a header file.

In order to support higher vcpu numbers for guests without generally
increasing the memory consumption of guests on the host especially on
very large systems add a boot parameter for specifying the number of
allowed vcpus for guests.

The default will still be the current setting of 1024. The value 0 has
the special meaning to limit the number of possible vcpus to the
number of possible cpus of the host.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- rebase
---
 Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
 arch/x86/include/asm/kvm_host.h                 | 5 ++++-
 arch/x86/kvm/x86.c                              | 9 ++++++++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e269c3f66ba4..409a72c2d91b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2445,6 +2445,13 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
+			guest. The special value 0 sets the limit to the number
+			of physical cpus possible on the host (including not
+			yet hotplugged cpus). Higher values will result in
+			slightly higher memory consumption per guest.
+			Default: 1024
+
 	kvm.vcpu_id_add_bits=
 			[KVM,X86] The vcpu-ids of guests are sparse, as they
 			are constructed by bit-wise concatenation of the ids of
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8ea03ff01c45..8566e278ca91 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -38,7 +38,8 @@
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
-#define KVM_MAX_VCPUS 1024U
+#define KVM_DEFAULT_MAX_VCPUS 1024U
+#define KVM_MAX_VCPUS max_vcpus
 #define KVM_MAX_HYPERV_VCPUS 1024U
 #define KVM_MAX_VCPU_IDS kvm_max_vcpu_ids()
 /* memory slots that are not exposed to userspace */
@@ -1611,6 +1612,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
 extern u64  kvm_default_tsc_scaling_ratio;
 /* bus lock detection supported? */
 extern bool kvm_has_bus_lock_exit;
+/* maximum number of vcpus per guest */
+extern unsigned int max_vcpus;
 /* maximum vcpu-id */
 unsigned int kvm_max_vcpu_ids(void);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a388acdc5eb0..3571ea34135b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -190,9 +190,13 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 static int __read_mostly vcpu_id_add_bits = 2;
 module_param(vcpu_id_add_bits, int, S_IRUGO);
 
+unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
+module_param(max_vcpus, uint, S_IRUGO);
+EXPORT_SYMBOL_GPL(max_vcpus);
+
 unsigned int kvm_max_vcpu_ids(void)
 {
-	int n_bits = fls(KVM_MAX_VCPUS - 1);
+	int n_bits = fls(max_vcpus - 1);
 
 	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
 		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
@@ -11251,6 +11255,9 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		rdmsrl(MSR_IA32_XSS, host_xss);
 
+	if (max_vcpus == 0)
+		max_vcpus = num_possible_cpus();
+
 	kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
 					    sizeof(unsigned long));
 	if (!kvm_pcpu_vcpu_mask) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
@ 2021-11-17 20:57   ` Sean Christopherson
  2021-11-18  7:16     ` Juergen Gross
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2021-11-17 20:57 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Tue, Nov 16, 2021, Juergen Gross wrote:
> Today the maximum number of vcpus of a kvm guest is set via a #define
> in a header file.
> 
> In order to support higher vcpu numbers for guests without generally
> increasing the memory consumption of guests on the host especially on
> very large systems add a boot parameter for specifying the number of
> allowed vcpus for guests.
> 
> The default will still be the current setting of 1024. The value 0 has
> the special meaning to limit the number of possible vcpus to the
> number of possible cpus of the host.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V3:
> - rebase
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
>  arch/x86/include/asm/kvm_host.h                 | 5 ++++-
>  arch/x86/kvm/x86.c                              | 9 ++++++++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index e269c3f66ba4..409a72c2d91b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2445,6 +2445,13 @@
>  			feature (tagged TLBs) on capable Intel chips.
>  			Default is 1 (enabled)
>  
> +	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
> +			guest. The special value 0 sets the limit to the number
> +			of physical cpus possible on the host (including not
> +			yet hotplugged cpus). Higher values will result in
> +			slightly higher memory consumption per guest.
> +			Default: 1024

Rather than makes this a module param, I would prefer to start with the below
patch (originally from TDX pre-enabling) and then wire up a way for userspace to
_lower_ the max on a per-VM basis, e.g. add a capability.

VMs largely fall into two categories: (1) the max number of vCPUs is known prior
to VM creation, or (2) the max number of vCPUs is unbounded (up to KVM's hard
limit), e.g. for container-style use cases where "vCPUs" are created on-demand in
response to the "guest" creating a new task.

For #1, a per-VM control lets userspace lower the limit to the bare minimum.  For
#2, neither the module param nor the per-VM control is likely to be useful, but
a per-VM control does let mixed environments (both #1 and #2 VMs) lower the limits
for compatible VMs, whereas a module param must be set to the max of any potential VM.

From 0593cb4f73a6c3f0862f9411f0e14f00671f59ae Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson@intel.com>
Date: Fri, 2 Jul 2021 15:04:27 -0700
Subject: [PATCH] KVM: Add max_vcpus field in common 'struct kvm'

Move arm's per-VM max_vcpus field into the generic "struct kvm", and use
it to check vcpus_created in the generic code instead of checking only
the hardcoded absolute KVM-wide max.  x86 TDX guests will reuse the
generic check verbatim, as the max number of vCPUs for a TDX guest is
user defined at VM creation and immutable thereafter.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 ---
 arch/arm64/kvm/arm.c              | 7 ++-----
 arch/arm64/kvm/vgic/vgic-init.c   | 6 +++---
 include/linux/kvm_host.h          | 1 +
 virt/kvm/kvm_main.c               | 3 ++-
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4be8486042a7..b51e1aa6ae27 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -108,9 +108,6 @@ struct kvm_arch {
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;

-	/* The maximum number of vCPUs depends on the used GIC model */
-	int max_vcpus;
-
 	/* Interrupt controller */
 	struct vgic_dist	vgic;

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f5490afe1ebf..97c3b83235b4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -153,7 +153,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm_vgic_early_init(kvm);

 	/* The maximum number of VCPUs is limited by the host's GIC model */
-	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
+	kvm->max_vcpus = kvm_arm_default_max_vcpus();

 	set_default_spectre(kvm);

@@ -228,7 +228,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_MAX_VCPUS:
 	case KVM_CAP_MAX_VCPU_ID:
 		if (kvm)
-			r = kvm->arch.max_vcpus;
+			r = kvm->max_vcpus;
 		else
 			r = kvm_arm_default_max_vcpus();
 		break;
@@ -304,9 +304,6 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
 		return -EBUSY;

-	if (id >= kvm->arch.max_vcpus)
-		return -EINVAL;
-
 	return 0;
 }

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 0a06d0648970..906aee52f2bc 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -97,11 +97,11 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	ret = 0;

 	if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
-		kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
+		kvm->max_vcpus = VGIC_V2_MAX_CPUS;
 	else
-		kvm->arch.max_vcpus = VGIC_V3_MAX_CPUS;
+		kvm->max_vcpus = VGIC_V3_MAX_CPUS;

-	if (atomic_read(&kvm->online_vcpus) > kvm->arch.max_vcpus) {
+	if (atomic_read(&kvm->online_vcpus) > kvm->max_vcpus) {
 		ret = -E2BIG;
 		goto out_unlock;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60a35d9fe259..5f56516e2f5a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -566,6 +566,7 @@ struct kvm {
 	 * and is accessed atomically.
 	 */
 	atomic_t online_vcpus;
+	int max_vcpus;
 	int created_vcpus;
 	int last_boosted_vcpu;
 	struct list_head vm_list;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f6d450355f0..e509b963651c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1052,6 +1052,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	rcuwait_init(&kvm->mn_memslots_update_rcuwait);

 	INIT_LIST_HEAD(&kvm->devices);
+	kvm->max_vcpus = KVM_MAX_VCPUS;

 	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);

@@ -3599,7 +3600,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 		return -EINVAL;

 	mutex_lock(&kvm->lock);
-	if (kvm->created_vcpus == KVM_MAX_VCPUS) {
+	if (kvm->created_vcpus >= kvm->max_vcpus) {
 		mutex_unlock(&kvm->lock);
 		return -EINVAL;
 	}
--
2.34.0.rc1.387.gb447b232ab-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-17 20:57   ` Sean Christopherson
@ 2021-11-18  7:16     ` Juergen Gross
  2021-11-18 15:05       ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Juergen Gross @ 2021-11-18  7:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 7670 bytes --]

On 17.11.21 21:57, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Juergen Gross wrote:
>> Today the maximum number of vcpus of a kvm guest is set via a #define
>> in a header file.
>>
>> In order to support higher vcpu numbers for guests without generally
>> increasing the memory consumption of guests on the host especially on
>> very large systems add a boot parameter for specifying the number of
>> allowed vcpus for guests.
>>
>> The default will still be the current setting of 1024. The value 0 has
>> the special meaning to limit the number of possible vcpus to the
>> number of possible cpus of the host.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V3:
>> - rebase
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
>>   arch/x86/include/asm/kvm_host.h                 | 5 ++++-
>>   arch/x86/kvm/x86.c                              | 9 ++++++++-
>>   3 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index e269c3f66ba4..409a72c2d91b 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2445,6 +2445,13 @@
>>   			feature (tagged TLBs) on capable Intel chips.
>>   			Default is 1 (enabled)
>>   
>> +	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
>> +			guest. The special value 0 sets the limit to the number
>> +			of physical cpus possible on the host (including not
>> +			yet hotplugged cpus). Higher values will result in
>> +			slightly higher memory consumption per guest.
>> +			Default: 1024
> 
> Rather than makes this a module param, I would prefer to start with the below
> patch (originally from TDX pre-enabling) and then wire up a way for userspace to
> _lower_ the max on a per-VM basis, e.g. add a capability.
> 
> VMs largely fall into two categories: (1) the max number of vCPUs is known prior
> to VM creation, or (2) the max number of vCPUs is unbounded (up to KVM's hard
> limit), e.g. for container-style use cases where "vCPUs" are created on-demand in
> response to the "guest" creating a new task.
> 
> For #1, a per-VM control lets userspace lower the limit to the bare minimum.  For
> #2, neither the module param nor the per-VM control is likely to be useful, but
> a per-VM control does let mixed environments (both #1 and #2 VMs) lower the limits
> for compatible VMs, whereas a module param must be set to the max of any potential VM.

The main reason for this whole series is a request by a partner
to enable huge VMs on huge machines (huge meaning thousands of
vcpus on thousands of physical cpus).

Making this large number a compile time setting would hurt all
the users who have more standard requirements by allocating the
needed resources even on small systems, so I've switched to a boot
parameter in order to enable those huge numbers only when required.

With Marc's series to use an xarray for the vcpu pointers only the
bitmaps for sending IRQs to vcpus are left which need to be sized
according to the max vcpu limit. Your patch below seems to be fine, but
doesn't help for that case.


Juergen

> 
>  From 0593cb4f73a6c3f0862f9411f0e14f00671f59ae Mon Sep 17 00:00:00 2001
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> Date: Fri, 2 Jul 2021 15:04:27 -0700
> Subject: [PATCH] KVM: Add max_vcpus field in common 'struct kvm'
> 
> Move arm's per-VM max_vcpus field into the generic "struct kvm", and use
> it to check vcpus_created in the generic code instead of checking only
> the hardcoded absolute KVM-wide max.  x86 TDX guests will reuse the
> generic check verbatim, as the max number of vCPUs for a TDX guest is
> user defined at VM creation and immutable thereafter.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/arm64/include/asm/kvm_host.h | 3 ---
>   arch/arm64/kvm/arm.c              | 7 ++-----
>   arch/arm64/kvm/vgic/vgic-init.c   | 6 +++---
>   include/linux/kvm_host.h          | 1 +
>   virt/kvm/kvm_main.c               | 3 ++-
>   5 files changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4be8486042a7..b51e1aa6ae27 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -108,9 +108,6 @@ struct kvm_arch {
>   	/* VTCR_EL2 value for this VM */
>   	u64    vtcr;
> 
> -	/* The maximum number of vCPUs depends on the used GIC model */
> -	int max_vcpus;
> -
>   	/* Interrupt controller */
>   	struct vgic_dist	vgic;
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index f5490afe1ebf..97c3b83235b4 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -153,7 +153,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   	kvm_vgic_early_init(kvm);
> 
>   	/* The maximum number of VCPUs is limited by the host's GIC model */
> -	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> +	kvm->max_vcpus = kvm_arm_default_max_vcpus();
> 
>   	set_default_spectre(kvm);
> 
> @@ -228,7 +228,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_MAX_VCPUS:
>   	case KVM_CAP_MAX_VCPU_ID:
>   		if (kvm)
> -			r = kvm->arch.max_vcpus;
> +			r = kvm->max_vcpus;
>   		else
>   			r = kvm_arm_default_max_vcpus();
>   		break;
> @@ -304,9 +304,6 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
>   	if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
>   		return -EBUSY;
> 
> -	if (id >= kvm->arch.max_vcpus)
> -		return -EINVAL;
> -
>   	return 0;
>   }
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index 0a06d0648970..906aee52f2bc 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -97,11 +97,11 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>   	ret = 0;
> 
>   	if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
> -		kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
> +		kvm->max_vcpus = VGIC_V2_MAX_CPUS;
>   	else
> -		kvm->arch.max_vcpus = VGIC_V3_MAX_CPUS;
> +		kvm->max_vcpus = VGIC_V3_MAX_CPUS;
> 
> -	if (atomic_read(&kvm->online_vcpus) > kvm->arch.max_vcpus) {
> +	if (atomic_read(&kvm->online_vcpus) > kvm->max_vcpus) {
>   		ret = -E2BIG;
>   		goto out_unlock;
>   	}
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 60a35d9fe259..5f56516e2f5a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -566,6 +566,7 @@ struct kvm {
>   	 * and is accessed atomically.
>   	 */
>   	atomic_t online_vcpus;
> +	int max_vcpus;
>   	int created_vcpus;
>   	int last_boosted_vcpu;
>   	struct list_head vm_list;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3f6d450355f0..e509b963651c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1052,6 +1052,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
>   	rcuwait_init(&kvm->mn_memslots_update_rcuwait);
> 
>   	INIT_LIST_HEAD(&kvm->devices);
> +	kvm->max_vcpus = KVM_MAX_VCPUS;
> 
>   	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
> 
> @@ -3599,7 +3600,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>   		return -EINVAL;
> 
>   	mutex_lock(&kvm->lock);
> -	if (kvm->created_vcpus == KVM_MAX_VCPUS) {
> +	if (kvm->created_vcpus >= kvm->max_vcpus) {
>   		mutex_unlock(&kvm->lock);
>   		return -EINVAL;
>   	}
> --
> 2.34.0.rc1.387.gb447b232ab-goog
> 


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18  7:16     ` Juergen Gross
@ 2021-11-18 15:05       ` Sean Christopherson
  2021-11-18 15:15         ` Juergen Gross
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:05 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 17.11.21 21:57, Sean Christopherson wrote:
> > Rather than makes this a module param, I would prefer to start with the below
> > patch (originally from TDX pre-enabling) and then wire up a way for userspace to
> > _lower_ the max on a per-VM basis, e.g. add a capability.
>
> The main reason for this whole series is a request by a partner
> to enable huge VMs on huge machines (huge meaning thousands of
> vcpus on thousands of physical cpus).
> 
> Making this large number a compile time setting would hurt all
> the users who have more standard requirements by allocating the
> needed resources even on small systems, so I've switched to a boot
> parameter in order to enable those huge numbers only when required.
> 
> With Marc's series to use an xarray for the vcpu pointers only the
> bitmaps for sending IRQs to vcpus are left which need to be sized
> according to the max vcpu limit. Your patch below seems to be fine, but
> doesn't help for that case.

Ah, you want to let userspace define a MAX_VCPUS that goes well beyond the current
limit without negatively impacting existing setups.  My idea of a per-VM capability
still works, it would simply require separating the default max from the absolute
max, which this patch mostly does already, it just neglects to set an absolute max.

Which is a good segue into pointing out that if a module param is added, it needs
to be sanity checked against a KVM-defined max.  The admin may be trusted to some
extent, but there is zero reason to let userspace set max_vcspus to 4 billion.
At that point, it really is just a param vs. capability question.

I like the idea of a capability because there are already two known use cases,
arm64's GIC and x86's TDX, and it could also be used to reduce the kernel's footprint
for use cases that run large numbers of smaller VMs.

The other alternative would be to turn KVM_MAX_VCPUS into a Kconfig knob.  I assume
the partner isn't running a vanilla distro build and could set it as they see fit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:05       ` Sean Christopherson
@ 2021-11-18 15:15         ` Juergen Gross
  2021-11-18 15:32           ` Sean Christopherson
  2021-11-18 15:46           ` Sean Christopherson
  0 siblings, 2 replies; 17+ messages in thread
From: Juergen Gross @ 2021-11-18 15:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2520 bytes --]

On 18.11.21 16:05, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 17.11.21 21:57, Sean Christopherson wrote:
>>> Rather than makes this a module param, I would prefer to start with the below
>>> patch (originally from TDX pre-enabling) and then wire up a way for userspace to
>>> _lower_ the max on a per-VM basis, e.g. add a capability.
>>
>> The main reason for this whole series is a request by a partner
>> to enable huge VMs on huge machines (huge meaning thousands of
>> vcpus on thousands of physical cpus).
>>
>> Making this large number a compile time setting would hurt all
>> the users who have more standard requirements by allocating the
>> needed resources even on small systems, so I've switched to a boot
>> parameter in order to enable those huge numbers only when required.
>>
>> With Marc's series to use an xarray for the vcpu pointers only the
>> bitmaps for sending IRQs to vcpus are left which need to be sized
>> according to the max vcpu limit. Your patch below seems to be fine, but
>> doesn't help for that case.
> 
> Ah, you want to let userspace define a MAX_VCPUS that goes well beyond the current
> limit without negatively impacting existing setups.  My idea of a per-VM capability

Correct.

> still works, it would simply require separating the default max from the absolute
> max, which this patch mostly does already, it just neglects to set an absolute max.
> 
> Which is a good segue into pointing out that if a module param is added, it needs
> to be sanity checked against a KVM-defined max.  The admin may be trusted to some
> extent, but there is zero reason to let userspace set max_vcspus to 4 billion.
> At that point, it really is just a param vs. capability question.

I agree. Capping it at e.g. 65536 would probably be a good idea.

> I like the idea of a capability because there are already two known use cases,
> arm64's GIC and x86's TDX, and it could also be used to reduce the kernel's footprint
> for use cases that run large numbers of smaller VMs.
> 
> The other alternative would be to turn KVM_MAX_VCPUS into a Kconfig knob.  I assume

I like combining the capping and a Kconfig knob. So let the distro (or
whoever is building the kernel) decide, which is the max allowed value
(e.g. above 65536 per default).

> the partner isn't running a vanilla distro build and could set it as they see fit.

And here you are wrong. They'd like to use standard SUSE Linux (SLE).


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:15         ` Juergen Gross
@ 2021-11-18 15:32           ` Sean Christopherson
  2021-11-18 16:19             ` Juergen Gross
  2021-11-18 15:46           ` Sean Christopherson
  1 sibling, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 18.11.21 16:05, Sean Christopherson wrote:
> > the partner isn't running a vanilla distro build and could set it as they see fit.
> 
> And here you are wrong. They'd like to use standard SUSE Linux (SLE).

Huh.  As in, completely off-the-shelf kernel binaries without any tweaks to the
config?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:32           ` Sean Christopherson
@ 2021-11-18 16:19             ` Juergen Gross
  0 siblings, 0 replies; 17+ messages in thread
From: Juergen Gross @ 2021-11-18 16:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 446 bytes --]

On 18.11.21 16:32, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 18.11.21 16:05, Sean Christopherson wrote:
>>> the partner isn't running a vanilla distro build and could set it as they see fit.
>>
>> And here you are wrong. They'd like to use standard SUSE Linux (SLE).
> 
> Huh.  As in, completely off-the-shelf kernel binaries without any tweaks to the
> config?

This is the idea, yes.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:15         ` Juergen Gross
  2021-11-18 15:32           ` Sean Christopherson
@ 2021-11-18 15:46           ` Sean Christopherson
  1 sibling, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:46 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 18.11.21 16:05, Sean Christopherson wrote:
> > Which is a good segue into pointing out that if a module param is added, it needs
> > to be sanity checked against a KVM-defined max.  The admin may be trusted to some
> > extent, but there is zero reason to let userspace set max_vcspus to 4 billion.
> > At that point, it really is just a param vs. capability question.
> 
> I agree. Capping it at e.g. 65536 would probably be a good idea.

Any reason to choose 65536 in particular?  Why not cap it at the upper limit of
NR_CPUS_RANGE_END / MAXSMP, which is currently 8192?

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-11-18 16:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
2021-11-17  6:59   ` Juergen Gross
2021-11-17 23:46     ` Sean Christopherson
2021-11-18  7:45       ` Juergen Gross
2021-11-18 15:09         ` Sean Christopherson
2021-11-18 15:19           ` Juergen Gross
2021-11-17 23:44   ` Sean Christopherson
2021-11-18  7:44     ` Juergen Gross
2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
2021-11-17 20:57   ` Sean Christopherson
2021-11-18  7:16     ` Juergen Gross
2021-11-18 15:05       ` Sean Christopherson
2021-11-18 15:15         ` Juergen Gross
2021-11-18 15:32           ` Sean Christopherson
2021-11-18 16:19             ` Juergen Gross
2021-11-18 15:46           ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).