linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/29] KVM: VM planes
@ 2025-04-01 16:10 Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept Paolo Bonzini
                   ` (31 more replies)
  0 siblings, 32 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

I guess April 1st is not the best date to send out such a large series
after months of radio silence, but here we are.

AMD VMPLs, Intel TDX partitions, Microsoft Hyper-V VTLs, and ARM CCA planes.
are all examples of virtual privilege level concepts that are exclusive to
guests.  In all these specifications the hypervisor hosts multiple
copies of a vCPU's register state (or at least of most of it) and provides
hypercalls or instructions to switch between them.

This is the first draft of the implementation according to the sketch that
was prepared last year between Linux Plumbers and KVM Forum.  The initial
version of the API was posted last October, and the implementation only
needed small changes.

Attempts made in the past, mostly in the context of Hyper-V VTLs and SEV-SNP
VMPLs, fell into two categories:

- use a single vCPU file descriptor, and store multiple copies of the state
  in a single struct kvm_vcpu.  This approach requires a lot of changes to
  provide multiple copies of affected fields, especially MMUs and APICs;
  and complex uAPI extensions to direct existing ioctls to a specific
  privilege level.  While more or less workable for SEV-SNP VMPLs, that
  was only because the copies of the register state were hidden
  in the VMSA (KVM does not manage it); it showed all its problems when
  applied to Hyper-V VTLs.

  The main advantage was that KVM kept the knowledge of the relationship
  between vCPUs that have the same id but belong to different privilege
  levels.  This is important in order to accelerate switches in-kernel.

- use multiple VM and vCPU file descriptors, and handle the switch entirely
  in userspace.  This got gnarly pretty fast for even more reasons than
  the previous case, for example because VMs could not share anymore
  memslots, including dirty bitmaps and private/shared attributes (a
  substantial problem for SEV-SNP since VMPLs share their ASID).

  Opposite to the other case, the total lack of kernel-level sharing of
  register state, and lack of control that vCPUs do not run in parallel,
  is what makes this approach problematic for both kernel and userspace.
  In-kernel implementation of privilege level switch becomes from
  complicated to impossible, and userspace needs a lot of complexity
  as well to ensure that higher-privileged VTLs properly interrupted a
  lower-privileged one.

This design sits squarely in the middle: it gives the initial set of
VM and vCPU file descriptors the full set of ioctls + struct kvm_run,
whereas other privilege levels ("planes") instead only support a small
part of the KVM API.  In fact for the vm file descriptor it is only three
ioctls: KVM_CHECK_EXTENSION, KVM_SIGNAL_MSI, KVM_SET_MEMORY_ATTRIBUTES.
For vCPUs it is basically KVM_GET/SET_*.

Most notably, memslots and KVM_RUN are *not* included (the choice of
which plane to run is done via vcpu->run), which solves a lot of
the problems in both of the previous approaches.  Compared to the
multiple-file-descriptors solution, it gets for free the ability to
avoid parallel execution of the same vCPUs in different privilege levels.
Compared to having a single file descriptor churn is more limited, or
at least can be attacked in small bites.  For example in this series
only per-plane interrupt controllers are switched to use the new struct
kvm_plane in place of struct kvm, and that's more or less enough in
the absence of complex interrupt delivery scenarios.

Changes to the userspace API are also relatively small; they boil down
to the introduction of a single new kind of file descriptor and almost
entirely fit in common code.  Reviewing these VM-wide and architecture-
independent changes should be the main purpose of this RFC, since 
there are still some things to fix:

- I named some fields "plane" instead of "plane_id" because I expected no
  fields of type struct kvm_plane*, but in retrospect that wasn't a great
  idea.

- online_vcpus counts across all planes but x86 code is still using it to
  deal with TSC synchronization.  Probably I will try and make kvmclock
  synchronization per-plane instead of per-VM.

- we're going to need a struct kvm_vcpu_plane similar to what Roy had in
  https://lore.kernel.org/kvm/cover.1726506534.git.roy.hopkins@suse.com/
  (probably smaller though).  Requests are per-plane for example, and I'm
  pretty sure any simplistic solution would have some corner cases where
  it's wrong; but it's a high churn change and I wanted to avoid that
  for this first posting.

There's a handful of locking TODOs where things should be checked more
carefully, but clearly identifying vCPU data that is not per-plane will
also simplify locking, thanks to having a single vcpu->mutex for the
whole plane.  So I'm not particularly worried about that; the TDX saga
hopefully has taught everyone to move in baby steps towards the intended
direction.

The handling of interrupt priorities is way more complicated than I
anticipated, unfortunately; everything else seems to fall into place
decently well---even taking into account the above incompleteness,
which anyway should not be a blocker for any VTL or VMPL experiments.
But do shout if anything makes you feel like I was too lazy, and/or you
want to puke.

Patches 1-2 are documentation and uAPI definitions.

Patches 3-9 are the common code for VM planes, while patches 10-14
are the common code for vCPU file descriptors on non-default planes.

Patches 15-26 are the x86-specific code, which is organized as follows:

- 15-20: convert APIC code to place its data in the new struct
kvm_arch_plane instead of struct kvm_arch.

- 21-24: everything else except the new userspace exit, KVM_EXIT_PLANE_EVENT

- 25: KVM_EXIT_PLANE_EVENT, which is used when one plane interrupts another.

- 26: finally make the capability available to userspace

Patches 27-29 finally are the testcases.  More are possible and planned,
but these are enough to say that, despite the missing bits, what exits
is not _completely_ broken.  I also didn't want to write dozens of tests
before committing to a selftests API.

Available for now at https://git.kernel.org/pub/scm/virt/kvm/kvm.git
branch planes-20250401.  I plan to place it in kvm-coco-queue, for lack
of a better place, as soon as TDX is merged into kvm/next and I test it
with the usual battery of kvm-unit-tests and real world guests.

Thanks,

Paolo

Paolo Bonzini (29):
  Documentation: kvm: introduce "VM plane" concept
  KVM: API definitions for plane userspace exit
  KVM: add plane info to structs
  KVM: introduce struct kvm_arch_plane
  KVM: add plane support to KVM_SIGNAL_MSI
  KVM: move mem_attr_array to kvm_plane
  KVM: do not use online_vcpus to test vCPU validity
  KVM: move vcpu_array to struct kvm_plane
  KVM: implement plane file descriptors ioctl and creation
  KVM: share statistics for same vCPU id on different planes
  KVM: anticipate allocation of dirty ring
  KVM: share dirty ring for same vCPU id on different planes
  KVM: implement vCPU creation for extra planes
  KVM: pass plane to kvm_arch_vcpu_create
  KVM: x86: pass vcpu to kvm_pv_send_ipi()
  KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit
  KVM: x86: block creating irqchip if planes are active
  KVM: x86: track APICv inhibits per plane
  KVM: x86: move APIC map to kvm_arch_plane
  KVM: x86: add planes support for interrupt delivery
  KVM: x86: add infrastructure to share FPU across planes
  KVM: x86: implement initial plane support
  KVM: x86: extract kvm_post_set_cpuid
  KVM: x86: initialize CPUID for non-default planes
  KVM: x86: handle interrupt priorities for planes
  KVM: x86: enable up to 16 planes
  selftests: kvm: introduce basic test for VM planes
  selftests: kvm: add plane infrastructure
  selftests: kvm: add x86-specific plane test

 Documentation/virt/kvm/api.rst                | 245 +++++++--
 Documentation/virt/kvm/locking.rst            |   3 +
 Documentation/virt/kvm/vcpu-requests.rst      |   7 +
 arch/arm64/include/asm/kvm_host.h             |   5 +
 arch/arm64/kvm/arm.c                          |   4 +-
 arch/arm64/kvm/handle_exit.c                  |   6 +-
 arch/arm64/kvm/hyp/nvhe/gen-hyprel.c          |   4 +-
 arch/arm64/kvm/mmio.c                         |   4 +-
 arch/loongarch/include/asm/kvm_host.h         |   5 +
 arch/loongarch/kvm/exit.c                     |   8 +-
 arch/loongarch/kvm/vcpu.c                     |   4 +-
 arch/mips/include/asm/kvm_host.h              |   5 +
 arch/mips/kvm/emulate.c                       |   2 +-
 arch/mips/kvm/mips.c                          |  32 +-
 arch/mips/kvm/vz.c                            |  18 +-
 arch/powerpc/include/asm/kvm_host.h           |   5 +
 arch/powerpc/kvm/book3s.c                     |   2 +-
 arch/powerpc/kvm/book3s_hv.c                  |  46 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c          |   8 +-
 arch/powerpc/kvm/book3s_pr.c                  |  22 +-
 arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
 arch/powerpc/kvm/powerpc.c                    |   6 +-
 arch/powerpc/kvm/timing.h                     |  28 +-
 arch/riscv/include/asm/kvm_host.h             |   5 +
 arch/riscv/kvm/vcpu.c                         |   4 +-
 arch/riscv/kvm/vcpu_exit.c                    |  10 +-
 arch/riscv/kvm/vcpu_insn.c                    |  16 +-
 arch/riscv/kvm/vcpu_sbi.c                     |   2 +-
 arch/riscv/kvm/vcpu_sbi_hsm.c                 |   2 +-
 arch/s390/include/asm/kvm_host.h              |   5 +
 arch/s390/kvm/diag.c                          |  18 +-
 arch/s390/kvm/intercept.c                     |  20 +-
 arch/s390/kvm/interrupt.c                     |  48 +-
 arch/s390/kvm/kvm-s390.c                      |  10 +-
 arch/s390/kvm/priv.c                          |  60 +--
 arch/s390/kvm/sigp.c                          |  50 +-
 arch/s390/kvm/vsie.c                          |   2 +-
 arch/x86/include/asm/kvm_host.h               |  46 +-
 arch/x86/kvm/cpuid.c                          |  57 +-
 arch/x86/kvm/cpuid.h                          |   2 +
 arch/x86/kvm/debugfs.c                        |   2 +-
 arch/x86/kvm/hyperv.c                         |   7 +-
 arch/x86/kvm/i8254.c                          |   7 +-
 arch/x86/kvm/ioapic.c                         |   4 +-
 arch/x86/kvm/irq_comm.c                       |  14 +-
 arch/x86/kvm/kvm_cache_regs.h                 |   4 +-
 arch/x86/kvm/lapic.c                          | 147 +++--
 arch/x86/kvm/mmu/mmu.c                        |  41 +-
 arch/x86/kvm/mmu/tdp_mmu.c                    |   2 +-
 arch/x86/kvm/svm/sev.c                        |   4 +-
 arch/x86/kvm/svm/svm.c                        |  21 +-
 arch/x86/kvm/vmx/tdx.c                        |   8 +-
 arch/x86/kvm/vmx/vmx.c                        |  20 +-
 arch/x86/kvm/x86.c                            | 319 ++++++++---
 arch/x86/kvm/xen.c                            |   1 +
 include/linux/kvm_host.h                      | 130 +++--
 include/linux/kvm_types.h                     |   1 +
 include/uapi/linux/kvm.h                      |  28 +-
 tools/testing/selftests/kvm/Makefile.kvm      |   2 +
 .../testing/selftests/kvm/include/kvm_util.h  |  48 ++
 .../selftests/kvm/include/x86/processor.h     |   1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  65 ++-
 .../testing/selftests/kvm/lib/x86/processor.c |  15 +
 tools/testing/selftests/kvm/plane_test.c      | 103 ++++
 tools/testing/selftests/kvm/x86/plane_test.c  | 270 ++++++++++
 virt/kvm/dirty_ring.c                         |   5 +-
 virt/kvm/guest_memfd.c                        |   3 +-
 virt/kvm/irqchip.c                            |   5 +-
 virt/kvm/kvm_main.c                           | 500 ++++++++++++++----
 69 files changed, 1991 insertions(+), 614 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/plane_test.c
 create mode 100644 tools/testing/selftests/kvm/x86/plane_test.c

-- 
2.49.0


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2025-08-07 12:34 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
2025-04-01 16:10 ` [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept Paolo Bonzini
2025-04-21 18:43   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 02/29] KVM: API definitions for plane userspace exit Paolo Bonzini
2025-06-04  0:10   ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 03/29] KVM: add plane info to structs Paolo Bonzini
2025-04-21 18:57   ` Tom Lendacky
2025-04-21 19:04   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 04/29] KVM: introduce struct kvm_arch_plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 05/29] KVM: add plane support to KVM_SIGNAL_MSI Paolo Bonzini
2025-04-01 16:10 ` [PATCH 06/29] KVM: move mem_attr_array to kvm_plane Paolo Bonzini
2025-06-06 22:50   ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity Paolo Bonzini
2025-06-05 22:45   ` Sean Christopherson
2025-06-06 13:49     ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 08/29] KVM: move vcpu_array to struct kvm_plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 09/29] KVM: implement plane file descriptors ioctl and creation Paolo Bonzini
2025-04-21 20:32   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 10/29] KVM: share statistics for same vCPU id on different planes Paolo Bonzini
2025-06-06 16:23   ` Sean Christopherson
2025-06-06 16:32     ` Paolo Bonzini
2025-04-01 16:10 ` [PATCH 11/29] KVM: anticipate allocation of dirty ring Paolo Bonzini
2025-04-01 16:10 ` [PATCH 12/29] KVM: share dirty ring for same vCPU id on different planes Paolo Bonzini
2025-04-21 21:51   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 13/29] KVM: implement vCPU creation for extra planes Paolo Bonzini
2025-04-21 22:08   ` Tom Lendacky
2025-06-05 22:47     ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 14/29] KVM: pass plane to kvm_arch_vcpu_create Paolo Bonzini
2025-04-01 16:10 ` [PATCH 15/29] KVM: x86: pass vcpu to kvm_pv_send_ipi() Paolo Bonzini
2025-04-01 16:10 ` [PATCH 16/29] KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit Paolo Bonzini
2025-04-01 16:10 ` [PATCH 17/29] KVM: x86: block creating irqchip if planes are active Paolo Bonzini
2025-04-01 16:10 ` [PATCH 18/29] KVM: x86: track APICv inhibits per plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 19/29] KVM: x86: move APIC map to kvm_arch_plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 20/29] KVM: x86: add planes support for interrupt delivery Paolo Bonzini
2025-06-06 16:30   ` Sean Christopherson
2025-06-06 16:38     ` Paolo Bonzini
2025-04-01 16:10 ` [PATCH 21/29] KVM: x86: add infrastructure to share FPU across planes Paolo Bonzini
2025-04-01 16:10 ` [PATCH 22/29] KVM: x86: implement initial plane support Paolo Bonzini
2025-04-01 16:11 ` [PATCH 23/29] KVM: x86: extract kvm_post_set_cpuid Paolo Bonzini
2025-04-01 16:11 ` [PATCH 24/29] KVM: x86: initialize CPUID for non-default planes Paolo Bonzini
2025-04-01 16:11 ` [PATCH 25/29] KVM: x86: handle interrupt priorities for planes Paolo Bonzini
2025-04-01 16:11 ` [PATCH 26/29] KVM: x86: enable up to 16 planes Paolo Bonzini
2025-06-06 22:41   ` Sean Christopherson
2025-04-01 16:11 ` [PATCH 27/29] selftests: kvm: introduce basic test for VM planes Paolo Bonzini
2025-04-01 16:11 ` [PATCH 28/29] selftests: kvm: add plane infrastructure Paolo Bonzini
2025-04-01 16:11 ` [PATCH 29/29] selftests: kvm: add x86-specific plane test Paolo Bonzini
2025-04-01 16:16 ` [RFC PATCH 00/29] KVM: VM planes Sean Christopherson
2025-06-06 16:42 ` Tom Lendacky
2025-08-07 12:34 ` Vaishali Thakkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).