linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/29] KVM: VM planes
@ 2025-04-01 16:10 Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept Paolo Bonzini
                   ` (31 more replies)
  0 siblings, 32 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

I guess April 1st is not the best date to send out such a large series
after months of radio silence, but here we are.

AMD VMPLs, Intel TDX partitions, Microsoft Hyper-V VTLs, and ARM CCA planes.
are all examples of virtual privilege level concepts that are exclusive to
guests.  In all these specifications the hypervisor hosts multiple
copies of a vCPU's register state (or at least of most of it) and provides
hypercalls or instructions to switch between them.

This is the first draft of the implementation according to the sketch that
was prepared last year between Linux Plumbers and KVM Forum.  The initial
version of the API was posted last October, and the implementation only
needed small changes.

Attempts made in the past, mostly in the context of Hyper-V VTLs and SEV-SNP
VMPLs, fell into two categories:

- use a single vCPU file descriptor, and store multiple copies of the state
  in a single struct kvm_vcpu.  This approach requires a lot of changes to
  provide multiple copies of affected fields, especially MMUs and APICs;
  and complex uAPI extensions to direct existing ioctls to a specific
  privilege level.  While more or less workable for SEV-SNP VMPLs, that
  was only because the copies of the register state were hidden
  in the VMSA (KVM does not manage it); it showed all its problems when
  applied to Hyper-V VTLs.

  The main advantage was that KVM kept the knowledge of the relationship
  between vCPUs that have the same id but belong to different privilege
  levels.  This is important in order to accelerate switches in-kernel.

- use multiple VM and vCPU file descriptors, and handle the switch entirely
  in userspace.  This got gnarly pretty fast for even more reasons than
  the previous case, for example because VMs could not share anymore
  memslots, including dirty bitmaps and private/shared attributes (a
  substantial problem for SEV-SNP since VMPLs share their ASID).

  Opposite to the other case, the total lack of kernel-level sharing of
  register state, and lack of control that vCPUs do not run in parallel,
  is what makes this approach problematic for both kernel and userspace.
  In-kernel implementation of privilege level switch becomes from
  complicated to impossible, and userspace needs a lot of complexity
  as well to ensure that higher-privileged VTLs properly interrupted a
  lower-privileged one.

This design sits squarely in the middle: it gives the initial set of
VM and vCPU file descriptors the full set of ioctls + struct kvm_run,
whereas other privilege levels ("planes") instead only support a small
part of the KVM API.  In fact for the vm file descriptor it is only three
ioctls: KVM_CHECK_EXTENSION, KVM_SIGNAL_MSI, KVM_SET_MEMORY_ATTRIBUTES.
For vCPUs it is basically KVM_GET/SET_*.

Most notably, memslots and KVM_RUN are *not* included (the choice of
which plane to run is done via vcpu->run), which solves a lot of
the problems in both of the previous approaches.  Compared to the
multiple-file-descriptors solution, it gets for free the ability to
avoid parallel execution of the same vCPUs in different privilege levels.
Compared to having a single file descriptor churn is more limited, or
at least can be attacked in small bites.  For example in this series
only per-plane interrupt controllers are switched to use the new struct
kvm_plane in place of struct kvm, and that's more or less enough in
the absence of complex interrupt delivery scenarios.

Changes to the userspace API are also relatively small; they boil down
to the introduction of a single new kind of file descriptor and almost
entirely fit in common code.  Reviewing these VM-wide and architecture-
independent changes should be the main purpose of this RFC, since 
there are still some things to fix:

- I named some fields "plane" instead of "plane_id" because I expected no
  fields of type struct kvm_plane*, but in retrospect that wasn't a great
  idea.

- online_vcpus counts across all planes but x86 code is still using it to
  deal with TSC synchronization.  Probably I will try and make kvmclock
  synchronization per-plane instead of per-VM.

- we're going to need a struct kvm_vcpu_plane similar to what Roy had in
  https://lore.kernel.org/kvm/cover.1726506534.git.roy.hopkins@suse.com/
  (probably smaller though).  Requests are per-plane for example, and I'm
  pretty sure any simplistic solution would have some corner cases where
  it's wrong; but it's a high churn change and I wanted to avoid that
  for this first posting.

There's a handful of locking TODOs where things should be checked more
carefully, but clearly identifying vCPU data that is not per-plane will
also simplify locking, thanks to having a single vcpu->mutex for the
whole plane.  So I'm not particularly worried about that; the TDX saga
hopefully has taught everyone to move in baby steps towards the intended
direction.

The handling of interrupt priorities is way more complicated than I
anticipated, unfortunately; everything else seems to fall into place
decently well---even taking into account the above incompleteness,
which anyway should not be a blocker for any VTL or VMPL experiments.
But do shout if anything makes you feel like I was too lazy, and/or you
want to puke.

Patches 1-2 are documentation and uAPI definitions.

Patches 3-9 are the common code for VM planes, while patches 10-14
are the common code for vCPU file descriptors on non-default planes.

Patches 15-26 are the x86-specific code, which is organized as follows:

- 15-20: convert APIC code to place its data in the new struct
kvm_arch_plane instead of struct kvm_arch.

- 21-24: everything else except the new userspace exit, KVM_EXIT_PLANE_EVENT

- 25: KVM_EXIT_PLANE_EVENT, which is used when one plane interrupts another.

- 26: finally make the capability available to userspace

Patches 27-29 finally are the testcases.  More are possible and planned,
but these are enough to say that, despite the missing bits, what exits
is not _completely_ broken.  I also didn't want to write dozens of tests
before committing to a selftests API.

Available for now at https://git.kernel.org/pub/scm/virt/kvm/kvm.git
branch planes-20250401.  I plan to place it in kvm-coco-queue, for lack
of a better place, as soon as TDX is merged into kvm/next and I test it
with the usual battery of kvm-unit-tests and real world guests.

Thanks,

Paolo

Paolo Bonzini (29):
  Documentation: kvm: introduce "VM plane" concept
  KVM: API definitions for plane userspace exit
  KVM: add plane info to structs
  KVM: introduce struct kvm_arch_plane
  KVM: add plane support to KVM_SIGNAL_MSI
  KVM: move mem_attr_array to kvm_plane
  KVM: do not use online_vcpus to test vCPU validity
  KVM: move vcpu_array to struct kvm_plane
  KVM: implement plane file descriptors ioctl and creation
  KVM: share statistics for same vCPU id on different planes
  KVM: anticipate allocation of dirty ring
  KVM: share dirty ring for same vCPU id on different planes
  KVM: implement vCPU creation for extra planes
  KVM: pass plane to kvm_arch_vcpu_create
  KVM: x86: pass vcpu to kvm_pv_send_ipi()
  KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit
  KVM: x86: block creating irqchip if planes are active
  KVM: x86: track APICv inhibits per plane
  KVM: x86: move APIC map to kvm_arch_plane
  KVM: x86: add planes support for interrupt delivery
  KVM: x86: add infrastructure to share FPU across planes
  KVM: x86: implement initial plane support
  KVM: x86: extract kvm_post_set_cpuid
  KVM: x86: initialize CPUID for non-default planes
  KVM: x86: handle interrupt priorities for planes
  KVM: x86: enable up to 16 planes
  selftests: kvm: introduce basic test for VM planes
  selftests: kvm: add plane infrastructure
  selftests: kvm: add x86-specific plane test

 Documentation/virt/kvm/api.rst                | 245 +++++++--
 Documentation/virt/kvm/locking.rst            |   3 +
 Documentation/virt/kvm/vcpu-requests.rst      |   7 +
 arch/arm64/include/asm/kvm_host.h             |   5 +
 arch/arm64/kvm/arm.c                          |   4 +-
 arch/arm64/kvm/handle_exit.c                  |   6 +-
 arch/arm64/kvm/hyp/nvhe/gen-hyprel.c          |   4 +-
 arch/arm64/kvm/mmio.c                         |   4 +-
 arch/loongarch/include/asm/kvm_host.h         |   5 +
 arch/loongarch/kvm/exit.c                     |   8 +-
 arch/loongarch/kvm/vcpu.c                     |   4 +-
 arch/mips/include/asm/kvm_host.h              |   5 +
 arch/mips/kvm/emulate.c                       |   2 +-
 arch/mips/kvm/mips.c                          |  32 +-
 arch/mips/kvm/vz.c                            |  18 +-
 arch/powerpc/include/asm/kvm_host.h           |   5 +
 arch/powerpc/kvm/book3s.c                     |   2 +-
 arch/powerpc/kvm/book3s_hv.c                  |  46 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c          |   8 +-
 arch/powerpc/kvm/book3s_pr.c                  |  22 +-
 arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
 arch/powerpc/kvm/powerpc.c                    |   6 +-
 arch/powerpc/kvm/timing.h                     |  28 +-
 arch/riscv/include/asm/kvm_host.h             |   5 +
 arch/riscv/kvm/vcpu.c                         |   4 +-
 arch/riscv/kvm/vcpu_exit.c                    |  10 +-
 arch/riscv/kvm/vcpu_insn.c                    |  16 +-
 arch/riscv/kvm/vcpu_sbi.c                     |   2 +-
 arch/riscv/kvm/vcpu_sbi_hsm.c                 |   2 +-
 arch/s390/include/asm/kvm_host.h              |   5 +
 arch/s390/kvm/diag.c                          |  18 +-
 arch/s390/kvm/intercept.c                     |  20 +-
 arch/s390/kvm/interrupt.c                     |  48 +-
 arch/s390/kvm/kvm-s390.c                      |  10 +-
 arch/s390/kvm/priv.c                          |  60 +--
 arch/s390/kvm/sigp.c                          |  50 +-
 arch/s390/kvm/vsie.c                          |   2 +-
 arch/x86/include/asm/kvm_host.h               |  46 +-
 arch/x86/kvm/cpuid.c                          |  57 +-
 arch/x86/kvm/cpuid.h                          |   2 +
 arch/x86/kvm/debugfs.c                        |   2 +-
 arch/x86/kvm/hyperv.c                         |   7 +-
 arch/x86/kvm/i8254.c                          |   7 +-
 arch/x86/kvm/ioapic.c                         |   4 +-
 arch/x86/kvm/irq_comm.c                       |  14 +-
 arch/x86/kvm/kvm_cache_regs.h                 |   4 +-
 arch/x86/kvm/lapic.c                          | 147 +++--
 arch/x86/kvm/mmu/mmu.c                        |  41 +-
 arch/x86/kvm/mmu/tdp_mmu.c                    |   2 +-
 arch/x86/kvm/svm/sev.c                        |   4 +-
 arch/x86/kvm/svm/svm.c                        |  21 +-
 arch/x86/kvm/vmx/tdx.c                        |   8 +-
 arch/x86/kvm/vmx/vmx.c                        |  20 +-
 arch/x86/kvm/x86.c                            | 319 ++++++++---
 arch/x86/kvm/xen.c                            |   1 +
 include/linux/kvm_host.h                      | 130 +++--
 include/linux/kvm_types.h                     |   1 +
 include/uapi/linux/kvm.h                      |  28 +-
 tools/testing/selftests/kvm/Makefile.kvm      |   2 +
 .../testing/selftests/kvm/include/kvm_util.h  |  48 ++
 .../selftests/kvm/include/x86/processor.h     |   1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  65 ++-
 .../testing/selftests/kvm/lib/x86/processor.c |  15 +
 tools/testing/selftests/kvm/plane_test.c      | 103 ++++
 tools/testing/selftests/kvm/x86/plane_test.c  | 270 ++++++++++
 virt/kvm/dirty_ring.c                         |   5 +-
 virt/kvm/guest_memfd.c                        |   3 +-
 virt/kvm/irqchip.c                            |   5 +-
 virt/kvm/kvm_main.c                           | 500 ++++++++++++++----
 69 files changed, 1991 insertions(+), 614 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/plane_test.c
 create mode 100644 tools/testing/selftests/kvm/x86/plane_test.c

-- 
2.49.0


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-21 18:43   ` Tom Lendacky
  2025-04-01 16:10 ` [PATCH 02/29] KVM: API definitions for plane userspace exit Paolo Bonzini
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

There have been multiple occurrences of processors introducing a virtual
privilege level concept for guests, where the hypervisor hosts multiple
copies of a vCPU's register state (or at least of most of it) and provides
hypercalls or instructions to switch between them.  These include AMD
VMPLs, Intel TDX partitions, Microsoft Hyper-V VTLs, and ARM CCA planes.
Include documentation on how the feature will be exposed to userspace,
based on a draft made between Plumbers and KVM Forum.

In the past, two main solutions that were attempted, mostly in the context
of Hyper-V VTLs and SEV-SNP VMPLs:

- use a single vCPU file descriptor, and store multiple copies of the state
  in a single struct kvm_vcpu.  This requires a lot of changes to
  provide multiple copies of affected fields, especially MMUs and APICs;
  and complex uAPI extensions to direct existing ioctls to a specific
  privilege level.  This solution looked marginally okay for SEV-SNP
  VMPLs, but only because the copies of the register state were hidden
  in the VMSA (KVM does not manage it); it showed all its problems when
  applied to Hyper-V VTLs.

- use multiple VM and vCPU file descriptors, and handle the switch entirely
  in userspace.  This got gnarly pretty fast for even more reasons than
  the previous case, for example because VMs could not share anymore
  memslots, including dirty bitmaps and private/shared attributes (a
  substantial problem for SEV-SNP since VMPLs share their ASID).  Another
  problem was the need to share _some_ register state across VTLs and
  to control that vCPUs did not run in parallel; there needed to be a
  lot of logic to be added in userspace to ensure that higher-privileged
  VTL properly interrupted a lower-privileged one.

  This solution also complicates in-kernel implementation of privilege
  level switch, or even makes it impossible, because there is no kernel
  knowledge of the relationship between vCPUs that have the same id but
  belong to different privilege levels.

Especially given the need to accelerate switches in kernel, it is clear
that KVM needs some level of knowledge of the relationship between vCPUs
that have the same id but belong to different privilege levels.  For this
reason, I proposed a design that only gives the initial set of VM and vCPU file
descriptors the full set of ioctls + struct kvm_run; other privilege
levels instead only support a small part of the KVM API.  In fact for
the vm file descriptor it is only three ioctls: KVM_CHECK_EXTENSION,
KVM_SIGNAL_MSI, KVM_SET_MEMORY_ATTRIBUTES.  For vCPUs it is basically
KVM_GET/SET_*.

This solves a lot of the problems in the multiple-file-descriptors solution,
namely it gets for free the ability to avoid parallel execution of the
same vCPUs in different privilege levels.  Changes to the userspace API
of course exist, but they are relatively small and more easily backwards
compatible, because they boil down to the introduction of new file
descriptor kinds instead of having to change the inputs to all affected
ioctls.

It does share some of the code churn issues in the single-file-descriptor
solution; on the other hand a prototype multi-fd VMPL implementation[1]
also needed large scale changes which therefore seem unavoidable when
privilege levels are provided by hardware, and not a software concept
only as is the case for VTLs.
hardware

   [1] https://lore.kernel.org/lkml/cover.1726506534.git.roy.hopkins@suse.com/

Acknowledgements: thanks to everyone who participated in the discussions,
you are too many to mention in a small margin.  Thanks to Roy Hopkins,
Tom Lendacky, Anel Orazgaliyeva, Nicolas Saenz-Julienne for experimenting
with implementations of VTLs and VMPLs.

Ah, and because x86 has three names for it and Arm has one, choose the
Arm name for all architectures to avoid bikeshedding and to displease
everyone---including the KVM/arm64 folks, probably.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst           | 235 ++++++++++++++++++++---
 Documentation/virt/kvm/vcpu-requests.rst |   7 +
 2 files changed, 211 insertions(+), 31 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2a63a244e87a..e1c67bc6df47 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -56,6 +56,18 @@ be checked with :ref:`KVM_CHECK_EXTENSION <KVM_CHECK_EXTENSION>`.  Some
 capabilities also need to be enabled for VMs or VCPUs where their
 functionality is desired (see :ref:`cap_enable` and :ref:`cap_enable_vm`).
 
+On some architectures, a "virtual privilege level" concept may be present
+apart from the usual separation between user and supervisor mode, or
+between hypervisor and guest mode.  When this is the case, a single vCPU
+can have multiple copies of its register state (or at least most of it),
+and will switch between them through a special processor instruction,
+or through some kind of hypercall.
+
+KVM calls these privilege levels "planes".  Planes other than the
+initially-created one (called "plane 0") have a file descriptor each,
+and so do the planes of each vCPU.  Ioctls for vCPU planes should also
+be issued from a single thread, unless specially marked as asynchronous
+in the documentation.
 
 2. Restrictions
 ===============
@@ -119,6 +131,11 @@ description:
   Type:
       system, vm, or vcpu.
 
+      File descriptors for planes other than plane 0 provide a subset
+      of vm and vcpu ioctls.  Those that *are* supported in extra
+      planes are marked specially in the documentation (for example,
+      `vcpu (all planes)`).
+
   Parameters:
       what parameters are accepted by the ioctl.
 
@@ -264,7 +281,7 @@ otherwise.
 
 :Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl
 :Architectures: all
-:Type: system ioctl, vm ioctl
+:Type: system ioctl, vm ioctl (all planes)
 :Parameters: extension identifier (KVM_CAP_*)
 :Returns: 0 if unsupported; 1 (or some other positive integer) if supported
 
@@ -421,7 +438,7 @@ kvm_run' (see below).
 
 :Capability: basic
 :Architectures: all except arm64
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_regs (out)
 :Returns: 0 on success, -1 on error
 
@@ -461,7 +478,7 @@ Reads the general purpose registers from the vcpu.
 
 :Capability: basic
 :Architectures: all except arm64
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_regs (in)
 :Returns: 0 on success, -1 on error
 
@@ -475,7 +492,7 @@ See KVM_GET_REGS for the data structure.
 
 :Capability: basic
 :Architectures: x86, ppc
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_sregs (out)
 :Returns: 0 on success, -1 on error
 
@@ -506,7 +523,7 @@ but not yet injected into the cpu core.
 
 :Capability: basic
 :Architectures: x86, ppc
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_sregs (in)
 :Returns: 0 on success, -1 on error
 
@@ -519,7 +536,7 @@ data structures.
 
 :Capability: basic
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_translation (in/out)
 :Returns: 0 on success, -1 on error
 
@@ -645,7 +662,7 @@ This is an asynchronous vcpu ioctl and can be invoked from any thread.
 
 :Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system)
 :Architectures: x86
-:Type: system ioctl, vcpu ioctl
+:Type: system ioctl, vcpu ioctl (all planes)
 :Parameters: struct kvm_msrs (in/out)
 :Returns: number of msrs successfully returned;
           -1 on error
@@ -685,7 +702,7 @@ kvm will fill in the 'data' member.
 
 :Capability: basic
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_msrs (in)
 :Returns: number of msrs successfully set (see below), -1 on error
 
@@ -773,7 +790,7 @@ signal mask.
 
 :Capability: basic
 :Architectures: x86, loongarch
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_fpu (out)
 :Returns: 0 on success, -1 on error
 
@@ -811,7 +828,7 @@ Reads the floating point state from the vcpu.
 
 :Capability: basic
 :Architectures: x86, loongarch
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_fpu (in)
 :Returns: 0 on success, -1 on error
 
@@ -1126,7 +1143,7 @@ Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored.
 :Capability: KVM_CAP_VCPU_EVENTS
 :Extended by: KVM_CAP_INTR_SHADOW
 :Architectures: x86, arm64
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_vcpu_events (out)
 :Returns: 0 on success, -1 on error
 
@@ -1249,7 +1266,7 @@ directly to the virtual CPU).
 :Capability: KVM_CAP_VCPU_EVENTS
 :Extended by: KVM_CAP_INTR_SHADOW
 :Architectures: x86, arm64
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_vcpu_events (in)
 :Returns: 0 on success, -1 on error
 
@@ -1315,7 +1332,7 @@ See KVM_GET_VCPU_EVENTS for the data structure.
 
 :Capability: KVM_CAP_DEBUGREGS
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_debugregs (out)
 :Returns: 0 on success, -1 on error
 
@@ -1337,7 +1354,7 @@ Reads debug registers from the vcpu.
 
 :Capability: KVM_CAP_DEBUGREGS
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_debugregs (in)
 :Returns: 0 on success, -1 on error
 
@@ -1656,7 +1673,7 @@ otherwise it will return EBUSY error.
 
 :Capability: KVM_CAP_XSAVE
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_xsave (out)
 :Returns: 0 on success, -1 on error
 
@@ -1676,7 +1693,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
 
 :Capability: KVM_CAP_XSAVE and KVM_CAP_XSAVE2
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_xsave (in)
 :Returns: 0 on success, -1 on error
 
@@ -1704,7 +1721,7 @@ contents of CPUID leaf 0xD on the host.
 
 :Capability: KVM_CAP_XCRS
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_xcrs (out)
 :Returns: 0 on success, -1 on error
 
@@ -1731,7 +1748,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
 
 :Capability: KVM_CAP_XCRS
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_xcrs (in)
 :Returns: 0 on success, -1 on error
 
@@ -2027,7 +2044,7 @@ error.
 
 :Capability: KVM_CAP_IRQCHIP
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_lapic_state (out)
 :Returns: 0 on success, -1 on error
 
@@ -2058,7 +2075,7 @@ always uses xAPIC format.
 
 :Capability: KVM_CAP_IRQCHIP
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_lapic_state (in)
 :Returns: 0 on success, -1 on error
 
@@ -2292,7 +2309,7 @@ prior to calling the KVM_RUN ioctl.
 
 :Capability: KVM_CAP_ONE_REG
 :Architectures: all
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_one_reg (in)
 :Returns: 0 on success, negative value on failure
 
@@ -2907,7 +2924,7 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
 
 :Capability: KVM_CAP_ONE_REG
 :Architectures: all
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_one_reg (in and out)
 :Returns: 0 on success, negative value on failure
 
@@ -2961,7 +2978,7 @@ after pausing the vcpu, but before it is resumed.
 
 :Capability: KVM_CAP_SIGNAL_MSI
 :Architectures: x86 arm64
-:Type: vm ioctl
+:Type: vm ioctl (all planes)
 :Parameters: struct kvm_msi (in)
 :Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
 
@@ -3564,7 +3581,7 @@ VCPU matching underlying host.
 
 :Capability: basic
 :Architectures: arm64, mips, riscv
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_reg_list (in/out)
 :Returns: 0 on success; -1 on error
 
@@ -4861,7 +4878,7 @@ The acceptable values for the flags field are::
 
 :Capability: KVM_CAP_NESTED_STATE
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_nested_state (in/out)
 :Returns: 0 on success, -1 on error
 
@@ -4935,7 +4952,7 @@ to the KVM_CHECK_EXTENSION ioctl().
 
 :Capability: KVM_CAP_NESTED_STATE
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_nested_state (in)
 :Returns: 0 on success, -1 on error
 
@@ -5816,7 +5833,7 @@ then ``length`` is returned.
 
 :Capability: KVM_CAP_SREGS2
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_sregs2 (out)
 :Returns: 0 on success, -1 on error
 
@@ -5849,7 +5866,7 @@ flags values for ``kvm_sregs2``:
 
 :Capability: KVM_CAP_SREGS2
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_sregs2 (in)
 :Returns: 0 on success, -1 on error
 
@@ -6065,7 +6082,7 @@ as the descriptors in Descriptors block.
 
 :Capability: KVM_CAP_XSAVE2
 :Architectures: x86
-:Type: vcpu ioctl
+:Type: vcpu ioctl (all planes)
 :Parameters: struct kvm_xsave (out)
 :Returns: 0 on success, -1 on error
 
@@ -6323,7 +6340,7 @@ Returns -EINVAL if called on a protected VM.
 
 :Capability: KVM_CAP_MEMORY_ATTRIBUTES
 :Architectures: x86
-:Type: vm ioctl
+:Type: vm ioctl (all planes)
 :Parameters: struct kvm_memory_attributes (in)
 :Returns: 0 on success, <0 on error
 
@@ -6458,6 +6475,46 @@ the capability to be present.
 `flags` must currently be zero.
 
 
+.. _KVM_CREATE_PLANE:
+
+4.144 KVM_CREATE_PLANE
+----------------------
+
+:Capability: KVM_CAP_PLANES
+:Architectures: none
+:Type: vm ioctl
+:Parameters: plane id
+:Returns: a VM fd that can be used to control the new plane.
+
+Creates a new *plane*, i.e. a separate privilege level for the
+virtual machine.  Each plane has its own memory attributes,
+which can be used to enable more restricted permissions than
+what is allowed with ``KVM_SET_USER_MEMORY_REGION``.
+
+Each plane has a numeric id that is used when communicating
+with KVM through the :ref:`kvm_run <kvm_run>` struct.  While
+KVM is currently agnostic to whether low ids are more or less
+privileged, it is expected that this will not always be the
+case in the future.  For example KVM in the future may use
+the plane id when planes are supported by hardware (as is the
+case for VMPLs in AMD), or if KVM supports accelerated plane
+switch operations (as might be the case for Hyper-V VTLs).
+
+4.145 KVM_CREATE_VCPU_PLANE
+---------------------------
+
+:Capability: KVM_CAP_PLANES
+:Architectures: none
+:Type: vm ioctl (non default plane)
+:Parameters: vcpu file descriptor for the default plane
+:Returns: a vCPU fd that can be used to control the new plane
+          for the vCPU.
+
+Adds a vCPU to a plane; the new vCPU's id comes from the vCPU
+file descriptor that is passed in the argument.  Note that
+ because of how the API is defined, planes other than plane 0
+can only have a subset of the ids that are available in plane 0.
+
 .. _kvm_run:
 
 5. The kvm_run structure
@@ -6493,7 +6550,50 @@ This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
 
 ::
 
-	__u8 padding1[6];
+	/* in/out */
+	__u8 plane;
+
+The plane that will be run (usually 0).
+
+While this is not yet supported, in the future KVM may handle plane
+switch in the kernel.  In this case, the output value of this field
+may differ from the input value.  However, automatic switch will
+have to be :ref:`explicitly enabled <KVM_ENABLE_CAP>`.
+
+For backwards compatibility, this field is ignored unless a plane
+other than plane 0 has been created.
+
+::
+
+        /* in/out */
+        __u16 suspended_planes;
+
+A bitmap of planes whose execution was suspended to run a
+higher-privileged plane, usually via a hypercall or due to
+an interrupt in the higher-privileged plane.
+
+KVM right now does not use this field; it may be used in the future
+once KVM implements in-kernel plane switch mechanisms.  Until that
+is the case, userspace can leave this to zero.
+
+::
+
+	/* in */
+	__u16 req_exit_planes;
+
+A bitmap of planes for which KVM should exit when they have a pending
+interrupt.  In general, userspace should set bits corresponding to
+planes that are more privileged than ``plane``; because KVM is agnostic
+to whether low ids are more or less privileged, these could be the bits
+*above* or *below* ``plane``.  In some cases it may make sense to request
+an exit for all planes---for example, if the higher-priority plane
+wants to be informed about interrupts pending in lower-priority planes,
+userspace may need to learn about those as well.
+
+The bit at position ``plane`` is ignored; interrupts for the current
+plane are never delivered to userspace.
+
+::
 
 	/* out */
 	__u32 exit_reason;
@@ -7162,6 +7262,44 @@ The valid value for 'flags' is:
   - KVM_NOTIFY_CONTEXT_INVALID -- the VM context is corrupted and not valid
     in VMCS. It would run into unknown result if resume the target VM.
 
+::
+
+    /* KVM_EXIT_PLANE_EVENT */
+    struct {
+  #define KVM_PLANE_EVENT_INTERRUPT	1
+      __u16 cause;
+      __u16 pending_event_planes;
+      __u16 target;
+      __u16 padding;
+      __u32 flags;
+      __u64 extra;
+    } plane_event;
+
+Inform userspace of an event that affects a different plane than the
+currently executing one.
+
+On a ``KVM_EXIT_PLANE_EVENT`` exit, ``pending_event_planes`` is always
+set to the set of planes that have a pending interrupt.
+
+``cause`` provides the event that caused the exit, and the meaning of
+``target`` depends on the cause of the exit too.
+
+Right now the only defined cause is ``KVM_PLANE_EVENT_INTERRUPT``, i.e.
+an interrupt was received by a plane whose id is set in the
+``req_exit_planes`` bitmap.  In this case, ``target`` is the AND of
+``req_exit_planes`` and ``pending_event_planes``.
+
+``flags`` and ``extra`` are currently always 0.
+
+If userspace wants to switch to the target plane, it should move any
+shared state from the current plane to ``target``, and then invoke
+``KVM_RUN`` with ``kvm_run->plane`` set to ``target`` (and
+``req_exit_planes`` initialized accordingly).  Note that it's also
+valid to switch planes in response to other userspace exit codes, for
+example ``KVM_EXIT_X86_WRMSR`` or ``KVM_EXIT_HYPERCALL``.  Immediately
+after ``KVM_RUN`` is entered, KVM will check ``req_exit_planes`` and
+trigger a ``KVM_EXIT_PLANE_EVENT`` userspace exit if needed.
+
 ::
 
 		/* Fix the size of the union. */
@@ -8511,6 +8649,26 @@ ENOSYS for the others.
 When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
 type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
 
+7.46 KVM_CAP_PLANES_FPU
+-----------------------
+
+:Architectures: x86
+:Parameters: arg[0] is 0 if each vCPU plane has a separate FPU,
+             1 if the FPU is shared
+:Type: vm
+
+When enabled, such as KVM_SET_XSAVE or KVM_SET_FPU *are* available for
+vCPU on all planes, but they will read and write the same data that is presented
+to other planes.  Note that KVM_GET/SET_XSAVE also allows access to some
+registers that are *not* part of FPU state; right now this is just PKRU.
+Those are never shared.
+
+KVM_CAP_PLANES_FPU is experimental; userspace must *not* assume that
+KVM_CAP_PLANES_FPU is present on x86 for *any* VM type and different
+VM types may or may not allow enabling KVM_CAP_PLANES_FPU.  Like for other
+capabilities, KVM_CAP_PLANES_FPU can be queried on the VM file descriptor;
+KVM_CHECK_EXTENSION returns 1 if it is possible to enable shared FPU mode.
+
 8. Other capabilities.
 ======================
 
@@ -9037,6 +9195,21 @@ KVM exits with the register state of either the L1 or L2 guest
 depending on which executed at the time of an exit. Userspace must
 take care to differentiate between these cases.
 
+8.46 KVM_CAP_PLANES
+-------------------
+
+:Capability: KVM_CAP_PLANES
+:Architectures: x86
+:Type: system, vm
+
+The capability returns the maximum plane id that can be passed to
+:ref:`KVM_CREATE_PLANE <KVM_CREATE_PLANE>`.  Because the maximum
+id can vary according to the machine type, it is recommended to
+check for this capability on the VM file descriptor.
+
+When called on the system file descriptor, KVM returns the highest
+value supported on any machine type.
+
 9. Known KVM API problems
 =========================
 
diff --git a/Documentation/virt/kvm/vcpu-requests.rst b/Documentation/virt/kvm/vcpu-requests.rst
index 06718b9bc959..86ac67b98a74 100644
--- a/Documentation/virt/kvm/vcpu-requests.rst
+++ b/Documentation/virt/kvm/vcpu-requests.rst
@@ -286,6 +286,13 @@ architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
 to check if it should awaken.  One reason to do so is to provide
 architectures a function where requests may be checked if necessary.
 
+VM planes
+---------
+
+Each plane has its own set of requests.  Processing requests from
+another plane needs to go through a plane switch, for example via a
+`KVM_EXIT_PLANE_EVENT` userspace exit.
+
 References
 ==========
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/29] KVM: API definitions for plane userspace exit
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-06-04  0:10   ` Sean Christopherson
  2025-04-01 16:10 ` [PATCH 03/29] KVM: add plane info to structs Paolo Bonzini
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Copy over the uapi definitions from the Documentation/ directory.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/uapi/linux/kvm.h | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1e0a511c43d0..b0cca93ebcb3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,6 +135,16 @@ struct kvm_xen_exit {
 	} u;
 };
 
+struct kvm_plane_event_exit {
+#define KVM_PLANE_EVENT_INTERRUPT    1
+	__u16 cause;
+	__u16 pending_event_planes;
+	__u16 target;
+	__u16 padding;
+	__u32 flags;
+	__u64 extra[8];
+};
+
 struct kvm_tdx_exit {
 #define KVM_EXIT_TDX_VMCALL     1
         __u32 type;
@@ -262,7 +272,8 @@ struct kvm_tdx_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
-#define KVM_EXIT_TDX              40
+#define KVM_EXIT_PLANE_EVENT      40
+#define KVM_EXIT_TDX              41
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -295,7 +306,13 @@ struct kvm_run {
 	/* in */
 	__u8 request_interrupt_window;
 	__u8 HINT_UNSAFE_IN_KVM(immediate_exit);
-	__u8 padding1[6];
+
+	/* in/out */
+	__u8 plane;
+	__u16 suspended_planes;
+
+	/* in */
+	__u16 req_exit_planes;
 
 	/* out */
 	__u32 exit_reason;
@@ -532,6 +549,8 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_PLANE_EVENT */
+		struct kvm_plane_event_exit plane_event;
 		/* KVM_EXIT_TDX */
 		struct kvm_tdx_exit tdx;
 		/* Fix the size of the union. */
@@ -1017,6 +1036,8 @@ struct kvm_enable_cap {
 #define KVM_CAP_PRE_FAULT_MEMORY 236
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_PLANES 239
+#define KVM_CAP_PLANES_FPU 240
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/29] KVM: add plane info to structs
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 02/29] KVM: API definitions for plane userspace exit Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-21 18:57   ` Tom Lendacky
  2025-04-21 19:04   ` Tom Lendacky
  2025-04-01 16:10 ` [PATCH 04/29] KVM: introduce struct kvm_arch_plane Paolo Bonzini
                   ` (28 subsequent siblings)
  31 siblings, 2 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Add some of the data to move from one plane to the other within a VM,
typically from plane N to plane 0.

There is quite some difference here because while separate planes provide
very little of the vm file descriptor functionality, they are almost fully
functional vCPUs except that non-zero planes(*) can only be ran indirectly
through the initial plane.

Therefore, vCPUs use struct kvm_vcpu for all planes, with just a couple
fields that will be added later and will only be valid for plane 0.  At
the VM level instead plane info is stored in a completely different struct.
For now struct kvm_plane has no architecture-specific counterpart, but this
may change in the future if needed.  It's possible for example that some MMU
info becomes per-plane in order to support per-plane RWX permissions.

(*) I will restrain from calling them astral planes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h  | 17 ++++++++++++++++-
 include/linux/kvm_types.h |  1 +
 virt/kvm/kvm_main.c       | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c8f1facdb600..0e16c34080ef 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -84,6 +84,10 @@
 #define KVM_MAX_NR_ADDRESS_SPACES	1
 #endif
 
+#ifndef KVM_MAX_VCPU_PLANES
+#define KVM_MAX_VCPU_PLANES		1
+#endif
+
 /*
  * For the normal pfn, the highest 12 bits should be zero,
  * so we can mask bit 62 ~ bit 52  to indicate the error pfn,
@@ -332,7 +336,8 @@ struct kvm_vcpu {
 #ifdef CONFIG_PROVE_RCU
 	int srcu_depth;
 #endif
-	int mode;
+	short plane;
+	short mode;
 	u64 requests;
 	unsigned long guest_debug;
 
@@ -367,6 +372,8 @@ struct kvm_vcpu {
 	} async_pf;
 #endif
 
+	struct kvm_vcpu *plane0;
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 	/*
 	 * Cpu relax intercept or pause loop exit optimization
@@ -753,6 +760,11 @@ struct kvm_memslots {
 	int node_idx;
 };
 
+struct kvm_plane {
+	struct kvm *kvm;
+	int plane;
+};
+
 struct kvm {
 #ifdef KVM_HAVE_MMU_RWLOCK
 	rwlock_t mmu_lock;
@@ -777,6 +789,9 @@ struct kvm {
 	/* The current active memslot set for each address space */
 	struct kvm_memslots __rcu *memslots[KVM_MAX_NR_ADDRESS_SPACES];
 	struct xarray vcpu_array;
+
+	struct kvm_plane *planes[KVM_MAX_VCPU_PLANES];
+
 	/*
 	 * Protected by slots_lock, but can be read outside if an
 	 * incorrect answer is acceptable.
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 827ecc0b7e10..7d0a86108d1a 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -11,6 +11,7 @@ struct kvm_interrupt;
 struct kvm_irq_routing_table;
 struct kvm_memory_slot;
 struct kvm_one_reg;
+struct kvm_plane;
 struct kvm_run;
 struct kvm_userspace_memory_region;
 struct kvm_vcpu;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f6c947961b78..67773b6b9576 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1095,9 +1095,22 @@ void __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
 {
 }
 
+static struct kvm_plane *kvm_create_vm_plane(struct kvm *kvm, unsigned plane_id)
+{
+	struct kvm_plane *plane = kzalloc(sizeof(struct kvm_plane), GFP_KERNEL_ACCOUNT);
+
+	if (!plane)
+		return ERR_PTR(-ENOMEM);
+
+	plane->kvm = kvm;
+	plane->plane = plane_id;
+	return plane;
+}
+
 static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 {
 	struct kvm *kvm = kvm_arch_alloc_vm();
+	struct kvm_plane *plane0;
 	struct kvm_memslots *slots;
 	int r, i, j;
 
@@ -1136,6 +1149,13 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	snprintf(kvm->stats_id, sizeof(kvm->stats_id), "kvm-%d",
 		 task_pid_nr(current));
 
+	plane0 = kvm_create_vm_plane(kvm, 0);
+	if (IS_ERR(plane0)) {
+		r = PTR_ERR(plane0);
+		goto out_err_no_plane0;
+	}
+	kvm->planes[0] = plane0;
+
 	r = -ENOMEM;
 	if (init_srcu_struct(&kvm->srcu))
 		goto out_err_no_srcu;
@@ -1227,6 +1247,8 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 out_err_no_irq_srcu:
 	cleanup_srcu_struct(&kvm->srcu);
 out_err_no_srcu:
+	kfree(kvm->planes[0]);
+out_err_no_plane0:
 	kvm_arch_free_vm(kvm);
 	mmdrop(current->mm);
 	return ERR_PTR(r);
@@ -1253,6 +1275,10 @@ static void kvm_destroy_devices(struct kvm *kvm)
 	}
 }
 
+static void kvm_destroy_plane(struct kvm_plane *plane)
+{
+}
+
 static void kvm_destroy_vm(struct kvm *kvm)
 {
 	int i;
@@ -1309,6 +1335,11 @@ static void kvm_destroy_vm(struct kvm *kvm)
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 	xa_destroy(&kvm->mem_attr_array);
 #endif
+	for (i = 0; i < ARRAY_SIZE(kvm->planes); i++) {
+		struct kvm_plane *plane = kvm->planes[i];
+		if (plane)
+			kvm_destroy_plane(plane);
+	}
 	kvm_arch_free_vm(kvm);
 	preempt_notifier_dec();
 	kvm_disable_virtualization();
@@ -4110,6 +4141,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	}
 	vcpu->run = page_address(page);
 
+	vcpu->plane0 = vcpu;
 	kvm_vcpu_init(vcpu, kvm, id);
 
 	r = kvm_arch_vcpu_create(vcpu);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/29] KVM: introduce struct kvm_arch_plane
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 03/29] KVM: add plane info to structs Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 05/29] KVM: add plane support to KVM_SIGNAL_MSI Paolo Bonzini
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h     | 5 +++++
 arch/loongarch/include/asm/kvm_host.h | 5 +++++
 arch/mips/include/asm/kvm_host.h      | 5 +++++
 arch/powerpc/include/asm/kvm_host.h   | 5 +++++
 arch/riscv/include/asm/kvm_host.h     | 5 +++++
 arch/s390/include/asm/kvm_host.h      | 5 +++++
 arch/x86/include/asm/kvm_host.h       | 6 ++++++
 include/linux/kvm_host.h              | 2 ++
 virt/kvm/kvm_main.c                   | 3 +++
 9 files changed, 41 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d919557af5e5..b742275cda4d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -227,6 +227,9 @@ struct kvm_s2_mmu {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_arch_plane {
+};
+
 /**
  * struct kvm_smccc_features: Descriptor of the hypercall services exposed to the guests
  *
@@ -1334,6 +1337,8 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
 	return cpus_have_final_cap(ARM64_SPECTRE_V3A);
 }
 
+static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
+static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 
 void kvm_init_host_debug_data(void);
diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
index 2281293a5f59..24c1dafac855 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -73,6 +73,9 @@ struct kvm_arch_memory_slot {
 	unsigned long flags;
 };
 
+struct kvm_arch_plane {
+};
+
 #define HOST_MAX_PMNUM			16
 struct kvm_context {
 	unsigned long vpid_cache;
@@ -325,6 +328,8 @@ static inline bool kvm_is_ifetch_fault(struct kvm_vcpu_arch *arch)
 }
 
 /* Misc */
+static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
+static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index f7222eb594ea..d7be72c529b3 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -147,6 +147,9 @@ struct kvm_vcpu_stat {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_arch_plane {
+};
+
 #ifdef CONFIG_CPU_LOONGSON64
 struct ipi_state {
 	uint32_t status;
@@ -886,6 +889,8 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
 extern int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
 			     struct kvm_mips_interrupt *irq);
 
+static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
+static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_free_memslot(struct kvm *kvm,
 					 struct kvm_memory_slot *slot) {}
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 6e1108f8fce6..6023f0fd637b 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -256,6 +256,9 @@ struct kvm_arch_memory_slot {
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 };
 
+struct kvm_arch_plane {
+};
+
 struct kvm_hpt_info {
 	/* Host virtual (linear mapping) address of guest HPT */
 	unsigned long virt;
@@ -902,6 +905,8 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
+static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index cc33e35cd628..72f862194a0c 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -97,6 +97,9 @@ struct kvm_vcpu_stat {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_arch_plane {
+};
+
 struct kvm_vmid {
 	/*
 	 * Writes to vmid_version and vmid happen with vmid_lock held
@@ -301,6 +304,8 @@ static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu)
 	return IS_ENABLED(CONFIG_GUEST_PERF_EVENTS) && !!vcpu;
 }
 
+static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
+static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 
 #define KVM_RISCV_GSTAGE_TLB_MIN_ORDER		12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 9a367866cab0..63b79ce5c8ac 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -799,6 +799,9 @@ struct kvm_vm_stat {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_arch_plane {
+};
+
 struct s390_map_info {
 	struct list_head list;
 	__u64 guest_addr;
@@ -1056,6 +1059,8 @@ bool kvm_s390_pv_cpu_is_protected(struct kvm_vcpu *vcpu);
 extern int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc);
 extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
 
+static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
+static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_free_memslot(struct kvm *kvm,
 					 struct kvm_memory_slot *slot) {}
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 383b736cc6f1..8240f565a764 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1086,6 +1086,9 @@ struct kvm_arch_memory_slot {
 	unsigned short *gfn_write_track;
 };
 
+struct kvm_arch_plane {
+};
+
 /*
  * Track the mode of the optimized logical map, as the rules for decoding the
  * destination vary per mode.  Enabling the optimized logical map requires all
@@ -2357,6 +2360,9 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
 				       unsigned long *vcpu_bitmap);
 
+static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
+static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
+
 bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work);
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0e16c34080ef..6bd9b0b3cbee 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -763,6 +763,8 @@ struct kvm_memslots {
 struct kvm_plane {
 	struct kvm *kvm;
 	int plane;
+
+	struct kvm_arch_plane arch;
 };
 
 struct kvm {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 67773b6b9576..e83db27580da 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1104,6 +1104,8 @@ static struct kvm_plane *kvm_create_vm_plane(struct kvm *kvm, unsigned plane_id)
 
 	plane->kvm = kvm;
 	plane->plane = plane_id;
+
+	kvm_arch_init_plane(plane);
 	return plane;
 }
 
@@ -1277,6 +1279,7 @@ static void kvm_destroy_devices(struct kvm *kvm)
 
 static void kvm_destroy_plane(struct kvm_plane *plane)
 {
+	kvm_arch_free_plane(plane);
 }
 
 static void kvm_destroy_vm(struct kvm *kvm)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/29] KVM: add plane support to KVM_SIGNAL_MSI
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 04/29] KVM: introduce struct kvm_arch_plane Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 06/29] KVM: move mem_attr_array to kvm_plane Paolo Bonzini
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

struct kvm_kernel_irq_routing_entry is the main tool for sending
cross-plane IPIs.  Make kvm_send_userspace_msi the first function to
accept a struct kvm_plane pointer, in preparation for making it available
from plane file descriptors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h | 3 ++-
 virt/kvm/irqchip.c       | 5 ++++-
 virt/kvm/kvm_main.c      | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6bd9b0b3cbee..98bae5dc3515 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -684,6 +684,7 @@ struct kvm_kernel_irq_routing_entry {
 			u32 data;
 			u32 flags;
 			u32 devid;
+			u32 plane;
 		} msi;
 		struct kvm_s390_adapter_int adapter;
 		struct kvm_hv_sint hv_sint;
@@ -2218,7 +2219,7 @@ static inline int kvm_init_irq_routing(struct kvm *kvm)
 
 #endif
 
-int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
+int kvm_send_userspace_msi(struct kvm_plane *plane, struct kvm_msi *msi);
 
 void kvm_eventfd_init(struct kvm *kvm);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 162d8ed889f2..84952345e3c2 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -45,8 +45,10 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
 	return irq_rt->chip[irqchip][pin];
 }
 
-int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
+int kvm_send_userspace_msi(struct kvm_plane *plane, struct kvm_msi *msi)
 {
+	struct kvm *kvm = plane->kvm;
+	unsigned plane_id = plane->plane;
 	struct kvm_kernel_irq_routing_entry route;
 
 	if (!kvm_arch_irqchip_in_kernel(kvm) || (msi->flags & ~KVM_MSI_VALID_DEVID))
@@ -57,6 +59,7 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
 	route.msi.data = msi->data;
 	route.msi.flags = msi->flags;
 	route.msi.devid = msi->devid;
+	route.msi.plane = plane_id;
 
 	return kvm_set_msi(&route, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, false);
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e83db27580da..5b44a7f9e52e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5207,7 +5207,7 @@ static long kvm_vm_ioctl(struct file *filp,
 		r = -EFAULT;
 		if (copy_from_user(&msi, argp, sizeof(msi)))
 			goto out;
-		r = kvm_send_userspace_msi(kvm, &msi);
+		r = kvm_send_userspace_msi(kvm->planes[0], &msi);
 		break;
 	}
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/29] KVM: move mem_attr_array to kvm_plane
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (4 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 05/29] KVM: add plane support to KVM_SIGNAL_MSI Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-06-06 22:50   ` Sean Christopherson
  2025-04-01 16:10 ` [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity Paolo Bonzini
                   ` (25 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Another aspect of the VM that is now different for separate planes is
memory attributes, in order to support RWX permissions in the future.
The existing vm-level ioctls apply to plane 0 and the underlying
functionality operates on struct kvm_plane, which now hosts the
mem_attr_array xarray.

As a result, the pre/post architecture-specific callbacks also take
a plane.

Private/shared is a global attribute and only applies to plane 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   | 23 ++++++-----
 include/linux/kvm_host.h | 24 +++++++-----
 virt/kvm/guest_memfd.c   |  3 +-
 virt/kvm/kvm_main.c      | 85 +++++++++++++++++++++++++---------------
 4 files changed, 84 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a284dce227a0..04e4b041e248 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7670,9 +7670,11 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
 }
 
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
+bool kvm_arch_pre_set_memory_attributes(struct kvm_plane *plane,
 					struct kvm_gfn_range *range)
 {
+	struct kvm *kvm = plane->kvm;
+
 	/*
 	 * Zap SPTEs even if the slot can't be mapped PRIVATE.  KVM x86 only
 	 * supports KVM_MEMORY_ATTRIBUTE_PRIVATE, and so it *seems* like KVM
@@ -7714,26 +7716,27 @@ static void hugepage_set_mixed(struct kvm_memory_slot *slot, gfn_t gfn,
 	lpage_info_slot(gfn, slot, level)->disallow_lpage |= KVM_LPAGE_MIXED_FLAG;
 }
 
-static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot,
+static bool hugepage_has_attrs(struct kvm_plane *plane, struct kvm_memory_slot *slot,
 			       gfn_t gfn, int level, unsigned long attrs)
 {
 	const unsigned long start = gfn;
 	const unsigned long end = start + KVM_PAGES_PER_HPAGE(level);
 
 	if (level == PG_LEVEL_2M)
-		return kvm_range_has_memory_attributes(kvm, start, end, ~0, attrs);
+		return kvm_range_has_memory_attributes(plane, start, end, ~0, attrs);
 
 	for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) {
 		if (hugepage_test_mixed(slot, gfn, level - 1) ||
-		    attrs != kvm_get_memory_attributes(kvm, gfn))
+		    attrs != kvm_get_plane_memory_attributes(plane, gfn))
 			return false;
 	}
 	return true;
 }
 
-bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
+bool kvm_arch_post_set_memory_attributes(struct kvm_plane *plane,
 					 struct kvm_gfn_range *range)
 {
+	struct kvm *kvm = plane->kvm;
 	unsigned long attrs = range->arg.attributes;
 	struct kvm_memory_slot *slot = range->slot;
 	int level;
@@ -7767,7 +7770,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 			 */
 			if (gfn >= slot->base_gfn &&
 			    gfn + nr_pages <= slot->base_gfn + slot->npages) {
-				if (hugepage_has_attrs(kvm, slot, gfn, level, attrs))
+				if (hugepage_has_attrs(plane, slot, gfn, level, attrs))
 					hugepage_clear_mixed(slot, gfn, level);
 				else
 					hugepage_set_mixed(slot, gfn, level);
@@ -7789,7 +7792,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 		 */
 		if (gfn < range->end &&
 		    (gfn + nr_pages) <= (slot->base_gfn + slot->npages)) {
-			if (hugepage_has_attrs(kvm, slot, gfn, level, attrs))
+			if (hugepage_has_attrs(plane, slot, gfn, level, attrs))
 				hugepage_clear_mixed(slot, gfn, level);
 			else
 				hugepage_set_mixed(slot, gfn, level);
@@ -7801,11 +7804,13 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
 					    struct kvm_memory_slot *slot)
 {
+	struct kvm_plane *plane0;
 	int level;
 
 	if (!kvm_arch_has_private_mem(kvm))
 		return;
 
+	plane0 = kvm->planes[0];
 	for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
 		/*
 		 * Don't bother tracking mixed attributes for pages that can't
@@ -7825,9 +7830,9 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
 		 * be manually checked as the attributes may already be mixed.
 		 */
 		for (gfn = start; gfn < end; gfn += nr_pages) {
-			unsigned long attrs = kvm_get_memory_attributes(kvm, gfn);
+			unsigned long attrs = kvm_get_plane_memory_attributes(plane0, gfn);
 
-			if (hugepage_has_attrs(kvm, slot, gfn, level, attrs))
+			if (hugepage_has_attrs(plane0, slot, gfn, level, attrs))
 				hugepage_clear_mixed(slot, gfn, level);
 			else
 				hugepage_set_mixed(slot, gfn, level);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 98bae5dc3515..4d408d1d5ccc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -763,6 +763,10 @@ struct kvm_memslots {
 
 struct kvm_plane {
 	struct kvm *kvm;
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+	/* Protected by slots_locks (for writes) and RCU (for reads) */
+	struct xarray mem_attr_array;
+#endif
 	int plane;
 
 	struct kvm_arch_plane arch;
@@ -875,10 +879,6 @@ struct kvm {
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
 	struct notifier_block pm_notifier;
-#endif
-#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	/* Protected by slots_locks (for writes) and RCU (for reads) */
-	struct xarray mem_attr_array;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
 };
@@ -2511,20 +2511,26 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 }
 
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+static inline unsigned long kvm_get_plane_memory_attributes(struct kvm_plane *plane, gfn_t gfn)
 {
-	return xa_to_value(xa_load(&kvm->mem_attr_array, gfn));
+	return xa_to_value(xa_load(&plane->mem_attr_array, gfn));
 }
 
-bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
+static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+{
+	return kvm_get_plane_memory_attributes(kvm->planes[0], gfn);
+}
+
+bool kvm_range_has_memory_attributes(struct kvm_plane *plane, gfn_t start, gfn_t end,
 				     unsigned long mask, unsigned long attrs);
-bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
+bool kvm_arch_pre_set_memory_attributes(struct kvm_plane *plane,
 					struct kvm_gfn_range *range);
-bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
+bool kvm_arch_post_set_memory_attributes(struct kvm_plane *plane,
 					 struct kvm_gfn_range *range);
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
+	/* Private/shared is always in plane 0 */
 	return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
 	       kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
 }
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..f07102bcaf24 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -642,6 +642,7 @@ EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
 long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
 		       kvm_gmem_populate_cb post_populate, void *opaque)
 {
+	struct kvm_plane *plane0 = kvm->planes[0];
 	struct file *file;
 	struct kvm_memory_slot *slot;
 	void __user *p;
@@ -694,7 +695,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 			(npages - i) < (1 << max_order));
 
 		ret = -EINVAL;
-		while (!kvm_range_has_memory_attributes(kvm, gfn, gfn + (1 << max_order),
+		while (!kvm_range_has_memory_attributes(plane0, gfn, gfn + (1 << max_order),
 							KVM_MEMORY_ATTRIBUTE_PRIVATE,
 							KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
 			if (!max_order)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b44a7f9e52e..e343905e46d8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -500,6 +500,7 @@ static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
 }
 
 typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
+typedef bool (*plane_gfn_handler_t)(struct kvm_plane *plane, struct kvm_gfn_range *range);
 
 typedef void (*on_lock_fn_t)(struct kvm *kvm);
 
@@ -511,7 +512,11 @@ struct kvm_mmu_notifier_range {
 	u64 start;
 	u64 end;
 	union kvm_mmu_notifier_arg arg;
-	gfn_handler_t handler;
+	/* The only difference is the type of the first parameter.  */
+	union {
+		gfn_handler_t handler;
+		plane_gfn_handler_t handler_plane;
+	};
 	on_lock_fn_t on_lock;
 	bool flush_on_ret;
 	bool may_block;
@@ -1105,6 +1110,9 @@ static struct kvm_plane *kvm_create_vm_plane(struct kvm *kvm, unsigned plane_id)
 	plane->kvm = kvm;
 	plane->plane = plane_id;
 
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+	xa_init(&plane->mem_attr_array);
+#endif
 	kvm_arch_init_plane(plane);
 	return plane;
 }
@@ -1130,9 +1138,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	spin_lock_init(&kvm->mn_invalidate_lock);
 	rcuwait_init(&kvm->mn_memslots_update_rcuwait);
 	xa_init(&kvm->vcpu_array);
-#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	xa_init(&kvm->mem_attr_array);
-#endif
 
 	INIT_LIST_HEAD(&kvm->gpc_list);
 	spin_lock_init(&kvm->gpc_lock);
@@ -1280,6 +1285,10 @@ static void kvm_destroy_devices(struct kvm *kvm)
 static void kvm_destroy_plane(struct kvm_plane *plane)
 {
 	kvm_arch_free_plane(plane);
+
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+	xa_destroy(&plane->mem_attr_array);
+#endif
 }
 
 static void kvm_destroy_vm(struct kvm *kvm)
@@ -1335,9 +1344,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	}
 	cleanup_srcu_struct(&kvm->irq_srcu);
 	cleanup_srcu_struct(&kvm->srcu);
-#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	xa_destroy(&kvm->mem_attr_array);
-#endif
 	for (i = 0; i < ARRAY_SIZE(kvm->planes); i++) {
 		struct kvm_plane *plane = kvm->planes[i];
 		if (plane)
@@ -2385,9 +2391,9 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
 #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
 
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-static u64 kvm_supported_mem_attributes(struct kvm *kvm)
+static u64 kvm_supported_mem_attributes(struct kvm_plane *plane)
 {
-	if (!kvm || kvm_arch_has_private_mem(kvm))
+	if (!plane || (!plane->plane && kvm_arch_has_private_mem(plane->kvm)))
 		return KVM_MEMORY_ATTRIBUTE_PRIVATE;
 
 	return 0;
@@ -2397,19 +2403,20 @@ static u64 kvm_supported_mem_attributes(struct kvm *kvm)
  * Returns true if _all_ gfns in the range [@start, @end) have attributes
  * such that the bits in @mask match @attrs.
  */
-bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
+bool kvm_range_has_memory_attributes(struct kvm_plane *plane,
+				     gfn_t start, gfn_t end,
 				     unsigned long mask, unsigned long attrs)
 {
-	XA_STATE(xas, &kvm->mem_attr_array, start);
+	XA_STATE(xas, &plane->mem_attr_array, start);
 	unsigned long index;
 	void *entry;
 
-	mask &= kvm_supported_mem_attributes(kvm);
+	mask &= kvm_supported_mem_attributes(plane);
 	if (attrs & ~mask)
 		return false;
 
 	if (end == start + 1)
-		return (kvm_get_memory_attributes(kvm, start) & mask) == attrs;
+		return (kvm_get_plane_memory_attributes(plane, start) & mask) == attrs;
 
 	guard(rcu)();
 	if (!attrs)
@@ -2428,8 +2435,8 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	return true;
 }
 
-static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
-						 struct kvm_mmu_notifier_range *range)
+static __always_inline void __kvm_handle_gfn_range(struct kvm *kvm, void *arg1,
+						   struct kvm_mmu_notifier_range *range)
 {
 	struct kvm_gfn_range gfn_range;
 	struct kvm_memory_slot *slot;
@@ -2469,7 +2476,7 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
 					range->on_lock(kvm);
 			}
 
-			ret |= range->handler(kvm, &gfn_range);
+			ret |= range->handler(arg1, &gfn_range);
 		}
 	}
 
@@ -2480,7 +2487,19 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
 		KVM_MMU_UNLOCK(kvm);
 }
 
-static bool kvm_pre_set_memory_attributes(struct kvm *kvm,
+static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
+						 struct kvm_mmu_notifier_range *range)
+{
+	__kvm_handle_gfn_range(kvm, kvm, range);
+}
+
+static __always_inline void kvm_plane_handle_gfn_range(struct kvm_plane *plane,
+						       struct kvm_mmu_notifier_range *range)
+{
+	__kvm_handle_gfn_range(plane->kvm, plane, range);
+}
+
+static bool kvm_pre_set_memory_attributes(struct kvm_plane *plane,
 					  struct kvm_gfn_range *range)
 {
 	/*
@@ -2494,20 +2513,21 @@ static bool kvm_pre_set_memory_attributes(struct kvm *kvm,
 	 * but it's not obvious that allowing new mappings while the attributes
 	 * are in flux is desirable or worth the complexity.
 	 */
-	kvm_mmu_invalidate_range_add(kvm, range->start, range->end);
+	kvm_mmu_invalidate_range_add(plane->kvm, range->start, range->end);
 
-	return kvm_arch_pre_set_memory_attributes(kvm, range);
+	return kvm_arch_pre_set_memory_attributes(plane, range);
 }
 
 /* Set @attributes for the gfn range [@start, @end). */
-static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
+static int kvm_vm_set_mem_attributes(struct kvm_plane *plane, gfn_t start, gfn_t end,
 				     unsigned long attributes)
 {
+	struct kvm *kvm = plane->kvm;
 	struct kvm_mmu_notifier_range pre_set_range = {
 		.start = start,
 		.end = end,
 		.arg.attributes = attributes,
-		.handler = kvm_pre_set_memory_attributes,
+		.handler_plane = kvm_pre_set_memory_attributes,
 		.on_lock = kvm_mmu_invalidate_begin,
 		.flush_on_ret = true,
 		.may_block = true,
@@ -2516,7 +2536,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		.start = start,
 		.end = end,
 		.arg.attributes = attributes,
-		.handler = kvm_arch_post_set_memory_attributes,
+		.handler_plane = kvm_arch_post_set_memory_attributes,
 		.on_lock = kvm_mmu_invalidate_end,
 		.may_block = true,
 	};
@@ -2529,7 +2549,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	mutex_lock(&kvm->slots_lock);
 
 	/* Nothing to do if the entire range as the desired attributes. */
-	if (kvm_range_has_memory_attributes(kvm, start, end, ~0, attributes))
+	if (kvm_range_has_memory_attributes(plane, start, end, ~0, attributes))
 		goto out_unlock;
 
 	/*
@@ -2537,27 +2557,28 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	 * partway through setting the new attributes.
 	 */
 	for (i = start; i < end; i++) {
-		r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT);
+		r = xa_reserve(&plane->mem_attr_array, i, GFP_KERNEL_ACCOUNT);
 		if (r)
 			goto out_unlock;
 	}
 
-	kvm_handle_gfn_range(kvm, &pre_set_range);
+	kvm_plane_handle_gfn_range(plane, &pre_set_range);
 
 	for (i = start; i < end; i++) {
-		r = xa_err(xa_store(&kvm->mem_attr_array, i, entry,
+		r = xa_err(xa_store(&plane->mem_attr_array, i, entry,
 				    GFP_KERNEL_ACCOUNT));
 		KVM_BUG_ON(r, kvm);
 	}
 
-	kvm_handle_gfn_range(kvm, &post_set_range);
+	kvm_plane_handle_gfn_range(plane, &post_set_range);
 
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
 
 	return r;
 }
-static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
+
+static int kvm_vm_ioctl_set_mem_attributes(struct kvm_plane *plane,
 					   struct kvm_memory_attributes *attrs)
 {
 	gfn_t start, end;
@@ -2565,7 +2586,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	/* flags is currently not used. */
 	if (attrs->flags)
 		return -EINVAL;
-	if (attrs->attributes & ~kvm_supported_mem_attributes(kvm))
+	if (attrs->attributes & ~kvm_supported_mem_attributes(plane))
 		return -EINVAL;
 	if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
 		return -EINVAL;
@@ -2582,7 +2603,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	 */
 	BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long));
 
-	return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
+	return kvm_vm_set_mem_attributes(plane, start, end, attrs->attributes);
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
@@ -4867,7 +4888,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return 1;
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 	case KVM_CAP_MEMORY_ATTRIBUTES:
-		return kvm_supported_mem_attributes(kvm);
+		return kvm_supported_mem_attributes(kvm ? kvm->planes[0] : NULL);
 #endif
 #ifdef CONFIG_KVM_PRIVATE_MEM
 	case KVM_CAP_GUEST_MEMFD:
@@ -5274,7 +5295,7 @@ static long kvm_vm_ioctl(struct file *filp,
 		if (copy_from_user(&attrs, argp, sizeof(attrs)))
 			goto out;
 
-		r = kvm_vm_ioctl_set_mem_attributes(kvm, &attrs);
+		r = kvm_vm_ioctl_set_mem_attributes(kvm->planes[0], &attrs);
 		break;
 	}
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (5 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 06/29] KVM: move mem_attr_array to kvm_plane Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-06-05 22:45   ` Sean Christopherson
  2025-04-01 16:10 ` [PATCH 08/29] KVM: move vcpu_array to struct kvm_plane Paolo Bonzini
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Different planes can initialize their vCPUs separately, therefore there is
no single online_vcpus value that can be used to test that a vCPU has
indeed been fully initialized.

Use the shiny new plane field instead, initializing it to an invalid value
(-1) while the vCPU is visible in the xarray but may still disappear if
the creation fails.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/i8254.c     |  3 ++-
 include/linux/kvm_host.h | 23 ++++++-----------------
 virt/kvm/kvm_main.c      | 20 +++++++++++++-------
 3 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d7ab8780ab9e..e3a3e7b90c26 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -260,9 +260,10 @@ static void pit_do_work(struct kthread_work *work)
 	 * VCPUs and only when LVT0 is in NMI mode.  The interrupt can
 	 * also be simultaneously delivered through PIC and IOAPIC.
 	 */
-	if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
+	if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0) {
 		kvm_for_each_vcpu(i, vcpu, kvm)
 			kvm_apic_nmi_wd_deliver(vcpu);
+	}
 }
 
 static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4d408d1d5ccc..0db27814294f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -992,27 +992,16 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
 
 static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
 {
-	int num_vcpus = atomic_read(&kvm->online_vcpus);
-
-	/*
-	 * Explicitly verify the target vCPU is online, as the anti-speculation
-	 * logic only limits the CPU's ability to speculate, e.g. given a "bad"
-	 * index, clamping the index to 0 would return vCPU0, not NULL.
-	 */
-	if (i >= num_vcpus)
+	struct kvm_vcpu *vcpu = xa_load(&kvm->vcpu_array, i);
+	if (vcpu && unlikely(vcpu->plane == -1))
 		return NULL;
 
-	i = array_index_nospec(i, num_vcpus);
-
-	/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
-	smp_rmb();
-	return xa_load(&kvm->vcpu_array, i);
+	return vcpu;
 }
 
-#define kvm_for_each_vcpu(idx, vcpup, kvm)				\
-	if (atomic_read(&kvm->online_vcpus))				\
-		xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0,	\
-				  (atomic_read(&kvm->online_vcpus) - 1))
+#define kvm_for_each_vcpu(idx, vcpup, kvm)			\
+	xa_for_each(&kvm->vcpu_array, idx, vcpup)		\
+		if ((vcpup)->plane == -1) ; else		\
 
 static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e343905e46d8..eba02cb7cc57 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4186,6 +4186,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 		goto unlock_vcpu_destroy;
 	}
 
+	/*
+	 * Store an invalid plane number until fully initialized.  xa_insert() has
+	 * release semantics, which ensures the write is visible to kvm_get_vcpu().
+	 */
+	vcpu->plane = -1;
 	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
 	r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
 	WARN_ON_ONCE(r == -EBUSY);
@@ -4195,7 +4200,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	/*
 	 * Now it's all set up, let userspace reach it.  Grab the vCPU's mutex
 	 * so that userspace can't invoke vCPU ioctl()s until the vCPU is fully
-	 * visible (per online_vcpus), e.g. so that KVM doesn't get tricked
+	 * visible (valid vcpu->plane), e.g. so that KVM doesn't get tricked
 	 * into a NULL-pointer dereference because KVM thinks the _current_
 	 * vCPU doesn't exist.  As a bonus, taking vcpu->mutex ensures lockdep
 	 * knows it's taken *inside* kvm->lock.
@@ -4206,12 +4211,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	if (r < 0)
 		goto kvm_put_xa_erase;
 
-	/*
-	 * Pairs with smp_rmb() in kvm_get_vcpu.  Store the vcpu
-	 * pointer before kvm->online_vcpu's incremented value.
-	 */
-	smp_wmb();
 	atomic_inc(&kvm->online_vcpus);
+
+	/*
+	 * Pairs with xa_load() in kvm_get_vcpu, ensuring that online_vcpus
+	 * is updated before vcpu->plane.
+	 */
+	smp_store_release(&vcpu->plane, 0);
 	mutex_unlock(&vcpu->mutex);
 
 	mutex_unlock(&kvm->lock);
@@ -4355,7 +4361,7 @@ static int kvm_wait_for_vcpu_online(struct kvm_vcpu *vcpu)
 	 * In practice, this happy path will always be taken, as a well-behaved
 	 * VMM will never invoke a vCPU ioctl() before KVM_CREATE_VCPU returns.
 	 */
-	if (likely(vcpu->vcpu_idx < atomic_read(&kvm->online_vcpus)))
+	if (likely(vcpu->plane != -1))
 		return 0;
 
 	/*
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/29] KVM: move vcpu_array to struct kvm_plane
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (6 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 09/29] KVM: implement plane file descriptors ioctl and creation Paolo Bonzini
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Different planes may have only a subset of the vCPUs available in
the initial plane, therefore vcpu_array must also be moved to
struct kvm_plane.  New functions allow accessing the vCPUs of
a struct kvm_plane and, as usual, the older names automatically
go through kvm->planes[0].

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h | 29 +++++++++++++++++++++--------
 virt/kvm/kvm_main.c      | 22 +++++++++++++++-------
 2 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0db27814294f..0a91b556767e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -763,6 +763,7 @@ struct kvm_memslots {
 
 struct kvm_plane {
 	struct kvm *kvm;
+	struct xarray vcpu_array;
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 	/* Protected by slots_locks (for writes) and RCU (for reads) */
 	struct xarray mem_attr_array;
@@ -795,7 +796,6 @@ struct kvm {
 	struct kvm_memslots __memslots[KVM_MAX_NR_ADDRESS_SPACES][2];
 	/* The current active memslot set for each address space */
 	struct kvm_memslots __rcu *memslots[KVM_MAX_NR_ADDRESS_SPACES];
-	struct xarray vcpu_array;
 
 	struct kvm_plane *planes[KVM_MAX_VCPU_PLANES];
 
@@ -990,20 +990,20 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
 				      !refcount_read(&kvm->users_count));
 }
 
-static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
+static inline struct kvm_vcpu *kvm_get_plane_vcpu(struct kvm_plane *plane, int i)
 {
-	struct kvm_vcpu *vcpu = xa_load(&kvm->vcpu_array, i);
+	struct kvm_vcpu *vcpu = xa_load(&plane->vcpu_array, i);
 	if (vcpu && unlikely(vcpu->plane == -1))
 		return NULL;
 
 	return vcpu;
 }
 
-#define kvm_for_each_vcpu(idx, vcpup, kvm)			\
-	xa_for_each(&kvm->vcpu_array, idx, vcpup)		\
+#define kvm_for_each_plane_vcpu(idx, vcpup, plane_)				\
+	xa_for_each(&(plane_)->vcpu_array, idx, vcpup)		\
 		if ((vcpup)->plane == -1) ; else		\
 
-static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
+static inline struct kvm_vcpu *kvm_get_plane_vcpu_by_id(struct kvm_plane *plane, int id)
 {
 	struct kvm_vcpu *vcpu = NULL;
 	unsigned long i;
@@ -1011,15 +1011,28 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
 	if (id < 0)
 		return NULL;
 	if (id < KVM_MAX_VCPUS)
-		vcpu = kvm_get_vcpu(kvm, id);
+		vcpu = kvm_get_plane_vcpu(plane, id);
 	if (vcpu && vcpu->vcpu_id == id)
 		return vcpu;
-	kvm_for_each_vcpu(i, vcpu, kvm)
+	kvm_for_each_plane_vcpu(i, vcpu, plane)
 		if (vcpu->vcpu_id == id)
 			return vcpu;
 	return NULL;
 }
 
+static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
+{
+	return kvm_get_plane_vcpu(kvm->planes[0], i);
+}
+
+#define kvm_for_each_vcpu(idx, vcpup, kvm)				\
+	kvm_for_each_plane_vcpu(idx, vcpup, kvm->planes[0])
+
+static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
+{
+	return kvm_get_plane_vcpu_by_id(kvm->planes[0], id);
+}
+
 void kvm_destroy_vcpus(struct kvm *kvm);
 
 void vcpu_load(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index eba02cb7cc57..cd4dfc399cad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -481,12 +481,19 @@ static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 void kvm_destroy_vcpus(struct kvm *kvm)
 {
+	int j;
 	unsigned long i;
 	struct kvm_vcpu *vcpu;
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		kvm_vcpu_destroy(vcpu);
-		xa_erase(&kvm->vcpu_array, i);
+	for (j = ARRAY_SIZE(kvm->planes) - 1; j >= 0; j--) {
+		struct kvm_plane *plane = kvm->planes[j];
+		if (!plane)
+			continue;
+
+		kvm_for_each_plane_vcpu(i, vcpu, plane) {
+			kvm_vcpu_destroy(vcpu);
+			xa_erase(&plane->vcpu_array, i);
+		}
 	}
 
 	atomic_set(&kvm->online_vcpus, 0);
@@ -1110,6 +1117,7 @@ static struct kvm_plane *kvm_create_vm_plane(struct kvm *kvm, unsigned plane_id)
 	plane->kvm = kvm;
 	plane->plane = plane_id;
 
+	xa_init(&plane->vcpu_array);
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 	xa_init(&plane->mem_attr_array);
 #endif
@@ -1137,7 +1145,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	mutex_init(&kvm->slots_arch_lock);
 	spin_lock_init(&kvm->mn_invalidate_lock);
 	rcuwait_init(&kvm->mn_memslots_update_rcuwait);
-	xa_init(&kvm->vcpu_array);
 
 	INIT_LIST_HEAD(&kvm->gpc_list);
 	spin_lock_init(&kvm->gpc_lock);
@@ -3930,6 +3937,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 {
 	int nr_vcpus, start, i, idx, yielded;
 	struct kvm *kvm = me->kvm;
+	struct kvm_plane *plane = kvm->planes[me->plane];
 	struct kvm_vcpu *vcpu;
 	int try = 3;
 
@@ -3967,7 +3975,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 		if (idx == me->vcpu_idx)
 			continue;
 
-		vcpu = xa_load(&kvm->vcpu_array, idx);
+		vcpu = xa_load(&plane->vcpu_array, idx);
 		if (!READ_ONCE(vcpu->ready))
 			continue;
 		if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
@@ -4192,7 +4200,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	 */
 	vcpu->plane = -1;
 	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
-	r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
+	r = xa_insert(&kvm->planes[0]->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
 	WARN_ON_ONCE(r == -EBUSY);
 	if (r)
 		goto unlock_vcpu_destroy;
@@ -4228,7 +4236,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 kvm_put_xa_erase:
 	mutex_unlock(&vcpu->mutex);
 	kvm_put_kvm_no_destroy(kvm);
-	xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
+	xa_erase(&kvm->planes[0]->vcpu_array, vcpu->vcpu_idx);
 unlock_vcpu_destroy:
 	mutex_unlock(&kvm->lock);
 	kvm_dirty_ring_free(&vcpu->dirty_ring);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/29] KVM: implement plane file descriptors ioctl and creation
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (7 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 08/29] KVM: move vcpu_array to struct kvm_plane Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-21 20:32   ` Tom Lendacky
  2025-04-01 16:10 ` [PATCH 10/29] KVM: share statistics for same vCPU id on different planes Paolo Bonzini
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Add the file_operations for planes, the means to create new file
descriptors for them, and the KVM_CHECK_EXTENSION implementation for
the two new capabilities.

KVM_SIGNAL_MSI and KVM_SET_MEMORY_ATTRIBUTES are now available
through both vm and plane file descriptors, forward them to the
same function that is used by the file_operations for planes.
KVM_CHECK_EXTENSION instead remains separate, because it only
advertises a very small subset of capabilities when applied to
plane file descriptors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h |  19 +++++
 include/uapi/linux/kvm.h |   2 +
 virt/kvm/kvm_main.c      | 154 +++++++++++++++++++++++++++++++++------
 3 files changed, 154 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0a91b556767e..dbca418d64f5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -342,6 +342,8 @@ struct kvm_vcpu {
 	unsigned long guest_debug;
 
 	struct mutex mutex;
+
+	/* Shared for all planes */
 	struct kvm_run *run;
 
 #ifndef __KVM_HAVE_ARCH_WQP
@@ -922,6 +924,23 @@ static inline void kvm_vm_bugged(struct kvm *kvm)
 }
 
 
+#if KVM_MAX_VCPU_PLANES == 1
+static inline int kvm_arch_nr_vcpu_planes(struct kvm *kvm)
+{
+	return KVM_MAX_VCPU_PLANES;
+}
+
+static inline struct kvm_plane *vcpu_to_plane(struct kvm_vcpu *vcpu)
+{
+	return vcpu->kvm->planes[0];
+}
+#else
+static inline struct kvm_plane *vcpu_to_plane(struct kvm_vcpu *vcpu)
+{
+	return vcpu->kvm->planes[vcpu->plane_id];
+}
+#endif
+
 #define KVM_BUG(cond, kvm, fmt...)				\
 ({								\
 	bool __ret = !!(cond);					\
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b0cca93ebcb3..96d25c7fa18f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1690,4 +1690,6 @@ struct kvm_pre_fault_memory {
 	__u64 padding[5];
 };
 
+#define KVM_CREATE_PLANE	_IO(KVMIO, 0xd6)
+
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cd4dfc399cad..b08fea91dc74 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4388,6 +4388,80 @@ static int kvm_wait_for_vcpu_online(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int kvm_plane_ioctl_check_extension(struct kvm_plane *plane, long arg)
+{
+	switch (arg) {
+#ifdef CONFIG_HAVE_KVM_MSI
+	case KVM_CAP_SIGNAL_MSI:
+#endif
+	case KVM_CAP_CHECK_EXTENSION_VM:
+		return 1;
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+	case KVM_CAP_MEMORY_ATTRIBUTES:
+		return kvm_supported_mem_attributes(plane);
+#endif
+	default:
+		return 0;
+	}
+}
+
+static long __kvm_plane_ioctl(struct kvm_plane *plane, unsigned int ioctl,
+			      unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+#ifdef CONFIG_HAVE_KVM_MSI
+	case KVM_SIGNAL_MSI: {
+		struct kvm_msi msi;
+
+		if (copy_from_user(&msi, argp, sizeof(msi)))
+			return -EFAULT;
+		return kvm_send_userspace_msi(plane, &msi);
+	}
+#endif
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+	case KVM_SET_MEMORY_ATTRIBUTES: {
+		struct kvm_memory_attributes attrs;
+
+		if (copy_from_user(&attrs, argp, sizeof(attrs)))
+			return -EFAULT;
+		return kvm_vm_ioctl_set_mem_attributes(plane, &attrs);
+	}
+#endif
+	case KVM_CHECK_EXTENSION:
+		return kvm_plane_ioctl_check_extension(plane, arg);
+	default:
+		return -ENOTTY;
+	}
+}
+
+static long kvm_plane_ioctl(struct file *filp, unsigned int ioctl,
+			     unsigned long arg)
+{
+	struct kvm_plane *plane = filp->private_data;
+
+	if (plane->kvm->mm != current->mm || plane->kvm->vm_dead)
+		return -EIO;
+
+	return __kvm_plane_ioctl(plane, ioctl, arg);
+}
+
+static int kvm_plane_release(struct inode *inode, struct file *filp)
+{
+	struct kvm_plane *plane = filp->private_data;
+
+	kvm_put_kvm(plane->kvm);
+	return 0;
+}
+
+static struct file_operations kvm_plane_fops = {
+	.unlocked_ioctl = kvm_plane_ioctl,
+	.release = kvm_plane_release,
+	KVM_COMPAT(kvm_plane_ioctl),
+};
+
+
 static long kvm_vcpu_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
@@ -4878,6 +4952,14 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		if (kvm)
 			return kvm_arch_nr_memslot_as_ids(kvm);
 		return KVM_MAX_NR_ADDRESS_SPACES;
+#endif
+#if KVM_MAX_VCPU_PLANES > 1
+	case KVM_CAP_PLANES:
+		if (kvm)
+			return kvm_arch_nr_vcpu_planes(kvm);
+		return KVM_MAX_PLANES;
+	case KVM_CAP_PLANES_FPU:
+		return kvm_arch_planes_share_fpu(kvm);
 #endif
 	case KVM_CAP_NR_MEMSLOTS:
 		return KVM_USER_MEM_SLOTS;
@@ -5112,6 +5194,48 @@ static int kvm_vm_ioctl_get_stats_fd(struct kvm *kvm)
 	return fd;
 }
 
+static int kvm_vm_ioctl_create_plane(struct kvm *kvm, unsigned id)
+{
+	struct kvm_plane *plane;
+	struct file *file;
+	int r, fd;
+
+	if (id >= KVM_MAX_VCPU_PLANES)
+		return -EINVAL;
+
+	guard(mutex)(&kvm->lock);
+	if (kvm->planes[id])
+		return -EEXIST;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	plane = kvm_create_vm_plane(kvm, id);
+	if (IS_ERR(plane)) {
+		r = PTR_ERR(plane);
+		goto put_fd;
+	}
+
+	kvm_get_kvm(kvm);
+	file = anon_inode_getfile("kvm-plane", &kvm_plane_fops, plane, O_RDWR);
+	if (IS_ERR(file)) {
+		r = PTR_ERR(file);
+		goto put_kvm;
+	}
+
+	kvm->planes[id] = plane;
+	fd_install(fd, file);
+	return fd;
+
+put_kvm:
+	kvm_put_kvm(kvm);
+	kfree(plane);
+put_fd:
+	put_unused_fd(fd);
+	return r;
+}
+
 #define SANITY_CHECK_MEM_REGION_FIELD(field)					\
 do {										\
 	BUILD_BUG_ON(offsetof(struct kvm_userspace_memory_region, field) !=		\
@@ -5130,6 +5254,9 @@ static long kvm_vm_ioctl(struct file *filp,
 	if (kvm->mm != current->mm || kvm->vm_dead)
 		return -EIO;
 	switch (ioctl) {
+	case KVM_CREATE_PLANE:
+		r = kvm_vm_ioctl_create_plane(kvm, arg);
+		break;
 	case KVM_CREATE_VCPU:
 		r = kvm_vm_ioctl_create_vcpu(kvm, arg);
 		break;
@@ -5236,16 +5363,12 @@ static long kvm_vm_ioctl(struct file *filp,
 		break;
 	}
 #ifdef CONFIG_HAVE_KVM_MSI
-	case KVM_SIGNAL_MSI: {
-		struct kvm_msi msi;
-
-		r = -EFAULT;
-		if (copy_from_user(&msi, argp, sizeof(msi)))
-			goto out;
-		r = kvm_send_userspace_msi(kvm->planes[0], &msi);
-		break;
-	}
+	case KVM_SIGNAL_MSI:
 #endif
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+	case KVM_SET_MEMORY_ATTRIBUTES:
+#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
+		return __kvm_plane_ioctl(kvm->planes[0], ioctl, arg);
 #ifdef __KVM_HAVE_IRQ_LINE
 	case KVM_IRQ_LINE_STATUS:
 	case KVM_IRQ_LINE: {
@@ -5301,18 +5424,6 @@ static long kvm_vm_ioctl(struct file *filp,
 		break;
 	}
 #endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */
-#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
-	case KVM_SET_MEMORY_ATTRIBUTES: {
-		struct kvm_memory_attributes attrs;
-
-		r = -EFAULT;
-		if (copy_from_user(&attrs, argp, sizeof(attrs)))
-			goto out;
-
-		r = kvm_vm_ioctl_set_mem_attributes(kvm->planes[0], &attrs);
-		break;
-	}
-#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 	case KVM_CREATE_DEVICE: {
 		struct kvm_create_device cd;
 
@@ -6467,6 +6578,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	kvm_chardev_ops.owner = module;
 	kvm_vm_fops.owner = module;
 	kvm_vcpu_fops.owner = module;
+	kvm_plane_fops.owner = module;
 	kvm_device_fops.owner = module;
 
 	kvm_preempt_ops.sched_in = kvm_sched_in;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/29] KVM: share statistics for same vCPU id on different planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (8 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 09/29] KVM: implement plane file descriptors ioctl and creation Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-06-06 16:23   ` Sean Christopherson
  2025-04-01 16:10 ` [PATCH 11/29] KVM: anticipate allocation of dirty ring Paolo Bonzini
                   ` (21 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Statistics are protected by vcpu->mutex; because KVM_RUN takes the
plane-0 vCPU mutex, there is no race on applying statistics for all
planes to the plane-0 kvm_vcpu struct.

This saves the burden on the kernel of implementing the binary stats
interface for vCPU plane file descriptors, and on userspace of gathering
info from multiple planes.  The disadvantage is a slight loss of
information, and an extra pointer dereference when updating stats.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/arm64/kvm/arm.c                 |  2 +-
 arch/arm64/kvm/handle_exit.c         |  6 +--
 arch/arm64/kvm/hyp/nvhe/gen-hyprel.c |  4 +-
 arch/arm64/kvm/mmio.c                |  4 +-
 arch/loongarch/kvm/exit.c            |  8 ++--
 arch/loongarch/kvm/vcpu.c            |  2 +-
 arch/mips/kvm/emulate.c              |  2 +-
 arch/mips/kvm/mips.c                 | 30 +++++++-------
 arch/mips/kvm/vz.c                   | 18 ++++-----
 arch/powerpc/kvm/book3s.c            |  2 +-
 arch/powerpc/kvm/book3s_hv.c         | 46 ++++++++++-----------
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  8 ++--
 arch/powerpc/kvm/book3s_pr.c         | 22 +++++-----
 arch/powerpc/kvm/book3s_pr_papr.c    |  2 +-
 arch/powerpc/kvm/powerpc.c           |  4 +-
 arch/powerpc/kvm/timing.h            | 28 ++++++-------
 arch/riscv/kvm/vcpu.c                |  2 +-
 arch/riscv/kvm/vcpu_exit.c           | 10 ++---
 arch/riscv/kvm/vcpu_insn.c           | 16 ++++----
 arch/riscv/kvm/vcpu_sbi.c            |  2 +-
 arch/riscv/kvm/vcpu_sbi_hsm.c        |  2 +-
 arch/s390/kvm/diag.c                 | 18 ++++-----
 arch/s390/kvm/intercept.c            | 20 +++++-----
 arch/s390/kvm/interrupt.c            | 48 +++++++++++-----------
 arch/s390/kvm/kvm-s390.c             |  8 ++--
 arch/s390/kvm/priv.c                 | 60 ++++++++++++++--------------
 arch/s390/kvm/sigp.c                 | 50 +++++++++++------------
 arch/s390/kvm/vsie.c                 |  2 +-
 arch/x86/kvm/debugfs.c               |  2 +-
 arch/x86/kvm/hyperv.c                |  4 +-
 arch/x86/kvm/kvm_cache_regs.h        |  4 +-
 arch/x86/kvm/mmu/mmu.c               | 18 ++++-----
 arch/x86/kvm/mmu/tdp_mmu.c           |  2 +-
 arch/x86/kvm/svm/sev.c               |  2 +-
 arch/x86/kvm/svm/svm.c               | 18 ++++-----
 arch/x86/kvm/vmx/tdx.c               |  8 ++--
 arch/x86/kvm/vmx/vmx.c               | 20 +++++-----
 arch/x86/kvm/x86.c                   | 40 +++++++++----------
 include/linux/kvm_host.h             |  5 ++-
 virt/kvm/kvm_main.c                  | 19 ++++-----
 40 files changed, 285 insertions(+), 283 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0160b4924351..94fae442a8b8 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1187,7 +1187,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		ret = kvm_arm_vcpu_enter_exit(vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
-		vcpu->stat.exits++;
+		vcpu->stat->exits++;
 		/*
 		 * Back from guest
 		 *************************************************************/
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 512d152233ff..b4f69beedd88 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -38,7 +38,7 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 {
 	trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0),
 			    kvm_vcpu_hvc_get_imm(vcpu));
-	vcpu->stat.hvc_exit_stat++;
+	vcpu->stat->hvc_exit_stat++;
 
 	/* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
 	if (vcpu_has_nv(vcpu)) {
@@ -132,10 +132,10 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 
 	if (esr & ESR_ELx_WFx_ISS_WFE) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
-		vcpu->stat.wfe_exit_stat++;
+		vcpu->stat->wfe_exit_stat++;
 	} else {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
-		vcpu->stat.wfi_exit_stat++;
+		vcpu->stat->wfi_exit_stat++;
 	}
 
 	if (esr & ESR_ELx_WFx_ISS_WFxT) {
diff --git a/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c b/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c
index b63f4e1c1033..b7c3f3b8cc26 100644
--- a/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c
+++ b/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c
@@ -266,7 +266,7 @@ static void init_elf(const char *path)
 	}
 
 	/* mmap() the entire ELF file read-only at an arbitrary address. */
-	elf.begin = mmap(0, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	elf.begin = mmap(0, stat->st_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (elf.begin == MAP_FAILED) {
 		close(fd);
 		fatal_perror("Could not mmap ELF file");
@@ -276,7 +276,7 @@ static void init_elf(const char *path)
 	close(fd);
 
 	/* Get pointer to the ELF header. */
-	assert_ge(stat.st_size, sizeof(*elf.ehdr), "%lu");
+	assert_ge(stat->st_size, sizeof(*elf.ehdr), "%lu");
 	elf.ehdr = elf_ptr(Elf64_Ehdr, 0);
 
 	/* Check the ELF magic. */
diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index ab365e839874..96c5fd5146ba 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -221,14 +221,14 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 		/* We handled the access successfully in the kernel. */
 		if (!is_write)
 			memcpy(run->mmio.data, data_buf, len);
-		vcpu->stat.mmio_exit_kernel++;
+		vcpu->stat->mmio_exit_kernel++;
 		kvm_handle_mmio_return(vcpu);
 		return 1;
 	}
 
 	if (is_write)
 		memcpy(run->mmio.data, data_buf, len);
-	vcpu->stat.mmio_exit_user++;
+	vcpu->stat->mmio_exit_user++;
 	run->exit_reason	= KVM_EXIT_MMIO;
 	return 0;
 }
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index ea321403644a..ee5b3673efc8 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -31,7 +31,7 @@ static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
 
 	rd = inst.reg2_format.rd;
 	rj = inst.reg2_format.rj;
-	++vcpu->stat.cpucfg_exits;
+	++vcpu->stat->cpucfg_exits;
 	index = vcpu->arch.gprs[rj];
 
 	/*
@@ -264,7 +264,7 @@ int kvm_complete_iocsr_read(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 int kvm_emu_idle(struct kvm_vcpu *vcpu)
 {
-	++vcpu->stat.idle_exits;
+	++vcpu->stat->idle_exits;
 	trace_kvm_exit_idle(vcpu, KVM_TRACE_EXIT_IDLE);
 
 	if (!kvm_arch_vcpu_runnable(vcpu))
@@ -884,7 +884,7 @@ static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
 
 	switch (code) {
 	case KVM_HCALL_SERVICE:
-		vcpu->stat.hypercall_exits++;
+		vcpu->stat->hypercall_exits++;
 		kvm_handle_service(vcpu);
 		break;
 	case KVM_HCALL_USER_SERVICE:
@@ -893,7 +893,7 @@ static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
 			break;
 		}
 
-		vcpu->stat.hypercall_exits++;
+		vcpu->stat->hypercall_exits++;
 		vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
 		vcpu->run->hypercall.nr = KVM_HCALL_USER_SERVICE;
 		vcpu->run->hypercall.args[0] = kvm_read_reg(vcpu, LOONGARCH_GPR_A0);
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 552cde722932..470c79e79281 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -330,7 +330,7 @@ static int kvm_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		ret = kvm_handle_fault(vcpu, ecode);
 	} else {
 		WARN(!intr, "vm exiting with suspicious irq\n");
-		++vcpu->stat.int_exits;
+		++vcpu->stat->int_exits;
 	}
 
 	if (ret == RESUME_GUEST)
diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
index 0feec52222fb..c9f83b500078 100644
--- a/arch/mips/kvm/emulate.c
+++ b/arch/mips/kvm/emulate.c
@@ -947,7 +947,7 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu)
 	kvm_debug("[%#lx] !!!WAIT!!! (%#lx)\n", vcpu->arch.pc,
 		  vcpu->arch.pending_exceptions);
 
-	++vcpu->stat.wait_exits;
+	++vcpu->stat->wait_exits;
 	trace_kvm_exit(vcpu, KVM_TRACE_EXIT_WAIT);
 	if (!vcpu->arch.pending_exceptions) {
 		kvm_vz_lose_htimer(vcpu);
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 60b43ea85c12..77637d201699 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1199,7 +1199,7 @@ static int __kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
 	case EXCCODE_INT:
 		kvm_debug("[%d]EXCCODE_INT @ %p\n", vcpu->vcpu_id, opc);
 
-		++vcpu->stat.int_exits;
+		++vcpu->stat->int_exits;
 
 		if (need_resched())
 			cond_resched();
@@ -1210,7 +1210,7 @@ static int __kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
 	case EXCCODE_CPU:
 		kvm_debug("EXCCODE_CPU: @ PC: %p\n", opc);
 
-		++vcpu->stat.cop_unusable_exits;
+		++vcpu->stat->cop_unusable_exits;
 		ret = kvm_mips_callbacks->handle_cop_unusable(vcpu);
 		/* XXXKYMA: Might need to return to user space */
 		if (run->exit_reason == KVM_EXIT_IRQ_WINDOW_OPEN)
@@ -1218,7 +1218,7 @@ static int __kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
 		break;
 
 	case EXCCODE_MOD:
-		++vcpu->stat.tlbmod_exits;
+		++vcpu->stat->tlbmod_exits;
 		ret = kvm_mips_callbacks->handle_tlb_mod(vcpu);
 		break;
 
@@ -1227,7 +1227,7 @@ static int __kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
 			  cause, kvm_read_c0_guest_status(&vcpu->arch.cop0), opc,
 			  badvaddr);
 
-		++vcpu->stat.tlbmiss_st_exits;
+		++vcpu->stat->tlbmiss_st_exits;
 		ret = kvm_mips_callbacks->handle_tlb_st_miss(vcpu);
 		break;
 
@@ -1235,52 +1235,52 @@ static int __kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
 		kvm_debug("TLB LD fault: cause %#x, PC: %p, BadVaddr: %#lx\n",
 			  cause, opc, badvaddr);
 
-		++vcpu->stat.tlbmiss_ld_exits;
+		++vcpu->stat->tlbmiss_ld_exits;
 		ret = kvm_mips_callbacks->handle_tlb_ld_miss(vcpu);
 		break;
 
 	case EXCCODE_ADES:
-		++vcpu->stat.addrerr_st_exits;
+		++vcpu->stat->addrerr_st_exits;
 		ret = kvm_mips_callbacks->handle_addr_err_st(vcpu);
 		break;
 
 	case EXCCODE_ADEL:
-		++vcpu->stat.addrerr_ld_exits;
+		++vcpu->stat->addrerr_ld_exits;
 		ret = kvm_mips_callbacks->handle_addr_err_ld(vcpu);
 		break;
 
 	case EXCCODE_SYS:
-		++vcpu->stat.syscall_exits;
+		++vcpu->stat->syscall_exits;
 		ret = kvm_mips_callbacks->handle_syscall(vcpu);
 		break;
 
 	case EXCCODE_RI:
-		++vcpu->stat.resvd_inst_exits;
+		++vcpu->stat->resvd_inst_exits;
 		ret = kvm_mips_callbacks->handle_res_inst(vcpu);
 		break;
 
 	case EXCCODE_BP:
-		++vcpu->stat.break_inst_exits;
+		++vcpu->stat->break_inst_exits;
 		ret = kvm_mips_callbacks->handle_break(vcpu);
 		break;
 
 	case EXCCODE_TR:
-		++vcpu->stat.trap_inst_exits;
+		++vcpu->stat->trap_inst_exits;
 		ret = kvm_mips_callbacks->handle_trap(vcpu);
 		break;
 
 	case EXCCODE_MSAFPE:
-		++vcpu->stat.msa_fpe_exits;
+		++vcpu->stat->msa_fpe_exits;
 		ret = kvm_mips_callbacks->handle_msa_fpe(vcpu);
 		break;
 
 	case EXCCODE_FPE:
-		++vcpu->stat.fpe_exits;
+		++vcpu->stat->fpe_exits;
 		ret = kvm_mips_callbacks->handle_fpe(vcpu);
 		break;
 
 	case EXCCODE_MSADIS:
-		++vcpu->stat.msa_disabled_exits;
+		++vcpu->stat->msa_disabled_exits;
 		ret = kvm_mips_callbacks->handle_msa_disabled(vcpu);
 		break;
 
@@ -1317,7 +1317,7 @@ static int __kvm_mips_handle_exit(struct kvm_vcpu *vcpu)
 		if (signal_pending(current)) {
 			run->exit_reason = KVM_EXIT_INTR;
 			ret = (-EINTR << 2) | RESUME_HOST;
-			++vcpu->stat.signal_exits;
+			++vcpu->stat->signal_exits;
 			trace_kvm_exit(vcpu, KVM_TRACE_EXIT_SIGNAL);
 		}
 	}
diff --git a/arch/mips/kvm/vz.c b/arch/mips/kvm/vz.c
index ccab4d76b126..c37fd7b3e608 100644
--- a/arch/mips/kvm/vz.c
+++ b/arch/mips/kvm/vz.c
@@ -1162,7 +1162,7 @@ static enum emulation_result kvm_vz_gpsi_lwc2(union mips_instruction inst,
 	rd = inst.loongson3_lscsr_format.rd;
 	switch (inst.loongson3_lscsr_format.fr) {
 	case 0x8:  /* Read CPUCFG */
-		++vcpu->stat.vz_cpucfg_exits;
+		++vcpu->stat->vz_cpucfg_exits;
 		hostcfg = read_cpucfg(vcpu->arch.gprs[rs]);
 
 		switch (vcpu->arch.gprs[rs]) {
@@ -1491,38 +1491,38 @@ static int kvm_trap_vz_handle_guest_exit(struct kvm_vcpu *vcpu)
 	trace_kvm_exit(vcpu, KVM_TRACE_EXIT_GEXCCODE_BASE + gexccode);
 	switch (gexccode) {
 	case MIPS_GCTL0_GEXC_GPSI:
-		++vcpu->stat.vz_gpsi_exits;
+		++vcpu->stat->vz_gpsi_exits;
 		er = kvm_trap_vz_handle_gpsi(cause, opc, vcpu);
 		break;
 	case MIPS_GCTL0_GEXC_GSFC:
-		++vcpu->stat.vz_gsfc_exits;
+		++vcpu->stat->vz_gsfc_exits;
 		er = kvm_trap_vz_handle_gsfc(cause, opc, vcpu);
 		break;
 	case MIPS_GCTL0_GEXC_HC:
-		++vcpu->stat.vz_hc_exits;
+		++vcpu->stat->vz_hc_exits;
 		er = kvm_trap_vz_handle_hc(cause, opc, vcpu);
 		break;
 	case MIPS_GCTL0_GEXC_GRR:
-		++vcpu->stat.vz_grr_exits;
+		++vcpu->stat->vz_grr_exits;
 		er = kvm_trap_vz_no_handler_guest_exit(gexccode, cause, opc,
 						       vcpu);
 		break;
 	case MIPS_GCTL0_GEXC_GVA:
-		++vcpu->stat.vz_gva_exits;
+		++vcpu->stat->vz_gva_exits;
 		er = kvm_trap_vz_no_handler_guest_exit(gexccode, cause, opc,
 						       vcpu);
 		break;
 	case MIPS_GCTL0_GEXC_GHFC:
-		++vcpu->stat.vz_ghfc_exits;
+		++vcpu->stat->vz_ghfc_exits;
 		er = kvm_trap_vz_handle_ghfc(cause, opc, vcpu);
 		break;
 	case MIPS_GCTL0_GEXC_GPA:
-		++vcpu->stat.vz_gpa_exits;
+		++vcpu->stat->vz_gpa_exits;
 		er = kvm_trap_vz_no_handler_guest_exit(gexccode, cause, opc,
 						       vcpu);
 		break;
 	default:
-		++vcpu->stat.vz_resvd_exits;
+		++vcpu->stat->vz_resvd_exits;
 		er = kvm_trap_vz_no_handler_guest_exit(gexccode, cause, opc,
 						       vcpu);
 		break;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index d79c5d1098c0..7ea6955cd96c 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -178,7 +178,7 @@ void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu,
 
 void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
 {
-	vcpu->stat.queue_intr++;
+	vcpu->stat->queue_intr++;
 
 	set_bit(kvmppc_book3s_vec2irqprio(vec),
 		&vcpu->arch.pending_exceptions);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 86bff159c51e..6e94ffc0bb6b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -238,7 +238,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 
 	waitp = kvm_arch_vcpu_get_wait(vcpu);
 	if (rcuwait_wake_up(waitp))
-		++vcpu->stat.generic.halt_wakeup;
+		++vcpu->stat->generic.halt_wakeup;
 
 	cpu = READ_ONCE(vcpu->arch.thread_cpu);
 	if (cpu >= 0 && kvmppc_ipi_thread(cpu))
@@ -1633,7 +1633,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	struct kvm_run *run = vcpu->run;
 	int r = RESUME_HOST;
 
-	vcpu->stat.sum_exits++;
+	vcpu->stat->sum_exits++;
 
 	/*
 	 * This can happen if an interrupt occurs in the last stages
@@ -1662,13 +1662,13 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 		vcpu->arch.trap = BOOK3S_INTERRUPT_HV_DECREMENTER;
 		fallthrough;
 	case BOOK3S_INTERRUPT_HV_DECREMENTER:
-		vcpu->stat.dec_exits++;
+		vcpu->stat->dec_exits++;
 		r = RESUME_GUEST;
 		break;
 	case BOOK3S_INTERRUPT_EXTERNAL:
 	case BOOK3S_INTERRUPT_H_DOORBELL:
 	case BOOK3S_INTERRUPT_H_VIRT:
-		vcpu->stat.ext_intr_exits++;
+		vcpu->stat->ext_intr_exits++;
 		r = RESUME_GUEST;
 		break;
 	/* SR/HMI/PMI are HV interrupts that host has handled. Resume guest.*/
@@ -1971,7 +1971,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 	int r;
 	int srcu_idx;
 
-	vcpu->stat.sum_exits++;
+	vcpu->stat->sum_exits++;
 
 	/*
 	 * This can happen if an interrupt occurs in the last stages
@@ -1992,22 +1992,22 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
 	switch (vcpu->arch.trap) {
 	/* We're good on these - the host merely wanted to get our attention */
 	case BOOK3S_INTERRUPT_HV_DECREMENTER:
-		vcpu->stat.dec_exits++;
+		vcpu->stat->dec_exits++;
 		r = RESUME_GUEST;
 		break;
 	case BOOK3S_INTERRUPT_EXTERNAL:
-		vcpu->stat.ext_intr_exits++;
+		vcpu->stat->ext_intr_exits++;
 		r = RESUME_HOST;
 		break;
 	case BOOK3S_INTERRUPT_H_DOORBELL:
 	case BOOK3S_INTERRUPT_H_VIRT:
-		vcpu->stat.ext_intr_exits++;
+		vcpu->stat->ext_intr_exits++;
 		r = RESUME_GUEST;
 		break;
 	/* These need to go to the nested HV */
 	case BOOK3S_INTERRUPT_NESTED_HV_DECREMENTER:
 		vcpu->arch.trap = BOOK3S_INTERRUPT_HV_DECREMENTER;
-		vcpu->stat.dec_exits++;
+		vcpu->stat->dec_exits++;
 		r = RESUME_HOST;
 		break;
 	/* SR/HMI/PMI are HV interrupts that host has handled. Resume guest.*/
@@ -4614,7 +4614,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 	cur = start_poll = ktime_get();
 	if (vc->halt_poll_ns) {
 		ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns);
-		++vc->runner->stat.generic.halt_attempted_poll;
+		++vc->runner->stat->generic.halt_attempted_poll;
 
 		vc->vcore_state = VCORE_POLLING;
 		spin_unlock(&vc->lock);
@@ -4631,7 +4631,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 		vc->vcore_state = VCORE_INACTIVE;
 
 		if (!do_sleep) {
-			++vc->runner->stat.generic.halt_successful_poll;
+			++vc->runner->stat->generic.halt_successful_poll;
 			goto out;
 		}
 	}
@@ -4643,7 +4643,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 		do_sleep = 0;
 		/* If we polled, count this as a successful poll */
 		if (vc->halt_poll_ns)
-			++vc->runner->stat.generic.halt_successful_poll;
+			++vc->runner->stat->generic.halt_successful_poll;
 		goto out;
 	}
 
@@ -4657,7 +4657,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 	spin_lock(&vc->lock);
 	vc->vcore_state = VCORE_INACTIVE;
 	trace_kvmppc_vcore_blocked(vc->runner, 1);
-	++vc->runner->stat.halt_successful_wait;
+	++vc->runner->stat->halt_successful_wait;
 
 	cur = ktime_get();
 
@@ -4666,29 +4666,29 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 
 	/* Attribute wait time */
 	if (do_sleep) {
-		vc->runner->stat.generic.halt_wait_ns +=
+		vc->runner->stat->generic.halt_wait_ns +=
 			ktime_to_ns(cur) - ktime_to_ns(start_wait);
 		KVM_STATS_LOG_HIST_UPDATE(
-				vc->runner->stat.generic.halt_wait_hist,
+				vc->runner->stat->generic.halt_wait_hist,
 				ktime_to_ns(cur) - ktime_to_ns(start_wait));
 		/* Attribute failed poll time */
 		if (vc->halt_poll_ns) {
-			vc->runner->stat.generic.halt_poll_fail_ns +=
+			vc->runner->stat->generic.halt_poll_fail_ns +=
 				ktime_to_ns(start_wait) -
 				ktime_to_ns(start_poll);
 			KVM_STATS_LOG_HIST_UPDATE(
-				vc->runner->stat.generic.halt_poll_fail_hist,
+				vc->runner->stat->generic.halt_poll_fail_hist,
 				ktime_to_ns(start_wait) -
 				ktime_to_ns(start_poll));
 		}
 	} else {
 		/* Attribute successful poll time */
 		if (vc->halt_poll_ns) {
-			vc->runner->stat.generic.halt_poll_success_ns +=
+			vc->runner->stat->generic.halt_poll_success_ns +=
 				ktime_to_ns(cur) -
 				ktime_to_ns(start_poll);
 			KVM_STATS_LOG_HIST_UPDATE(
-				vc->runner->stat.generic.halt_poll_success_hist,
+				vc->runner->stat->generic.halt_poll_success_hist,
 				ktime_to_ns(cur) - ktime_to_ns(start_poll));
 		}
 	}
@@ -4807,7 +4807,7 @@ static int kvmppc_run_vcpu(struct kvm_vcpu *vcpu)
 			kvmppc_core_prepare_to_enter(v);
 			if (signal_pending(v->arch.run_task)) {
 				kvmppc_remove_runnable(vc, v, mftb());
-				v->stat.signal_exits++;
+				v->stat->signal_exits++;
 				v->run->exit_reason = KVM_EXIT_INTR;
 				v->arch.ret = -EINTR;
 				wake_up(&v->arch.cpu_run);
@@ -4848,7 +4848,7 @@ static int kvmppc_run_vcpu(struct kvm_vcpu *vcpu)
 
 	if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) {
 		kvmppc_remove_runnable(vc, vcpu, mftb());
-		vcpu->stat.signal_exits++;
+		vcpu->stat->signal_exits++;
 		run->exit_reason = KVM_EXIT_INTR;
 		vcpu->arch.ret = -EINTR;
 	}
@@ -5047,7 +5047,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 		for (;;) {
 			set_current_state(TASK_INTERRUPTIBLE);
 			if (signal_pending(current)) {
-				vcpu->stat.signal_exits++;
+				vcpu->stat->signal_exits++;
 				run->exit_reason = KVM_EXIT_INTR;
 				vcpu->arch.ret = -EINTR;
 				break;
@@ -5070,7 +5070,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 	return vcpu->arch.ret;
 
  sigpend:
-	vcpu->stat.signal_exits++;
+	vcpu->stat->signal_exits++;
 	run->exit_reason = KVM_EXIT_INTR;
 	vcpu->arch.ret = -EINTR;
  out:
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index f2636414d82a..59f740a88581 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -132,7 +132,7 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 	int hcore;
 
 	/* Mark the target VCPU as having an interrupt pending */
-	vcpu->stat.queue_intr++;
+	vcpu->stat->queue_intr++;
 	set_bit(BOOK3S_IRQPRIO_EXTERNAL, &vcpu->arch.pending_exceptions);
 
 	/* Kick self ? Just set MER and return */
@@ -713,14 +713,14 @@ static int ics_rm_eoi(struct kvm_vcpu *vcpu, u32 irq)
 
 	/* Handle passthrough interrupts */
 	if (state->host_irq) {
-		++vcpu->stat.pthru_all;
+		++vcpu->stat->pthru_all;
 		if (state->intr_cpu != -1) {
 			int pcpu = raw_smp_processor_id();
 
 			pcpu = cpu_first_thread_sibling(pcpu);
-			++vcpu->stat.pthru_host;
+			++vcpu->stat->pthru_host;
 			if (state->intr_cpu != pcpu) {
-				++vcpu->stat.pthru_bad_aff;
+				++vcpu->stat->pthru_bad_aff;
 				xics_opal_set_server(state->host_irq, pcpu);
 			}
 			state->intr_cpu = -1;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 83bcdc80ce51..8cbf7ecc796d 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -493,7 +493,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
 	if (msr & MSR_POW) {
 		if (!vcpu->arch.pending_exceptions) {
 			kvm_vcpu_halt(vcpu);
-			vcpu->stat.generic.halt_wakeup++;
+			vcpu->stat->generic.halt_wakeup++;
 
 			/* Unset POW bit after we woke up */
 			msr &= ~MSR_POW;
@@ -776,13 +776,13 @@ static int kvmppc_handle_pagefault(struct kvm_vcpu *vcpu,
 			return RESUME_HOST;
 		}
 		if (data)
-			vcpu->stat.sp_storage++;
+			vcpu->stat->sp_storage++;
 		else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
 			 (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
 			kvmppc_patch_dcbz(vcpu, &pte);
 	} else {
 		/* MMIO */
-		vcpu->stat.mmio_exits++;
+		vcpu->stat->mmio_exits++;
 		vcpu->arch.paddr_accessed = pte.raddr;
 		vcpu->arch.vaddr_accessed = pte.eaddr;
 		r = kvmppc_emulate_mmio(vcpu);
@@ -1103,7 +1103,7 @@ static int kvmppc_exit_pr_progint(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 		}
 	}
 
-	vcpu->stat.emulated_inst_exits++;
+	vcpu->stat->emulated_inst_exits++;
 	er = kvmppc_emulate_instruction(vcpu);
 	switch (er) {
 	case EMULATE_DONE:
@@ -1138,7 +1138,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 	int r = RESUME_HOST;
 	int s;
 
-	vcpu->stat.sum_exits++;
+	vcpu->stat->sum_exits++;
 
 	run->exit_reason = KVM_EXIT_UNKNOWN;
 	run->ready_for_interrupt_injection = 1;
@@ -1152,7 +1152,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 	case BOOK3S_INTERRUPT_INST_STORAGE:
 	{
 		ulong shadow_srr1 = vcpu->arch.shadow_srr1;
-		vcpu->stat.pf_instruc++;
+		vcpu->stat->pf_instruc++;
 
 		if (kvmppc_is_split_real(vcpu))
 			kvmppc_fixup_split_real(vcpu);
@@ -1180,7 +1180,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 			int idx = srcu_read_lock(&vcpu->kvm->srcu);
 			r = kvmppc_handle_pagefault(vcpu, kvmppc_get_pc(vcpu), exit_nr);
 			srcu_read_unlock(&vcpu->kvm->srcu, idx);
-			vcpu->stat.sp_instruc++;
+			vcpu->stat->sp_instruc++;
 		} else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
 			  (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
 			/*
@@ -1201,7 +1201,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 	{
 		ulong dar = kvmppc_get_fault_dar(vcpu);
 		u32 fault_dsisr = vcpu->arch.fault_dsisr;
-		vcpu->stat.pf_storage++;
+		vcpu->stat->pf_storage++;
 
 #ifdef CONFIG_PPC_BOOK3S_32
 		/* We set segments as unused segments when invalidating them. So
@@ -1256,13 +1256,13 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 	case BOOK3S_INTERRUPT_HV_DECREMENTER:
 	case BOOK3S_INTERRUPT_DOORBELL:
 	case BOOK3S_INTERRUPT_H_DOORBELL:
-		vcpu->stat.dec_exits++;
+		vcpu->stat->dec_exits++;
 		r = RESUME_GUEST;
 		break;
 	case BOOK3S_INTERRUPT_EXTERNAL:
 	case BOOK3S_INTERRUPT_EXTERNAL_HV:
 	case BOOK3S_INTERRUPT_H_VIRT:
-		vcpu->stat.ext_intr_exits++;
+		vcpu->stat->ext_intr_exits++;
 		r = RESUME_GUEST;
 		break;
 	case BOOK3S_INTERRUPT_HMI:
@@ -1331,7 +1331,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 			r = RESUME_GUEST;
 		} else {
 			/* Guest syscalls */
-			vcpu->stat.syscall_exits++;
+			vcpu->stat->syscall_exits++;
 			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
 			r = RESUME_GUEST;
 		}
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index b2c89e850d7a..8f007a86de40 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -393,7 +393,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 	case H_CEDE:
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_halt(vcpu);
-		vcpu->stat.generic.halt_wakeup++;
+		vcpu->stat->generic.halt_wakeup++;
 		return EMULATE_DONE;
 	case H_LOGICAL_CI_LOAD:
 		return kvmppc_h_pr_logical_ci_load(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ce1d91eed231..a39919dbaffb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -352,7 +352,7 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 	struct kvmppc_pte pte;
 	int r = -EINVAL;
 
-	vcpu->stat.st++;
+	vcpu->stat->st++;
 
 	if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->store_to_eaddr)
 		r = vcpu->kvm->arch.kvm_ops->store_to_eaddr(vcpu, eaddr, ptr,
@@ -395,7 +395,7 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 	struct kvmppc_pte pte;
 	int rc = -EINVAL;
 
-	vcpu->stat.ld++;
+	vcpu->stat->ld++;
 
 	if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->load_from_eaddr)
 		rc = vcpu->kvm->arch.kvm_ops->load_from_eaddr(vcpu, eaddr, ptr,
diff --git a/arch/powerpc/kvm/timing.h b/arch/powerpc/kvm/timing.h
index 45817ab82bb4..529f32e7aaf1 100644
--- a/arch/powerpc/kvm/timing.h
+++ b/arch/powerpc/kvm/timing.h
@@ -45,46 +45,46 @@ static inline void kvmppc_account_exit_stat(struct kvm_vcpu *vcpu, int type)
 	*/
 	switch (type) {
 	case EXT_INTR_EXITS:
-		vcpu->stat.ext_intr_exits++;
+		vcpu->stat->ext_intr_exits++;
 		break;
 	case DEC_EXITS:
-		vcpu->stat.dec_exits++;
+		vcpu->stat->dec_exits++;
 		break;
 	case EMULATED_INST_EXITS:
-		vcpu->stat.emulated_inst_exits++;
+		vcpu->stat->emulated_inst_exits++;
 		break;
 	case DSI_EXITS:
-		vcpu->stat.dsi_exits++;
+		vcpu->stat->dsi_exits++;
 		break;
 	case ISI_EXITS:
-		vcpu->stat.isi_exits++;
+		vcpu->stat->isi_exits++;
 		break;
 	case SYSCALL_EXITS:
-		vcpu->stat.syscall_exits++;
+		vcpu->stat->syscall_exits++;
 		break;
 	case DTLB_REAL_MISS_EXITS:
-		vcpu->stat.dtlb_real_miss_exits++;
+		vcpu->stat->dtlb_real_miss_exits++;
 		break;
 	case DTLB_VIRT_MISS_EXITS:
-		vcpu->stat.dtlb_virt_miss_exits++;
+		vcpu->stat->dtlb_virt_miss_exits++;
 		break;
 	case MMIO_EXITS:
-		vcpu->stat.mmio_exits++;
+		vcpu->stat->mmio_exits++;
 		break;
 	case ITLB_REAL_MISS_EXITS:
-		vcpu->stat.itlb_real_miss_exits++;
+		vcpu->stat->itlb_real_miss_exits++;
 		break;
 	case ITLB_VIRT_MISS_EXITS:
-		vcpu->stat.itlb_virt_miss_exits++;
+		vcpu->stat->itlb_virt_miss_exits++;
 		break;
 	case SIGNAL_EXITS:
-		vcpu->stat.signal_exits++;
+		vcpu->stat->signal_exits++;
 		break;
 	case DBELL_EXITS:
-		vcpu->stat.dbell_exits++;
+		vcpu->stat->dbell_exits++;
 		break;
 	case GDBELL_EXITS:
-		vcpu->stat.gdbell_exits++;
+		vcpu->stat->gdbell_exits++;
 		break;
 	}
 }
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 60d684c76c58..55fb16307cc6 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -967,7 +967,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		kvm_riscv_vcpu_enter_exit(vcpu, &trap);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
-		vcpu->stat.exits++;
+		vcpu->stat->exits++;
 
 		/* Syncup interrupts state with HW */
 		kvm_riscv_vcpu_sync_interrupts(vcpu);
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index 6e0c18412795..73116dd903e5 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -195,27 +195,27 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	switch (trap->scause) {
 	case EXC_INST_ILLEGAL:
 		kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_ILLEGAL_INSN);
-		vcpu->stat.instr_illegal_exits++;
+		vcpu->stat->instr_illegal_exits++;
 		ret = vcpu_redirect(vcpu, trap);
 		break;
 	case EXC_LOAD_MISALIGNED:
 		kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_MISALIGNED_LOAD);
-		vcpu->stat.load_misaligned_exits++;
+		vcpu->stat->load_misaligned_exits++;
 		ret = vcpu_redirect(vcpu, trap);
 		break;
 	case EXC_STORE_MISALIGNED:
 		kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_MISALIGNED_STORE);
-		vcpu->stat.store_misaligned_exits++;
+		vcpu->stat->store_misaligned_exits++;
 		ret = vcpu_redirect(vcpu, trap);
 		break;
 	case EXC_LOAD_ACCESS:
 		kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_ACCESS_LOAD);
-		vcpu->stat.load_access_exits++;
+		vcpu->stat->load_access_exits++;
 		ret = vcpu_redirect(vcpu, trap);
 		break;
 	case EXC_STORE_ACCESS:
 		kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_ACCESS_STORE);
-		vcpu->stat.store_access_exits++;
+		vcpu->stat->store_access_exits++;
 		ret = vcpu_redirect(vcpu, trap);
 		break;
 	case EXC_INST_ACCESS:
diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
index 97dec18e6989..43911b8a3f1b 100644
--- a/arch/riscv/kvm/vcpu_insn.c
+++ b/arch/riscv/kvm/vcpu_insn.c
@@ -201,14 +201,14 @@ void kvm_riscv_vcpu_wfi(struct kvm_vcpu *vcpu)
 
 static int wfi_insn(struct kvm_vcpu *vcpu, struct kvm_run *run, ulong insn)
 {
-	vcpu->stat.wfi_exit_stat++;
+	vcpu->stat->wfi_exit_stat++;
 	kvm_riscv_vcpu_wfi(vcpu);
 	return KVM_INSN_CONTINUE_NEXT_SEPC;
 }
 
 static int wrs_insn(struct kvm_vcpu *vcpu, struct kvm_run *run, ulong insn)
 {
-	vcpu->stat.wrs_exit_stat++;
+	vcpu->stat->wrs_exit_stat++;
 	kvm_vcpu_on_spin(vcpu, vcpu->arch.guest_context.sstatus & SR_SPP);
 	return KVM_INSN_CONTINUE_NEXT_SEPC;
 }
@@ -335,7 +335,7 @@ static int csr_insn(struct kvm_vcpu *vcpu, struct kvm_run *run, ulong insn)
 		if (rc > KVM_INSN_EXIT_TO_USER_SPACE) {
 			if (rc == KVM_INSN_CONTINUE_NEXT_SEPC) {
 				run->riscv_csr.ret_value = val;
-				vcpu->stat.csr_exit_kernel++;
+				vcpu->stat->csr_exit_kernel++;
 				kvm_riscv_vcpu_csr_return(vcpu, run);
 				rc = KVM_INSN_CONTINUE_SAME_SEPC;
 			}
@@ -345,7 +345,7 @@ static int csr_insn(struct kvm_vcpu *vcpu, struct kvm_run *run, ulong insn)
 
 	/* Exit to user-space for CSR emulation */
 	if (rc <= KVM_INSN_EXIT_TO_USER_SPACE) {
-		vcpu->stat.csr_exit_user++;
+		vcpu->stat->csr_exit_user++;
 		run->exit_reason = KVM_EXIT_RISCV_CSR;
 	}
 
@@ -576,13 +576,13 @@ int kvm_riscv_vcpu_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	if (!kvm_io_bus_read(vcpu, KVM_MMIO_BUS, fault_addr, len, data_buf)) {
 		/* Successfully handled MMIO access in the kernel so resume */
 		memcpy(run->mmio.data, data_buf, len);
-		vcpu->stat.mmio_exit_kernel++;
+		vcpu->stat->mmio_exit_kernel++;
 		kvm_riscv_vcpu_mmio_return(vcpu, run);
 		return 1;
 	}
 
 	/* Exit to userspace for MMIO emulation */
-	vcpu->stat.mmio_exit_user++;
+	vcpu->stat->mmio_exit_user++;
 	run->exit_reason = KVM_EXIT_MMIO;
 
 	return 0;
@@ -709,13 +709,13 @@ int kvm_riscv_vcpu_mmio_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	if (!kvm_io_bus_write(vcpu, KVM_MMIO_BUS,
 			      fault_addr, len, run->mmio.data)) {
 		/* Successfully handled MMIO access in the kernel so resume */
-		vcpu->stat.mmio_exit_kernel++;
+		vcpu->stat->mmio_exit_kernel++;
 		kvm_riscv_vcpu_mmio_return(vcpu, run);
 		return 1;
 	}
 
 	/* Exit to userspace for MMIO emulation */
-	vcpu->stat.mmio_exit_user++;
+	vcpu->stat->mmio_exit_user++;
 	run->exit_reason = KVM_EXIT_MMIO;
 
 	return 0;
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index d1c83a77735e..b500bcaf7b11 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -121,7 +121,7 @@ void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
 
 	vcpu->arch.sbi_context.return_handled = 0;
-	vcpu->stat.ecall_exit_stat++;
+	vcpu->stat->ecall_exit_stat++;
 	run->exit_reason = KVM_EXIT_RISCV_SBI;
 	run->riscv_sbi.extension_id = cp->a7;
 	run->riscv_sbi.function_id = cp->a6;
diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
index 3070bb31745d..519671760674 100644
--- a/arch/riscv/kvm/vcpu_sbi_hsm.c
+++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
@@ -82,7 +82,7 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
 		return SBI_ERR_INVALID_PARAM;
 	if (kvm_riscv_vcpu_stopped(target_vcpu))
 		return SBI_HSM_STATE_STOPPED;
-	else if (target_vcpu->stat.generic.blocking)
+	else if (target_vcpu->stat->generic.blocking)
 		return SBI_HSM_STATE_SUSPENDED;
 	else
 		return SBI_HSM_STATE_STARTED;
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 74f73141f9b9..359d562f7b81 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -24,7 +24,7 @@ static int diag_release_pages(struct kvm_vcpu *vcpu)
 
 	start = vcpu->run->s.regs.gprs[(vcpu->arch.sie_block->ipa & 0xf0) >> 4];
 	end = vcpu->run->s.regs.gprs[vcpu->arch.sie_block->ipa & 0xf] + PAGE_SIZE;
-	vcpu->stat.instruction_diagnose_10++;
+	vcpu->stat->instruction_diagnose_10++;
 
 	if (start & ~PAGE_MASK || end & ~PAGE_MASK || start >= end
 	    || start < 2 * PAGE_SIZE)
@@ -74,7 +74,7 @@ static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
 
 	VCPU_EVENT(vcpu, 3, "diag page reference parameter block at 0x%llx",
 		   vcpu->run->s.regs.gprs[rx]);
-	vcpu->stat.instruction_diagnose_258++;
+	vcpu->stat->instruction_diagnose_258++;
 	if (vcpu->run->s.regs.gprs[rx] & 7)
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 	rc = read_guest_real(vcpu, vcpu->run->s.regs.gprs[rx], &parm, sizeof(parm));
@@ -145,7 +145,7 @@ static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
 static int __diag_time_slice_end(struct kvm_vcpu *vcpu)
 {
 	VCPU_EVENT(vcpu, 5, "%s", "diag time slice end");
-	vcpu->stat.instruction_diagnose_44++;
+	vcpu->stat->instruction_diagnose_44++;
 	kvm_vcpu_on_spin(vcpu, true);
 	return 0;
 }
@@ -170,7 +170,7 @@ static int __diag_time_slice_end_directed(struct kvm_vcpu *vcpu)
 	int tid;
 
 	tid = vcpu->run->s.regs.gprs[(vcpu->arch.sie_block->ipa & 0xf0) >> 4];
-	vcpu->stat.instruction_diagnose_9c++;
+	vcpu->stat->instruction_diagnose_9c++;
 
 	/* yield to self */
 	if (tid == vcpu->vcpu_id)
@@ -194,7 +194,7 @@ static int __diag_time_slice_end_directed(struct kvm_vcpu *vcpu)
 		VCPU_EVENT(vcpu, 5,
 			   "diag time slice end directed to %d: yield forwarded",
 			   tid);
-		vcpu->stat.diag_9c_forward++;
+		vcpu->stat->diag_9c_forward++;
 		return 0;
 	}
 
@@ -205,7 +205,7 @@ static int __diag_time_slice_end_directed(struct kvm_vcpu *vcpu)
 	return 0;
 no_yield:
 	VCPU_EVENT(vcpu, 5, "diag time slice end directed to %d: ignored", tid);
-	vcpu->stat.diag_9c_ignored++;
+	vcpu->stat->diag_9c_ignored++;
 	return 0;
 }
 
@@ -215,7 +215,7 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
 	unsigned long subcode = vcpu->run->s.regs.gprs[reg] & 0xffff;
 
 	VCPU_EVENT(vcpu, 3, "diag ipl functions, subcode %lx", subcode);
-	vcpu->stat.instruction_diagnose_308++;
+	vcpu->stat->instruction_diagnose_308++;
 	switch (subcode) {
 	case 3:
 		vcpu->run->s390_reset_flags = KVM_S390_RESET_CLEAR;
@@ -247,7 +247,7 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
 {
 	int ret;
 
-	vcpu->stat.instruction_diagnose_500++;
+	vcpu->stat->instruction_diagnose_500++;
 	/* No virtio-ccw notification? Get out quickly. */
 	if (!vcpu->kvm->arch.css_support ||
 	    (vcpu->run->s.regs.gprs[1] != KVM_S390_VIRTIO_CCW_NOTIFY))
@@ -301,7 +301,7 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
 	case 0x500:
 		return __diag_virtio_hypercall(vcpu);
 	default:
-		vcpu->stat.instruction_diagnose_other++;
+		vcpu->stat->instruction_diagnose_other++;
 		return -EOPNOTSUPP;
 	}
 }
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 610dd44a948b..74d01f67a257 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -57,7 +57,7 @@ static int handle_stop(struct kvm_vcpu *vcpu)
 	int rc = 0;
 	uint8_t flags, stop_pending;
 
-	vcpu->stat.exit_stop_request++;
+	vcpu->stat->exit_stop_request++;
 
 	/* delay the stop if any non-stop irq is pending */
 	if (kvm_s390_vcpu_has_irq(vcpu, 1))
@@ -93,7 +93,7 @@ static int handle_validity(struct kvm_vcpu *vcpu)
 {
 	int viwhy = vcpu->arch.sie_block->ipb >> 16;
 
-	vcpu->stat.exit_validity++;
+	vcpu->stat->exit_validity++;
 	trace_kvm_s390_intercept_validity(vcpu, viwhy);
 	KVM_EVENT(3, "validity intercept 0x%x for pid %u (kvm 0x%pK)", viwhy,
 		  current->pid, vcpu->kvm);
@@ -106,7 +106,7 @@ static int handle_validity(struct kvm_vcpu *vcpu)
 
 static int handle_instruction(struct kvm_vcpu *vcpu)
 {
-	vcpu->stat.exit_instruction++;
+	vcpu->stat->exit_instruction++;
 	trace_kvm_s390_intercept_instruction(vcpu,
 					     vcpu->arch.sie_block->ipa,
 					     vcpu->arch.sie_block->ipb);
@@ -249,7 +249,7 @@ static int handle_prog(struct kvm_vcpu *vcpu)
 	psw_t psw;
 	int rc;
 
-	vcpu->stat.exit_program_interruption++;
+	vcpu->stat->exit_program_interruption++;
 
 	/*
 	 * Intercept 8 indicates a loop of specification exceptions
@@ -307,7 +307,7 @@ static int handle_external_interrupt(struct kvm_vcpu *vcpu)
 	psw_t newpsw;
 	int rc;
 
-	vcpu->stat.exit_external_interrupt++;
+	vcpu->stat->exit_external_interrupt++;
 
 	if (kvm_s390_pv_cpu_is_protected(vcpu)) {
 		newpsw = vcpu->arch.sie_block->gpsw;
@@ -388,7 +388,7 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu)
 
 static int handle_partial_execution(struct kvm_vcpu *vcpu)
 {
-	vcpu->stat.exit_pei++;
+	vcpu->stat->exit_pei++;
 
 	if (vcpu->arch.sie_block->ipa == 0xb254)	/* MVPG */
 		return handle_mvpg_pei(vcpu);
@@ -416,7 +416,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
 	code = vcpu->run->s.regs.gprs[reg1];
 	addr = vcpu->run->s.regs.gprs[reg2];
 
-	vcpu->stat.instruction_sthyi++;
+	vcpu->stat->instruction_sthyi++;
 	VCPU_EVENT(vcpu, 3, "STHYI: fc: %llu addr: 0x%016llx", code, addr);
 	trace_kvm_s390_handle_sthyi(vcpu, code, addr);
 
@@ -465,7 +465,7 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
 	psw_t oldpsw, newpsw;
 	int rc;
 
-	vcpu->stat.exit_operation_exception++;
+	vcpu->stat->exit_operation_exception++;
 	trace_kvm_s390_handle_operexc(vcpu, vcpu->arch.sie_block->ipa,
 				      vcpu->arch.sie_block->ipb);
 
@@ -609,10 +609,10 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 
 	switch (vcpu->arch.sie_block->icptcode) {
 	case ICPT_EXTREQ:
-		vcpu->stat.exit_external_request++;
+		vcpu->stat->exit_external_request++;
 		return 0;
 	case ICPT_IOREQ:
-		vcpu->stat.exit_io_request++;
+		vcpu->stat->exit_io_request++;
 		return 0;
 	case ICPT_INST:
 		rc = handle_instruction(vcpu);
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 07ff0e10cb7f..7576df5305c3 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -479,7 +479,7 @@ static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 	int rc = 0;
 
-	vcpu->stat.deliver_cputm++;
+	vcpu->stat->deliver_cputm++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CPU_TIMER,
 					 0, 0);
 	if (kvm_s390_pv_cpu_is_protected(vcpu)) {
@@ -503,7 +503,7 @@ static int __must_check __deliver_ckc(struct kvm_vcpu *vcpu)
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 	int rc = 0;
 
-	vcpu->stat.deliver_ckc++;
+	vcpu->stat->deliver_ckc++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CLOCK_COMP,
 					 0, 0);
 	if (kvm_s390_pv_cpu_is_protected(vcpu)) {
@@ -707,7 +707,7 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
 		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 						 KVM_S390_MCHK,
 						 mchk.cr14, mchk.mcic);
-		vcpu->stat.deliver_machine_check++;
+		vcpu->stat->deliver_machine_check++;
 		rc = __write_machine_check(vcpu, &mchk);
 	}
 	return rc;
@@ -719,7 +719,7 @@ static int __must_check __deliver_restart(struct kvm_vcpu *vcpu)
 	int rc = 0;
 
 	VCPU_EVENT(vcpu, 3, "%s", "deliver: cpu restart");
-	vcpu->stat.deliver_restart_signal++;
+	vcpu->stat->deliver_restart_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_RESTART, 0, 0);
 
 	if (kvm_s390_pv_cpu_is_protected(vcpu)) {
@@ -746,7 +746,7 @@ static int __must_check __deliver_set_prefix(struct kvm_vcpu *vcpu)
 	clear_bit(IRQ_PEND_SET_PREFIX, &li->pending_irqs);
 	spin_unlock(&li->lock);
 
-	vcpu->stat.deliver_prefix_signal++;
+	vcpu->stat->deliver_prefix_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 					 KVM_S390_SIGP_SET_PREFIX,
 					 prefix.address, 0);
@@ -769,7 +769,7 @@ static int __must_check __deliver_emergency_signal(struct kvm_vcpu *vcpu)
 	spin_unlock(&li->lock);
 
 	VCPU_EVENT(vcpu, 4, "%s", "deliver: sigp emerg");
-	vcpu->stat.deliver_emergency_signal++;
+	vcpu->stat->deliver_emergency_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
 					 cpu_addr, 0);
 	if (kvm_s390_pv_cpu_is_protected(vcpu)) {
@@ -802,7 +802,7 @@ static int __must_check __deliver_external_call(struct kvm_vcpu *vcpu)
 	spin_unlock(&li->lock);
 
 	VCPU_EVENT(vcpu, 4, "%s", "deliver: sigp ext call");
-	vcpu->stat.deliver_external_call++;
+	vcpu->stat->deliver_external_call++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 					 KVM_S390_INT_EXTERNAL_CALL,
 					 extcall.code, 0);
@@ -854,7 +854,7 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
 	ilen = pgm_info.flags & KVM_S390_PGM_FLAGS_ILC_MASK;
 	VCPU_EVENT(vcpu, 3, "deliver: program irq code 0x%x, ilen:%d",
 		   pgm_info.code, ilen);
-	vcpu->stat.deliver_program++;
+	vcpu->stat->deliver_program++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_PROGRAM_INT,
 					 pgm_info.code, 0);
 
@@ -1004,7 +1004,7 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 
 	VCPU_EVENT(vcpu, 4, "deliver: sclp parameter 0x%x",
 		   ext.ext_params);
-	vcpu->stat.deliver_service_signal++;
+	vcpu->stat->deliver_service_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
 					 ext.ext_params, 0);
 
@@ -1028,7 +1028,7 @@ static int __must_check __deliver_service_ev(struct kvm_vcpu *vcpu)
 	spin_unlock(&fi->lock);
 
 	VCPU_EVENT(vcpu, 4, "%s", "deliver: sclp parameter event");
-	vcpu->stat.deliver_service_signal++;
+	vcpu->stat->deliver_service_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
 					 ext.ext_params, 0);
 
@@ -1091,7 +1091,7 @@ static int __must_check __deliver_virtio(struct kvm_vcpu *vcpu)
 		VCPU_EVENT(vcpu, 4,
 			   "deliver: virtio parm: 0x%x,parm64: 0x%llx",
 			   inti->ext.ext_params, inti->ext.ext_params2);
-		vcpu->stat.deliver_virtio++;
+		vcpu->stat->deliver_virtio++;
 		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 				inti->type,
 				inti->ext.ext_params,
@@ -1177,7 +1177,7 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
 			inti->io.subchannel_id >> 1 & 0x3,
 			inti->io.subchannel_nr);
 
-		vcpu->stat.deliver_io++;
+		vcpu->stat->deliver_io++;
 		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 				inti->type,
 				((__u32)inti->io.subchannel_id << 16) |
@@ -1205,7 +1205,7 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
 		VCPU_EVENT(vcpu, 4, "%s isc %u", "deliver: I/O (AI/gisa)", isc);
 		memset(&io, 0, sizeof(io));
 		io.io_int_word = isc_to_int_word(isc);
-		vcpu->stat.deliver_io++;
+		vcpu->stat->deliver_io++;
 		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 			KVM_S390_INT_IO(1, 0, 0, 0),
 			((__u32)io.subchannel_id << 16) |
@@ -1290,7 +1290,7 @@ int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
 	struct kvm_s390_gisa_interrupt *gi = &vcpu->kvm->arch.gisa_int;
 	u64 sltime;
 
-	vcpu->stat.exit_wait_state++;
+	vcpu->stat->exit_wait_state++;
 
 	/* fast path */
 	if (kvm_arch_vcpu_runnable(vcpu))
@@ -1476,7 +1476,7 @@ static int __inject_prog(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 
-	vcpu->stat.inject_program++;
+	vcpu->stat->inject_program++;
 	VCPU_EVENT(vcpu, 3, "inject: program irq code 0x%x", irq->u.pgm.code);
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_PROGRAM_INT,
 				   irq->u.pgm.code, 0);
@@ -1518,7 +1518,7 @@ static int __inject_pfault_init(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 
-	vcpu->stat.inject_pfault_init++;
+	vcpu->stat->inject_pfault_init++;
 	VCPU_EVENT(vcpu, 4, "inject: pfault init parameter block at 0x%llx",
 		   irq->u.ext.ext_params2);
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_PFAULT_INIT,
@@ -1537,7 +1537,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 	struct kvm_s390_extcall_info *extcall = &li->irq.extcall;
 	uint16_t src_id = irq->u.extcall.code;
 
-	vcpu->stat.inject_external_call++;
+	vcpu->stat->inject_external_call++;
 	VCPU_EVENT(vcpu, 4, "inject: external call source-cpu:%u",
 		   src_id);
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_EXTERNAL_CALL,
@@ -1562,7 +1562,7 @@ static int __inject_set_prefix(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 	struct kvm_s390_prefix_info *prefix = &li->irq.prefix;
 
-	vcpu->stat.inject_set_prefix++;
+	vcpu->stat->inject_set_prefix++;
 	VCPU_EVENT(vcpu, 3, "inject: set prefix to %x",
 		   irq->u.prefix.address);
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_SIGP_SET_PREFIX,
@@ -1583,7 +1583,7 @@ static int __inject_sigp_stop(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 	struct kvm_s390_stop_info *stop = &li->irq.stop;
 	int rc = 0;
 
-	vcpu->stat.inject_stop_signal++;
+	vcpu->stat->inject_stop_signal++;
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_SIGP_STOP, 0, 0);
 
 	if (irq->u.stop.flags & ~KVM_S390_STOP_SUPP_FLAGS)
@@ -1607,7 +1607,7 @@ static int __inject_sigp_restart(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 
-	vcpu->stat.inject_restart++;
+	vcpu->stat->inject_restart++;
 	VCPU_EVENT(vcpu, 3, "%s", "inject: restart int");
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_RESTART, 0, 0);
 
@@ -1620,7 +1620,7 @@ static int __inject_sigp_emergency(struct kvm_vcpu *vcpu,
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 
-	vcpu->stat.inject_emergency_signal++;
+	vcpu->stat->inject_emergency_signal++;
 	VCPU_EVENT(vcpu, 4, "inject: emergency from cpu %u",
 		   irq->u.emerg.code);
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
@@ -1641,7 +1641,7 @@ static int __inject_mchk(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 	struct kvm_s390_mchk_info *mchk = &li->irq.mchk;
 
-	vcpu->stat.inject_mchk++;
+	vcpu->stat->inject_mchk++;
 	VCPU_EVENT(vcpu, 3, "inject: machine check mcic 0x%llx",
 		   irq->u.mchk.mcic);
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_MCHK, 0,
@@ -1672,7 +1672,7 @@ static int __inject_ckc(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 
-	vcpu->stat.inject_ckc++;
+	vcpu->stat->inject_ckc++;
 	VCPU_EVENT(vcpu, 3, "%s", "inject: clock comparator external");
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_CLOCK_COMP,
 				   0, 0);
@@ -1686,7 +1686,7 @@ static int __inject_cpu_timer(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
 
-	vcpu->stat.inject_cputm++;
+	vcpu->stat->inject_cputm++;
 	VCPU_EVENT(vcpu, 3, "%s", "inject: cpu timer external");
 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_CPU_TIMER,
 				   0, 0);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 020502af7dc9..46759021e924 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4133,7 +4133,7 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 	/* do not poll with more than halt_poll_max_steal percent of steal time */
 	if (get_lowcore()->avg_steal_timer * 100 / (TICK_USEC << 12) >=
 	    READ_ONCE(halt_poll_max_steal)) {
-		vcpu->stat.halt_no_poll_steal++;
+		vcpu->stat->halt_no_poll_steal++;
 		return true;
 	}
 	return false;
@@ -4898,7 +4898,7 @@ int __kvm_s390_handle_dat_fault(struct kvm_vcpu *vcpu, gfn_t gfn, gpa_t gaddr, u
 		trace_kvm_s390_major_guest_pfault(vcpu);
 		if (kvm_arch_setup_async_pf(vcpu))
 			return 0;
-		vcpu->stat.pfault_sync++;
+		vcpu->stat->pfault_sync++;
 		/* Could not setup async pfault, try again synchronously */
 		flags &= ~FOLL_NOWAIT;
 		goto try_again;
@@ -4960,7 +4960,7 @@ static int vcpu_post_run_handle_fault(struct kvm_vcpu *vcpu)
 
 	switch (current->thread.gmap_int_code & PGM_INT_CODE_MASK) {
 	case 0:
-		vcpu->stat.exit_null++;
+		vcpu->stat->exit_null++;
 		break;
 	case PGM_SECURE_STORAGE_ACCESS:
 	case PGM_SECURE_STORAGE_VIOLATION:
@@ -5351,7 +5351,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 	kvm_sigset_deactivate(vcpu);
 
-	vcpu->stat.exit_userspace++;
+	vcpu->stat->exit_userspace++;
 out:
 	vcpu_put(vcpu);
 	return rc;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 1a49b89706f8..6ff66373f115 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -31,7 +31,7 @@
 
 static int handle_ri(struct kvm_vcpu *vcpu)
 {
-	vcpu->stat.instruction_ri++;
+	vcpu->stat->instruction_ri++;
 
 	if (test_kvm_facility(vcpu->kvm, 64)) {
 		VCPU_EVENT(vcpu, 3, "%s", "ENABLE: RI (lazy)");
@@ -52,7 +52,7 @@ int kvm_s390_handle_aa(struct kvm_vcpu *vcpu)
 
 static int handle_gs(struct kvm_vcpu *vcpu)
 {
-	vcpu->stat.instruction_gs++;
+	vcpu->stat->instruction_gs++;
 
 	if (test_kvm_facility(vcpu->kvm, 133)) {
 		VCPU_EVENT(vcpu, 3, "%s", "ENABLE: GS (lazy)");
@@ -87,7 +87,7 @@ static int handle_set_clock(struct kvm_vcpu *vcpu)
 	u8 ar;
 	u64 op2;
 
-	vcpu->stat.instruction_sck++;
+	vcpu->stat->instruction_sck++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -126,7 +126,7 @@ static int handle_set_prefix(struct kvm_vcpu *vcpu)
 	int rc;
 	u8 ar;
 
-	vcpu->stat.instruction_spx++;
+	vcpu->stat->instruction_spx++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -164,7 +164,7 @@ static int handle_store_prefix(struct kvm_vcpu *vcpu)
 	int rc;
 	u8 ar;
 
-	vcpu->stat.instruction_stpx++;
+	vcpu->stat->instruction_stpx++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -194,7 +194,7 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
 	int rc;
 	u8 ar;
 
-	vcpu->stat.instruction_stap++;
+	vcpu->stat->instruction_stap++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -261,7 +261,7 @@ static int handle_iske(struct kvm_vcpu *vcpu)
 	bool unlocked;
 	int rc;
 
-	vcpu->stat.instruction_iske++;
+	vcpu->stat->instruction_iske++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -308,7 +308,7 @@ static int handle_rrbe(struct kvm_vcpu *vcpu)
 	bool unlocked;
 	int rc;
 
-	vcpu->stat.instruction_rrbe++;
+	vcpu->stat->instruction_rrbe++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -359,7 +359,7 @@ static int handle_sske(struct kvm_vcpu *vcpu)
 	bool unlocked;
 	int rc;
 
-	vcpu->stat.instruction_sske++;
+	vcpu->stat->instruction_sske++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -438,7 +438,7 @@ static int handle_sske(struct kvm_vcpu *vcpu)
 
 static int handle_ipte_interlock(struct kvm_vcpu *vcpu)
 {
-	vcpu->stat.instruction_ipte_interlock++;
+	vcpu->stat->instruction_ipte_interlock++;
 	if (psw_bits(vcpu->arch.sie_block->gpsw).pstate)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
 	wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu->kvm));
@@ -452,7 +452,7 @@ static int handle_test_block(struct kvm_vcpu *vcpu)
 	gpa_t addr;
 	int reg2;
 
-	vcpu->stat.instruction_tb++;
+	vcpu->stat->instruction_tb++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -486,7 +486,7 @@ static int handle_tpi(struct kvm_vcpu *vcpu)
 	u64 addr;
 	u8 ar;
 
-	vcpu->stat.instruction_tpi++;
+	vcpu->stat->instruction_tpi++;
 
 	addr = kvm_s390_get_base_disp_s(vcpu, &ar);
 	if (addr & 3)
@@ -548,7 +548,7 @@ static int handle_tsch(struct kvm_vcpu *vcpu)
 	struct kvm_s390_interrupt_info *inti = NULL;
 	const u64 isc_mask = 0xffUL << 24; /* all iscs set */
 
-	vcpu->stat.instruction_tsch++;
+	vcpu->stat->instruction_tsch++;
 
 	/* a valid schid has at least one bit set */
 	if (vcpu->run->s.regs.gprs[1])
@@ -593,7 +593,7 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
 		if (vcpu->arch.sie_block->ipa == 0xb235)
 			return handle_tsch(vcpu);
 		/* Handle in userspace. */
-		vcpu->stat.instruction_io_other++;
+		vcpu->stat->instruction_io_other++;
 		return -EOPNOTSUPP;
 	} else {
 		/*
@@ -702,7 +702,7 @@ static int handle_stfl(struct kvm_vcpu *vcpu)
 	int rc;
 	unsigned int fac;
 
-	vcpu->stat.instruction_stfl++;
+	vcpu->stat->instruction_stfl++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -751,7 +751,7 @@ int kvm_s390_handle_lpsw(struct kvm_vcpu *vcpu)
 	int rc;
 	u8 ar;
 
-	vcpu->stat.instruction_lpsw++;
+	vcpu->stat->instruction_lpsw++;
 
 	if (gpsw->mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -780,7 +780,7 @@ static int handle_lpswe(struct kvm_vcpu *vcpu)
 	int rc;
 	u8 ar;
 
-	vcpu->stat.instruction_lpswe++;
+	vcpu->stat->instruction_lpswe++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -804,7 +804,7 @@ static int handle_lpswey(struct kvm_vcpu *vcpu)
 	int rc;
 	u8 ar;
 
-	vcpu->stat.instruction_lpswey++;
+	vcpu->stat->instruction_lpswey++;
 
 	if (!test_kvm_facility(vcpu->kvm, 193))
 		return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
@@ -834,7 +834,7 @@ static int handle_stidp(struct kvm_vcpu *vcpu)
 	int rc;
 	u8 ar;
 
-	vcpu->stat.instruction_stidp++;
+	vcpu->stat->instruction_stidp++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -900,7 +900,7 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
 	int rc = 0;
 	u8 ar;
 
-	vcpu->stat.instruction_stsi++;
+	vcpu->stat->instruction_stsi++;
 	VCPU_EVENT(vcpu, 3, "STSI: fc: %u sel1: %u sel2: %u", fc, sel1, sel2);
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -1044,7 +1044,7 @@ static int handle_epsw(struct kvm_vcpu *vcpu)
 {
 	int reg1, reg2;
 
-	vcpu->stat.instruction_epsw++;
+	vcpu->stat->instruction_epsw++;
 
 	kvm_s390_get_regs_rre(vcpu, &reg1, &reg2);
 
@@ -1076,7 +1076,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 	unsigned long start, end;
 	unsigned char key;
 
-	vcpu->stat.instruction_pfmf++;
+	vcpu->stat->instruction_pfmf++;
 
 	kvm_s390_get_regs_rre(vcpu, &reg1, &reg2);
 
@@ -1256,7 +1256,7 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 
 	VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries);
 	gmap = vcpu->arch.gmap;
-	vcpu->stat.instruction_essa++;
+	vcpu->stat->instruction_essa++;
 	if (!vcpu->kvm->arch.use_cmma)
 		return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
 
@@ -1345,7 +1345,7 @@ int kvm_s390_handle_lctl(struct kvm_vcpu *vcpu)
 	u64 ga;
 	u8 ar;
 
-	vcpu->stat.instruction_lctl++;
+	vcpu->stat->instruction_lctl++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -1384,7 +1384,7 @@ int kvm_s390_handle_stctl(struct kvm_vcpu *vcpu)
 	u64 ga;
 	u8 ar;
 
-	vcpu->stat.instruction_stctl++;
+	vcpu->stat->instruction_stctl++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -1418,7 +1418,7 @@ static int handle_lctlg(struct kvm_vcpu *vcpu)
 	u64 ga;
 	u8 ar;
 
-	vcpu->stat.instruction_lctlg++;
+	vcpu->stat->instruction_lctlg++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -1456,7 +1456,7 @@ static int handle_stctg(struct kvm_vcpu *vcpu)
 	u64 ga;
 	u8 ar;
 
-	vcpu->stat.instruction_stctg++;
+	vcpu->stat->instruction_stctg++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -1508,7 +1508,7 @@ static int handle_tprot(struct kvm_vcpu *vcpu)
 	int ret, cc;
 	u8 ar;
 
-	vcpu->stat.instruction_tprot++;
+	vcpu->stat->instruction_tprot++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -1572,7 +1572,7 @@ static int handle_sckpf(struct kvm_vcpu *vcpu)
 {
 	u32 value;
 
-	vcpu->stat.instruction_sckpf++;
+	vcpu->stat->instruction_sckpf++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
 		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
@@ -1589,7 +1589,7 @@ static int handle_sckpf(struct kvm_vcpu *vcpu)
 
 static int handle_ptff(struct kvm_vcpu *vcpu)
 {
-	vcpu->stat.instruction_ptff++;
+	vcpu->stat->instruction_ptff++;
 
 	/* we don't emulate any control instructions yet */
 	kvm_s390_set_psw_cc(vcpu, 3);
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index 55c34cb35428..79cf7f77fec6 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -306,61 +306,61 @@ static int handle_sigp_dst(struct kvm_vcpu *vcpu, u8 order_code,
 
 	switch (order_code) {
 	case SIGP_SENSE:
-		vcpu->stat.instruction_sigp_sense++;
+		vcpu->stat->instruction_sigp_sense++;
 		rc = __sigp_sense(vcpu, dst_vcpu, status_reg);
 		break;
 	case SIGP_EXTERNAL_CALL:
-		vcpu->stat.instruction_sigp_external_call++;
+		vcpu->stat->instruction_sigp_external_call++;
 		rc = __sigp_external_call(vcpu, dst_vcpu, status_reg);
 		break;
 	case SIGP_EMERGENCY_SIGNAL:
-		vcpu->stat.instruction_sigp_emergency++;
+		vcpu->stat->instruction_sigp_emergency++;
 		rc = __sigp_emergency(vcpu, dst_vcpu);
 		break;
 	case SIGP_STOP:
-		vcpu->stat.instruction_sigp_stop++;
+		vcpu->stat->instruction_sigp_stop++;
 		rc = __sigp_stop(vcpu, dst_vcpu);
 		break;
 	case SIGP_STOP_AND_STORE_STATUS:
-		vcpu->stat.instruction_sigp_stop_store_status++;
+		vcpu->stat->instruction_sigp_stop_store_status++;
 		rc = __sigp_stop_and_store_status(vcpu, dst_vcpu, status_reg);
 		break;
 	case SIGP_STORE_STATUS_AT_ADDRESS:
-		vcpu->stat.instruction_sigp_store_status++;
+		vcpu->stat->instruction_sigp_store_status++;
 		rc = __sigp_store_status_at_addr(vcpu, dst_vcpu, parameter,
 						 status_reg);
 		break;
 	case SIGP_SET_PREFIX:
-		vcpu->stat.instruction_sigp_prefix++;
+		vcpu->stat->instruction_sigp_prefix++;
 		rc = __sigp_set_prefix(vcpu, dst_vcpu, parameter, status_reg);
 		break;
 	case SIGP_COND_EMERGENCY_SIGNAL:
-		vcpu->stat.instruction_sigp_cond_emergency++;
+		vcpu->stat->instruction_sigp_cond_emergency++;
 		rc = __sigp_conditional_emergency(vcpu, dst_vcpu, parameter,
 						  status_reg);
 		break;
 	case SIGP_SENSE_RUNNING:
-		vcpu->stat.instruction_sigp_sense_running++;
+		vcpu->stat->instruction_sigp_sense_running++;
 		rc = __sigp_sense_running(vcpu, dst_vcpu, status_reg);
 		break;
 	case SIGP_START:
-		vcpu->stat.instruction_sigp_start++;
+		vcpu->stat->instruction_sigp_start++;
 		rc = __prepare_sigp_re_start(vcpu, dst_vcpu, order_code);
 		break;
 	case SIGP_RESTART:
-		vcpu->stat.instruction_sigp_restart++;
+		vcpu->stat->instruction_sigp_restart++;
 		rc = __prepare_sigp_re_start(vcpu, dst_vcpu, order_code);
 		break;
 	case SIGP_INITIAL_CPU_RESET:
-		vcpu->stat.instruction_sigp_init_cpu_reset++;
+		vcpu->stat->instruction_sigp_init_cpu_reset++;
 		rc = __prepare_sigp_cpu_reset(vcpu, dst_vcpu, order_code);
 		break;
 	case SIGP_CPU_RESET:
-		vcpu->stat.instruction_sigp_cpu_reset++;
+		vcpu->stat->instruction_sigp_cpu_reset++;
 		rc = __prepare_sigp_cpu_reset(vcpu, dst_vcpu, order_code);
 		break;
 	default:
-		vcpu->stat.instruction_sigp_unknown++;
+		vcpu->stat->instruction_sigp_unknown++;
 		rc = __prepare_sigp_unknown(vcpu, dst_vcpu);
 	}
 
@@ -387,34 +387,34 @@ static int handle_sigp_order_in_user_space(struct kvm_vcpu *vcpu, u8 order_code,
 		return 0;
 	/* update counters as we're directly dropping to user space */
 	case SIGP_STOP:
-		vcpu->stat.instruction_sigp_stop++;
+		vcpu->stat->instruction_sigp_stop++;
 		break;
 	case SIGP_STOP_AND_STORE_STATUS:
-		vcpu->stat.instruction_sigp_stop_store_status++;
+		vcpu->stat->instruction_sigp_stop_store_status++;
 		break;
 	case SIGP_STORE_STATUS_AT_ADDRESS:
-		vcpu->stat.instruction_sigp_store_status++;
+		vcpu->stat->instruction_sigp_store_status++;
 		break;
 	case SIGP_STORE_ADDITIONAL_STATUS:
-		vcpu->stat.instruction_sigp_store_adtl_status++;
+		vcpu->stat->instruction_sigp_store_adtl_status++;
 		break;
 	case SIGP_SET_PREFIX:
-		vcpu->stat.instruction_sigp_prefix++;
+		vcpu->stat->instruction_sigp_prefix++;
 		break;
 	case SIGP_START:
-		vcpu->stat.instruction_sigp_start++;
+		vcpu->stat->instruction_sigp_start++;
 		break;
 	case SIGP_RESTART:
-		vcpu->stat.instruction_sigp_restart++;
+		vcpu->stat->instruction_sigp_restart++;
 		break;
 	case SIGP_INITIAL_CPU_RESET:
-		vcpu->stat.instruction_sigp_init_cpu_reset++;
+		vcpu->stat->instruction_sigp_init_cpu_reset++;
 		break;
 	case SIGP_CPU_RESET:
-		vcpu->stat.instruction_sigp_cpu_reset++;
+		vcpu->stat->instruction_sigp_cpu_reset++;
 		break;
 	default:
-		vcpu->stat.instruction_sigp_unknown++;
+		vcpu->stat->instruction_sigp_unknown++;
 	}
 	VCPU_EVENT(vcpu, 3, "SIGP: order %u for CPU %d handled in userspace",
 		   order_code, cpu_addr);
@@ -447,7 +447,7 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_handle_sigp(vcpu, order_code, cpu_addr, parameter);
 	switch (order_code) {
 	case SIGP_SET_ARCHITECTURE:
-		vcpu->stat.instruction_sigp_arch++;
+		vcpu->stat->instruction_sigp_arch++;
 		rc = __sigp_set_arch(vcpu, parameter,
 				     &vcpu->run->s.regs.gprs[r1]);
 		break;
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index a78df3a4f353..904a3d84c1b3 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1456,7 +1456,7 @@ int kvm_s390_handle_vsie(struct kvm_vcpu *vcpu)
 	unsigned long scb_addr;
 	int rc;
 
-	vcpu->stat.instruction_sie++;
+	vcpu->stat->instruction_sie++;
 	if (!test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_SIEF2))
 		return -EOPNOTSUPP;
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index 999227fc7c66..ff31d1bb49ec 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -24,7 +24,7 @@ DEFINE_SIMPLE_ATTRIBUTE(vcpu_timer_advance_ns_fops, vcpu_get_timer_advance_ns, N
 static int vcpu_get_guest_mode(void *data, u64 *val)
 {
 	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
-	*val = vcpu->stat.guest_mode;
+	*val = vcpu->stat->guest_mode;
 	return 0;
 }
 
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6ebeb6cea6c0..c6592e7f40a2 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1988,7 +1988,7 @@ int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
 		for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++)
 			kvm_x86_call(flush_tlb_gva)(vcpu, gva + j * PAGE_SIZE);
 
-		++vcpu->stat.tlb_flush;
+		++vcpu->stat->tlb_flush;
 	}
 	return 0;
 
@@ -2390,7 +2390,7 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
 
 	trace_kvm_hv_hypercall_done(result);
 	kvm_hv_hypercall_set_result(vcpu, result);
-	++vcpu->stat.hypercalls;
+	++vcpu->stat->hypercalls;
 
 	ret = kvm_skip_emulated_instruction(vcpu);
 
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 36a8786db291..1b9232aad730 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -225,7 +225,7 @@ static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu)
 static inline void enter_guest_mode(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.hflags |= HF_GUEST_MASK;
-	vcpu->stat.guest_mode = 1;
+	vcpu->stat->guest_mode = 1;
 }
 
 static inline void leave_guest_mode(struct kvm_vcpu *vcpu)
@@ -237,7 +237,7 @@ static inline void leave_guest_mode(struct kvm_vcpu *vcpu)
 		kvm_make_request(KVM_REQ_LOAD_EOI_EXITMAP, vcpu);
 	}
 
-	vcpu->stat.guest_mode = 0;
+	vcpu->stat->guest_mode = 0;
 }
 
 static inline bool is_guest_mode(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 04e4b041e248..2d8953163fa0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3014,7 +3014,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	bool write_fault = fault && fault->write;
 
 	if (unlikely(is_noslot_pfn(pfn))) {
-		vcpu->stat.pf_mmio_spte_created++;
+		vcpu->stat->pf_mmio_spte_created++;
 		mark_mmio_spte(vcpu, sptep, gfn, pte_access);
 		return RET_PF_EMULATE;
 	}
@@ -3689,7 +3689,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	walk_shadow_page_lockless_end(vcpu);
 
 	if (ret != RET_PF_INVALID)
-		vcpu->stat.pf_fast++;
+		vcpu->stat->pf_fast++;
 
 	return ret;
 }
@@ -4446,7 +4446,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	 * truly spurious and never trigger emulation
 	 */
 	if (r == RET_PF_FIXED)
-		vcpu->stat.pf_fixed++;
+		vcpu->stat->pf_fixed++;
 }
 
 static inline u8 kvm_max_level_for_order(int order)
@@ -6262,7 +6262,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		vcpu->stat.pf_taken++;
+		vcpu->stat->pf_taken++;
 
 		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
 					  &emulation_type, NULL);
@@ -6278,11 +6278,11 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 						&emulation_type);
 
 	if (r == RET_PF_FIXED)
-		vcpu->stat.pf_fixed++;
+		vcpu->stat->pf_fixed++;
 	else if (r == RET_PF_EMULATE)
-		vcpu->stat.pf_emulate++;
+		vcpu->stat->pf_emulate++;
 	else if (r == RET_PF_SPURIOUS)
-		vcpu->stat.pf_spurious++;
+		vcpu->stat->pf_spurious++;
 
 	/*
 	 * None of handle_mmio_page_fault(), kvm_mmu_do_page_fault(), or
@@ -6396,7 +6396,7 @@ void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 	 * done here for them.
 	 */
 	kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL);
-	++vcpu->stat.invlpg;
+	++vcpu->stat->invlpg;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
 
@@ -6418,7 +6418,7 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
 
 	if (roots)
 		kvm_mmu_invalidate_addr(vcpu, mmu, gva, roots);
-	++vcpu->stat.invlpg;
+	++vcpu->stat->invlpg;
 
 	/*
 	 * Mappings not reachable via the current cr3 or the prev_roots will be
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b23b1b2e60a8..72f81c99d665 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1181,7 +1181,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
 	if (unlikely(is_mmio_spte(vcpu->kvm, new_spte))) {
-		vcpu->stat.pf_mmio_spte_created++;
+		vcpu->stat->pf_mmio_spte_created++;
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
 		ret = RET_PF_EMULATE;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0bc708ee2788..827dbe4d2b3b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4306,7 +4306,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 					    svm->sev_es.ghcb_sa);
 		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
-		++vcpu->stat.nmi_window_exits;
+		++vcpu->stat->nmi_window_exits;
 		svm->nmi_masked = false;
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
 		ret = 1;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f692794d18a2..f6a435ff7e2d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1577,7 +1577,7 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 
 	svm_prepare_host_switch(vcpu);
 
-	++vcpu->stat.host_state_reload;
+	++vcpu->stat->host_state_reload;
 }
 
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
@@ -2238,7 +2238,7 @@ static int io_interception(struct kvm_vcpu *vcpu)
 	int size, in, string;
 	unsigned port;
 
-	++vcpu->stat.io_exits;
+	++vcpu->stat->io_exits;
 	string = (io_info & SVM_IOIO_STR_MASK) != 0;
 	in = (io_info & SVM_IOIO_TYPE_MASK) != 0;
 	port = io_info >> 16;
@@ -2268,7 +2268,7 @@ static int smi_interception(struct kvm_vcpu *vcpu)
 
 static int intr_interception(struct kvm_vcpu *vcpu)
 {
-	++vcpu->stat.irq_exits;
+	++vcpu->stat->irq_exits;
 	return 1;
 }
 
@@ -2592,7 +2592,7 @@ static int iret_interception(struct kvm_vcpu *vcpu)
 
 	WARN_ON_ONCE(sev_es_guest(vcpu->kvm));
 
-	++vcpu->stat.nmi_window_exits;
+	++vcpu->stat->nmi_window_exits;
 	svm->awaiting_iret_completion = true;
 
 	svm_clr_iret_intercept(svm);
@@ -3254,7 +3254,7 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
 	 */
 	kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
 
-	++vcpu->stat.irq_window_exits;
+	++vcpu->stat->irq_window_exits;
 	return 1;
 }
 
@@ -3664,7 +3664,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 		svm->nmi_masked = true;
 		svm_set_iret_intercept(svm);
 	}
-	++vcpu->stat.nmi_injections;
+	++vcpu->stat->nmi_injections;
 }
 
 static bool svm_is_vnmi_pending(struct kvm_vcpu *vcpu)
@@ -3695,7 +3695,7 @@ static bool svm_set_vnmi_pending(struct kvm_vcpu *vcpu)
 	 * the NMI is "injected", but for all intents and purposes, passing the
 	 * NMI off to hardware counts as injection.
 	 */
-	++vcpu->stat.nmi_injections;
+	++vcpu->stat->nmi_injections;
 
 	return true;
 }
@@ -3716,7 +3716,7 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 
 	trace_kvm_inj_virq(vcpu->arch.interrupt.nr,
 			   vcpu->arch.interrupt.soft, reinjected);
-	++vcpu->stat.irq_injections;
+	++vcpu->stat->irq_injections;
 
 	svm->vmcb->control.event_inj = vcpu->arch.interrupt.nr |
 				       SVM_EVTINJ_VALID | type;
@@ -4368,7 +4368,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 		/* Track VMRUNs that have made past consistency checking */
 		if (svm->nested.nested_run_pending &&
 		    svm->vmcb->control.exit_code != SVM_EXIT_ERR)
-                        ++vcpu->stat.nested_run;
+			++vcpu->stat->nested_run;
 
 		svm->nested.nested_run_pending = 0;
 	}
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 84369f539fb2..cf894f572321 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -813,7 +813,7 @@ static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
 	if (!vt->guest_state_loaded)
 		return;
 
-	++vcpu->stat.host_state_reload;
+	++vcpu->stat->host_state_reload;
 	wrmsrl(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
 
 	if (tdx->guest_entered) {
@@ -1082,7 +1082,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 
 void tdx_inject_nmi(struct kvm_vcpu *vcpu)
 {
-	++vcpu->stat.nmi_injections;
+	++vcpu->stat->nmi_injections;
 	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
 	/*
 	 * From KVM's perspective, NMI injection is completed right after
@@ -1321,7 +1321,7 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu)
 	u64 size, write;
 	int ret;
 
-	++vcpu->stat.io_exits;
+	++vcpu->stat->io_exits;
 
 	size = tdx->vp_enter_args.r12;
 	write = tdx->vp_enter_args.r13;
@@ -2072,7 +2072,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 	case EXIT_REASON_EXCEPTION_NMI:
 		return tdx_handle_exception_nmi(vcpu);
 	case EXIT_REASON_EXTERNAL_INTERRUPT:
-		++vcpu->stat.irq_exits;
+		++vcpu->stat->irq_exits;
 		return 1;
 	case EXIT_REASON_CPUID:
 		return tdx_emulate_cpuid(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 19dc85e5ac37..02458bb0b486 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1361,7 +1361,7 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
 
 	host_state = &vmx->loaded_vmcs->host_state;
 
-	++vmx->vcpu.stat.host_state_reload;
+	++vmx->vcpu.stat->host_state_reload;
 
 #ifdef CONFIG_X86_64
 	rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
@@ -4922,7 +4922,7 @@ void vmx_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 
 	trace_kvm_inj_virq(irq, vcpu->arch.interrupt.soft, reinjected);
 
-	++vcpu->stat.irq_injections;
+	++vcpu->stat->irq_injections;
 	if (vmx->rmode.vm86_active) {
 		int inc_eip = 0;
 		if (vcpu->arch.interrupt.soft)
@@ -4959,7 +4959,7 @@ void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 		vmx->loaded_vmcs->vnmi_blocked_time = 0;
 	}
 
-	++vcpu->stat.nmi_injections;
+	++vcpu->stat->nmi_injections;
 	vmx->loaded_vmcs->nmi_known_unmasked = false;
 
 	if (vmx->rmode.vm86_active) {
@@ -5353,7 +5353,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 
 static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu)
 {
-	++vcpu->stat.irq_exits;
+	++vcpu->stat->irq_exits;
 	return 1;
 }
 
@@ -5373,7 +5373,7 @@ static int handle_io(struct kvm_vcpu *vcpu)
 	exit_qualification = vmx_get_exit_qual(vcpu);
 	string = (exit_qualification & 16) != 0;
 
-	++vcpu->stat.io_exits;
+	++vcpu->stat->io_exits;
 
 	if (string)
 		return kvm_emulate_instruction(vcpu, 0);
@@ -5633,7 +5633,7 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu)
 
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 
-	++vcpu->stat.irq_window_exits;
+	++vcpu->stat->irq_window_exits;
 	return 1;
 }
 
@@ -5811,7 +5811,7 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu)
 		return -EIO;
 
 	exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING);
-	++vcpu->stat.nmi_window_exits;
+	++vcpu->stat->nmi_window_exits;
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 
 	return 1;
@@ -6062,7 +6062,7 @@ static int handle_notify(struct kvm_vcpu *vcpu)
 	unsigned long exit_qual = vmx_get_exit_qual(vcpu);
 	bool context_invalid = exit_qual & NOTIFY_VM_CONTEXT_INVALID;
 
-	++vcpu->stat.notify_window_exits;
+	++vcpu->stat->notify_window_exits;
 
 	/*
 	 * Notify VM exit happened while executing iret from NMI,
@@ -6666,7 +6666,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 			return;
 	}
 
-	vcpu->stat.l1d_flush++;
+	vcpu->stat->l1d_flush++;
 
 	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
 		native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
@@ -7450,7 +7450,7 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 		 */
 		if (vmx->nested.nested_run_pending &&
 		    !vmx_get_exit_reason(vcpu).failed_vmentry)
-			++vcpu->stat.nested_run;
+			++vcpu->stat->nested_run;
 
 		vmx->nested.nested_run_pending = 0;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 98a36df7cf62..2c8bdb139b75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -949,7 +949,7 @@ static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
 
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
-	++vcpu->stat.pf_guest;
+	++vcpu->stat->pf_guest;
 
 	/*
 	 * Async #PF in L2 is always forwarded to L1 as a VM-Exit regardless of
@@ -3607,7 +3607,7 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 
 static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
-	++vcpu->stat.tlb_flush;
+	++vcpu->stat->tlb_flush;
 	kvm_x86_call(flush_tlb_all)(vcpu);
 
 	/* Flushing all ASIDs flushes the current ASID... */
@@ -3616,7 +3616,7 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
 
 static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
 {
-	++vcpu->stat.tlb_flush;
+	++vcpu->stat->tlb_flush;
 
 	if (!tdp_enabled) {
 		/*
@@ -3641,7 +3641,7 @@ static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
 
 static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
 {
-	++vcpu->stat.tlb_flush;
+	++vcpu->stat->tlb_flush;
 	kvm_x86_call(flush_tlb_current)(vcpu);
 }
 
@@ -5067,11 +5067,11 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	 * preempted if and only if the VM-Exit was due to a host interrupt.
 	 */
 	if (!vcpu->arch.at_instruction_boundary) {
-		vcpu->stat.preemption_other++;
+		vcpu->stat->preemption_other++;
 		return;
 	}
 
-	vcpu->stat.preemption_reported++;
+	vcpu->stat->preemption_reported++;
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
@@ -8874,7 +8874,7 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 {
 	struct kvm *kvm = vcpu->kvm;
 
-	++vcpu->stat.insn_emulation_fail;
+	++vcpu->stat->insn_emulation_fail;
 	trace_kvm_emulate_insn_failed(vcpu);
 
 	if (emulation_type & EMULTYPE_VMWARE_GP) {
@@ -9119,7 +9119,7 @@ int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
 	r = x86_decode_insn(ctxt, insn, insn_len, emulation_type);
 
 	trace_kvm_emulate_insn_start(vcpu);
-	++vcpu->stat.insn_emulation;
+	++vcpu->stat->insn_emulation;
 
 	return r;
 }
@@ -9285,7 +9285,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		}
 		r = 0;
 	} else if (vcpu->mmio_needed) {
-		++vcpu->stat.mmio_exits;
+		++vcpu->stat->mmio_exits;
 
 		if (!vcpu->mmio_is_write)
 			writeback = false;
@@ -10011,7 +10011,7 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 	struct kvm_vcpu *target = NULL;
 	struct kvm_apic_map *map;
 
-	vcpu->stat.directed_yield_attempted++;
+	vcpu->stat->directed_yield_attempted++;
 
 	if (single_task_running())
 		goto no_yield;
@@ -10034,7 +10034,7 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 	if (kvm_vcpu_yield_to(target) <= 0)
 		goto no_yield;
 
-	vcpu->stat.directed_yield_successful++;
+	vcpu->stat->directed_yield_successful++;
 
 no_yield:
 	return;
@@ -10061,7 +10061,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
 	unsigned long a3 = kvm_rsi_read(vcpu);
 	int op_64_bit = is_64_bit_hypercall(vcpu);
 
-	++vcpu->stat.hypercalls;
+	++vcpu->stat->hypercalls;
 
 	trace_kvm_hypercall(nr, a0, a1, a2, a3);
 
@@ -10916,7 +10916,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
 	    kvm_xen_has_interrupt(vcpu)) {
-		++vcpu->stat.req_event;
+		++vcpu->stat->req_event;
 		r = kvm_apic_accept_events(vcpu);
 		if (r < 0) {
 			r = 0;
@@ -11048,7 +11048,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		}
 
 		/* Note, VM-Exits that go down the "slow" path are accounted below. */
-		++vcpu->stat.exits;
+		++vcpu->stat->exits;
 	}
 
 	/*
@@ -11099,11 +11099,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
 	 * An instruction is required after local_irq_enable() to fully unblock
 	 * interrupts on processors that implement an interrupt shadow, the
-	 * stat.exits increment will do nicely.
+	 * stat->exits increment will do nicely.
 	 */
 	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
 	local_irq_enable();
-	++vcpu->stat.exits;
+	++vcpu->stat->exits;
 	local_irq_disable();
 	kvm_after_interrupt(vcpu);
 
@@ -11321,7 +11321,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 			kvm_vcpu_ready_for_interrupt_injection(vcpu)) {
 			r = 0;
 			vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
-			++vcpu->stat.request_irq_exits;
+			++vcpu->stat->request_irq_exits;
 			break;
 		}
 
@@ -11346,7 +11346,7 @@ static int __kvm_emulate_halt(struct kvm_vcpu *vcpu, int state, int reason)
 	 * managed by userspace, in which case userspace is responsible for
 	 * handling wake events.
 	 */
-	++vcpu->stat.halt_exits;
+	++vcpu->stat->halt_exits;
 	if (lapic_in_kernel(vcpu)) {
 		if (kvm_vcpu_has_events(vcpu) || vcpu->arch.pv.pv_unhalted)
 			state = KVM_MP_STATE_RUNNABLE;
@@ -11515,7 +11515,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
 	fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false);
-	++vcpu->stat.fpu_reload;
+	++vcpu->stat->fpu_reload;
 	trace_kvm_fpu(0);
 }
 
@@ -11564,7 +11564,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (signal_pending(current)) {
 			r = -EINTR;
 			kvm_run->exit_reason = KVM_EXIT_INTR;
-			++vcpu->stat.signal_exits;
+			++vcpu->stat->signal_exits;
 		}
 		goto out;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dbca418d64f5..d2e0c0e8ff17 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -393,7 +393,8 @@ struct kvm_vcpu {
 	bool ready;
 	bool scheduled_out;
 	struct kvm_vcpu_arch arch;
-	struct kvm_vcpu_stat stat;
+	struct kvm_vcpu_stat *stat;
+	struct kvm_vcpu_stat __stat;
 	char stats_id[KVM_STATS_NAME_SIZE];
 	struct kvm_dirty_ring dirty_ring;
 
@@ -2489,7 +2490,7 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 {
 	vcpu->run->exit_reason = KVM_EXIT_INTR;
-	vcpu->stat.signal_exits++;
+	vcpu->stat->signal_exits++;
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b08fea91dc74..dce89a2f0a31 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3632,7 +3632,7 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
 	bool waited = false;
 
-	vcpu->stat.generic.blocking = 1;
+	vcpu->stat->generic.blocking = 1;
 
 	preempt_disable();
 	kvm_arch_vcpu_blocking(vcpu);
@@ -3654,7 +3654,7 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	kvm_arch_vcpu_unblocking(vcpu);
 	preempt_enable();
 
-	vcpu->stat.generic.blocking = 0;
+	vcpu->stat->generic.blocking = 0;
 
 	return waited;
 }
@@ -3662,16 +3662,16 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
 static inline void update_halt_poll_stats(struct kvm_vcpu *vcpu, ktime_t start,
 					  ktime_t end, bool success)
 {
-	struct kvm_vcpu_stat_generic *stats = &vcpu->stat.generic;
+	struct kvm_vcpu_stat_generic *stats = &vcpu->stat->generic;
 	u64 poll_ns = ktime_to_ns(ktime_sub(end, start));
 
-	++vcpu->stat.generic.halt_attempted_poll;
+	++vcpu->stat->generic.halt_attempted_poll;
 
 	if (success) {
-		++vcpu->stat.generic.halt_successful_poll;
+		++vcpu->stat->generic.halt_successful_poll;
 
 		if (!vcpu_valid_wakeup(vcpu))
-			++vcpu->stat.generic.halt_poll_invalid;
+			++vcpu->stat->generic.halt_poll_invalid;
 
 		stats->halt_poll_success_ns += poll_ns;
 		KVM_STATS_LOG_HIST_UPDATE(stats->halt_poll_success_hist, poll_ns);
@@ -3735,9 +3735,9 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 
 	cur = ktime_get();
 	if (waited) {
-		vcpu->stat.generic.halt_wait_ns +=
+		vcpu->stat->generic.halt_wait_ns +=
 			ktime_to_ns(cur) - ktime_to_ns(poll_end);
-		KVM_STATS_LOG_HIST_UPDATE(vcpu->stat.generic.halt_wait_hist,
+		KVM_STATS_LOG_HIST_UPDATE(vcpu->stat->generic.halt_wait_hist,
 				ktime_to_ns(cur) - ktime_to_ns(poll_end));
 	}
 out:
@@ -3782,7 +3782,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 {
 	if (__kvm_vcpu_wake_up(vcpu)) {
 		WRITE_ONCE(vcpu->ready, true);
-		++vcpu->stat.generic.halt_wakeup;
+		++vcpu->stat->generic.halt_wakeup;
 		return true;
 	}
 
@@ -4174,6 +4174,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	vcpu->run = page_address(page);
 
 	vcpu->plane0 = vcpu;
+	vcpu->stat = &vcpu->__stat;
 	kvm_vcpu_init(vcpu, kvm, id);
 
 	r = kvm_arch_vcpu_create(vcpu);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/29] KVM: anticipate allocation of dirty ring
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (9 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 10/29] KVM: share statistics for same vCPU id on different planes Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 12/29] KVM: share dirty ring for same vCPU id on different planes Paolo Bonzini
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Put together code that deals with data that is shared by all planes:
vcpu->run and dirty ring.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/kvm_main.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dce89a2f0a31..4c7e379fbf7d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4173,20 +4173,20 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	}
 	vcpu->run = page_address(page);
 
+	if (kvm->dirty_ring_size) {
+		r = kvm_dirty_ring_alloc(kvm, &vcpu->dirty_ring,
+					 id, kvm->dirty_ring_size);
+		if (r)
+			goto vcpu_free_run_page;
+	}
+
 	vcpu->plane0 = vcpu;
 	vcpu->stat = &vcpu->__stat;
 	kvm_vcpu_init(vcpu, kvm, id);
 
 	r = kvm_arch_vcpu_create(vcpu);
 	if (r)
-		goto vcpu_free_run_page;
-
-	if (kvm->dirty_ring_size) {
-		r = kvm_dirty_ring_alloc(kvm, &vcpu->dirty_ring,
-					 id, kvm->dirty_ring_size);
-		if (r)
-			goto arch_vcpu_destroy;
-	}
+		goto vcpu_free_dirty_ring;
 
 	mutex_lock(&kvm->lock);
 
@@ -4240,9 +4240,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	xa_erase(&kvm->planes[0]->vcpu_array, vcpu->vcpu_idx);
 unlock_vcpu_destroy:
 	mutex_unlock(&kvm->lock);
-	kvm_dirty_ring_free(&vcpu->dirty_ring);
-arch_vcpu_destroy:
 	kvm_arch_vcpu_destroy(vcpu);
+vcpu_free_dirty_ring:
+	kvm_dirty_ring_free(&vcpu->dirty_ring);
 vcpu_free_run_page:
 	free_page((unsigned long)vcpu->run);
 vcpu_free:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/29] KVM: share dirty ring for same vCPU id on different planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (10 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 11/29] KVM: anticipate allocation of dirty ring Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-21 21:51   ` Tom Lendacky
  2025-04-01 16:10 ` [PATCH 13/29] KVM: implement vCPU creation for extra planes Paolo Bonzini
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

The dirty page ring is read by mmap()-ing the vCPU file descriptor,
which is only possible for plane 0.  This is not a problem because it
is only filled by KVM_RUN which takes the plane-0 vCPU mutex, and it is
therefore possible to share it for vCPUs that have the same id but are
on different planes.  (TODO: double check).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h |  6 ++++--
 virt/kvm/dirty_ring.c    |  5 +++--
 virt/kvm/kvm_main.c      | 10 +++++-----
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d2e0c0e8ff17..b511aed2de8e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -394,9 +394,11 @@ struct kvm_vcpu {
 	bool scheduled_out;
 	struct kvm_vcpu_arch arch;
 	struct kvm_vcpu_stat *stat;
-	struct kvm_vcpu_stat __stat;
 	char stats_id[KVM_STATS_NAME_SIZE];
-	struct kvm_dirty_ring dirty_ring;
+	struct kvm_dirty_ring *dirty_ring;
+
+	struct kvm_vcpu_stat __stat;
+	struct kvm_dirty_ring __dirty_ring;
 
 	/*
 	 * The most recently used memslot by this vCPU and the slots generation
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d14ffc7513ee..66e6a6a67d13 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -172,11 +172,12 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
 {
-	struct kvm_dirty_ring *ring = &vcpu->dirty_ring;
+	struct kvm_dirty_ring *ring = vcpu->dirty_ring;
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
 	WARN_ON_ONCE(kvm_dirty_ring_full(ring));
+	lockdep_assert_held(&vcpu->plane0->mutex);
 
 	entry = &ring->dirty_gfns[ring->dirty_index & (ring->size - 1)];
 
@@ -204,7 +205,7 @@ bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
 	 * the dirty ring is reset by userspace.
 	 */
 	if (kvm_check_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu) &&
-	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+	    kvm_dirty_ring_soft_full(vcpu->dirty_ring)) {
 		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
 		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
 		trace_kvm_dirty_ring_exit(vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4c7e379fbf7d..863fd80ddfbe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -466,7 +466,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_arch_vcpu_destroy(vcpu);
-	kvm_dirty_ring_free(&vcpu->dirty_ring);
+	kvm_dirty_ring_free(vcpu->dirty_ring);
 
 	/*
 	 * No need for rcu_read_lock as VCPU_RUN is the only place that changes
@@ -4038,7 +4038,7 @@ static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf)
 #endif
 	else if (kvm_page_in_dirty_ring(vcpu->kvm, vmf->pgoff))
 		page = kvm_dirty_ring_get_page(
-		    &vcpu->dirty_ring,
+		    vcpu->dirty_ring,
 		    vmf->pgoff - KVM_DIRTY_LOG_PAGE_OFFSET);
 	else
 		return kvm_arch_vcpu_fault(vcpu, vmf);
@@ -4174,7 +4174,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	vcpu->run = page_address(page);
 
 	if (kvm->dirty_ring_size) {
-		r = kvm_dirty_ring_alloc(kvm, &vcpu->dirty_ring,
+		r = kvm_dirty_ring_alloc(kvm, &vcpu->__dirty_ring,
 					 id, kvm->dirty_ring_size);
 		if (r)
 			goto vcpu_free_run_page;
@@ -4242,7 +4242,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	mutex_unlock(&kvm->lock);
 	kvm_arch_vcpu_destroy(vcpu);
 vcpu_free_dirty_ring:
-	kvm_dirty_ring_free(&vcpu->dirty_ring);
+	kvm_dirty_ring_free(&vcpu->__dirty_ring);
 vcpu_free_run_page:
 	free_page((unsigned long)vcpu->run);
 vcpu_free:
@@ -5047,7 +5047,7 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
 	mutex_lock(&kvm->slots_lock);
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
-		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
+		cleared += kvm_dirty_ring_reset(vcpu->kvm, vcpu->dirty_ring);
 
 	mutex_unlock(&kvm->slots_lock);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 13/29] KVM: implement vCPU creation for extra planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (11 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 12/29] KVM: share dirty ring for same vCPU id on different planes Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-21 22:08   ` Tom Lendacky
  2025-04-01 16:10 ` [PATCH 14/29] KVM: pass plane to kvm_arch_vcpu_create Paolo Bonzini
                   ` (18 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

For userspace to have fun with planes it is probably useful to let them
create vCPUs on the non-zero planes as well.  Since such vCPUs are backed
by the same struct kvm_vcpu, these are regular vCPU file descriptors except
that they only allow a small subset of ioctls (mostly get/set) and they
share some of the backing resources, notably vcpu->run.

TODO: prefault might be useful on non-default planes as well?

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/locking.rst |   3 +
 include/linux/kvm_host.h           |   4 +-
 include/uapi/linux/kvm.h           |   1 +
 virt/kvm/kvm_main.c                | 167 +++++++++++++++++++++++------
 4 files changed, 142 insertions(+), 33 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index ae8bce7fecbe..ad22344deb28 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -26,6 +26,9 @@ The acquisition orders for mutexes are as follows:
   are taken on the waiting side when modifying memslots, so MMU notifiers
   must not take either kvm->slots_lock or kvm->slots_arch_lock.
 
+- when VMs have multiple planes, vcpu->mutex for plane 0 can taken
+  outside vcpu->mutex for the same id and another plane
+
 cpus_read_lock() vs kvm_lock:
 
 - Taking cpus_read_lock() outside of kvm_lock is problematic, despite that
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b511aed2de8e..99fd90c5d71b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -343,6 +343,9 @@ struct kvm_vcpu {
 
 	struct mutex mutex;
 
+	/* Only valid on plane 0 */
+	bool wants_to_run;
+
 	/* Shared for all planes */
 	struct kvm_run *run;
 
@@ -388,7 +391,6 @@ struct kvm_vcpu {
 		bool dy_eligible;
 	} spin_loop;
 #endif
-	bool wants_to_run;
 	bool preempted;
 	bool ready;
 	bool scheduled_out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 96d25c7fa18f..24fa002cd7c1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1691,5 +1691,6 @@ struct kvm_pre_fault_memory {
 };
 
 #define KVM_CREATE_PLANE	_IO(KVMIO, 0xd6)
+#define KVM_CREATE_VCPU_PLANE	_IO(KVMIO, 0xd7)
 
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 863fd80ddfbe..06fa2a6ad96f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -438,11 +438,11 @@ void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
 }
 #endif
 
-static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
+static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_plane *plane, unsigned id)
 {
 	mutex_init(&vcpu->mutex);
 	vcpu->cpu = -1;
-	vcpu->kvm = kvm;
+	vcpu->kvm = plane->kvm;
 	vcpu->vcpu_id = id;
 	vcpu->pid = NULL;
 	rwlock_init(&vcpu->pid_lock);
@@ -459,8 +459,13 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	vcpu->last_used_slot = NULL;
 
 	/* Fill the stats id string for the vcpu */
-	snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d",
-		 task_pid_nr(current), id);
+	if (plane->plane) {
+		snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d:%d",
+			 task_pid_nr(current), id, plane->plane);
+	} else {
+		snprintf(vcpu->stats_id, sizeof(vcpu->stats_id), "kvm-%d/vcpu-%d",
+			 task_pid_nr(current), id);
+	}
 }
 
 static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
@@ -475,7 +480,9 @@ static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
 	 */
 	put_pid(vcpu->pid);
 
-	free_page((unsigned long)vcpu->run);
+	if (!vcpu->plane)
+		free_page((unsigned long)vcpu->run);
+
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
@@ -4026,6 +4033,9 @@ static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf)
 	struct kvm_vcpu *vcpu = vmf->vma->vm_file->private_data;
 	struct page *page;
 
+	if (vcpu->plane)
+		return VM_FAULT_SIGBUS;
+
 	if (vmf->pgoff == 0)
 		page = virt_to_page(vcpu->run);
 #ifdef CONFIG_X86
@@ -4113,7 +4123,10 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
 	if (!debugfs_initialized())
 		return;
 
-	snprintf(dir_name, sizeof(dir_name), "vcpu%d", vcpu->vcpu_id);
+	if (vcpu->plane)
+		snprintf(dir_name, sizeof(dir_name), "vcpu%d:%d", vcpu->vcpu_id, vcpu->plane);
+	else
+		snprintf(dir_name, sizeof(dir_name), "vcpu%d", vcpu->vcpu_id);
 	debugfs_dentry = debugfs_create_dir(dir_name,
 					    vcpu->kvm->debugfs_dentry);
 	debugfs_create_file("pid", 0444, debugfs_dentry, vcpu,
@@ -4126,9 +4139,10 @@ static void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
 /*
  * Creates some virtual cpus.  Good luck creating more than one.
  */
-static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
+static int kvm_vm_ioctl_create_vcpu(struct kvm_plane *plane, struct kvm_vcpu *plane0_vcpu, unsigned long id)
 {
 	int r;
+	struct kvm *kvm = plane->kvm;
 	struct kvm_vcpu *vcpu;
 	struct page *page;
 
@@ -4165,24 +4179,33 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 		goto vcpu_decrement;
 	}
 
-	BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE);
-	page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
-	if (!page) {
-		r = -ENOMEM;
-		goto vcpu_free;
-	}
-	vcpu->run = page_address(page);
+	if (plane->plane) {
+		page = NULL;
+		vcpu->run = plane0_vcpu->run;
+	} else {
+		WARN_ON(plane0_vcpu != NULL);
+		plane0_vcpu = vcpu;
 
-	if (kvm->dirty_ring_size) {
-		r = kvm_dirty_ring_alloc(kvm, &vcpu->__dirty_ring,
-					 id, kvm->dirty_ring_size);
-		if (r)
-			goto vcpu_free_run_page;
+		BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE);
+		page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		if (!page) {
+			r = -ENOMEM;
+			goto vcpu_free;
+		}
+		vcpu->run = page_address(page);
+
+		if (kvm->dirty_ring_size) {
+			r = kvm_dirty_ring_alloc(kvm, &vcpu->__dirty_ring,
+						 id, kvm->dirty_ring_size);
+			if (r)
+				goto vcpu_free_run_page;
+		}
 	}
 
-	vcpu->plane0 = vcpu;
-	vcpu->stat = &vcpu->__stat;
-	kvm_vcpu_init(vcpu, kvm, id);
+	vcpu->plane0 = plane0_vcpu;
+	vcpu->stat = &plane0_vcpu->__stat;
+	vcpu->dirty_ring = &plane0_vcpu->__dirty_ring;
+	kvm_vcpu_init(vcpu, plane, id);
 
 	r = kvm_arch_vcpu_create(vcpu);
 	if (r)
@@ -4190,7 +4213,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 
 	mutex_lock(&kvm->lock);
 
-	if (kvm_get_vcpu_by_id(kvm, id)) {
+	if (kvm_get_plane_vcpu_by_id(plane, id)) {
 		r = -EEXIST;
 		goto unlock_vcpu_destroy;
 	}
@@ -4200,8 +4223,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	 * release semantics, which ensures the write is visible to kvm_get_vcpu().
 	 */
 	vcpu->plane = -1;
-	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
-	r = xa_insert(&kvm->planes[0]->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
+	if (plane->plane)
+		vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
+	else
+		vcpu->vcpu_idx = plane0_vcpu->vcpu_idx;
+
+	r = xa_insert(&plane->vcpu_array, vcpu->vcpu_idx,
+		      vcpu, GFP_KERNEL_ACCOUNT);
 	WARN_ON_ONCE(r == -EBUSY);
 	if (r)
 		goto unlock_vcpu_destroy;
@@ -4220,13 +4248,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 	if (r < 0)
 		goto kvm_put_xa_erase;
 
-	atomic_inc(&kvm->online_vcpus);
+	if (!plane0_vcpu)
+		atomic_inc(&kvm->online_vcpus);
 
 	/*
 	 * Pairs with xa_load() in kvm_get_vcpu, ensuring that online_vcpus
 	 * is updated before vcpu->plane.
 	 */
-	smp_store_release(&vcpu->plane, 0);
+	smp_store_release(&vcpu->plane, plane->plane);
 	mutex_unlock(&vcpu->mutex);
 
 	mutex_unlock(&kvm->lock);
@@ -4237,14 +4266,15 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 kvm_put_xa_erase:
 	mutex_unlock(&vcpu->mutex);
 	kvm_put_kvm_no_destroy(kvm);
-	xa_erase(&kvm->planes[0]->vcpu_array, vcpu->vcpu_idx);
+	xa_erase(&plane->vcpu_array, vcpu->vcpu_idx);
 unlock_vcpu_destroy:
 	mutex_unlock(&kvm->lock);
 	kvm_arch_vcpu_destroy(vcpu);
 vcpu_free_dirty_ring:
 	kvm_dirty_ring_free(&vcpu->__dirty_ring);
 vcpu_free_run_page:
-	free_page((unsigned long)vcpu->run);
+	if (page)
+		__free_page(page);
 vcpu_free:
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 vcpu_decrement:
@@ -4406,6 +4436,35 @@ static int kvm_plane_ioctl_check_extension(struct kvm_plane *plane, long arg)
 	}
 }
 
+static int kvm_plane_ioctl_create_vcpu(struct kvm_plane *plane, long arg)
+{
+	int r = -EINVAL;
+	struct file *file;
+	struct kvm_vcpu *vcpu;
+	int fd;
+
+	if (arg != (int)arg)
+		return -EBADF;
+
+	fd = arg;
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+
+	if (file->f_op != &kvm_vcpu_fops)
+		goto err;
+
+	vcpu = file->private_data;
+	if (vcpu->kvm != plane->kvm)
+		goto err;
+
+	r = kvm_vm_ioctl_create_vcpu(plane, vcpu, vcpu->vcpu_id);
+
+err:
+	fput(file);
+	return r;
+}
+
 static long __kvm_plane_ioctl(struct kvm_plane *plane, unsigned int ioctl,
 			      unsigned long arg)
 {
@@ -4432,6 +4491,8 @@ static long __kvm_plane_ioctl(struct kvm_plane *plane, unsigned int ioctl,
 #endif
 	case KVM_CHECK_EXTENSION:
 		return kvm_plane_ioctl_check_extension(plane, arg);
+	case KVM_CREATE_VCPU_PLANE:
+		return kvm_plane_ioctl_create_vcpu(plane, arg);
 	default:
 		return -ENOTTY;
 	}
@@ -4463,6 +4524,44 @@ static struct file_operations kvm_plane_fops = {
 };
 
 
+static inline bool kvm_arch_is_vcpu_plane_ioctl(unsigned ioctl)
+{
+	switch (ioctl) {
+	case KVM_GET_DEBUGREGS:
+	case KVM_SET_DEBUGREGS:
+	case KVM_GET_FPU:
+	case KVM_SET_FPU:
+	case KVM_GET_LAPIC:
+	case KVM_SET_LAPIC:
+	case KVM_GET_MSRS:
+	case KVM_SET_MSRS:
+	case KVM_GET_NESTED_STATE:
+	case KVM_SET_NESTED_STATE:
+	case KVM_GET_ONE_REG:
+	case KVM_SET_ONE_REG:
+	case KVM_GET_REGS:
+	case KVM_SET_REGS:
+	case KVM_GET_SREGS:
+	case KVM_SET_SREGS:
+	case KVM_GET_SREGS2:
+	case KVM_SET_SREGS2:
+	case KVM_GET_VCPU_EVENTS:
+	case KVM_SET_VCPU_EVENTS:
+	case KVM_GET_XCRS:
+	case KVM_SET_XCRS:
+	case KVM_GET_XSAVE:
+	case KVM_GET_XSAVE2:
+	case KVM_SET_XSAVE:
+
+	case KVM_GET_REG_LIST:
+	case KVM_TRANSLATE:
+		return true;
+
+	default:
+		return false;
+	}
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
@@ -4475,6 +4574,9 @@ static long kvm_vcpu_ioctl(struct file *filp,
 	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
 		return -EIO;
 
+	if (vcpu->plane && !kvm_arch_is_vcpu_plane_ioctl(ioctl))
+		return -EINVAL;
+
 	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
 		return -EINVAL;
 
@@ -4958,7 +5060,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_PLANES:
 		if (kvm)
 			return kvm_arch_nr_vcpu_planes(kvm);
-		return KVM_MAX_PLANES;
+		return KVM_MAX_VCPU_PLANES;
 	case KVM_CAP_PLANES_FPU:
 		return kvm_arch_planes_share_fpu(kvm);
 #endif
@@ -5201,7 +5303,8 @@ static int kvm_vm_ioctl_create_plane(struct kvm *kvm, unsigned id)
 	struct file *file;
 	int r, fd;
 
-	if (id >= KVM_MAX_VCPU_PLANES)
+	if (id >= kvm_arch_nr_vcpu_planes(kvm)
+	    || WARN_ON_ONCE(id >= KVM_MAX_VCPU_PLANES))
 		return -EINVAL;
 
 	guard(mutex)(&kvm->lock);
@@ -5259,7 +5362,7 @@ static long kvm_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_create_plane(kvm, arg);
 		break;
 	case KVM_CREATE_VCPU:
-		r = kvm_vm_ioctl_create_vcpu(kvm, arg);
+		r = kvm_vm_ioctl_create_vcpu(kvm->planes[0], NULL, arg);
 		break;
 	case KVM_ENABLE_CAP: {
 		struct kvm_enable_cap cap;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 14/29] KVM: pass plane to kvm_arch_vcpu_create
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (12 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 13/29] KVM: implement vCPU creation for extra planes Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 15/29] KVM: x86: pass vcpu to kvm_pv_send_ipi() Paolo Bonzini
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Pass the plane to architecture-specific code, so that it can also share
backing data between plane 0 and the non-zero planes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/arm64/kvm/arm.c       | 2 +-
 arch/loongarch/kvm/vcpu.c  | 2 +-
 arch/mips/kvm/mips.c       | 2 +-
 arch/powerpc/kvm/powerpc.c | 2 +-
 arch/riscv/kvm/vcpu.c      | 2 +-
 arch/s390/kvm/kvm-s390.c   | 2 +-
 arch/x86/kvm/x86.c         | 2 +-
 include/linux/kvm_host.h   | 2 +-
 virt/kvm/kvm_main.c        | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94fae442a8b8..3df9a7c164a3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -427,7 +427,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	return 0;
 }
 
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	int err;
 
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 470c79e79281..71b0fd05917f 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -1479,7 +1479,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	return 0;
 }
 
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	unsigned long timer_hz;
 	struct loongarch_csrs *csr;
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 77637d201699..fec95594c041 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -275,7 +275,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	return 0;
 }
 
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	int err, size;
 	void *gebase, *p, *handler, *refill_start, *refill_end;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a39919dbaffb..359ca3924461 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -762,7 +762,7 @@ static enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	int err;
 
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 55fb16307cc6..0f114c01484e 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -107,7 +107,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	return 0;
 }
 
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	int rc;
 	struct kvm_cpu_context *cntx;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 46759021e924..8e3f8bc04a42 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3970,7 +3970,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	return 0;
 }
 
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	struct sie_page *sie_page;
 	int rc;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c8bdb139b75..9f699f056ce6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12293,7 +12293,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	return kvm_x86_call(vcpu_precreate)(kvm);
 }
 
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	struct page *page;
 	int r;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 99fd90c5d71b..16a8b3adb76d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1622,7 +1622,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id);
-int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu);
+int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane);
 void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 06fa2a6ad96f..cb04fe6f8a2c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4207,7 +4207,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm_plane *plane, struct kvm_vcpu *pl
 	vcpu->dirty_ring = &plane0_vcpu->__dirty_ring;
 	kvm_vcpu_init(vcpu, plane, id);
 
-	r = kvm_arch_vcpu_create(vcpu);
+	r = kvm_arch_vcpu_create(vcpu, plane);
 	if (r)
 		goto vcpu_free_dirty_ring;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 15/29] KVM: x86: pass vcpu to kvm_pv_send_ipi()
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (13 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 14/29] KVM: pass plane to kvm_arch_vcpu_create Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 16/29] KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit Paolo Bonzini
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/lapic.c            | 4 ++--
 arch/x86/kvm/x86.c              | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8240f565a764..e29694a97a19 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2334,7 +2334,7 @@ int kvm_cpu_get_extint(struct kvm_vcpu *v);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 
-int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
+int kvm_pv_send_ipi(struct kvm_vcpu *source, unsigned long ipi_bitmap_low,
 		    unsigned long ipi_bitmap_high, u32 min,
 		    unsigned long icr, int op_64_bit);
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d8d11d9fd30a..c078269f7b1d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -861,7 +861,7 @@ static int __pv_send_ipi(unsigned long *ipi_bitmap, struct kvm_apic_map *map,
 	return count;
 }
 
-int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
+int kvm_pv_send_ipi(struct kvm_vcpu *source, unsigned long ipi_bitmap_low,
 		    unsigned long ipi_bitmap_high, u32 min,
 		    unsigned long icr, int op_64_bit)
 {
@@ -879,7 +879,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
 	irq.trig_mode = icr & APIC_INT_LEVELTRIG;
 
 	rcu_read_lock();
-	map = rcu_dereference(kvm->arch.apic_map);
+	map = rcu_dereference(source->kvm->arch.apic_map);
 
 	count = -EOPNOTSUPP;
 	if (likely(map)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9f699f056ce6..a527a425c55d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10101,7 +10101,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
 		if (!guest_pv_has(vcpu, KVM_FEATURE_PV_SEND_IPI))
 			break;
 
-		ret = kvm_pv_send_ipi(vcpu->kvm, a0, a1, a2, a3, op_64_bit);
+		ret = kvm_pv_send_ipi(vcpu, a0, a1, a2, a3, op_64_bit);
 		break;
 	case KVM_HC_SCHED_YIELD:
 		if (!guest_pv_has(vcpu, KVM_FEATURE_PV_SCHED_YIELD))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 16/29] KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (14 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 15/29] KVM: x86: pass vcpu to kvm_pv_send_ipi() Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 17/29] KVM: x86: block creating irqchip if planes are active Paolo Bonzini
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a527a425c55d..f70d9a572455 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10637,6 +10637,7 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 				      enum kvm_apicv_inhibit reason, bool set)
 {
 	unsigned long old, new;
+	bool changed;
 
 	lockdep_assert_held_write(&kvm->arch.apicv_update_lock);
 
@@ -10644,10 +10645,10 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 		return;
 
 	old = new = kvm->arch.apicv_inhibit_reasons;
-
 	set_or_clear_apicv_inhibit(&new, reason, set);
+	changed = (!!old != !!new);
 
-	if (!!old != !!new) {
+	if (changed) {
 		/*
 		 * Kick all vCPUs before setting apicv_inhibit_reasons to avoid
 		 * false positives in the sanity check WARN in vcpu_enter_guest().
@@ -10661,16 +10662,16 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 		 * servicing the request with a stale apicv_inhibit_reasons.
 		 */
 		kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
-		kvm->arch.apicv_inhibit_reasons = new;
-		if (new) {
-			unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
-			int idx = srcu_read_lock(&kvm->srcu);
+	}
 
-			kvm_zap_gfn_range(kvm, gfn, gfn+1);
-			srcu_read_unlock(&kvm->srcu, idx);
-		}
-	} else {
-		kvm->arch.apicv_inhibit_reasons = new;
+	kvm->arch.apicv_inhibit_reasons = new;
+
+	if (changed && set) {
+		unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
+		int idx = srcu_read_lock(&kvm->srcu);
+
+		kvm_zap_gfn_range(kvm, gfn, gfn+1);
+		srcu_read_unlock(&kvm->srcu, idx);
 	}
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 17/29] KVM: x86: block creating irqchip if planes are active
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (15 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 16/29] KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 18/29] KVM: x86: track APICv inhibits per plane Paolo Bonzini
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Force creating the irqchip before planes, so that APICV_INHIBIT_REASON_ABSENT
only needs to be removed from plane 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst | 6 ++++--
 arch/x86/kvm/x86.c             | 4 ++--
 include/linux/kvm_host.h       | 1 +
 virt/kvm/kvm_main.c            | 1 +
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e1c67bc6df47..16d836b954dc 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -882,6 +882,8 @@ On s390, a dummy irq routing table is created.
 Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
 before KVM_CREATE_IRQCHIP can be used.
 
+The interrupt controller must be created before any extra VM planes.
+
 
 4.25 KVM_IRQ_LINE
 -----------------
@@ -7792,8 +7794,8 @@ used in the IRQ routing table.  The first args[0] MSI routes are reserved
 for the IOAPIC pins.  Whenever the LAPIC receives an EOI for these routes,
 a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace.
 
-Fails if VCPU has already been created, or if the irqchip is already in the
-kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
+Fails if VCPUs or planes have already been created, or if the irqchip is
+already in the kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
 
 7.6 KVM_CAP_S390_RI
 -------------------
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f70d9a572455..653886e6e1c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6561,7 +6561,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = -EEXIST;
 		if (irqchip_in_kernel(kvm))
 			goto split_irqchip_unlock;
-		if (kvm->created_vcpus)
+		if (kvm->created_vcpus || kvm->has_planes)
 			goto split_irqchip_unlock;
 		/* Pairs with irqchip_in_kernel. */
 		smp_wmb();
@@ -7087,7 +7087,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 			goto create_irqchip_unlock;
 
 		r = -EINVAL;
-		if (kvm->created_vcpus)
+		if (kvm->created_vcpus || kvm->has_planes)
 			goto create_irqchip_unlock;
 
 		r = kvm_pic_init(kvm);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 16a8b3adb76d..152dc5845309 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -883,6 +883,7 @@ struct kvm {
 	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
+	bool has_planes;
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
 	struct notifier_block pm_notifier;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cb04fe6f8a2c..db38894f6fa3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5316,6 +5316,7 @@ static int kvm_vm_ioctl_create_plane(struct kvm *kvm, unsigned id)
 		return fd;
 
 	plane = kvm_create_vm_plane(kvm, id);
+	kvm->has_planes = true;
 	if (IS_ERR(plane)) {
 		r = PTR_ERR(plane);
 		goto put_fd;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 18/29] KVM: x86: track APICv inhibits per plane
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (16 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 17/29] KVM: x86: block creating irqchip if planes are active Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 19/29] KVM: x86: move APIC map to kvm_arch_plane Paolo Bonzini
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

As a first step towards per-plane APIC maps, track APICv inhibits per
plane.  Most of the inhibits are set or cleared when building the map,
and the virtual machine as a whole will have the OR of the inhibits
of the individual plane.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 21 +++++----
 arch/x86/kvm/hyperv.c           |  2 +-
 arch/x86/kvm/i8254.c            |  4 +-
 arch/x86/kvm/lapic.c            | 15 +++---
 arch/x86/kvm/svm/sev.c          |  2 +-
 arch/x86/kvm/svm/svm.c          |  3 +-
 arch/x86/kvm/x86.c              | 83 +++++++++++++++++++++++++--------
 include/linux/kvm_host.h        |  2 +-
 8 files changed, 90 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e29694a97a19..d07ab048d7cc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1087,6 +1087,7 @@ struct kvm_arch_memory_slot {
 };
 
 struct kvm_arch_plane {
+	unsigned long apicv_inhibit_reasons;
 };
 
 /*
@@ -1299,11 +1300,13 @@ enum kvm_apicv_inhibit {
 	/*
 	 * PIT (i8254) 're-inject' mode, relies on EOI intercept,
 	 * which AVIC doesn't support for edge triggered interrupts.
+	 * Applied only to plane 0.
 	 */
 	APICV_INHIBIT_REASON_PIT_REINJ,
 
 	/*
-	 * AVIC is disabled because SEV doesn't support it.
+	 * AVIC is disabled because SEV doesn't support it.  Sticky and applied
+	 * only to plane 0.
 	 */
 	APICV_INHIBIT_REASON_SEV,
 
@@ -2232,21 +2235,21 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
 bool kvm_apicv_activated(struct kvm *kvm);
 bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
 void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
-void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+void __kvm_set_or_clear_apicv_inhibit(struct kvm_plane *plane,
 				      enum kvm_apicv_inhibit reason, bool set);
-void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+void kvm_set_or_clear_apicv_inhibit(struct kvm_plane *plane,
 				    enum kvm_apicv_inhibit reason, bool set);
 
-static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
+static inline void kvm_set_apicv_inhibit(struct kvm_plane *plane,
 					 enum kvm_apicv_inhibit reason)
 {
-	kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
+	kvm_set_or_clear_apicv_inhibit(plane, reason, true);
 }
 
-static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
+static inline void kvm_clear_apicv_inhibit(struct kvm_plane *plane,
 					   enum kvm_apicv_inhibit reason)
 {
-	kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
+	kvm_set_or_clear_apicv_inhibit(plane, reason, false);
 }
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
@@ -2360,8 +2363,8 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
 				       unsigned long *vcpu_bitmap);
 
-static inline void kvm_arch_init_plane(struct kvm_plane *plane) {}
-static inline void kvm_arch_free_plane(struct kvm_plane *plane) {}
+void kvm_arch_init_plane(struct kvm_plane *plane);
+void kvm_arch_free_plane(struct kvm_plane *plane);
 
 bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work);
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index c6592e7f40a2..a522b467be48 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -145,7 +145,7 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
 	 * Inhibit APICv if any vCPU is using SynIC's AutoEOI, which relies on
 	 * the hypervisor to manually inject IRQs.
 	 */
-	__kvm_set_or_clear_apicv_inhibit(vcpu->kvm,
+	__kvm_set_or_clear_apicv_inhibit(vcpu_to_plane(vcpu),
 					 APICV_INHIBIT_REASON_HYPERV,
 					 !!hv->synic_auto_eoi_used);
 
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index e3a3e7b90c26..ded1a9565c36 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -306,13 +306,13 @@ void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject)
 	 * So, deactivate APICv when PIT is in reinject mode.
 	 */
 	if (reinject) {
-		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PIT_REINJ);
+		kvm_set_apicv_inhibit(kvm->planes[0], APICV_INHIBIT_REASON_PIT_REINJ);
 		/* The initial state is preserved while ps->reinject == 0. */
 		kvm_pit_reset_reinject(pit);
 		kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
 		kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
 	} else {
-		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PIT_REINJ);
+		kvm_clear_apicv_inhibit(kvm->planes[0], APICV_INHIBIT_REASON_PIT_REINJ);
 		kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
 		kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
 	}
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c078269f7b1d..4077c8d1e37e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -377,6 +377,7 @@ enum {
 
 static void kvm_recalculate_apic_map(struct kvm *kvm)
 {
+	struct kvm_plane *plane = kvm->planes[0];
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
@@ -456,19 +457,19 @@ static void kvm_recalculate_apic_map(struct kvm *kvm)
 	 * map also applies to APICv.
 	 */
 	if (!new)
-		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
+		kvm_set_apicv_inhibit(plane, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
 	else
-		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
+		kvm_clear_apicv_inhibit(plane, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
 
 	if (!new || new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
-		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+		kvm_set_apicv_inhibit(plane, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
 	else
-		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+		kvm_clear_apicv_inhibit(plane, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
 
 	if (xapic_id_mismatch)
-		kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);
+		kvm_set_apicv_inhibit(plane, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);
 	else
-		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);
+		kvm_clear_apicv_inhibit(plane, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);
 
 	old = rcu_dereference_protected(kvm->arch.apic_map,
 			lockdep_is_held(&kvm->arch.apic_map_lock));
@@ -2630,7 +2631,7 @@ static void __kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value)
 
 	if ((value & MSR_IA32_APICBASE_ENABLE) &&
 	     apic->base_address != APIC_DEFAULT_PHYS_BASE) {
-		kvm_set_apicv_inhibit(apic->vcpu->kvm,
+		kvm_set_apicv_inhibit(vcpu_to_plane(vcpu),
 				      APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
 	}
 }
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 827dbe4d2b3b..130d895f1d95 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -458,7 +458,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 	INIT_LIST_HEAD(&sev->mirror_vms);
 	sev->need_init = false;
 
-	kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV);
+	kvm_set_apicv_inhibit(kvm->planes[0], APICV_INHIBIT_REASON_SEV);
 
 	return 0;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f6a435ff7e2d..917bfe8db101 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3926,7 +3926,8 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
 		 * the VM wide AVIC inhibition.
 		 */
 		if (!is_guest_mode(vcpu))
-			kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
+			kvm_set_apicv_inhibit(vcpu_to_plane(vcpu),
+					      APICV_INHIBIT_REASON_IRQWIN);
 
 		svm_set_vintr(svm);
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 653886e6e1c8..382d8ace131f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6567,7 +6567,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		smp_wmb();
 		kvm->arch.irqchip_mode = KVM_IRQCHIP_SPLIT;
 		kvm->arch.nr_reserved_ioapic_pins = cap->args[0];
-		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_ABSENT);
+		kvm_clear_apicv_inhibit(kvm->planes[0], APICV_INHIBIT_REASON_ABSENT);
 		r = 0;
 split_irqchip_unlock:
 		mutex_unlock(&kvm->lock);
@@ -7109,7 +7109,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 		/* Write kvm->irq_routing before enabling irqchip_in_kernel. */
 		smp_wmb();
 		kvm->arch.irqchip_mode = KVM_IRQCHIP_KERNEL;
-		kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_ABSENT);
+		kvm_clear_apicv_inhibit(kvm->planes[0], APICV_INHIBIT_REASON_ABSENT);
 	create_irqchip_unlock:
 		mutex_unlock(&kvm->lock);
 		break;
@@ -9996,14 +9996,18 @@ static void set_or_clear_apicv_inhibit(unsigned long *inhibits,
 	trace_kvm_apicv_inhibit_changed(reason, set, *inhibits);
 }
 
-static void kvm_apicv_init(struct kvm *kvm)
+static void kvm_apicv_init(struct kvm *kvm, unsigned long *apicv_inhibit_reasons)
 {
-	enum kvm_apicv_inhibit reason = enable_apicv ? APICV_INHIBIT_REASON_ABSENT :
-						       APICV_INHIBIT_REASON_DISABLED;
+	enum kvm_apicv_inhibit reason;
 
-	set_or_clear_apicv_inhibit(&kvm->arch.apicv_inhibit_reasons, reason, true);
+	if (!enable_apicv)
+		reason = APICV_INHIBIT_REASON_DISABLED;
+	else if (!irqchip_kernel(kvm))
+		reason = APICV_INHIBIT_REASON_ABSENT;
+	else
+		return;
 
-	init_rwsem(&kvm->arch.apicv_update_lock);
+	set_or_clear_apicv_inhibit(apicv_inhibit_reasons, reason, true);
 }
 
 static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
@@ -10633,10 +10637,22 @@ static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	__kvm_vcpu_update_apicv(vcpu);
 }
 
-void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+static bool kvm_compute_apicv_inhibit(struct kvm *kvm,
+				      enum kvm_apicv_inhibit reason)
+{
+	int i;
+	for (i = 0; i < KVM_MAX_VCPU_PLANES; i++)
+		if (test_bit(reason, &kvm->planes[i]->arch.apicv_inhibit_reasons))
+			return true;
+
+	return false;
+}
+
+void __kvm_set_or_clear_apicv_inhibit(struct kvm_plane *plane,
 				      enum kvm_apicv_inhibit reason, bool set)
 {
-	unsigned long old, new;
+	struct kvm *kvm = plane->kvm;
+	unsigned long local, global;
 	bool changed;
 
 	lockdep_assert_held_write(&kvm->arch.apicv_update_lock);
@@ -10644,9 +10660,24 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 	if (!(kvm_x86_ops.required_apicv_inhibits & BIT(reason)))
 		return;
 
-	old = new = kvm->arch.apicv_inhibit_reasons;
-	set_or_clear_apicv_inhibit(&new, reason, set);
-	changed = (!!old != !!new);
+	local = plane->arch.apicv_inhibit_reasons;
+	set_or_clear_apicv_inhibit(&local, reason, set);
+
+	/* Could this flip change the global state? */
+	global = kvm->arch.apicv_inhibit_reasons;
+	if ((local & BIT(reason)) == (global & BIT(reason))) {
+		/* Easy case 1, the bit is now the same as for the whole VM.  */
+		changed = false;
+	} else if (set) {
+		/* Easy case 2, maybe the bit flipped globally from clear to set?  */
+		changed = !global;
+		set_or_clear_apicv_inhibit(&global, reason, set);
+	} else {
+		/* Harder case, check if no other plane had this inhibit.  */
+		set = kvm_compute_apicv_inhibit(kvm, reason);
+		set_or_clear_apicv_inhibit(&global, reason, set);
+		changed = !global;
+	}
 
 	if (changed) {
 		/*
@@ -10664,7 +10695,8 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 		kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
 	}
 
-	kvm->arch.apicv_inhibit_reasons = new;
+	plane->arch.apicv_inhibit_reasons = local;
+	kvm->arch.apicv_inhibit_reasons = global;
 
 	if (changed && set) {
 		unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
@@ -10675,14 +10707,17 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 	}
 }
 
-void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+void kvm_set_or_clear_apicv_inhibit(struct kvm_plane *plane,
 				    enum kvm_apicv_inhibit reason, bool set)
 {
+	struct kvm *kvm;
+
 	if (!enable_apicv)
 		return;
 
+	kvm = plane->kvm;
 	down_write(&kvm->arch.apicv_update_lock);
-	__kvm_set_or_clear_apicv_inhibit(kvm, reason, set);
+	__kvm_set_or_clear_apicv_inhibit(plane, reason, set);
 	up_write(&kvm->arch.apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_set_or_clear_apicv_inhibit);
@@ -12083,24 +12118,26 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return ret;
 }
 
-static void kvm_arch_vcpu_guestdbg_update_apicv_inhibit(struct kvm *kvm)
+static void kvm_arch_vcpu_guestdbg_update_apicv_inhibit(struct kvm_plane *plane)
 {
 	bool set = false;
+	struct kvm *kvm;
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
 
 	if (!enable_apicv)
 		return;
 
+	kvm = plane->kvm;
 	down_write(&kvm->arch.apicv_update_lock);
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
+	kvm_for_each_plane_vcpu(i, vcpu, plane) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_BLOCKIRQ) {
 			set = true;
 			break;
 		}
 	}
-	__kvm_set_or_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_BLOCKIRQ, set);
+	__kvm_set_or_clear_apicv_inhibit(plane, APICV_INHIBIT_REASON_BLOCKIRQ, set);
 	up_write(&kvm->arch.apicv_update_lock);
 }
 
@@ -12156,7 +12193,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 
 	kvm_x86_call(update_exception_bitmap)(vcpu);
 
-	kvm_arch_vcpu_guestdbg_update_apicv_inhibit(vcpu->kvm);
+	kvm_arch_vcpu_guestdbg_update_apicv_inhibit(vcpu_to_plane(vcpu));
 
 	r = 0;
 
@@ -12732,6 +12769,11 @@ void kvm_arch_free_vm(struct kvm *kvm)
 }
 
 
+void kvm_arch_init_plane(struct kvm_plane *plane)
+{
+	kvm_apicv_init(plane->kvm, &plane->arch.apicv_inhibit_reasons);
+}
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	int ret;
@@ -12767,6 +12809,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	set_bit(KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
 		&kvm->arch.irq_sources_bitmap);
 
+	init_rwsem(&kvm->arch.apicv_update_lock);
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
@@ -12789,7 +12832,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
 
-	kvm_apicv_init(kvm);
+	kvm_apicv_init(kvm, &kvm->arch.apicv_inhibit_reasons);
 	kvm_hv_init_vm(kvm);
 	kvm_xen_init_vm(kvm);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 152dc5845309..5cade1c04646 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -943,7 +943,7 @@ static inline struct kvm_plane *vcpu_to_plane(struct kvm_vcpu *vcpu)
 #else
 static inline struct kvm_plane *vcpu_to_plane(struct kvm_vcpu *vcpu)
 {
-	return vcpu->kvm->planes[vcpu->plane_id];
+	return vcpu->kvm->planes[vcpu->plane];
 }
 #endif
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 19/29] KVM: x86: move APIC map to kvm_arch_plane
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (17 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 18/29] KVM: x86: track APICv inhibits per plane Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 20/29] KVM: x86: add planes support for interrupt delivery Paolo Bonzini
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

IRQs need to be directed to the appropriate plane (typically, but not
always, the same as the vCPU that is running).  Because each plane has
a separate struct kvm_vcpu *, the map that holds the pointers to them
must be individual to the plane as well.

This works fine as long as all IRQs (even those directed at multiple CPUs)
only target a single plane.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +--
 arch/x86/kvm/lapic.c            | 94 +++++++++++++++++++--------------
 arch/x86/kvm/svm/sev.c          |  2 +-
 arch/x86/kvm/x86.c              | 10 ++--
 4 files changed, 67 insertions(+), 46 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d07ab048d7cc..f832352cf4d3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1087,6 +1087,10 @@ struct kvm_arch_memory_slot {
 };
 
 struct kvm_arch_plane {
+	struct mutex apic_map_lock;
+	struct kvm_apic_map __rcu *apic_map;
+	atomic_t apic_map_dirty;
+
 	unsigned long apicv_inhibit_reasons;
 };
 
@@ -1381,9 +1385,6 @@ struct kvm_arch {
 	struct kvm_ioapic *vioapic;
 	struct kvm_pit *vpit;
 	atomic_t vapics_in_nmi_mode;
-	struct mutex apic_map_lock;
-	struct kvm_apic_map __rcu *apic_map;
-	atomic_t apic_map_dirty;
 
 	bool apic_access_memslot_enabled;
 	bool apic_access_memslot_inhibited;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4077c8d1e37e..6ed5f5b4f878 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -375,9 +375,9 @@ enum {
 	DIRTY
 };
 
-static void kvm_recalculate_apic_map(struct kvm *kvm)
+static void kvm_recalculate_apic_map(struct kvm_plane *plane)
 {
-	struct kvm_plane *plane = kvm->planes[0];
+	struct kvm *kvm = plane->kvm;
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
@@ -385,27 +385,27 @@ static void kvm_recalculate_apic_map(struct kvm *kvm)
 	bool xapic_id_mismatch;
 	int r;
 
-	/* Read kvm->arch.apic_map_dirty before kvm->arch.apic_map.  */
-	if (atomic_read_acquire(&kvm->arch.apic_map_dirty) == CLEAN)
+	/* Read plane->arch.apic_map_dirty before plane->arch.apic_map.  */
+	if (atomic_read_acquire(&plane->arch.apic_map_dirty) == CLEAN)
 		return;
 
 	WARN_ONCE(!irqchip_in_kernel(kvm),
 		  "Dirty APIC map without an in-kernel local APIC");
 
-	mutex_lock(&kvm->arch.apic_map_lock);
+	mutex_lock(&plane->arch.apic_map_lock);
 
 retry:
 	/*
-	 * Read kvm->arch.apic_map_dirty before kvm->arch.apic_map (if clean)
+	 * Read plane->arch.apic_map_dirty before plane->arch.apic_map (if clean)
 	 * or the APIC registers (if dirty).  Note, on retry the map may have
 	 * not yet been marked dirty by whatever task changed a vCPU's x2APIC
 	 * ID, i.e. the map may still show up as in-progress.  In that case
 	 * this task still needs to retry and complete its calculation.
 	 */
-	if (atomic_cmpxchg_acquire(&kvm->arch.apic_map_dirty,
+	if (atomic_cmpxchg_acquire(&plane->arch.apic_map_dirty,
 				   DIRTY, UPDATE_IN_PROGRESS) == CLEAN) {
 		/* Someone else has updated the map. */
-		mutex_unlock(&kvm->arch.apic_map_lock);
+		mutex_unlock(&plane->arch.apic_map_lock);
 		return;
 	}
 
@@ -418,7 +418,7 @@ static void kvm_recalculate_apic_map(struct kvm *kvm)
 	 */
 	xapic_id_mismatch = false;
 
-	kvm_for_each_vcpu(i, vcpu, kvm)
+	kvm_for_each_plane_vcpu(i, vcpu, plane)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic));
 
@@ -432,7 +432,7 @@ static void kvm_recalculate_apic_map(struct kvm *kvm)
 	new->max_apic_id = max_id;
 	new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
+	kvm_for_each_plane_vcpu(i, vcpu, plane) {
 		if (!kvm_apic_present(vcpu))
 			continue;
 
@@ -471,21 +471,29 @@ static void kvm_recalculate_apic_map(struct kvm *kvm)
 	else
 		kvm_clear_apicv_inhibit(plane, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);
 
-	old = rcu_dereference_protected(kvm->arch.apic_map,
-			lockdep_is_held(&kvm->arch.apic_map_lock));
-	rcu_assign_pointer(kvm->arch.apic_map, new);
+	old = rcu_dereference_protected(plane->arch.apic_map,
+			lockdep_is_held(&plane->arch.apic_map_lock));
+	rcu_assign_pointer(plane->arch.apic_map, new);
 	/*
-	 * Write kvm->arch.apic_map before clearing apic->apic_map_dirty.
+	 * Write plane->arch.apic_map before clearing apic->apic_map_dirty.
 	 * If another update has come in, leave it DIRTY.
 	 */
-	atomic_cmpxchg_release(&kvm->arch.apic_map_dirty,
+	atomic_cmpxchg_release(&plane->arch.apic_map_dirty,
 			       UPDATE_IN_PROGRESS, CLEAN);
-	mutex_unlock(&kvm->arch.apic_map_lock);
+	mutex_unlock(&plane->arch.apic_map_lock);
 
 	if (old)
 		kvfree_rcu(old, rcu);
 
-	kvm_make_scan_ioapic_request(kvm);
+	if (plane->plane == 0)
+		kvm_make_scan_ioapic_request(kvm);
+}
+
+static inline void kvm_mark_apic_map_dirty(struct kvm_vcpu *vcpu)
+{
+	struct kvm_plane *plane = vcpu_to_plane(vcpu);
+
+	atomic_set_release(&plane->arch.apic_map_dirty, DIRTY);
 }
 
 static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
@@ -501,7 +509,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 		else
 			static_branch_inc(&apic_sw_disabled.key);
 
-		atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
+		kvm_mark_apic_map_dirty(apic->vcpu);
 	}
 
 	/* Check if there are APF page ready requests pending */
@@ -514,19 +522,19 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 static inline void kvm_apic_set_xapic_id(struct kvm_lapic *apic, u8 id)
 {
 	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
-	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
+	kvm_mark_apic_map_dirty(apic->vcpu);
 }
 
 static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
 {
 	kvm_lapic_set_reg(apic, APIC_LDR, id);
-	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
+	kvm_mark_apic_map_dirty(apic->vcpu);
 }
 
 static inline void kvm_apic_set_dfr(struct kvm_lapic *apic, u32 val)
 {
 	kvm_lapic_set_reg(apic, APIC_DFR, val);
-	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
+	kvm_mark_apic_map_dirty(apic->vcpu);
 }
 
 static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id)
@@ -537,7 +545,7 @@ static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id)
 
 	kvm_lapic_set_reg(apic, APIC_ID, id);
 	kvm_lapic_set_reg(apic, APIC_LDR, ldr);
-	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
+	kvm_mark_apic_map_dirty(apic->vcpu);
 }
 
 static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
@@ -866,6 +874,7 @@ int kvm_pv_send_ipi(struct kvm_vcpu *source, unsigned long ipi_bitmap_low,
 		    unsigned long ipi_bitmap_high, u32 min,
 		    unsigned long icr, int op_64_bit)
 {
+	struct kvm_plane *plane = vcpu_to_plane(source);
 	struct kvm_apic_map *map;
 	struct kvm_lapic_irq irq = {0};
 	int cluster_size = op_64_bit ? 64 : 32;
@@ -880,7 +889,7 @@ int kvm_pv_send_ipi(struct kvm_vcpu *source, unsigned long ipi_bitmap_low,
 	irq.trig_mode = icr & APIC_INT_LEVELTRIG;
 
 	rcu_read_lock();
-	map = rcu_dereference(source->kvm->arch.apic_map);
+	map = rcu_dereference(plane->arch.apic_map);
 
 	count = -EOPNOTSUPP;
 	if (likely(map)) {
@@ -1152,7 +1161,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
  * means that the interrupt should be dropped.  In this case, *bitmap would be
  * zero and *dst undefined.
  */
-static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
+static inline bool kvm_apic_map_get_dest_lapic(struct kvm_plane *plane,
 		struct kvm_lapic **src, struct kvm_lapic_irq *irq,
 		struct kvm_apic_map *map, struct kvm_lapic ***dst,
 		unsigned long *bitmap)
@@ -1166,7 +1175,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 	} else if (irq->shorthand)
 		return false;
 
-	if (!map || kvm_apic_is_broadcast_dest(kvm, src, irq, map))
+	if (!map || kvm_apic_is_broadcast_dest(plane->kvm, src, irq, map))
 		return false;
 
 	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
@@ -1207,7 +1216,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 				bitmap, 16);
 
 		if (!(*dst)[lowest]) {
-			kvm_apic_disabled_lapic_found(kvm);
+			kvm_apic_disabled_lapic_found(plane->kvm);
 			*bitmap = 0;
 			return true;
 		}
@@ -1221,6 +1230,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map)
 {
+	struct kvm_plane *plane = kvm->planes[0];
 	struct kvm_apic_map *map;
 	unsigned long bitmap;
 	struct kvm_lapic **dst = NULL;
@@ -1228,6 +1238,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 	bool ret;
 
 	*r = -1;
+	if (KVM_BUG_ON(!plane, kvm)) {
+		*r = 0;
+		return true;
+	}
 
 	if (irq->shorthand == APIC_DEST_SELF) {
 		if (KVM_BUG_ON(!src, kvm)) {
@@ -1239,9 +1253,9 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 	}
 
 	rcu_read_lock();
-	map = rcu_dereference(kvm->arch.apic_map);
+	map = rcu_dereference(plane->arch.apic_map);
 
-	ret = kvm_apic_map_get_dest_lapic(kvm, &src, irq, map, &dst, &bitmap);
+	ret = kvm_apic_map_get_dest_lapic(plane, &src, irq, map, &dst, &bitmap);
 	if (ret) {
 		*r = 0;
 		for_each_set_bit(i, &bitmap, 16) {
@@ -1272,6 +1286,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu)
 {
+	struct kvm_plane *plane = kvm->planes[0];
 	struct kvm_apic_map *map;
 	unsigned long bitmap;
 	struct kvm_lapic **dst = NULL;
@@ -1281,9 +1296,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 		return false;
 
 	rcu_read_lock();
-	map = rcu_dereference(kvm->arch.apic_map);
+	map = rcu_dereference(plane->arch.apic_map);
 
-	if (kvm_apic_map_get_dest_lapic(kvm, NULL, irq, map, &dst, &bitmap) &&
+	if (kvm_apic_map_get_dest_lapic(plane, NULL, irq, map, &dst, &bitmap) &&
 			hweight16(bitmap) == 1) {
 		unsigned long i = find_first_bit(&bitmap, 16);
 
@@ -1407,6 +1422,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			      unsigned long *vcpu_bitmap)
 {
+	struct kvm_plane *plane = kvm->planes[0];
 	struct kvm_lapic **dest_vcpu = NULL;
 	struct kvm_lapic *src = NULL;
 	struct kvm_apic_map *map;
@@ -1416,9 +1432,9 @@ void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
 	bool ret;
 
 	rcu_read_lock();
-	map = rcu_dereference(kvm->arch.apic_map);
+	map = rcu_dereference(plane->arch.apic_map);
 
-	ret = kvm_apic_map_get_dest_lapic(kvm, &src, irq, map, &dest_vcpu,
+	ret = kvm_apic_map_get_dest_lapic(plane, &src, irq, map, &dest_vcpu,
 					  &bitmap);
 	if (ret) {
 		for_each_set_bit(i, &bitmap, 16) {
@@ -2420,7 +2436,7 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	 * was toggled, the APIC ID changed, etc...   The maps are marked dirty
 	 * on relevant changes, i.e. this is a nop for most writes.
 	 */
-	kvm_recalculate_apic_map(apic->vcpu->kvm);
+	kvm_recalculate_apic_map(vcpu_to_plane(apic->vcpu));
 
 	return ret;
 }
@@ -2610,7 +2626,7 @@ static void __kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value)
 			kvm_make_request(KVM_REQ_APF_READY, vcpu);
 		} else {
 			static_branch_inc(&apic_hw_disabled.key);
-			atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
+			kvm_mark_apic_map_dirty(apic->vcpu);
 		}
 	}
 
@@ -2657,7 +2673,7 @@ int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated)
 	}
 
 	__kvm_apic_set_base(vcpu, value);
-	kvm_recalculate_apic_map(vcpu->kvm);
+	kvm_recalculate_apic_map(vcpu_to_plane(vcpu));
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_apic_set_base);
@@ -2823,7 +2839,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vcpu->arch.apic_arb_prio = 0;
 	vcpu->arch.apic_attention = 0;
 
-	kvm_recalculate_apic_map(vcpu->kvm);
+	kvm_recalculate_apic_map(vcpu_to_plane(apic->vcpu));
 }
 
 /*
@@ -3115,13 +3131,13 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 
 	r = kvm_apic_state_fixup(vcpu, s, true);
 	if (r) {
-		kvm_recalculate_apic_map(vcpu->kvm);
+		kvm_recalculate_apic_map(vcpu_to_plane(apic->vcpu));
 		return r;
 	}
 	memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
 
-	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
-	kvm_recalculate_apic_map(vcpu->kvm);
+	kvm_mark_apic_map_dirty(apic->vcpu);
+	kvm_recalculate_apic_map(vcpu_to_plane(apic->vcpu));
 	kvm_apic_set_version(vcpu);
 
 	apic_update_ppr(apic);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 130d895f1d95..9d4492862c11 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -458,7 +458,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 	INIT_LIST_HEAD(&sev->mirror_vms);
 	sev->need_init = false;
 
-	kvm_set_apicv_inhibit(kvm->planes[0], APICV_INHIBIT_REASON_SEV);
+	kvm_set_apicv_inhibit(kvm->planes[[0], APICV_INHIBIT_REASON_SEV);
 
 	return 0;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 382d8ace131f..19e3bb33bf7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10021,7 +10021,7 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 		goto no_yield;
 
 	rcu_read_lock();
-	map = rcu_dereference(vcpu->kvm->arch.apic_map);
+	map = rcu_dereference(vcpu_to_plane(vcpu)->arch.apic_map);
 
 	if (likely(map) && dest_id <= map->max_apic_id && map->phys_map[dest_id])
 		target = map->phys_map[dest_id]->vcpu;
@@ -12771,6 +12771,7 @@ void kvm_arch_free_vm(struct kvm *kvm)
 
 void kvm_arch_init_plane(struct kvm_plane *plane)
 {
+	mutex_init(&plane->arch.apic_map_lock);
 	kvm_apicv_init(plane->kvm, &plane->arch.apicv_inhibit_reasons);
 }
 
@@ -12811,7 +12812,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	init_rwsem(&kvm->arch.apicv_update_lock);
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
-	mutex_init(&kvm->arch.apic_map_lock);
 	seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
 	kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
 
@@ -12960,6 +12960,11 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
 	static_call_cond(kvm_x86_vm_pre_destroy)(kvm);
 }
 
+void kvm_arch_free_plane(struct kvm_plane *plane)
+{
+	kvfree(rcu_dereference_check(plane->arch.apic_map, 1));
+}
+
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	if (current->mm == kvm->mm) {
@@ -12981,7 +12986,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
 	kvm_pic_destroy(kvm);
 	kvm_ioapic_destroy(kvm);
-	kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 	kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
 	kvm_mmu_uninit_vm(kvm);
 	kvm_page_track_cleanup(kvm);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 20/29] KVM: x86: add planes support for interrupt delivery
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (18 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 19/29] KVM: x86: move APIC map to kvm_arch_plane Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-06-06 16:30   ` Sean Christopherson
  2025-04-01 16:10 ` [PATCH 21/29] KVM: x86: add infrastructure to share FPU across planes Paolo Bonzini
                   ` (11 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Plumb the destination plane into struct kvm_lapic_irq and propagate it
everywhere.  The in-kernel IOAPIC only targets plane 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/hyperv.c           |  1 +
 arch/x86/kvm/ioapic.c           |  4 ++--
 arch/x86/kvm/irq_comm.c         | 14 +++++++++++---
 arch/x86/kvm/lapic.c            |  8 ++++----
 arch/x86/kvm/x86.c              |  8 +++++---
 arch/x86/kvm/xen.c              |  1 +
 7 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f832352cf4d3..283d8a4b5b14 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1661,6 +1661,7 @@ struct kvm_lapic_irq {
 	u16 delivery_mode;
 	u16 dest_mode;
 	bool level;
+	u8 plane;
 	u16 trig_mode;
 	u32 shorthand;
 	u32 dest_id;
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index a522b467be48..cd1ff31038d2 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -491,6 +491,7 @@ static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
 	irq.delivery_mode = APIC_DM_FIXED;
 	irq.vector = vector;
 	irq.level = 1;
+	irq.plane = vcpu->plane;
 
 	ret = kvm_irq_delivery_to_apic(vcpu->kvm, vcpu->arch.apic, &irq, NULL);
 	trace_kvm_hv_synic_set_irq(vcpu->vcpu_id, sint, irq.vector, ret);
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 995eb5054360..c538867afceb 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -402,7 +402,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 				ioapic_service(ioapic, index, false);
 		}
 		if (e->fields.delivery_mode == APIC_DM_FIXED) {
-			struct kvm_lapic_irq irq;
+			struct kvm_lapic_irq irq = { 0 };
 
 			irq.vector = e->fields.vector;
 			irq.delivery_mode = e->fields.delivery_mode << 8;
@@ -442,7 +442,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
 {
 	union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
-	struct kvm_lapic_irq irqe;
+	struct kvm_lapic_irq irqe = { 0 };
 	int ret;
 
 	if (entry->fields.mask ||
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 8136695f7b96..94f9db50384e 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -48,6 +48,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, struct dest_map *dest_map)
 {
 	int r = -1;
+	struct kvm_plane *plane = kvm->planes[irq->plane];
 	struct kvm_vcpu *vcpu, *lowest = NULL;
 	unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
 	unsigned int dest_vcpus = 0;
@@ -63,7 +64,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 
 	memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
+	kvm_for_each_plane_vcpu(i, vcpu, plane) {
 		if (!kvm_apic_present(vcpu))
 			continue;
 
@@ -92,7 +93,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		int idx = kvm_vector_to_index(irq->vector, dest_vcpus,
 					dest_vcpu_bitmap, KVM_MAX_VCPUS);
 
-		lowest = kvm_get_vcpu(kvm, idx);
+		lowest = kvm_get_plane_vcpu(plane, idx);
 	}
 
 	if (lowest)
@@ -119,13 +120,20 @@ void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 	irq->msi_redir_hint = msg.arch_addr_lo.redirect_hint;
 	irq->level = 1;
 	irq->shorthand = APIC_DEST_NOSHORT;
+	irq->plane = e->msi.plane;
 }
 EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
 
 static inline bool kvm_msi_route_invalid(struct kvm *kvm,
 		struct kvm_kernel_irq_routing_entry *e)
 {
-	return kvm->arch.x2apic_format && (e->msi.address_hi & 0xff);
+	if (kvm->arch.x2apic_format && (e->msi.address_hi & 0xff))
+		return true;
+
+	if (!kvm->planes[e->msi.plane])
+		return true;
+
+	return false;
 }
 
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6ed5f5b4f878..16a0e2387f2c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1223,14 +1223,13 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm_plane *plane,
 	}
 
 	*bitmap = (lowest >= 0) ? 1 << lowest : 0;
-
 	return true;
 }
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map)
 {
-	struct kvm_plane *plane = kvm->planes[0];
+	struct kvm_plane *plane = kvm->planes[irq->plane];
 	struct kvm_apic_map *map;
 	unsigned long bitmap;
 	struct kvm_lapic **dst = NULL;
@@ -1286,7 +1285,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu)
 {
-	struct kvm_plane *plane = kvm->planes[0];
+	struct kvm_plane *plane = kvm->planes[irq->plane];
 	struct kvm_apic_map *map;
 	unsigned long bitmap;
 	struct kvm_lapic **dst = NULL;
@@ -1422,7 +1421,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			      unsigned long *vcpu_bitmap)
 {
-	struct kvm_plane *plane = kvm->planes[0];
+	struct kvm_plane *plane = kvm->planes[irq->plane];
 	struct kvm_lapic **dest_vcpu = NULL;
 	struct kvm_lapic *src = NULL;
 	struct kvm_apic_map *map;
@@ -1544,6 +1543,7 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
 	irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
 	irq.shorthand = icr_low & APIC_SHORT_MASK;
 	irq.msi_redir_hint = false;
+	irq.plane = apic->vcpu->plane;
 	if (apic_x2apic_mode(apic))
 		irq.dest_id = icr_high;
 	else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19e3bb33bf7d..ce8e623052a7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9949,7 +9949,7 @@ static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
  *
  * @apicid - apicid of vcpu to be kicked.
  */
-static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned plane_id, int apicid)
 {
 	/*
 	 * All other fields are unused for APIC_DM_REMRD, but may be consumed by
@@ -9960,6 +9960,7 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
 		.dest_mode = APIC_DEST_PHYSICAL,
 		.shorthand = APIC_DEST_NOSHORT,
 		.dest_id = apicid,
+		.plane = plane_id,
 	};
 
 	kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
@@ -10092,7 +10093,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
 		if (!guest_pv_has(vcpu, KVM_FEATURE_PV_UNHALT))
 			break;
 
-		kvm_pv_kick_cpu_op(vcpu->kvm, a1);
+		kvm_pv_kick_cpu_op(vcpu->kvm, vcpu->plane, a1);
 		kvm_sched_yield(vcpu, a1);
 		ret = 0;
 		break;
@@ -13559,7 +13560,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 {
 	struct kvm_lapic_irq irq = {
 		.delivery_mode = APIC_DM_FIXED,
-		.vector = vcpu->arch.apf.vec
+		.vector = vcpu->arch.apf.vec,
+		.plane = vcpu->plane,
 	};
 
 	if (work->wakeup_all)
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 7449be30d701..ac9c69f2190b 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -625,6 +625,7 @@ void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v)
 	irq.shorthand = APIC_DEST_NOSHORT;
 	irq.delivery_mode = APIC_DM_FIXED;
 	irq.level = 1;
+	irq.plane = v->plane;
 
 	kvm_irq_delivery_to_apic(v->kvm, NULL, &irq, NULL);
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 21/29] KVM: x86: add infrastructure to share FPU across planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (19 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 20/29] KVM: x86: add planes support for interrupt delivery Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:10 ` [PATCH 22/29] KVM: x86: implement initial plane support Paolo Bonzini
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Wrap fpu_alloc_guest_fpstate() and fpu_free_guest_fpstate() so that only
one FPU exists for vCPUs that are in different planes but share the same
vCPU id.

This API could be handy for VTL implementation but it may be tricky
because for some registers sharing would be a bad idea (even MPX right
now if it weren't deprecated, but APX in the future could be worse).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/x86.c              | 47 ++++++++++++++++++++++++++++-----
 2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 283d8a4b5b14..9ac39f128a53 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1347,6 +1347,7 @@ struct kvm_arch {
 	unsigned int indirect_shadow_pages;
 	u8 mmu_valid_gen;
 	u8 vm_type;
+	bool planes_share_fpu;
 	bool has_private_mem;
 	bool has_protected_state;
 	bool pre_fault_allowed;
@@ -2447,4 +2448,6 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
  */
 #define KVM_EXIT_HYPERCALL_MBZ		GENMASK_ULL(31, 1)
 
+bool kvm_arch_planes_share_fpu(struct kvm *kvm);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ce8e623052a7..ebdbd08a840b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6626,6 +6626,17 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.triple_fault_event = cap->args[0];
 		r = 0;
 		break;
+	case KVM_CAP_PLANES_FPU:
+		r = -EINVAL;
+		if (atomic_read(&kvm->online_vcpus))
+			break;
+		if (cap->args[0] > 1)
+			break;
+		if (cap->args[0] && kvm->arch.has_protected_state)
+			break;
+		kvm->arch.planes_share_fpu = cap->args[0];
+		r = 0;
+		break;
 	case KVM_CAP_X86_USER_SPACE_MSR:
 		r = -EINVAL;
 		if (cap->args[0] & ~KVM_MSR_EXIT_REASON_VALID_MASK)
@@ -12332,6 +12343,27 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	return kvm_x86_call(vcpu_precreate)(kvm);
 }
 
+static void kvm_free_guest_fpstate(struct kvm_vcpu *vcpu, unsigned plane)
+{
+	if (plane == 0 || !vcpu->kvm->arch.planes_share_fpu)
+		fpu_free_guest_fpstate(&vcpu->arch.guest_fpu);
+}
+
+static int kvm_init_guest_fpstate(struct kvm_vcpu *vcpu, struct kvm_vcpu *plane0_vcpu)
+{
+	if (plane0_vcpu && vcpu->kvm->arch.planes_share_fpu) {
+		vcpu->arch.guest_fpu = plane0_vcpu->arch.guest_fpu;
+		return 0;
+	}
+
+	if (!fpu_alloc_guest_fpstate(&vcpu->arch.guest_fpu)) {
+		pr_err("failed to allocate vcpu's fpu\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
 	struct page *page;
@@ -12378,10 +12410,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 	if (!alloc_emulate_ctxt(vcpu))
 		goto free_wbinvd_dirty_mask;
 
-	if (!fpu_alloc_guest_fpstate(&vcpu->arch.guest_fpu)) {
-		pr_err("failed to allocate vcpu's fpu\n");
+	if (kvm_init_guest_fpstate(vcpu, plane->plane ? vcpu->plane0 : NULL) < 0)
 		goto free_emulate_ctxt;
-	}
 
 	kvm_async_pf_hash_reset(vcpu);
 
@@ -12413,7 +12443,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 	return 0;
 
 free_guest_fpu:
-	fpu_free_guest_fpstate(&vcpu->arch.guest_fpu);
+	kvm_free_guest_fpstate(vcpu, plane->plane);
 free_emulate_ctxt:
 	kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt);
 free_wbinvd_dirty_mask:
@@ -12459,7 +12489,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 	kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt);
 	free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
-	fpu_free_guest_fpstate(&vcpu->arch.guest_fpu);
+	kvm_free_guest_fpstate(vcpu, vcpu->plane);
 
 	kvm_xen_destroy_vcpu(vcpu);
 	kvm_hv_vcpu_uninit(vcpu);
@@ -12824,7 +12854,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
 	kvm->arch.guest_can_read_msr_platform_info = true;
 	kvm->arch.enable_pmu = enable_pmu;
-
+	kvm->arch.planes_share_fpu = false;
 #if IS_ENABLED(CONFIG_HYPERV)
 	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
 	kvm->arch.hv_root_tdp = INVALID_PAGE;
@@ -13881,6 +13911,11 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 }
 EXPORT_SYMBOL_GPL(kvm_handle_invpcid);
 
+bool kvm_arch_planes_share_fpu(struct kvm *kvm)
+{
+	return !kvm || kvm->arch.planes_share_fpu;
+}
+
 static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu)
 {
 	struct kvm_run *run = vcpu->run;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 22/29] KVM: x86: implement initial plane support
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (20 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 21/29] KVM: x86: add infrastructure to share FPU across planes Paolo Bonzini
@ 2025-04-01 16:10 ` Paolo Bonzini
  2025-04-01 16:11 ` [PATCH 23/29] KVM: x86: extract kvm_post_set_cpuid Paolo Bonzini
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:10 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Implement more of the shared state, namely the PIO emulation area
and ioctl(KVM_RUN).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ebdbd08a840b..d2b43d9b6543 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11567,7 +11567,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 	trace_kvm_fpu(0);
 }
 
-int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
+static int kvm_vcpu_ioctl_run_plane(struct kvm_vcpu *vcpu)
 {
 	struct kvm_queued_exception *ex = &vcpu->arch.exception;
 	struct kvm_run *kvm_run = vcpu->run;
@@ -11585,7 +11585,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 	kvm_vcpu_srcu_read_lock(vcpu);
 	if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
-		if (!vcpu->wants_to_run) {
+		if (!vcpu->plane0->wants_to_run) {
 			r = -EINTR;
 			goto out;
 		}
@@ -11664,7 +11664,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		WARN_ON_ONCE(vcpu->mmio_needed);
 	}
 
-	if (!vcpu->wants_to_run) {
+	if (!vcpu->plane0->wants_to_run) {
 		r = -EINTR;
 		goto out;
 	}
@@ -11687,6 +11687,25 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
+{
+	int plane_id = READ_ONCE(vcpu->run->plane);
+	struct kvm_plane *plane = vcpu->kvm->planes[plane_id];
+	int r;
+
+	if (plane_id) {
+		vcpu = kvm_get_plane_vcpu(plane, vcpu->vcpu_id);
+		mutex_lock_nested(&vcpu->mutex, 1);
+	}
+
+	r = kvm_vcpu_ioctl_run_plane(vcpu);
+
+	if (plane_id)
+		mutex_unlock(&vcpu->mutex);
+
+	return r;
+}
+
 static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
 	if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
@@ -12366,7 +12385,7 @@ static int kvm_init_guest_fpstate(struct kvm_vcpu *vcpu, struct kvm_vcpu *plane0
 
 int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 {
-	struct page *page;
+	struct page *page = NULL;
 	int r;
 
 	vcpu->arch.last_vmentry_cpu = -1;
@@ -12390,10 +12409,15 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 
 	r = -ENOMEM;
 
-	page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
-	if (!page)
-		goto fail_free_lapic;
-	vcpu->arch.pio_data = page_address(page);
+	if (plane->plane) {
+		page = NULL;
+		vcpu->arch.pio_data = vcpu->plane0->arch.pio_data;
+	} else {
+		page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		if (!page)
+			goto fail_free_lapic;
+		vcpu->arch.pio_data = page_address(page);
+	}
 
 	vcpu->arch.mce_banks = kcalloc(KVM_MAX_MCE_BANKS * 4, sizeof(u64),
 				       GFP_KERNEL_ACCOUNT);
@@ -12451,7 +12475,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 fail_free_mce_banks:
 	kfree(vcpu->arch.mce_banks);
 	kfree(vcpu->arch.mci_ctl2_banks);
-	free_page((unsigned long)vcpu->arch.pio_data);
+	__free_page(page);
 fail_free_lapic:
 	kvm_free_lapic(vcpu);
 fail_mmu_destroy:
@@ -12500,7 +12524,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 	kvm_mmu_destroy(vcpu);
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
-	free_page((unsigned long)vcpu->arch.pio_data);
+	if (!vcpu->plane)
+		free_page((unsigned long)vcpu->arch.pio_data);
 	kvfree(vcpu->arch.cpuid_entries);
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 23/29] KVM: x86: extract kvm_post_set_cpuid
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (21 preceding siblings ...)
  2025-04-01 16:10 ` [PATCH 22/29] KVM: x86: implement initial plane support Paolo Bonzini
@ 2025-04-01 16:11 ` Paolo Bonzini
  2025-04-01 16:11 ` [PATCH 24/29] KVM: x86: initialize CPUID for non-default planes Paolo Bonzini
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:11 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

CPU state depends on CPUID info and is initialized by KVM_SET_CPUID2,
but KVM_SET_CPUID2 does not exist for non-default planes.  Instead, they
just copy over the CPUID info of plane 0.

Extract the tail of KVM_SET_CPUID2 so that it can be executed as part
of KVM_CREATE_VCPU_PLANE.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/cpuid.c | 38 ++++++++++++++++++++++++--------------
 arch/x86/kvm/cpuid.h |  1 +
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f760a8a5d719..142decb3a736 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -488,6 +488,29 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu)
 	return rsvd_bits(cpuid_maxphyaddr(vcpu), 63);
 }
 
+int kvm_post_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	int r;
+
+#ifdef CONFIG_KVM_HYPERV
+	if (kvm_cpuid_has_hyperv(vcpu)) {
+		r = kvm_hv_vcpu_init(vcpu);
+		if (r)
+			return r;
+	}
+#endif
+
+	r = kvm_check_cpuid(vcpu);
+	if (r)
+		return r;
+
+#ifdef CONFIG_KVM_XEN
+	vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE);
+#endif
+	kvm_vcpu_after_set_cpuid(vcpu);
+	return 0;
+}
+
 static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
                         int nent)
 {
@@ -529,23 +552,10 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 		goto success;
 	}
 
-#ifdef CONFIG_KVM_HYPERV
-	if (kvm_cpuid_has_hyperv(vcpu)) {
-		r = kvm_hv_vcpu_init(vcpu);
-		if (r)
-			goto err;
-	}
-#endif
-
-	r = kvm_check_cpuid(vcpu);
+	r = kvm_post_set_cpuid(vcpu);
 	if (r)
 		goto err;
 
-#ifdef CONFIG_KVM_XEN
-	vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE);
-#endif
-	kvm_vcpu_after_set_cpuid(vcpu);
-
 success:
 	kvfree(e2);
 	return 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index d3f5ae15a7ca..05cc1245f570 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -42,6 +42,7 @@ static inline struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcp
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 			    struct kvm_cpuid_entry2 __user *entries,
 			    unsigned int type);
+int kvm_post_set_cpuid(struct kvm_vcpu *vcpu);
 int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 			     struct kvm_cpuid *cpuid,
 			     struct kvm_cpuid_entry __user *entries);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 24/29] KVM: x86: initialize CPUID for non-default planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (22 preceding siblings ...)
  2025-04-01 16:11 ` [PATCH 23/29] KVM: x86: extract kvm_post_set_cpuid Paolo Bonzini
@ 2025-04-01 16:11 ` Paolo Bonzini
  2025-04-01 16:11 ` [PATCH 25/29] KVM: x86: handle interrupt priorities for planes Paolo Bonzini
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:11 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Copy the initial CPUID from plane 0.  To avoid mismatches, block
KVM_SET_CPUID{,2} after KVM_CREATE_VCPU_PLANE similar to how it is
blocked after KVM_RUN; this is handled by a tiny bit of architecture
independent code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst |  4 +++-
 arch/x86/kvm/cpuid.c           | 19 ++++++++++++++++++-
 arch/x86/kvm/cpuid.h           |  1 +
 arch/x86/kvm/x86.c             |  7 ++++++-
 include/linux/kvm_host.h       |  1 +
 virt/kvm/kvm_main.c            |  1 +
 6 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 16d836b954dc..3739d16b7164 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -736,7 +736,9 @@ Caveat emptor:
     configuration (if there is) is not corrupted. Userspace can get a copy
     of the resulting CPUID configuration through KVM_GET_CPUID2 in case.
   - Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model
-    after running the guest, may cause guest instability.
+    after running the guest, is forbidden; so is using the ioctls after
+    KVM_CREATE_VCPU_PLANE, because all planes must have the same CPU
+    capabilities.
   - Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc...
     may cause guest instability.
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 142decb3a736..44e6d4989bdd 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -545,7 +545,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	 * KVM_SET_CPUID{,2} again. To support this legacy behavior, check
 	 * whether the supplied CPUID data is equal to what's already set.
 	 */
-	if (kvm_vcpu_has_run(vcpu)) {
+	if (kvm_vcpu_has_run(vcpu) || vcpu->has_planes) {
 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
 		if (r)
 			goto err;
@@ -567,6 +567,23 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	return r;
 }
 
+int kvm_dup_cpuid(struct kvm_vcpu *vcpu, struct kvm_vcpu *source)
+{
+	if (WARN_ON_ONCE(vcpu->arch.cpuid_entries || vcpu->arch.cpuid_nent))
+		return -EEXIST;
+
+	vcpu->arch.cpuid_entries = kmemdup(source->arch.cpuid_entries,
+		     source->arch.cpuid_nent * sizeof(struct kvm_cpuid_entry2),
+		     GFP_KERNEL_ACCOUNT);
+	if (!vcpu->arch.cpuid_entries)
+		return -ENOMEM;
+
+	memcpy(vcpu->arch.cpu_caps, source->arch.cpu_caps, sizeof(source->arch.cpu_caps));
+	vcpu->arch.cpuid_nent = source->arch.cpuid_nent;
+
+	return 0;
+}
+
 /* when an old userspace process fills a new kernel module */
 int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 			     struct kvm_cpuid *cpuid,
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 05cc1245f570..a5983c635a70 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -42,6 +42,7 @@ static inline struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcp
 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 			    struct kvm_cpuid_entry2 __user *entries,
 			    unsigned int type);
+int kvm_dup_cpuid(struct kvm_vcpu *vcpu, struct kvm_vcpu *source);
 int kvm_post_set_cpuid(struct kvm_vcpu *vcpu);
 int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 			     struct kvm_cpuid *cpuid,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2b43d9b6543..be4d7b97367b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12412,6 +12412,11 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 	if (plane->plane) {
 		page = NULL;
 		vcpu->arch.pio_data = vcpu->plane0->arch.pio_data;
+		r = kvm_dup_cpuid(vcpu, vcpu->plane0);
+		if (r < 0)
+			goto fail_free_lapic;
+
+		r = -ENOMEM;
 	} else {
 		page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 		if (!page)
@@ -12459,7 +12464,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
 
 	kvm_xen_init_vcpu(vcpu);
 	vcpu_load(vcpu);
-	kvm_vcpu_after_set_cpuid(vcpu);
+	WARN_ON_ONCE(kvm_post_set_cpuid(vcpu));
 	kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
 	kvm_vcpu_reset(vcpu, false);
 	kvm_init_mmu(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5cade1c04646..0b764951f461 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -344,6 +344,7 @@ struct kvm_vcpu {
 	struct mutex mutex;
 
 	/* Only valid on plane 0 */
+	bool has_planes;
 	bool wants_to_run;
 
 	/* Shared for all planes */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index db38894f6fa3..3a04fdf0865d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4182,6 +4182,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm_plane *plane, struct kvm_vcpu *pl
 	if (plane->plane) {
 		page = NULL;
 		vcpu->run = plane0_vcpu->run;
+		plane0_vcpu->has_planes = true;
 	} else {
 		WARN_ON(plane0_vcpu != NULL);
 		plane0_vcpu = vcpu;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 25/29] KVM: x86: handle interrupt priorities for planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (23 preceding siblings ...)
  2025-04-01 16:11 ` [PATCH 24/29] KVM: x86: initialize CPUID for non-default planes Paolo Bonzini
@ 2025-04-01 16:11 ` Paolo Bonzini
  2025-04-01 16:11 ` [PATCH 26/29] KVM: x86: enable up to 16 planes Paolo Bonzini
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:11 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Force a userspace exit if an interrupt is delivered to a higher-priority
plane, where priority is represented by vcpu->run->req_exit_planes.
The set of planes with pending IRR are manipulated atomically and stored
in the plane-0 vCPU, since it is handy to reach from the target vCPU.

TODO: haven't put much thought into IPI virtualization.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +++++
 arch/x86/kvm/lapic.c            | 36 +++++++++++++++++++++++--
 arch/x86/kvm/x86.c              | 48 +++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h        |  2 ++
 4 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9ac39f128a53..0344e8bed319 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -125,6 +125,7 @@
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
+#define KVM_REQ_PLANE_INTERRUPT		KVM_ARCH_REQ(35)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -864,6 +865,12 @@ struct kvm_vcpu_arch {
 	u64 xcr0;
 	u64 guest_supported_xcr0;
 
+	/*
+	 * Only valid in plane0.  The bitmask of planes that received
+	 * an interrupt, to be checked against req_exit_planes.
+	 */
+	atomic_t irr_pending_planes;
+
 	struct kvm_pio_request pio;
 	void *pio_data;
 	void *sev_pio_data;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 16a0e2387f2c..21dbc539cbe7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1311,6 +1311,39 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 	return ret;
 }
 
+static void kvm_lapic_deliver_interrupt(struct kvm_vcpu *vcpu, struct kvm_lapic *apic,
+					int delivery_mode, int trig_mode, int vector)
+{
+	struct kvm_vcpu *plane0_vcpu = vcpu->plane0;
+	struct kvm_plane *running_plane;
+	u16 req_exit_planes;
+
+	kvm_x86_call(deliver_interrupt)(apic, delivery_mode, trig_mode, vector);
+
+	/*
+	 * test_and_set_bit implies a memory barrier, so IRR is written before
+	 * reading irr_pending_planes below...
+	 */
+	if (!test_and_set_bit(vcpu->plane, &plane0_vcpu->arch.irr_pending_planes)) {
+		/*
+		 * ... and also running_plane and req_exit_planes are read after writing
+		 * irr_pending_planes.  Both barriers pair with kvm_arch_vcpu_ioctl_run().
+		 */
+		smp_mb__after_atomic();
+
+		running_plane = READ_ONCE(plane0_vcpu->running_plane);
+		if (!running_plane)
+			return;
+
+		req_exit_planes = READ_ONCE(plane0_vcpu->req_exit_planes);
+		if (!(req_exit_planes & BIT(vcpu->plane)))
+			return;
+
+		kvm_make_request(KVM_REQ_PLANE_INTERRUPT,
+				 kvm_get_plane_vcpu(running_plane, vcpu->vcpu_id));
+	}
+}
+
 /*
  * Add a pending IRQ into lapic.
  * Return 1 if successfully added and 0 if discarded.
@@ -1352,8 +1385,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 						       apic->regs + APIC_TMR);
 		}
 
-		kvm_x86_call(deliver_interrupt)(apic, delivery_mode,
-						trig_mode, vector);
+		kvm_lapic_deliver_interrupt(vcpu, apic, delivery_mode, trig_mode, vector);
 		break;
 
 	case APIC_DM_REMRD:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be4d7b97367b..4546d1049f43 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10960,6 +10960,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 				goto out;
 			}
 		}
+		if (kvm_check_request(KVM_REQ_PLANE_INTERRUPT, vcpu)) {
+			u16 irr_pending_planes = atomic_read(&vcpu->plane0->arch.irr_pending_planes);
+			u16 target = irr_pending_planes & vcpu->plane0->req_exit_planes;
+			if (target) {
+				vcpu->run->exit_reason = KVM_EXIT_PLANE_EVENT;
+				vcpu->run->plane_event.cause = KVM_PLANE_EVENT_INTERRUPT;
+				vcpu->run->plane_event.flags = 0;
+				vcpu->run->plane_event.pending_event_planes = irr_pending_planes;
+				vcpu->run->plane_event.target = target;
+				r = 0;
+				goto out;
+			}
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -11689,8 +11702,11 @@ static int kvm_vcpu_ioctl_run_plane(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+	struct kvm_vcpu *plane0_vcpu = vcpu;
 	int plane_id = READ_ONCE(vcpu->run->plane);
 	struct kvm_plane *plane = vcpu->kvm->planes[plane_id];
+	u16 req_exit_planes = READ_ONCE(vcpu->run->req_exit_planes) & ~BIT(plane_id);
+	u16 irr_pending_planes;
 	int r;
 
 	if (plane_id) {
@@ -11698,8 +11714,40 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		mutex_lock_nested(&vcpu->mutex, 1);
 	}
 
+	if (plane0_vcpu->has_planes) {
+		plane0_vcpu->req_exit_planes = req_exit_planes;
+		plane0_vcpu->running_plane = plane;
+
+		/*
+		 * Check for cross-plane interrupts that happened while outside KVM_RUN;
+		 * write running_plane and req_exit_planes before reading irr_pending_planes.
+		 * If an interrupt hasn't set irr_pending_planes yet, it will trigger
+		 * KVM_REQ_PLANE_INTERRUPT itself in kvm_lapic_deliver_interrupt().
+		 */
+		smp_mb__before_atomic();
+
+		irr_pending_planes = atomic_fetch_and(~BIT(plane_id), &plane0_vcpu->arch.irr_pending_planes);
+		if (req_exit_planes & irr_pending_planes)
+			kvm_make_request(KVM_REQ_PLANE_INTERRUPT, vcpu);
+	}
+
 	r = kvm_vcpu_ioctl_run_plane(vcpu);
 
+	if (plane0_vcpu->has_planes) {
+		smp_store_release(&plane0_vcpu->running_plane, NULL);
+
+		/*
+		 * Clear irr_pending_planes before reading IRR; pairs with
+		 * kvm_lapic_deliver_interrupt().  If this side doesn't see IRR set,
+		 * the other side will certainly see the cleared bit irr_pending_planes
+		 * and set it, and vice versa.
+		 */
+		clear_bit(plane_id, &plane0_vcpu->arch.irr_pending_planes);
+		smp_mb__after_atomic();
+		if (kvm_lapic_find_highest_irr(vcpu))
+			atomic_or(BIT(plane_id), &plane0_vcpu->arch.irr_pending_planes);
+	}
+
 	if (plane_id)
 		mutex_unlock(&vcpu->mutex);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b764951f461..442aed2b9cc6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -346,6 +346,8 @@ struct kvm_vcpu {
 	/* Only valid on plane 0 */
 	bool has_planes;
 	bool wants_to_run;
+	u16 req_exit_planes;
+	struct kvm_plane *running_plane;
 
 	/* Shared for all planes */
 	struct kvm_run *run;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 26/29] KVM: x86: enable up to 16 planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (24 preceding siblings ...)
  2025-04-01 16:11 ` [PATCH 25/29] KVM: x86: handle interrupt priorities for planes Paolo Bonzini
@ 2025-04-01 16:11 ` Paolo Bonzini
  2025-06-06 22:41   ` Sean Christopherson
  2025-04-01 16:11 ` [PATCH 27/29] selftests: kvm: introduce basic test for VM planes Paolo Bonzini
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:11 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Allow up to 16 VM planes, it's a nice round number.

FIXME: online_vcpus is used by x86 code that deals with TSC synchronization.
Maybe kvmclock should be moved to planex.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 3 +++
 arch/x86/kvm/x86.c              | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0344e8bed319..d0cb177b6f52 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2339,6 +2339,8 @@ enum {
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
 #endif
 
+#define KVM_MAX_VCPU_PLANES	16
+
 int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_cpu_has_extint(struct kvm_vcpu *v);
@@ -2455,6 +2457,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
  */
 #define KVM_EXIT_HYPERCALL_MBZ		GENMASK_ULL(31, 1)
 
+int kvm_arch_nr_vcpu_planes(struct kvm *kvm);
 bool kvm_arch_planes_share_fpu(struct kvm *kvm);
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4546d1049f43..86d1a567f62e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13989,6 +13989,12 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 }
 EXPORT_SYMBOL_GPL(kvm_handle_invpcid);
 
+int kvm_arch_nr_vcpu_planes(struct kvm *kvm)
+{
+	/* TODO: use kvm_x86_ops so that SNP can use planes for VTPLs.  */
+	return kvm->arch.has_protected_state ? 1 : KVM_MAX_VCPU_PLANES;
+}
+
 bool kvm_arch_planes_share_fpu(struct kvm *kvm)
 {
 	return !kvm || kvm->arch.planes_share_fpu;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 27/29] selftests: kvm: introduce basic test for VM planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (25 preceding siblings ...)
  2025-04-01 16:11 ` [PATCH 26/29] KVM: x86: enable up to 16 planes Paolo Bonzini
@ 2025-04-01 16:11 ` Paolo Bonzini
  2025-04-01 16:11 ` [PATCH 28/29] selftests: kvm: add plane infrastructure Paolo Bonzini
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:11 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Check a few error cases and ensure that a vCPU can have a second plane
added to it.  For now, all interactions happen through the bare
__vm_ioctl() interface or even directly through the ioctl() system
call.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tools/testing/selftests/kvm/Makefile.kvm |   1 +
 tools/testing/selftests/kvm/plane_test.c | 108 +++++++++++++++++++++++
 2 files changed, 109 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/plane_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f62b0a5aba35..b1d0b410cc03 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -57,6 +57,7 @@ TEST_GEN_PROGS_COMMON += guest_print_test
 TEST_GEN_PROGS_COMMON += kvm_binary_stats_test
 TEST_GEN_PROGS_COMMON += kvm_create_max_vcpus
 TEST_GEN_PROGS_COMMON += kvm_page_table_test
+TEST_GEN_PROGS_COMMON += plane_test
 TEST_GEN_PROGS_COMMON += set_memory_region_test
 
 # Compiled test targets
diff --git a/tools/testing/selftests/kvm/plane_test.c b/tools/testing/selftests/kvm/plane_test.c
new file mode 100644
index 000000000000..43c8de13490a
--- /dev/null
+++ b/tools/testing/selftests/kvm/plane_test.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Red Hat, Inc.
+ *
+ * Test for architecture-neutral VM plane functionality
+ */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+#include "asm/kvm.h"
+#include "linux/kvm.h"
+
+void test_create_plane_errors(int max_planes)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	int planefd, plane_vcpufd;
+
+	vm = vm_create_barebones();
+	vcpu = __vm_vcpu_add(vm, 0);
+
+	planefd = __vm_ioctl(vm, KVM_CREATE_PLANE, (void *)(unsigned long)0);
+	TEST_ASSERT(planefd == -1 && errno == EEXIST,
+		    "Creating existing plane, expecting EEXIST. ret: %d, errno: %d",
+		    planefd, errno);
+
+	planefd = __vm_ioctl(vm, KVM_CREATE_PLANE, (void *)(unsigned long)max_planes);
+	TEST_ASSERT(planefd == -1 && errno == EINVAL,
+		    "Creating plane %d, expecting EINVAL. ret: %d, errno: %d",
+		    max_planes, planefd, errno);
+
+	plane_vcpufd = __vm_ioctl(vm, KVM_CREATE_VCPU_PLANE, (void *)(unsigned long)vcpu->fd);
+	TEST_ASSERT(plane_vcpufd == -1 && errno == ENOTTY,
+		    "Creating vCPU for plane 0, expecting ENOTTY. ret: %d, errno: %d",
+		    plane_vcpufd, errno);
+
+	kvm_vm_free(vm);
+	ksft_test_result_pass("error conditions\n");
+}
+
+void test_create_plane(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	int r, planefd, plane_vcpufd;
+
+	vm = vm_create_barebones();
+	vcpu = __vm_vcpu_add(vm, 0);
+
+	planefd = __vm_ioctl(vm, KVM_CREATE_PLANE, (void *)(unsigned long)1);
+	TEST_ASSERT(planefd >= 0, "Creating new plane, got error: %d",
+		    errno);
+
+	r = ioctl(planefd, KVM_CHECK_EXTENSION, KVM_CAP_PLANES);
+	TEST_ASSERT(r == 0,
+		    "Checking KVM_CHECK_EXTENSION(KVM_CAP_PLANES). ret: %d", r);
+
+	r = ioctl(planefd, KVM_CHECK_EXTENSION, KVM_CAP_CHECK_EXTENSION_VM);
+	TEST_ASSERT(r == 1,
+		    "Checking KVM_CHECK_EXTENSION(KVM_CAP_CHECK_EXTENSION_VM). ret: %d", r);
+
+	r = __vm_ioctl(vm, KVM_CREATE_PLANE, (void *)(unsigned long)1);
+	TEST_ASSERT(r == -1 && errno == EEXIST,
+		    "Creating existing plane, expecting EEXIST. ret: %d, errno: %d",
+		    r, errno);
+
+	plane_vcpufd = ioctl(planefd, KVM_CREATE_VCPU_PLANE, (void *)(unsigned long)vcpu->fd);
+	TEST_ASSERT(plane_vcpufd >= 0, "Creating vCPU for plane 1, got error: %d", errno);
+
+	r = ioctl(planefd, KVM_CREATE_VCPU_PLANE, (void *)(unsigned long)vcpu->fd);
+	TEST_ASSERT(r == -1 && errno == EEXIST,
+		    "Creating vCPU again for plane 1. ret: %d, errno: %d",
+		    r, errno);
+
+	r = ioctl(planefd, KVM_RUN, (void *)(unsigned long)0);
+	TEST_ASSERT(r == -1 && errno == ENOTTY,
+		    "Running plane vCPU again for plane 1. ret: %d, errno: %d",
+		    r, errno);
+
+	close(plane_vcpufd);
+	close(planefd);
+
+	kvm_vm_free(vm);
+	ksft_test_result_pass("basic planefd and plane_vcpufd operation\n");
+}
+
+int main(int argc, char *argv[])
+{
+	int cap_planes = kvm_check_cap(KVM_CAP_PLANES);
+	TEST_REQUIRE(cap_planes);
+
+	ksft_print_header();
+	ksft_set_plan(2);
+
+	pr_info("# KVM_CAP_PLANES: %d\n", cap_planes);
+
+	test_create_plane_errors(cap_planes);
+
+	if (cap_planes > 1)
+		test_create_plane();
+
+	ksft_finished();
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 28/29] selftests: kvm: add plane infrastructure
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (26 preceding siblings ...)
  2025-04-01 16:11 ` [PATCH 27/29] selftests: kvm: introduce basic test for VM planes Paolo Bonzini
@ 2025-04-01 16:11 ` Paolo Bonzini
  2025-04-01 16:11 ` [PATCH 29/29] selftests: kvm: add x86-specific plane test Paolo Bonzini
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:11 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Allow creating plane and vCPU-plane file descriptors, and close them
when the VM is freed.  Rewrite the previous test using the new
infrastructure (separate for easier review).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../testing/selftests/kvm/include/kvm_util.h  | 48 ++++++++++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 65 ++++++++++++++++++-
 tools/testing/selftests/kvm/plane_test.c      | 21 +++---
 3 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 373912464fb4..c1dfe071357e 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -67,6 +67,20 @@ struct kvm_vcpu {
 	uint32_t dirty_gfns_count;
 };
 
+struct kvm_plane {
+	struct list_head list;
+	uint32_t id;
+	int fd;
+	struct kvm_vm *vm;
+};
+
+struct kvm_plane_vcpu {
+	struct list_head list;
+	uint32_t id;
+	int fd;
+	struct kvm_vcpu *plane0;
+};
+
 struct userspace_mem_regions {
 	struct rb_root gpa_tree;
 	struct rb_root hva_tree;
@@ -93,6 +107,8 @@ struct kvm_vm {
 	unsigned int va_bits;
 	uint64_t max_gfn;
 	struct list_head vcpus;
+	struct list_head planes;
+	struct list_head plane_vcpus;
 	struct userspace_mem_regions regions;
 	struct sparsebit *vpages_valid;
 	struct sparsebit *vpages_mapped;
@@ -338,6 +354,21 @@ do {											\
 	__TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, vm);		\
 })
 
+static __always_inline void static_assert_is_plane(struct kvm_plane *plane) { }
+
+#define __plane_ioctl(plane, cmd, arg)				\
+({								\
+	static_assert_is_plane(plane);				\
+	kvm_do_ioctl((plane)->fd, cmd, arg);			\
+})
+
+#define plane_ioctl(plane, cmd, arg)				\
+({								\
+	int ret = __plane_ioctl(plane, cmd, arg);		\
+								\
+	__TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, (plane)->vm); \
+})
+
 static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { }
 
 #define __vcpu_ioctl(vcpu, cmd, arg)				\
@@ -353,6 +384,21 @@ static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { }
 	__TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, (vcpu)->vm);	\
 })
 
+static __always_inline void static_assert_is_plane_vcpu(struct kvm_plane_vcpu *plane_vcpu) { }
+
+#define __plane_vcpu_ioctl(plane_vcpu, cmd, arg)		\
+({								\
+	static_assert_is_plane_vcpu(plane_vcpu);		\
+	kvm_do_ioctl((plane_vcpu)->fd, cmd, arg);		\
+})
+
+#define plane_vcpu_ioctl(plane_vcpu, cmd, arg)			\
+({								\
+	int ret = __plane_vcpu_ioctl(plane_vcpu, cmd, arg);	\
+								\
+	__TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, (plane_vcpu)->plane0->vm); \
+})
+
 /*
  * Looks up and returns the value corresponding to the capability
  * (KVM_CAP_*) given by cap.
@@ -601,6 +647,8 @@ void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags);
 void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa);
 void vm_mem_region_delete(struct kvm_vm *vm, uint32_t slot);
 struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id);
+struct kvm_plane *vm_plane_add(struct kvm_vm *vm, int plane_id);
+struct kvm_plane_vcpu *__vm_plane_vcpu_add(struct kvm_vcpu *vcpu, struct kvm_plane *plane);
 void vm_populate_vaddr_bitmap(struct kvm_vm *vm);
 vm_vaddr_t vm_vaddr_unused_gap(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min);
 vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 815bc45dd8dc..a2f233945e1c 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -279,6 +279,8 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 	TEST_ASSERT(vm != NULL, "Insufficient Memory");
 
 	INIT_LIST_HEAD(&vm->vcpus);
+	INIT_LIST_HEAD(&vm->planes);
+	INIT_LIST_HEAD(&vm->plane_vcpus);
 	vm->regions.gpa_tree = RB_ROOT;
 	vm->regions.hva_tree = RB_ROOT;
 	hash_init(vm->regions.slot_hash);
@@ -757,10 +759,22 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
 
 void kvm_vm_release(struct kvm_vm *vmp)
 {
-	struct kvm_vcpu *vcpu, *tmp;
+	struct kvm_vcpu *vcpu, *tmp_vcpu;
+	struct kvm_plane_vcpu *plane_vcpu, *tmp_plane_vcpu;
+	struct kvm_plane *plane, *tmp_plane;
 	int ret;
 
-	list_for_each_entry_safe(vcpu, tmp, &vmp->vcpus, list)
+	list_for_each_entry_safe(plane_vcpu, tmp_plane_vcpu, &vmp->plane_vcpus, list) {
+		close(plane_vcpu->fd);
+		free(plane_vcpu);
+	}
+
+	list_for_each_entry_safe(plane, tmp_plane, &vmp->planes, list) {
+		close(plane->fd);
+		free(plane);
+	}
+
+	list_for_each_entry_safe(vcpu, tmp_vcpu, &vmp->vcpus, list)
 		vm_vcpu_rm(vmp, vcpu);
 
 	ret = close(vmp->fd);
@@ -1314,6 +1328,52 @@ static bool vcpu_exists(struct kvm_vm *vm, uint32_t vcpu_id)
 	return false;
 }
 
+/*
+ * Adds a virtual CPU to the VM specified by vm with the ID given by vcpu_id.
+ * No additional vCPU setup is done.  Returns the vCPU.
+ */
+struct kvm_plane *vm_plane_add(struct kvm_vm *vm, int plane_id)
+{
+	struct kvm_plane *plane;
+
+	/* Allocate and initialize new vcpu structure. */
+	plane = calloc(1, sizeof(*plane));
+	TEST_ASSERT(plane != NULL, "Insufficient Memory");
+
+	plane->fd = __vm_ioctl(vm, KVM_CREATE_PLANE, (void *)(unsigned long)plane_id);
+	TEST_ASSERT_VM_VCPU_IOCTL(plane->fd >= 0, KVM_CREATE_PLANE, plane->fd, vm);
+	plane->vm = vm;
+	plane->id = plane_id;
+
+	/* Add to linked-list of extra-plane VCPUs. */
+	list_add(&plane->list, &vm->planes);
+
+	return plane;
+}
+
+/*
+ * Adds a virtual CPU to the VM specified by vm with the ID given by vcpu_id.
+ * No additional vCPU setup is done.  Returns the vCPU.
+ */
+struct kvm_plane_vcpu *__vm_plane_vcpu_add(struct kvm_vcpu *vcpu, struct kvm_plane *plane)
+{
+	struct kvm_plane_vcpu *plane_vcpu;
+
+	/* Allocate and initialize new vcpu structure. */
+	plane_vcpu = calloc(1, sizeof(*plane_vcpu));
+	TEST_ASSERT(plane_vcpu != NULL, "Insufficient Memory");
+
+	plane_vcpu->fd = __plane_ioctl(plane, KVM_CREATE_VCPU_PLANE, (void *)(unsigned long)vcpu->fd);
+	TEST_ASSERT_VM_VCPU_IOCTL(plane_vcpu->fd >= 0, KVM_CREATE_VCPU_PLANE, plane_vcpu->fd, plane->vm);
+	plane_vcpu->id = vcpu->id;
+	plane_vcpu->plane0 = vcpu;
+
+	/* Add to linked-list of extra-plane VCPUs. */
+	list_add(&plane_vcpu->list, &plane->vm->plane_vcpus);
+
+	return plane_vcpu;
+}
+
 /*
  * Adds a virtual CPU to the VM specified by vm with the ID given by vcpu_id.
  * No additional vCPU setup is done.  Returns the vCPU.
@@ -2021,6 +2081,7 @@ static struct exit_reason {
 	KVM_EXIT_STRING(NOTIFY),
 	KVM_EXIT_STRING(LOONGARCH_IOCSR),
 	KVM_EXIT_STRING(MEMORY_FAULT),
+	KVM_EXIT_STRING(PLANE_EVENT),
 };
 
 /*
diff --git a/tools/testing/selftests/kvm/plane_test.c b/tools/testing/selftests/kvm/plane_test.c
index 43c8de13490a..9cf3ab76b3cd 100644
--- a/tools/testing/selftests/kvm/plane_test.c
+++ b/tools/testing/selftests/kvm/plane_test.c
@@ -47,20 +47,19 @@ void test_create_plane(void)
 {
 	struct kvm_vm *vm;
 	struct kvm_vcpu *vcpu;
-	int r, planefd, plane_vcpufd;
+	struct kvm_plane *plane;
+	int r;
 
 	vm = vm_create_barebones();
 	vcpu = __vm_vcpu_add(vm, 0);
 
-	planefd = __vm_ioctl(vm, KVM_CREATE_PLANE, (void *)(unsigned long)1);
-	TEST_ASSERT(planefd >= 0, "Creating new plane, got error: %d",
-		    errno);
+	plane = vm_plane_add(vm, 1);
 
-	r = ioctl(planefd, KVM_CHECK_EXTENSION, KVM_CAP_PLANES);
+	r = __plane_ioctl(plane, KVM_CHECK_EXTENSION, (void *)(unsigned long)KVM_CAP_PLANES);
 	TEST_ASSERT(r == 0,
 		    "Checking KVM_CHECK_EXTENSION(KVM_CAP_PLANES). ret: %d", r);
 
-	r = ioctl(planefd, KVM_CHECK_EXTENSION, KVM_CAP_CHECK_EXTENSION_VM);
+	r = __plane_ioctl(plane, KVM_CHECK_EXTENSION, (void *)(unsigned long)KVM_CAP_CHECK_EXTENSION_VM);
 	TEST_ASSERT(r == 1,
 		    "Checking KVM_CHECK_EXTENSION(KVM_CAP_CHECK_EXTENSION_VM). ret: %d", r);
 
@@ -69,22 +68,18 @@ void test_create_plane(void)
 		    "Creating existing plane, expecting EEXIST. ret: %d, errno: %d",
 		    r, errno);
 
-	plane_vcpufd = ioctl(planefd, KVM_CREATE_VCPU_PLANE, (void *)(unsigned long)vcpu->fd);
-	TEST_ASSERT(plane_vcpufd >= 0, "Creating vCPU for plane 1, got error: %d", errno);
+	__vm_plane_vcpu_add(vcpu, plane);
 
-	r = ioctl(planefd, KVM_CREATE_VCPU_PLANE, (void *)(unsigned long)vcpu->fd);
+	r = __plane_ioctl(plane, KVM_CREATE_VCPU_PLANE, (void *)(unsigned long)vcpu->fd);
 	TEST_ASSERT(r == -1 && errno == EEXIST,
 		    "Creating vCPU again for plane 1. ret: %d, errno: %d",
 		    r, errno);
 
-	r = ioctl(planefd, KVM_RUN, (void *)(unsigned long)0);
+	r = __plane_ioctl(plane, KVM_RUN, (void *)(unsigned long)0);
 	TEST_ASSERT(r == -1 && errno == ENOTTY,
 		    "Running plane vCPU again for plane 1. ret: %d, errno: %d",
 		    r, errno);
 
-	close(plane_vcpufd);
-	close(planefd);
-
 	kvm_vm_free(vm);
 	ksft_test_result_pass("basic planefd and plane_vcpufd operation\n");
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 29/29] selftests: kvm: add x86-specific plane test
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (27 preceding siblings ...)
  2025-04-01 16:11 ` [PATCH 28/29] selftests: kvm: add plane infrastructure Paolo Bonzini
@ 2025-04-01 16:11 ` Paolo Bonzini
  2025-04-01 16:16 ` [RFC PATCH 00/29] KVM: VM planes Sean Christopherson
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-04-01 16:11 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	jroedel, nsaenz, anelkz, James.Bottomley

Add a new test for x86-specific behavior such as vCPU state sharing
and interrupts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/include/x86/processor.h     |   1 +
 .../testing/selftests/kvm/lib/x86/processor.c |  15 +
 tools/testing/selftests/kvm/x86/plane_test.c  | 270 ++++++++++++++++++
 4 files changed, 287 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/plane_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index b1d0b410cc03..9d94db9d750f 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -82,6 +82,7 @@ TEST_GEN_PROGS_x86 += x86/kvm_pv_test
 TEST_GEN_PROGS_x86 += x86/monitor_mwait_test
 TEST_GEN_PROGS_x86 += x86/nested_emulation_test
 TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
+TEST_GEN_PROGS_x86 += x86/plane_test
 TEST_GEN_PROGS_x86 += x86/platform_info_test
 TEST_GEN_PROGS_x86 += x86/pmu_counters_test
 TEST_GEN_PROGS_x86 += x86/pmu_event_filter_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 32ab6ca7ec32..cf2095f3a7d5 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1106,6 +1106,7 @@ static inline void vcpu_clear_cpuid_feature(struct kvm_vcpu *vcpu,
 
 uint64_t vcpu_get_msr(struct kvm_vcpu *vcpu, uint64_t msr_index);
 int _vcpu_set_msr(struct kvm_vcpu *vcpu, uint64_t msr_index, uint64_t msr_value);
+int _plane_vcpu_set_msr(struct kvm_plane_vcpu *plane_vcpu, uint64_t msr_index, uint64_t msr_value);
 
 /*
  * Assert on an MSR access(es) and pretty print the MSR name when possible.
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index bd5a802fa7a5..b4431ca7fbca 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -917,6 +917,21 @@ uint64_t vcpu_get_msr(struct kvm_vcpu *vcpu, uint64_t msr_index)
 	return buffer.entry.data;
 }
 
+int _plane_vcpu_set_msr(struct kvm_plane_vcpu *plane_vcpu, uint64_t msr_index, uint64_t msr_value)
+{
+	struct {
+		struct kvm_msrs header;
+		struct kvm_msr_entry entry;
+	} buffer = {};
+
+	memset(&buffer, 0, sizeof(buffer));
+	buffer.header.nmsrs = 1;
+	buffer.entry.index = msr_index;
+	buffer.entry.data = msr_value;
+
+	return __plane_vcpu_ioctl(plane_vcpu, KVM_SET_MSRS, &buffer.header);
+}
+
 int _vcpu_set_msr(struct kvm_vcpu *vcpu, uint64_t msr_index, uint64_t msr_value)
 {
 	struct {
diff --git a/tools/testing/selftests/kvm/x86/plane_test.c b/tools/testing/selftests/kvm/x86/plane_test.c
new file mode 100644
index 000000000000..0fdd8a066723
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/plane_test.c
@@ -0,0 +1,270 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Red Hat, Inc.
+ *
+ * Test for x86-specific VM plane functionality
+ */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "apic.h"
+#include "asm/kvm.h"
+#include "linux/kvm.h"
+
+static void test_plane_regs(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct kvm_plane *plane;
+	struct kvm_plane_vcpu *plane_vcpu;
+
+	struct kvm_regs regs0, regs1;
+
+	vm = vm_create_barebones();
+	vcpu = __vm_vcpu_add(vm, 0);
+	plane = vm_plane_add(vm, 1);
+	plane_vcpu = __vm_plane_vcpu_add(vcpu, plane);
+
+	vcpu_ioctl(vcpu, KVM_GET_REGS, &regs0);
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_REGS, &regs1);
+	regs0.rax = 0x12345678;
+	regs1.rax = 0x87654321;
+
+	vcpu_ioctl(vcpu, KVM_SET_REGS, &regs0);
+	plane_vcpu_ioctl(plane_vcpu, KVM_SET_REGS, &regs1);
+
+	vcpu_ioctl(vcpu, KVM_GET_REGS, &regs0);
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_REGS, &regs1);
+	TEST_ASSERT_EQ(regs0.rax, 0x12345678);
+	TEST_ASSERT_EQ(regs1.rax, 0x87654321);
+
+	kvm_vm_free(vm);
+	ksft_test_result_pass("get/set regs for planes\n");
+}
+
+/* Offset of XMM0 in the legacy XSAVE area.  */
+#define XSTATE_BV_OFFSET	(0x200/4)
+#define XMM_OFFSET		(0xa0/4)
+#define PKRU_OFFSET		(0xa80/4)
+
+static void test_plane_fpu_nonshared(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct kvm_plane *plane;
+	struct kvm_plane_vcpu *plane_vcpu;
+
+	struct kvm_xsave xsave0, xsave1;
+
+	vm = vm_create_barebones();
+	TEST_ASSERT_EQ(vm_check_cap(vm, KVM_CAP_PLANES_FPU), false);
+
+	vcpu = __vm_vcpu_add(vm, 0);
+	vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
+	vcpu_set_cpuid(vcpu);
+
+	plane = vm_plane_add(vm, 1);
+	plane_vcpu = __vm_plane_vcpu_add(vcpu, plane);
+
+	vcpu_ioctl(vcpu, KVM_GET_XSAVE, &xsave0);
+	xsave0.region[XSTATE_BV_OFFSET] |= XFEATURE_MASK_FP | XFEATURE_MASK_SSE;
+	xsave0.region[XMM_OFFSET] = 0x12345678;
+	vcpu_ioctl(vcpu, KVM_SET_XSAVE, &xsave0);
+
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_XSAVE, &xsave1);
+	xsave1.region[XSTATE_BV_OFFSET] |= XFEATURE_MASK_FP | XFEATURE_MASK_SSE;
+	xsave1.region[XMM_OFFSET] = 0x87654321;
+	plane_vcpu_ioctl(plane_vcpu, KVM_SET_XSAVE, &xsave1);
+
+	memset(&xsave0, 0, sizeof(xsave0));
+	vcpu_ioctl(vcpu, KVM_GET_XSAVE, &xsave0);
+	TEST_ASSERT_EQ(xsave0.region[XMM_OFFSET], 0x12345678);
+
+	memset(&xsave1, 0, sizeof(xsave0));
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_XSAVE, &xsave1);
+	TEST_ASSERT_EQ(xsave1.region[XMM_OFFSET], 0x87654321);
+
+	ksft_test_result_pass("get/set FPU not shared across planes\n");
+}
+
+static void test_plane_fpu_shared(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct kvm_plane *plane;
+	struct kvm_plane_vcpu *plane_vcpu;
+
+	struct kvm_xsave xsave0, xsave1;
+
+	vm = vm_create_barebones();
+	vm_enable_cap(vm, KVM_CAP_PLANES_FPU, 1ul);
+	TEST_ASSERT_EQ(vm_check_cap(vm, KVM_CAP_PLANES_FPU), true);
+
+	vcpu = __vm_vcpu_add(vm, 0);
+	vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
+	vcpu_set_cpuid(vcpu);
+
+	plane = vm_plane_add(vm, 1);
+	plane_vcpu = __vm_plane_vcpu_add(vcpu, plane);
+
+	vcpu_ioctl(vcpu, KVM_GET_XSAVE, &xsave0);
+
+	xsave0.region[XSTATE_BV_OFFSET] |= XFEATURE_MASK_FP | XFEATURE_MASK_SSE;
+	xsave0.region[XMM_OFFSET] = 0x12345678;
+	vcpu_ioctl(vcpu, KVM_SET_XSAVE, &xsave0);
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_XSAVE, &xsave1);
+	TEST_ASSERT_EQ(xsave1.region[XMM_OFFSET], 0x12345678);
+
+	xsave1.region[XSTATE_BV_OFFSET] |= XFEATURE_MASK_FP | XFEATURE_MASK_SSE;
+	xsave1.region[XMM_OFFSET] = 0x87654321;
+	plane_vcpu_ioctl(plane_vcpu, KVM_SET_XSAVE, &xsave1);
+	vcpu_ioctl(vcpu, KVM_GET_XSAVE, &xsave0);
+	TEST_ASSERT_EQ(xsave0.region[XMM_OFFSET], 0x87654321);
+
+	ksft_test_result_pass("get/set FPU shared across planes\n");
+
+	if (!this_cpu_has(X86_FEATURE_PKU)) {
+		ksft_test_result_skip("get/set PKRU with shared FPU\n");
+		goto exit;
+	}
+
+	xsave0.region[XSTATE_BV_OFFSET] = XFEATURE_MASK_PKRU;
+	xsave0.region[PKRU_OFFSET] = 0xffffffff;
+	vcpu_ioctl(vcpu, KVM_SET_XSAVE, &xsave0);
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_XSAVE, &xsave0);
+
+	xsave0.region[XSTATE_BV_OFFSET] = XFEATURE_MASK_PKRU;
+	xsave0.region[PKRU_OFFSET] = 0xaaaaaaaa;
+	vcpu_ioctl(vcpu, KVM_SET_XSAVE, &xsave0);
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_XSAVE, &xsave1);
+	assert(xsave1.region[PKRU_OFFSET] == 0xffffffff);
+
+	xsave1.region[XSTATE_BV_OFFSET] = XFEATURE_MASK_PKRU;
+	xsave1.region[PKRU_OFFSET] = 0x55555555;
+	plane_vcpu_ioctl(plane_vcpu, KVM_SET_XSAVE, &xsave1);
+	vcpu_ioctl(vcpu, KVM_GET_XSAVE, &xsave0);
+	assert(xsave0.region[PKRU_OFFSET] == 0xaaaaaaaa);
+
+	ksft_test_result_pass("get/set PKRU with shared FPU\n");
+
+exit:
+	kvm_vm_free(vm);
+}
+
+#define APIC_SPIV		0xF0
+#define APIC_IRR		0x200
+
+#define MYVEC			192
+
+#define MAKE_MSI(cpu, vector) ((struct kvm_msi){		\
+	.address_lo = APIC_DEFAULT_GPA + (((cpu) & 0xff) << 8),	\
+	.address_hi = (cpu) & ~0xff,				\
+	.data = (vector),					\
+})
+
+static bool has_irr(struct kvm_lapic_state *apic, int vector)
+{
+	int word = vector >> 5;
+	int bit_in_word = vector & 31;
+	int bit = (APIC_IRR + word * 16) * CHAR_BIT + (bit_in_word & 31);
+
+	return apic->regs[bit >> 3] & (1 << (bit & 7));
+}
+
+static void do_enable_lapic(struct kvm_lapic_state *apic)
+{
+	/* set bit 8 */
+	apic->regs[APIC_SPIV + 1] |= 1;
+}
+
+static void test_plane_msi(void)
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct kvm_plane *plane;
+	struct kvm_plane_vcpu *plane_vcpu;
+	int r;
+
+	struct kvm_msi msi = MAKE_MSI(0, MYVEC);
+	struct kvm_lapic_state lapic0, lapic1;
+
+	vm = __vm_create(VM_SHAPE_DEFAULT, 1, 0);
+
+	vcpu = __vm_vcpu_add(vm, 0);
+	vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
+	vcpu_set_cpuid(vcpu);
+
+	plane = vm_plane_add(vm, 1);
+	plane_vcpu = __vm_plane_vcpu_add(vcpu, plane);
+
+	vcpu_set_msr(vcpu, MSR_IA32_APICBASE,
+		     APIC_DEFAULT_GPA | MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
+	vcpu_ioctl(vcpu, KVM_GET_LAPIC, &lapic0);
+	do_enable_lapic(&lapic0);
+	vcpu_ioctl(vcpu, KVM_SET_LAPIC, &lapic0);
+
+	_plane_vcpu_set_msr(plane_vcpu, MSR_IA32_APICBASE,
+			    APIC_DEFAULT_GPA | MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE);
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_LAPIC, &lapic1);
+	do_enable_lapic(&lapic1);
+	plane_vcpu_ioctl(plane_vcpu, KVM_SET_LAPIC, &lapic1);
+
+	r = __plane_ioctl(plane, KVM_SIGNAL_MSI, &msi);
+	TEST_ASSERT(r == 1,
+		   "Delivering interrupt to plane 1. ret: %d, errno: %d", r, errno);
+
+	vcpu_ioctl(vcpu, KVM_GET_LAPIC, &lapic0);
+	TEST_ASSERT(!has_irr(&lapic0, MYVEC), "Vector clear in plane 0");
+	plane_vcpu_ioctl(plane_vcpu, KVM_GET_LAPIC, &lapic1);
+	TEST_ASSERT(has_irr(&lapic1, MYVEC), "Vector set in plane 1");
+
+	/* req_exit_planes always has priority */
+	vcpu->run->req_exit_planes = (1 << 1);
+	vcpu_run(vcpu);
+	TEST_ASSERT_EQ(vcpu->run->exit_reason, KVM_EXIT_PLANE_EVENT);
+	TEST_ASSERT_EQ(vcpu->run->plane_event.cause, KVM_PLANE_EVENT_INTERRUPT);
+	TEST_ASSERT_EQ(vcpu->run->plane_event.pending_event_planes, (1 << 1));
+	TEST_ASSERT_EQ(vcpu->run->plane_event.target, (1 << 1));
+
+	r = __vm_ioctl(vm, KVM_SIGNAL_MSI, &msi);
+	TEST_ASSERT(r == 1,
+		   "Delivering interrupt to plane 0. ret: %d, errno: %d", r, errno);
+	vcpu_ioctl(vcpu, KVM_GET_LAPIC, &lapic0);
+	TEST_ASSERT(has_irr(&lapic0, MYVEC), "Vector set in plane 0");
+
+	/* req_exit_planes ignores current plane; current plane is cleared */
+	vcpu->run->plane = 1;
+	vcpu->run->req_exit_planes = (1 << 0) | (1 << 1);
+	vcpu_run(vcpu);
+	TEST_ASSERT_EQ(vcpu->run->exit_reason, KVM_EXIT_PLANE_EVENT);
+	TEST_ASSERT_EQ(vcpu->run->plane_event.cause, KVM_PLANE_EVENT_INTERRUPT);
+	TEST_ASSERT_EQ(vcpu->run->plane_event.pending_event_planes, (1 << 0));
+	TEST_ASSERT_EQ(vcpu->run->plane_event.target, (1 << 0));
+
+	kvm_vm_free(vm);
+	ksft_test_result_pass("signal MSI for planes\n");
+}
+
+int main(int argc, char *argv[])
+{
+	int cap_planes = kvm_check_cap(KVM_CAP_PLANES);
+	TEST_REQUIRE(cap_planes && cap_planes > 1);
+
+	ksft_print_header();
+	ksft_set_plan(5);
+
+	pr_info("# KVM_CAP_PLANES: %d\n", cap_planes);
+
+	test_plane_regs();
+	test_plane_fpu_nonshared();
+	test_plane_fpu_shared();
+	test_plane_msi();
+
+	ksft_finished();
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/29] KVM: VM planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (28 preceding siblings ...)
  2025-04-01 16:11 ` [PATCH 29/29] selftests: kvm: add x86-specific plane test Paolo Bonzini
@ 2025-04-01 16:16 ` Sean Christopherson
  2025-06-06 16:42 ` Tom Lendacky
  2025-08-07 12:34 ` Vaishali Thakkar
  31 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-04-01 16:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> I guess April 1st is not the best date to send out such a large series
> after months of radio silence, but here we are.

Heh, you missed an opportunity to spell it "plains" and then spend the entire
cover letter justifying the name :-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept
  2025-04-01 16:10 ` [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept Paolo Bonzini
@ 2025-04-21 18:43   ` Tom Lendacky
  0 siblings, 0 replies; 49+ messages in thread
From: Tom Lendacky @ 2025-04-21 18:43 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, ashish.kalra, michael.roth, jroedel, nsaenz,
	anelkz, James.Bottomley

On 4/1/25 11:10, Paolo Bonzini wrote:
> There have been multiple occurrences of processors introducing a virtual
> privilege level concept for guests, where the hypervisor hosts multiple
> copies of a vCPU's register state (or at least of most of it) and provides
> hypercalls or instructions to switch between them.  These include AMD
> VMPLs, Intel TDX partitions, Microsoft Hyper-V VTLs, and ARM CCA planes.
> Include documentation on how the feature will be exposed to userspace,
> based on a draft made between Plumbers and KVM Forum.
> 

> @@ -7162,6 +7262,44 @@ The valid value for 'flags' is:
>    - KVM_NOTIFY_CONTEXT_INVALID -- the VM context is corrupted and not valid
>      in VMCS. It would run into unknown result if resume the target VM.
>  
> +::
> +
> +    /* KVM_EXIT_PLANE_EVENT */
> +    struct {
> +  #define KVM_PLANE_EVENT_INTERRUPT	1
> +      __u16 cause;
> +      __u16 pending_event_planes;
> +      __u16 target;
> +      __u16 padding;
> +      __u32 flags;
> +      __u64 extra;
> +    } plane_event;
> +
> +Inform userspace of an event that affects a different plane than the
> +currently executing one.
> +
> +On a ``KVM_EXIT_PLANE_EVENT`` exit, ``pending_event_planes`` is always
> +set to the set of planes that have a pending interrupt.
> +
> +``cause`` provides the event that caused the exit, and the meaning of
> +``target`` depends on the cause of the exit too.
> +
> +Right now the only defined cause is ``KVM_PLANE_EVENT_INTERRUPT``, i.e.

With the SVSM and VMPL levels, the guest OS will request to run VMPL0 to
run the SVSM and process an SVSM call. When complete, the SVSM will
return to the guest OS by requesting to run the guest VMPL. Do we need
an event for this plane switch that doesn't involve interrupts?

> +an interrupt was received by a plane whose id is set in the
> +``req_exit_planes`` bitmap.  In this case, ``target`` is the AND of
> +``req_exit_planes`` and ``pending_event_planes``.
> +
> +``flags`` and ``extra`` are currently always 0.
> +
> +If userspace wants to switch to the target plane, it should move any
> +shared state from the current plane to ``target``, and then invoke
> +``KVM_RUN`` with ``kvm_run->plane`` set to ``target`` (and
> +``req_exit_planes`` initialized accordingly).  Note that it's also
> +valid to switch planes in response to other userspace exit codes, for
> +example ``KVM_EXIT_X86_WRMSR`` or ``KVM_EXIT_HYPERCALL``.  Immediately
> +after ``KVM_RUN`` is entered, KVM will check ``req_exit_planes`` and
> +trigger a ``KVM_EXIT_PLANE_EVENT`` userspace exit if needed.

I'm not sure I follow this. Won't req_exit_planes have a value no matter
what when running a less privileged plane. How would KVM know to
immediately exit?

Thanks,
Tom

> +

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/29] KVM: add plane info to structs
  2025-04-01 16:10 ` [PATCH 03/29] KVM: add plane info to structs Paolo Bonzini
@ 2025-04-21 18:57   ` Tom Lendacky
  2025-04-21 19:04   ` Tom Lendacky
  1 sibling, 0 replies; 49+ messages in thread
From: Tom Lendacky @ 2025-04-21 18:57 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, ashish.kalra, michael.roth, jroedel, nsaenz,
	anelkz, James.Bottomley

On 4/1/25 11:10, Paolo Bonzini wrote:
> Add some of the data to move from one plane to the other within a VM,
> typically from plane N to plane 0.
> 
> There is quite some difference here because while separate planes provide
> very little of the vm file descriptor functionality, they are almost fully
> functional vCPUs except that non-zero planes(*) can only be ran indirectly
> through the initial plane.
> 
> Therefore, vCPUs use struct kvm_vcpu for all planes, with just a couple
> fields that will be added later and will only be valid for plane 0.  At
> the VM level instead plane info is stored in a completely different struct.
> For now struct kvm_plane has no architecture-specific counterpart, but this
> may change in the future if needed.  It's possible for example that some MMU
> info becomes per-plane in order to support per-plane RWX permissions.
> 
> (*) I will restrain from calling them astral planes.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/linux/kvm_host.h  | 17 ++++++++++++++++-
>  include/linux/kvm_types.h |  1 +
>  virt/kvm/kvm_main.c       | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+), 1 deletion(-)
> 

>  
> +static void kvm_destroy_plane(struct kvm_plane *plane)
> +{
> +}

Should this be doing a kfree() of the plane?

Thanks,
Tom

> +
>  static void kvm_destroy_vm(struct kvm *kvm)
>  {
>  	int i;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/29] KVM: add plane info to structs
  2025-04-01 16:10 ` [PATCH 03/29] KVM: add plane info to structs Paolo Bonzini
  2025-04-21 18:57   ` Tom Lendacky
@ 2025-04-21 19:04   ` Tom Lendacky
  1 sibling, 0 replies; 49+ messages in thread
From: Tom Lendacky @ 2025-04-21 19:04 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, ashish.kalra, michael.roth, jroedel, nsaenz,
	anelkz, James.Bottomley

On 4/1/25 11:10, Paolo Bonzini wrote:
> Add some of the data to move from one plane to the other within a VM,
> typically from plane N to plane 0.
> 
> There is quite some difference here because while separate planes provide
> very little of the vm file descriptor functionality, they are almost fully
> functional vCPUs except that non-zero planes(*) can only be ran indirectly
> through the initial plane.
> 
> Therefore, vCPUs use struct kvm_vcpu for all planes, with just a couple
> fields that will be added later and will only be valid for plane 0.  At
> the VM level instead plane info is stored in a completely different struct.
> For now struct kvm_plane has no architecture-specific counterpart, but this
> may change in the future if needed.  It's possible for example that some MMU
> info becomes per-plane in order to support per-plane RWX permissions.
> 
> (*) I will restrain from calling them astral planes.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/linux/kvm_host.h  | 17 ++++++++++++++++-
>  include/linux/kvm_types.h |  1 +
>  virt/kvm/kvm_main.c       | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+), 1 deletion(-)
> 

> @@ -332,7 +336,8 @@ struct kvm_vcpu {
>  #ifdef CONFIG_PROVE_RCU
>  	int srcu_depth;
>  #endif
> -	int mode;
> +	short plane;
> +	short mode;
>  	u64 requests;
>  	unsigned long guest_debug;
>  

> @@ -753,6 +760,11 @@ struct kvm_memslots {
>  	int node_idx;
>  };
>  
> +struct kvm_plane {
> +	struct kvm *kvm;
> +	int plane;

Should there be consistency between the use of short in kvm_run vs int
in kvm_plane? And elsewhere in the series, unsigned int is used, also.

Thanks,
Tom

> +};
> +


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/29] KVM: implement plane file descriptors ioctl and creation
  2025-04-01 16:10 ` [PATCH 09/29] KVM: implement plane file descriptors ioctl and creation Paolo Bonzini
@ 2025-04-21 20:32   ` Tom Lendacky
  0 siblings, 0 replies; 49+ messages in thread
From: Tom Lendacky @ 2025-04-21 20:32 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, ashish.kalra, michael.roth, jroedel, nsaenz,
	anelkz, James.Bottomley

On 4/1/25 11:10, Paolo Bonzini wrote:
> Add the file_operations for planes, the means to create new file
> descriptors for them, and the KVM_CHECK_EXTENSION implementation for
> the two new capabilities.
> 
> KVM_SIGNAL_MSI and KVM_SET_MEMORY_ATTRIBUTES are now available
> through both vm and plane file descriptors, forward them to the
> same function that is used by the file_operations for planes.
> KVM_CHECK_EXTENSION instead remains separate, because it only
> advertises a very small subset of capabilities when applied to
> plane file descriptors.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/linux/kvm_host.h |  19 +++++
>  include/uapi/linux/kvm.h |   2 +
>  virt/kvm/kvm_main.c      | 154 +++++++++++++++++++++++++++++++++------
>  3 files changed, 154 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0a91b556767e..dbca418d64f5 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -342,6 +342,8 @@ struct kvm_vcpu {
>  	unsigned long guest_debug;
>  
>  	struct mutex mutex;
> +
> +	/* Shared for all planes */
>  	struct kvm_run *run;
>  
>  #ifndef __KVM_HAVE_ARCH_WQP
> @@ -922,6 +924,23 @@ static inline void kvm_vm_bugged(struct kvm *kvm)
>  }
>  
>  
> +#if KVM_MAX_VCPU_PLANES == 1
> +static inline int kvm_arch_nr_vcpu_planes(struct kvm *kvm)
> +{
> +	return KVM_MAX_VCPU_PLANES;
> +}

Should this be outside of the #if above?

> +
> +static inline struct kvm_plane *vcpu_to_plane(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->kvm->planes[0];
> +}
> +#else
> +static inline struct kvm_plane *vcpu_to_plane(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->kvm->planes[vcpu->plane_id];
> +}
> +#endif

Are two different functions needed? The vcpu->plane_id will be zero in
the KVM_MAX_VCPU_PLANES == 1 case, so that should get the same result as
the hard-coded 0, right?

> +


> @@ -5236,16 +5363,12 @@ static long kvm_vm_ioctl(struct file *filp,
>  		break;
>  	}
>  #ifdef CONFIG_HAVE_KVM_MSI
> -	case KVM_SIGNAL_MSI: {
> -		struct kvm_msi msi;
> -
> -		r = -EFAULT;
> -		if (copy_from_user(&msi, argp, sizeof(msi)))
> -			goto out;
> -		r = kvm_send_userspace_msi(kvm->planes[0], &msi);
> -		break;
> -	}
> +	case KVM_SIGNAL_MSI:
>  #endif
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +	case KVM_SET_MEMORY_ATTRIBUTES:
> +#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */

If both of these aren't #defined, then you'll end up with just the
return statement from the next line, which will cause a build failure.

Thanks,
Tom

> +		return __kvm_plane_ioctl(kvm->planes[0], ioctl, arg);
>  #ifdef __KVM_HAVE_IRQ_LINE
>  	case KVM_IRQ_LINE_STATUS:
>  	case KVM_IRQ_LINE: {
> @@ -5301,18 +5424,6 @@ static long kvm_vm_ioctl(struct file *filp,
>  		break;
>  	}
>  #endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */
> -#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> -	case KVM_SET_MEMORY_ATTRIBUTES: {
> -		struct kvm_memory_attributes attrs;
> -
> -		r = -EFAULT;
> -		if (copy_from_user(&attrs, argp, sizeof(attrs)))
> -			goto out;
> -
> -		r = kvm_vm_ioctl_set_mem_attributes(kvm->planes[0], &attrs);
> -		break;
> -	}
> -#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
>  	case KVM_CREATE_DEVICE: {
>  		struct kvm_create_device cd;
>  
> @@ -6467,6 +6578,7 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
>  	kvm_chardev_ops.owner = module;
>  	kvm_vm_fops.owner = module;
>  	kvm_vcpu_fops.owner = module;
> +	kvm_plane_fops.owner = module;
>  	kvm_device_fops.owner = module;
>  
>  	kvm_preempt_ops.sched_in = kvm_sched_in;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 12/29] KVM: share dirty ring for same vCPU id on different planes
  2025-04-01 16:10 ` [PATCH 12/29] KVM: share dirty ring for same vCPU id on different planes Paolo Bonzini
@ 2025-04-21 21:51   ` Tom Lendacky
  0 siblings, 0 replies; 49+ messages in thread
From: Tom Lendacky @ 2025-04-21 21:51 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, ashish.kalra, michael.roth, jroedel, nsaenz,
	anelkz, James.Bottomley

On 4/1/25 11:10, Paolo Bonzini wrote:
> The dirty page ring is read by mmap()-ing the vCPU file descriptor,
> which is only possible for plane 0.  This is not a problem because it
> is only filled by KVM_RUN which takes the plane-0 vCPU mutex, and it is
> therefore possible to share it for vCPUs that have the same id but are
> on different planes.  (TODO: double check).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/linux/kvm_host.h |  6 ++++--
>  virt/kvm/dirty_ring.c    |  5 +++--
>  virt/kvm/kvm_main.c      | 10 +++++-----
>  3 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d2e0c0e8ff17..b511aed2de8e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -394,9 +394,11 @@ struct kvm_vcpu {
>  	bool scheduled_out;
>  	struct kvm_vcpu_arch arch;
>  	struct kvm_vcpu_stat *stat;
> -	struct kvm_vcpu_stat __stat;
>  	char stats_id[KVM_STATS_NAME_SIZE];
> -	struct kvm_dirty_ring dirty_ring;
> +	struct kvm_dirty_ring *dirty_ring;

Looks like the setting of dirty_ring is missing in the patch.

Thanks,
Tom

> +
> +	struct kvm_vcpu_stat __stat;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/29] KVM: implement vCPU creation for extra planes
  2025-04-01 16:10 ` [PATCH 13/29] KVM: implement vCPU creation for extra planes Paolo Bonzini
@ 2025-04-21 22:08   ` Tom Lendacky
  2025-06-05 22:47     ` Sean Christopherson
  0 siblings, 1 reply; 49+ messages in thread
From: Tom Lendacky @ 2025-04-21 22:08 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, ashish.kalra, michael.roth, jroedel, nsaenz,
	anelkz, James.Bottomley

On 4/1/25 11:10, Paolo Bonzini wrote:
> For userspace to have fun with planes it is probably useful to let them
> create vCPUs on the non-zero planes as well.  Since such vCPUs are backed
> by the same struct kvm_vcpu, these are regular vCPU file descriptors except
> that they only allow a small subset of ioctls (mostly get/set) and they
> share some of the backing resources, notably vcpu->run.
> 
> TODO: prefault might be useful on non-default planes as well?
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virt/kvm/locking.rst |   3 +
>  include/linux/kvm_host.h           |   4 +-
>  include/uapi/linux/kvm.h           |   1 +
>  virt/kvm/kvm_main.c                | 167 +++++++++++++++++++++++------
>  4 files changed, 142 insertions(+), 33 deletions(-)
> 

> @@ -4200,8 +4223,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>  	 * release semantics, which ensures the write is visible to kvm_get_vcpu().
>  	 */
>  	vcpu->plane = -1;
> -	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
> -	r = xa_insert(&kvm->planes[0]->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
> +	if (plane->plane)
> +		vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
> +	else
> +		vcpu->vcpu_idx = plane0_vcpu->vcpu_idx;

Don't you want the atomic_read() for the plane0 vCPU and use the plane0
vcpu->idx value for non-zero plane vCPUs?

> +
> +	r = xa_insert(&plane->vcpu_array, vcpu->vcpu_idx,
> +		      vcpu, GFP_KERNEL_ACCOUNT);
>  	WARN_ON_ONCE(r == -EBUSY);
>  	if (r)
>  		goto unlock_vcpu_destroy;
> @@ -4220,13 +4248,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>  	if (r < 0)
>  		goto kvm_put_xa_erase;
>  
> -	atomic_inc(&kvm->online_vcpus);
> +	if (!plane0_vcpu)

It looks like plane0_vcpu will always have value, either from input or
assigned in an else path earlier in the code. Should this be
"!plane->plane" ?

Thanks,
Tom

> +		atomic_inc(&kvm->online_vcpus);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/29] KVM: API definitions for plane userspace exit
  2025-04-01 16:10 ` [PATCH 02/29] KVM: API definitions for plane userspace exit Paolo Bonzini
@ 2025-06-04  0:10   ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-06-04  0:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> Copy over the uapi definitions from the Documentation/ directory.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/uapi/linux/kvm.h | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 1e0a511c43d0..b0cca93ebcb3 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -135,6 +135,16 @@ struct kvm_xen_exit {
>  	} u;
>  };
>  
> +struct kvm_plane_event_exit {
> +#define KVM_PLANE_EVENT_INTERRUPT    1
> +	__u16 cause;
> +	__u16 pending_event_planes;
> +	__u16 target;
> +	__u16 padding;
> +	__u32 flags;
> +	__u64 extra[8];
> +};
> +
>  struct kvm_tdx_exit {
>  #define KVM_EXIT_TDX_VMCALL     1
>          __u32 type;
> @@ -262,7 +272,8 @@ struct kvm_tdx_exit {
>  #define KVM_EXIT_NOTIFY           37
>  #define KVM_EXIT_LOONGARCH_IOCSR  38
>  #define KVM_EXIT_MEMORY_FAULT     39
> -#define KVM_EXIT_TDX              40
> +#define KVM_EXIT_PLANE_EVENT      40
> +#define KVM_EXIT_TDX              41
>  
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -295,7 +306,13 @@ struct kvm_run {
>  	/* in */
>  	__u8 request_interrupt_window;
>  	__u8 HINT_UNSAFE_IN_KVM(immediate_exit);
> -	__u8 padding1[6];
> +
> +	/* in/out */
> +	__u8 plane;

We should add padding before or after "plane"; there's a 1-byte hole here that's
hard to spot at first glance (ran pahole just to verify my eyeballs):

  struct kvm_run {
	__u8                       request_interrupt_window; /*     0     1 */
	__u8                       immediate_exit__unsafe; /*     1     1 */
	__u8                       plane;                /*     2     1 */

	/* XXX 1 byte hole, try to pack */

	__u16                      suspended_planes;     /*     4     2 */
	__u16                      req_exit_planes;      /*     6     2 */
  }

Probably pad before, so that "plane" is just before the xxx_planes fields?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity
  2025-04-01 16:10 ` [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity Paolo Bonzini
@ 2025-06-05 22:45   ` Sean Christopherson
  2025-06-06 13:49     ` Sean Christopherson
  0 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2025-06-05 22:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> Different planes can initialize their vCPUs separately, therefore there is
> no single online_vcpus value that can be used to test that a vCPU has
> indeed been fully initialized.
> 
> Use the shiny new plane field instead, initializing it to an invalid value
> (-1) while the vCPU is visible in the xarray but may still disappear if
> the creation fails.

Checking vcpu->plane _in addition_ to online_cpus seems way safer than checking
vcpu->plane _instead_ of online_cpus.  Even if we end up checking only vcpu->plane,
I think that should be a separate patch.

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/i8254.c     |  3 ++-
>  include/linux/kvm_host.h | 23 ++++++-----------------
>  virt/kvm/kvm_main.c      | 20 +++++++++++++-------
>  3 files changed, 21 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index d7ab8780ab9e..e3a3e7b90c26 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -260,9 +260,10 @@ static void pit_do_work(struct kthread_work *work)
>  	 * VCPUs and only when LVT0 is in NMI mode.  The interrupt can
>  	 * also be simultaneously delivered through PIC and IOAPIC.
>  	 */
> -	if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
> +	if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0) {

Spurious change (a good change, but noisy for this patch).

>  		kvm_for_each_vcpu(i, vcpu, kvm)
>  			kvm_apic_nmi_wd_deliver(vcpu);
> +	}
>  }
>  
>  static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 4d408d1d5ccc..0db27814294f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -992,27 +992,16 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
>  
>  static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
>  {
> -	int num_vcpus = atomic_read(&kvm->online_vcpus);
> -
> -	/*
> -	 * Explicitly verify the target vCPU is online, as the anti-speculation
> -	 * logic only limits the CPU's ability to speculate, e.g. given a "bad"
> -	 * index, clamping the index to 0 would return vCPU0, not NULL.
> -	 */
> -	if (i >= num_vcpus)
> +	struct kvm_vcpu *vcpu = xa_load(&kvm->vcpu_array, i);

newline

> +	if (vcpu && unlikely(vcpu->plane == -1))
>  		return NULL;
>  
> -	i = array_index_nospec(i, num_vcpus);

Don't we still need to prevent speculating into the xarray ?

> -
> -	/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu.  */
> -	smp_rmb();
> -	return xa_load(&kvm->vcpu_array, i);
> +	return vcpu;
>  }
>  
> -#define kvm_for_each_vcpu(idx, vcpup, kvm)				\
> -	if (atomic_read(&kvm->online_vcpus))				\
> -		xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0,	\
> -				  (atomic_read(&kvm->online_vcpus) - 1))
> +#define kvm_for_each_vcpu(idx, vcpup, kvm)			\
> +	xa_for_each(&kvm->vcpu_array, idx, vcpup)		\
> +		if ((vcpup)->plane == -1) ; else		\
>  
>  static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
>  {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e343905e46d8..eba02cb7cc57 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4186,6 +4186,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>  		goto unlock_vcpu_destroy;
>  	}
>  
> +	/*
> +	 * Store an invalid plane number until fully initialized.  xa_insert() has
> +	 * release semantics, which ensures the write is visible to kvm_get_vcpu().
> +	 */
> +	vcpu->plane = -1;
>  	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
>  	r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
>  	WARN_ON_ONCE(r == -EBUSY);
> @@ -4195,7 +4200,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>  	/*
>  	 * Now it's all set up, let userspace reach it.  Grab the vCPU's mutex
>  	 * so that userspace can't invoke vCPU ioctl()s until the vCPU is fully
> -	 * visible (per online_vcpus), e.g. so that KVM doesn't get tricked
> +	 * visible (valid vcpu->plane), e.g. so that KVM doesn't get tricked
>  	 * into a NULL-pointer dereference because KVM thinks the _current_
>  	 * vCPU doesn't exist.  As a bonus, taking vcpu->mutex ensures lockdep
>  	 * knows it's taken *inside* kvm->lock.
> @@ -4206,12 +4211,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>  	if (r < 0)
>  		goto kvm_put_xa_erase;
>  
> -	/*
> -	 * Pairs with smp_rmb() in kvm_get_vcpu.  Store the vcpu
> -	 * pointer before kvm->online_vcpu's incremented value.

Bad me for not updating this comment, but kvm_vcpu_on_spin() also pairs with this
barrier, and needs to be updated to be planes-aware, e.g. this looks like a NULL
pointer deref waiting to happen:

		vcpu = xa_load(&plane->vcpu_array, idx);
		if (!READ_ONCE(vcpu->ready))
			continue;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/29] KVM: implement vCPU creation for extra planes
  2025-04-21 22:08   ` Tom Lendacky
@ 2025-06-05 22:47     ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-06-05 22:47 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, linux-kernel, kvm, roy.hopkins, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Mon, Apr 21, 2025, Tom Lendacky wrote:
> On 4/1/25 11:10, Paolo Bonzini wrote:
> > For userspace to have fun with planes it is probably useful to let them
> > create vCPUs on the non-zero planes as well.  Since such vCPUs are backed
> > by the same struct kvm_vcpu, these are regular vCPU file descriptors except
> > that they only allow a small subset of ioctls (mostly get/set) and they
> > share some of the backing resources, notably vcpu->run.
> > 
> > TODO: prefault might be useful on non-default planes as well?
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  Documentation/virt/kvm/locking.rst |   3 +
> >  include/linux/kvm_host.h           |   4 +-
> >  include/uapi/linux/kvm.h           |   1 +
> >  virt/kvm/kvm_main.c                | 167 +++++++++++++++++++++++------
> >  4 files changed, 142 insertions(+), 33 deletions(-)
> > 
> 
> > @@ -4200,8 +4223,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> >  	 * release semantics, which ensures the write is visible to kvm_get_vcpu().
> >  	 */
> >  	vcpu->plane = -1;
> > -	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
> > -	r = xa_insert(&kvm->planes[0]->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
> > +	if (plane->plane)
> > +		vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
> > +	else
> > +		vcpu->vcpu_idx = plane0_vcpu->vcpu_idx;
> 
> Don't you want the atomic_read() for the plane0 vCPU and use the plane0
> vcpu->idx value for non-zero plane vCPUs?

+1, this looks backwards to me as well.

> 
> > +
> > +	r = xa_insert(&plane->vcpu_array, vcpu->vcpu_idx,
> > +		      vcpu, GFP_KERNEL_ACCOUNT);
> >  	WARN_ON_ONCE(r == -EBUSY);
> >  	if (r)
> >  		goto unlock_vcpu_destroy;
> > @@ -4220,13 +4248,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> >  	if (r < 0)
> >  		goto kvm_put_xa_erase;
> >  
> > -	atomic_inc(&kvm->online_vcpus);
> > +	if (!plane0_vcpu)
> 
> It looks like plane0_vcpu will always have value, either from input or
> assigned in an else path earlier in the code. Should this be
> "!plane->plane" ?

Even if plane0_vcpu were viable, my vote is for !plane->plane, because that makes
it much more obvious that only plane0 bumps the count.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity
  2025-06-05 22:45   ` Sean Christopherson
@ 2025-06-06 13:49     ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-06-06 13:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Thu, Jun 05, 2025, Sean Christopherson wrote:
> On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> > Different planes can initialize their vCPUs separately, therefore there is
> > no single online_vcpus value that can be used to test that a vCPU has
> > indeed been fully initialized.
> > 
> > Use the shiny new plane field instead, initializing it to an invalid value
> > (-1) while the vCPU is visible in the xarray but may still disappear if
> > the creation fails.
> 
> Checking vcpu->plane _in addition_ to online_cpus seems way safer than checking
> vcpu->plane _instead_ of online_cpus.  Even if we end up checking only vcpu->plane,
> I think that should be a separate patch.

Alternatively, why not do the somewhat more obvious thing if making online_vcpus
per-plane?

Oh!  Is it because vCPUs can be sparesly populated?  E.g. give a 4-vCPU VM, plane1
could have vCPU0 and vCPU3, but not vCPU1 or or vCPU2?

That's implicitly captured in the docs, but we should very explicitly call that
out in the relevant changelogs (this one especially), so that the motivation for
using vcpu->plane to detect validity is captured.  E.g. even if that detail were
explicitly stated in the docs, it would be easy to overlook when doing `git blame`
a few years from now.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/29] KVM: share statistics for same vCPU id on different planes
  2025-04-01 16:10 ` [PATCH 10/29] KVM: share statistics for same vCPU id on different planes Paolo Bonzini
@ 2025-06-06 16:23   ` Sean Christopherson
  2025-06-06 16:32     ` Paolo Bonzini
  0 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2025-06-06 16:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> Statistics are protected by vcpu->mutex; because KVM_RUN takes the
> plane-0 vCPU mutex, there is no race on applying statistics for all
> planes to the plane-0 kvm_vcpu struct.
> 
> This saves the burden on the kernel of implementing the binary stats
> interface for vCPU plane file descriptors, and on userspace of gathering
> info from multiple planes.  The disadvantage is a slight loss of
> information, and an extra pointer dereference when updating stats.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/arm64/kvm/arm.c                 |  2 +-
>  arch/arm64/kvm/handle_exit.c         |  6 +--
>  arch/arm64/kvm/hyp/nvhe/gen-hyprel.c |  4 +-
>  arch/arm64/kvm/mmio.c                |  4 +-
>  arch/loongarch/kvm/exit.c            |  8 ++--
>  arch/loongarch/kvm/vcpu.c            |  2 +-
>  arch/mips/kvm/emulate.c              |  2 +-
>  arch/mips/kvm/mips.c                 | 30 +++++++-------
>  arch/mips/kvm/vz.c                   | 18 ++++-----
>  arch/powerpc/kvm/book3s.c            |  2 +-
>  arch/powerpc/kvm/book3s_hv.c         | 46 ++++++++++-----------
>  arch/powerpc/kvm/book3s_hv_rm_xics.c |  8 ++--
>  arch/powerpc/kvm/book3s_pr.c         | 22 +++++-----
>  arch/powerpc/kvm/book3s_pr_papr.c    |  2 +-
>  arch/powerpc/kvm/powerpc.c           |  4 +-
>  arch/powerpc/kvm/timing.h            | 28 ++++++-------
>  arch/riscv/kvm/vcpu.c                |  2 +-
>  arch/riscv/kvm/vcpu_exit.c           | 10 ++---
>  arch/riscv/kvm/vcpu_insn.c           | 16 ++++----
>  arch/riscv/kvm/vcpu_sbi.c            |  2 +-
>  arch/riscv/kvm/vcpu_sbi_hsm.c        |  2 +-
>  arch/s390/kvm/diag.c                 | 18 ++++-----
>  arch/s390/kvm/intercept.c            | 20 +++++-----
>  arch/s390/kvm/interrupt.c            | 48 +++++++++++-----------
>  arch/s390/kvm/kvm-s390.c             |  8 ++--
>  arch/s390/kvm/priv.c                 | 60 ++++++++++++++--------------
>  arch/s390/kvm/sigp.c                 | 50 +++++++++++------------
>  arch/s390/kvm/vsie.c                 |  2 +-
>  arch/x86/kvm/debugfs.c               |  2 +-
>  arch/x86/kvm/hyperv.c                |  4 +-
>  arch/x86/kvm/kvm_cache_regs.h        |  4 +-
>  arch/x86/kvm/mmu/mmu.c               | 18 ++++-----
>  arch/x86/kvm/mmu/tdp_mmu.c           |  2 +-
>  arch/x86/kvm/svm/sev.c               |  2 +-
>  arch/x86/kvm/svm/svm.c               | 18 ++++-----
>  arch/x86/kvm/vmx/tdx.c               |  8 ++--
>  arch/x86/kvm/vmx/vmx.c               | 20 +++++-----
>  arch/x86/kvm/x86.c                   | 40 +++++++++----------
>  include/linux/kvm_host.h             |  5 ++-
>  virt/kvm/kvm_main.c                  | 19 ++++-----
>  40 files changed, 285 insertions(+), 283 deletions(-)

...

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index dbca418d64f5..d2e0c0e8ff17 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -393,7 +393,8 @@ struct kvm_vcpu {
>  	bool ready;
>  	bool scheduled_out;
>  	struct kvm_vcpu_arch arch;
> -	struct kvm_vcpu_stat stat;
> +	struct kvm_vcpu_stat *stat;
> +	struct kvm_vcpu_stat __stat;

Rather than special case invidiual fields, I think we should give kvm_vcpu the
same treatment as "struct kvm", and have kvm_vcpu represent the overall vCPU,
with an array of planes to hold the sub-vCPUs.

Having "kvm_vcpu" represent a plane, while "kvm" represents the overall VM, is
conceptually messy.  And more importantly, I think the approach taken here will
be nigh impossible to maintain, and will have quite a bit of baggage.  E.g. planes1+
will be filled with dead memory, and we also risk goofs where KVM could access
__stat in a plane1+ vCPU.

Documenting which fields are plane0-only, i.e. per-vCPU, via comments isn't
sustainable, whereas a hard split via structures will naturally what fields are
scope to the overall vCPU, versus what is per-plane, and will force us to more
explicitly audit the code.  E.g. ____srcu_idx (and thus srcu_depth) is something
that I think should be shared by all planes.  Ditto for preempt_notifier, vcpu_id,
vcpu_idx, pid, etc.

Aha!  And to prove my point, this series breaks legacy signal handling, because
sigset_active and sigset are accessed using the plane1+ vCPU in kvm_vcpu_ioctl_run_plane(),
but KVM_SET_SIGNAL_MASK is only allowed to operated on plane0.  And I definitely
don't think the answer is to let KVM_SET_SIGNAL_MASK operate on plane1+, because
forcing userspace to duplicate the sigmal masks to all planes is pointless.

Yeeeaaap.  pid and pid_lock are also broken.  As is vmx_hwapic_isr_update()
and kvm_sched_out()'s usage of wants_to_run.  And guest_debug.

Long term, I just don't see this approach as being maintainable.  We're pretty
much guaranteed to end up with bugs where KVM operates on the wrong kvm_vcpu
structure due to lack of explicit isolation in code.  And those bugs are going
to absolutely brutal to debug (or even notice).  E.g. failure to set "preempted"
on planes 1+ will mostly manifest as subtle performance issues.

Oof.  And that would force us to document that duplicating cpuid and cpu_caps to
planes1+ is actually necessary, due to dynamic CPUID features (ugh).  Though FWIW,
we could dodge that by special casing dynamic features, which isn't a bad idea
irrespective of planes.

Somewhat of a side topic: unless we need/want to explicitly support concurrent
GET/SET on planes of a vCPU, I think we should make vcpu->mutex per-vCPU, not
per-plane, so that there's zero chance of having bugs due to thinking that holding
vcpu->mutex provides protection against a race.

Extracing fields to a separate kvm_vcpu_plane will obviously require a *lot* more
churn, but I think in the long run it will be less work in total, because we won't
spend as much time chasing down bugs.

Very little per-plane state is in "struct kvm_vcpu", so I think we can do the big
conversion on a per-arch basis via a small amount of #ifdefs, i.e. not be force to
immediatedly convert all architectures to a kvm_vcpu vs. kvm_vcpu_plane world.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 20/29] KVM: x86: add planes support for interrupt delivery
  2025-04-01 16:10 ` [PATCH 20/29] KVM: x86: add planes support for interrupt delivery Paolo Bonzini
@ 2025-06-06 16:30   ` Sean Christopherson
  2025-06-06 16:38     ` Paolo Bonzini
  0 siblings, 1 reply; 49+ messages in thread
From: Sean Christopherson @ 2025-06-06 16:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> Plumb the destination plane into struct kvm_lapic_irq and propagate it
> everywhere.  The in-kernel IOAPIC only targets plane 0.

Can we get more aggressive and make KVM_CREATE_IRQCHIP mutually exclusive with
planes?  AIUI, literally every use case for planes is for folks that run split
IRQ chips.

And we should require an in-kernel local APIC to create a plane.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/29] KVM: share statistics for same vCPU id on different planes
  2025-06-06 16:23   ` Sean Christopherson
@ 2025-06-06 16:32     ` Paolo Bonzini
  0 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-06-06 16:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Fri, Jun 6, 2025 at 6:23 PM Sean Christopherson <seanjc@google.com> wrote:
> Rather than special case invidiual fields, I think we should give kvm_vcpu the
> same treatment as "struct kvm", and have kvm_vcpu represent the overall vCPU,
> with an array of planes to hold the sub-vCPUs.

Yes, I agree. This is also the direction that Roy took in
https://patchew.org/linux/cover.1726506534.git.roy.hopkins@suse.com/.
I thought it wasn't necessary, but it's bad for all the reasons you
mention before.

While he kept the struct on all planes for simplicity, something which
I stole for __stat here, the idea is the same as what you mention
below.

> Having "kvm_vcpu" represent a plane, while "kvm" represents the overall VM, is
> conceptually messy.  And more importantly, I think the approach taken here will
> be nigh impossible to maintain, and will have quite a bit of baggage.  E.g. planes1+
> will be filled with dead memory, and we also risk goofs where KVM could access
> __stat in a plane1+ vCPU.

Well, that's the reason for the __ so I don't think it's too risky -
but it's not possible to add __ to all fields of course.

Besides, if you have a zillion pointers to fields you might as well
have a single pointer to the common fields.

> Extracing fields to a separate kvm_vcpu_plane will obviously require a *lot* more
> churn, but I think in the long run it will be less work in total, because we won't
> spend as much time chasing down bugs.
>
> Very little per-plane state is in "struct kvm_vcpu", so I think we can do the big
> conversion on a per-arch basis via a small amount of #ifdefs, i.e. not be force to
> immediatedly convert all architectures to a kvm_vcpu vs. kvm_vcpu_plane world.

Roy didn't even have a struct that is per-arch and common to all
planes. He did have a flag-day conversion to add "->common"
everywhere, but I agree that it's better to add something like

struct kvm_vcpu_plane {
...
#ifndef KVM_HAS_PLANES
#include "kvm_common_fields.h"
#endif
}

#ifdef KVM_HAS_PLANES
struct kvm_vcpu {
#include "kvm_common_fields.h"
}
#else
#define kvm_vcpu kvm_vcpu_plane
#endif

Paolo


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 20/29] KVM: x86: add planes support for interrupt delivery
  2025-06-06 16:30   ` Sean Christopherson
@ 2025-06-06 16:38     ` Paolo Bonzini
  0 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2025-06-06 16:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On 6/6/25 18:30, Sean Christopherson wrote:
> On Tue, Apr 01, 2025, Paolo Bonzini wrote:
>> Plumb the destination plane into struct kvm_lapic_irq and propagate it
>> everywhere.  The in-kernel IOAPIC only targets plane 0.
> 
> Can we get more aggressive and make KVM_CREATE_IRQCHIP mutually exclusive with
> planes?  AIUI, literally every use case for planes is for folks that run split
> IRQ chips.

Maybe, but is there any added complexity other than the "= {0}" to 
initialize the new field.  Ready to be proven wrong but I do think 
that's one thing that just works.

> And we should require an in-kernel local APIC to create a plane.

I don't think it's needed either; if anything the complexity of patch 25 
isn't needed with userspace local APIC.

Paolo


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/29] KVM: VM planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (29 preceding siblings ...)
  2025-04-01 16:16 ` [RFC PATCH 00/29] KVM: VM planes Sean Christopherson
@ 2025-06-06 16:42 ` Tom Lendacky
  2025-08-07 12:34 ` Vaishali Thakkar
  31 siblings, 0 replies; 49+ messages in thread
From: Tom Lendacky @ 2025-06-06 16:42 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, ashish.kalra, michael.roth, jroedel, nsaenz,
	anelkz, James.Bottomley

On 4/1/25 11:10, Paolo Bonzini wrote:
> I guess April 1st is not the best date to send out such a large series
> after months of radio silence, but here we are.

There were some miscellaneous fixes I had to apply to get the series to
compile and start working properly. I didn't break them out by patch #,
but here they are:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 21dbc539cbe7..9d078eb001b1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1316,32 +1316,35 @@ static void kvm_lapic_deliver_interrupt(struct kvm_vcpu *vcpu, struct kvm_lapic
 {
 	struct kvm_vcpu *plane0_vcpu = vcpu->plane0;
 	struct kvm_plane *running_plane;
+	int irr_pending_planes;
 	u16 req_exit_planes;
 
 	kvm_x86_call(deliver_interrupt)(apic, delivery_mode, trig_mode, vector);
 
 	/*
-	 * test_and_set_bit implies a memory barrier, so IRR is written before
+	 * atomic_fetch_or implies a memory barrier, so IRR is written before
 	 * reading irr_pending_planes below...
 	 */
-	if (!test_and_set_bit(vcpu->plane, &plane0_vcpu->arch.irr_pending_planes)) {
-		/*
-		 * ... and also running_plane and req_exit_planes are read after writing
-		 * irr_pending_planes.  Both barriers pair with kvm_arch_vcpu_ioctl_run().
-		 */
-		smp_mb__after_atomic();
+	irr_pending_planes = atomic_fetch_or(BIT(vcpu->plane), &plane0_vcpu->arch.irr_pending_planes);
+	if (irr_pending_planes & BIT(vcpu->plane))
+		return;
 
-		running_plane = READ_ONCE(plane0_vcpu->running_plane);
-		if (!running_plane)
-			return;
+	/*
+	 * ... and also running_plane and req_exit_planes are read after writing
+	 * irr_pending_planes.  Both barriers pair with kvm_arch_vcpu_ioctl_run().
+	 */
+	smp_mb__after_atomic();
 
-		req_exit_planes = READ_ONCE(plane0_vcpu->req_exit_planes);
-		if (!(req_exit_planes & BIT(vcpu->plane)))
-			return;
+	running_plane = READ_ONCE(plane0_vcpu->running_plane);
+	if (!running_plane)
+		return;
 
-		kvm_make_request(KVM_REQ_PLANE_INTERRUPT,
-				 kvm_get_plane_vcpu(running_plane, vcpu->vcpu_id));
-	}
+	req_exit_planes = READ_ONCE(plane0_vcpu->req_exit_planes);
+	if (!(req_exit_planes & BIT(vcpu->plane)))
+		return;
+
+	kvm_make_request(KVM_REQ_PLANE_INTERRUPT,
+			 kvm_get_plane_vcpu(running_plane, vcpu->vcpu_id));
 }
 
 /*
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9d4492862c11..130d895f1d95 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -458,7 +458,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 	INIT_LIST_HEAD(&sev->mirror_vms);
 	sev->need_init = false;
 
-	kvm_set_apicv_inhibit(kvm->planes[[0], APICV_INHIBIT_REASON_SEV);
+	kvm_set_apicv_inhibit(kvm->planes[0], APICV_INHIBIT_REASON_SEV);
 
 	return 0;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 917bfe8db101..656b69eabc59 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3252,7 +3252,7 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
 	 * All vCPUs which run still run nested, will remain to have their
 	 * AVIC still inhibited due to per-cpu AVIC inhibition.
 	 */
-	kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
+	kvm_clear_apicv_inhibit(vcpu->kvm->planes[vcpu->plane], APICV_INHIBIT_REASON_IRQWIN);
 
 	++vcpu->stat->irq_window_exits;
 	return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65bc28e82140..704e8f80898f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11742,7 +11742,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 * the other side will certainly see the cleared bit irr_pending_planes
 		 * and set it, and vice versa.
 		 */
-		clear_bit(plane_id, &plane0_vcpu->arch.irr_pending_planes);
+		atomic_and(~BIT(plane_id), &plane0_vcpu->arch.irr_pending_planes);
 		smp_mb__after_atomic();
 		if (kvm_lapic_find_highest_irr(vcpu))
 			atomic_or(BIT(plane_id), &plane0_vcpu->arch.irr_pending_planes);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3a04fdf0865d..efd45e05fddf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4224,7 +4224,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm_plane *plane, struct kvm_vcpu *pl
 	 * release semantics, which ensures the write is visible to kvm_get_vcpu().
 	 */
 	vcpu->plane = -1;
-	if (plane->plane)
+	if (!plane->plane)
 		vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
 	else
 		vcpu->vcpu_idx = plane0_vcpu->vcpu_idx;
@@ -4249,7 +4249,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm_plane *plane, struct kvm_vcpu *pl
 	if (r < 0)
 		goto kvm_put_xa_erase;
 
-	if (!plane0_vcpu)
+	if (!plane->plane)
 		atomic_inc(&kvm->online_vcpus);
 
 	/*

Thanks,
Tom

> 

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 26/29] KVM: x86: enable up to 16 planes
  2025-04-01 16:11 ` [PATCH 26/29] KVM: x86: enable up to 16 planes Paolo Bonzini
@ 2025-06-06 22:41   ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-06-06 22:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> Allow up to 16 VM planes, it's a nice round number.
> 
> FIXME: online_vcpus is used by x86 code that deals with TSC synchronization.
> Maybe kvmclock should be moved to planex.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 3 +++
>  arch/x86/kvm/x86.c              | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0344e8bed319..d0cb177b6f52 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2339,6 +2339,8 @@ enum {
>  # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
>  #endif
>  
> +#define KVM_MAX_VCPU_PLANES	16

I'm pretty sure x86 can't support 16 planes.  "union kvm_mmu_page_role" needs
to incorporate the plane, otherwise per-plane memory attributes will won't work.
And adding four bits to the plane would theoretically put us in danger of
overflowing gfn_write_track (in practice, I highly, highly doubt that can happen).

Why not start with 4 planes?  Or even 2?  Expanding the number of planes should
be much easier than contracting.  Based on the VTL and VMPL roadmaps, 4 planes
will probably be enough for many years to come.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/29] KVM: move mem_attr_array to kvm_plane
  2025-04-01 16:10 ` [PATCH 06/29] KVM: move mem_attr_array to kvm_plane Paolo Bonzini
@ 2025-06-06 22:50   ` Sean Christopherson
  0 siblings, 0 replies; 49+ messages in thread
From: Sean Christopherson @ 2025-06-06 22:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, roy.hopkins, thomas.lendacky, ashish.kalra,
	michael.roth, jroedel, nsaenz, anelkz, James.Bottomley

On Tue, Apr 01, 2025, Paolo Bonzini wrote:
> Another aspect of the VM that is now different for separate planes is
> memory attributes, in order to support RWX permissions in the future.
> The existing vm-level ioctls apply to plane 0 and the underlying
> functionality operates on struct kvm_plane, which now hosts the
> mem_attr_array xarray.

...

> -bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> +bool kvm_arch_post_set_memory_attributes(struct kvm_plane *plane,
>  					 struct kvm_gfn_range *range)
>  {
> +	struct kvm *kvm = plane->kvm;
>  	unsigned long attrs = range->arg.attributes;
>  	struct kvm_memory_slot *slot = range->slot;
>  	int level;
> @@ -7767,7 +7770,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>  			 */
>  			if (gfn >= slot->base_gfn &&
>  			    gfn + nr_pages <= slot->base_gfn + slot->npages) {
> -				if (hugepage_has_attrs(kvm, slot, gfn, level, attrs))
> +				if (hugepage_has_attrs(plane, slot, gfn, level, attrs))
>  					hugepage_clear_mixed(slot, gfn, level);
>  				else
>  					hugepage_set_mixed(slot, gfn, level);

I don't see how this can possibly work.  Memslots are still per-VM, and so
setting/clearing KVM_LPAGE_MIXED_FLAG based on a givne plane's attributes will
clobber the state of the previous plane.

I think we could make this work by having a per-plane KVM_LPAGE_MIXED_FLAG?  I'm
99% certain we can use disallow_lpage[31:28], and _probably_ bits 31:16?  But I'd
rather

Note, to handle shared/private, we could make planes mutually exclusive with
tracking that state per-VM (see the many guest_memfd discussions), but unless I'm
missing something, we'll need the same logic for mixed RWX attributes, so...

Also, as mentioned in a later respone, planes need to be keyed in kvm_mmu_page_role
for this to work.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/29] KVM: VM planes
  2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
                   ` (30 preceding siblings ...)
  2025-06-06 16:42 ` Tom Lendacky
@ 2025-08-07 12:34 ` Vaishali Thakkar
  31 siblings, 0 replies; 49+ messages in thread
From: Vaishali Thakkar @ 2025-08-07 12:34 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: roy.hopkins, seanjc, thomas.lendacky, ashish.kalra, michael.roth,
	nsaenz, anelkz, James.Bottomley, Jörg Rödel

Adding Joerg's functional email id.

On 4/1/25 6:10 PM, Paolo Bonzini wrote:
> I guess April 1st is not the best date to send out such a large series
> after months of radio silence, but here we are.
> 
> AMD VMPLs, Intel TDX partitions, Microsoft Hyper-V VTLs, and ARM CCA planes.
> are all examples of virtual privilege level concepts that are exclusive to
> guests.  In all these specifications the hypervisor hosts multiple
> copies of a vCPU's register state (or at least of most of it) and provides
> hypercalls or instructions to switch between them.
> 
> This is the first draft of the implementation according to the sketch that
> was prepared last year between Linux Plumbers and KVM Forum.  The initial
> version of the API was posted last October, and the implementation only
> needed small changes.
> 
> Attempts made in the past, mostly in the context of Hyper-V VTLs and SEV-SNP
> VMPLs, fell into two categories:
> 
> - use a single vCPU file descriptor, and store multiple copies of the state
>   in a single struct kvm_vcpu.  This approach requires a lot of changes to
>   provide multiple copies of affected fields, especially MMUs and APICs;
>   and complex uAPI extensions to direct existing ioctls to a specific
>   privilege level.  While more or less workable for SEV-SNP VMPLs, that
>   was only because the copies of the register state were hidden
>   in the VMSA (KVM does not manage it); it showed all its problems when
>   applied to Hyper-V VTLs.
> 
>   The main advantage was that KVM kept the knowledge of the relationship
>   between vCPUs that have the same id but belong to different privilege
>   levels.  This is important in order to accelerate switches in-kernel.
> 
> - use multiple VM and vCPU file descriptors, and handle the switch entirely
>   in userspace.  This got gnarly pretty fast for even more reasons than
>   the previous case, for example because VMs could not share anymore
>   memslots, including dirty bitmaps and private/shared attributes (a
>   substantial problem for SEV-SNP since VMPLs share their ASID).
> 
>   Opposite to the other case, the total lack of kernel-level sharing of
>   register state, and lack of control that vCPUs do not run in parallel,
>   is what makes this approach problematic for both kernel and userspace.
>   In-kernel implementation of privilege level switch becomes from
>   complicated to impossible, and userspace needs a lot of complexity
>   as well to ensure that higher-privileged VTLs properly interrupted a
>   lower-privileged one.
> 
> This design sits squarely in the middle: it gives the initial set of
> VM and vCPU file descriptors the full set of ioctls + struct kvm_run,
> whereas other privilege levels ("planes") instead only support a small
> part of the KVM API.  In fact for the vm file descriptor it is only three
> ioctls: KVM_CHECK_EXTENSION, KVM_SIGNAL_MSI, KVM_SET_MEMORY_ATTRIBUTES.
> For vCPUs it is basically KVM_GET/SET_*.
> 
> Most notably, memslots and KVM_RUN are *not* included (the choice of
> which plane to run is done via vcpu->run), which solves a lot of
> the problems in both of the previous approaches.  Compared to the
> multiple-file-descriptors solution, it gets for free the ability to
> avoid parallel execution of the same vCPUs in different privilege levels.
> Compared to having a single file descriptor churn is more limited, or
> at least can be attacked in small bites.  For example in this series
> only per-plane interrupt controllers are switched to use the new struct
> kvm_plane in place of struct kvm, and that's more or less enough in
> the absence of complex interrupt delivery scenarios.
> 
> Changes to the userspace API are also relatively small; they boil down
> to the introduction of a single new kind of file descriptor and almost
> entirely fit in common code.  Reviewing these VM-wide and architecture-
> independent changes should be the main purpose of this RFC, since 
> there are still some things to fix:
> 
> - I named some fields "plane" instead of "plane_id" because I expected no
>   fields of type struct kvm_plane*, but in retrospect that wasn't a great
>   idea.
> 
> - online_vcpus counts across all planes but x86 code is still using it to
>   deal with TSC synchronization.  Probably I will try and make kvmclock
>   synchronization per-plane instead of per-VM.
> 

Hi Paolo,

Is there still a plan to make kvmclock synchronization per-plane instead
of per-VM? Do you plan to handle it as part of this patchset or you
think it should be handled separately on top of this patchset?

I'm asking as coconut-svsm needs a monotonic clock source which adheres
to wall-clock time. And we have been exploring several approaches to
achieve this. One of the idea is to use kvmclock, provided it can
support a per-plane instance that remains synchronized across planes.

Thanks.


> - we're going to need a struct kvm_vcpu_plane similar to what Roy had in
>   https://lore.kernel.org/kvm/cover.1726506534.git.roy.hopkins@suse.com/
>   (probably smaller though).  Requests are per-plane for example, and I'm
>   pretty sure any simplistic solution would have some corner cases where
>   it's wrong; but it's a high churn change and I wanted to avoid that
>   for this first posting.
> 
> There's a handful of locking TODOs where things should be checked more
> carefully, but clearly identifying vCPU data that is not per-plane will
> also simplify locking, thanks to having a single vcpu->mutex for the
> whole plane.  So I'm not particularly worried about that; the TDX saga
> hopefully has taught everyone to move in baby steps towards the intended
> direction.
> 
> The handling of interrupt priorities is way more complicated than I
> anticipated, unfortunately; everything else seems to fall into place
> decently well---even taking into account the above incompleteness,
> which anyway should not be a blocker for any VTL or VMPL experiments.
> But do shout if anything makes you feel like I was too lazy, and/or you
> want to puke.
> 
> Patches 1-2 are documentation and uAPI definitions.
> 
> Patches 3-9 are the common code for VM planes, while patches 10-14
> are the common code for vCPU file descriptors on non-default planes.
> 
> Patches 15-26 are the x86-specific code, which is organized as follows:
> 
> - 15-20: convert APIC code to place its data in the new struct
> kvm_arch_plane instead of struct kvm_arch.
> 
> - 21-24: everything else except the new userspace exit, KVM_EXIT_PLANE_EVENT
> 
> - 25: KVM_EXIT_PLANE_EVENT, which is used when one plane interrupts another.
> 
> - 26: finally make the capability available to userspace
> 
> Patches 27-29 finally are the testcases.  More are possible and planned,
> but these are enough to say that, despite the missing bits, what exits
> is not _completely_ broken.  I also didn't want to write dozens of tests
> before committing to a selftests API.
> 
> Available for now at https://git.kernel.org/pub/scm/virt/kvm/kvm.git
> branch planes-20250401.  I plan to place it in kvm-coco-queue, for lack
> of a better place, as soon as TDX is merged into kvm/next and I test it
> with the usual battery of kvm-unit-tests and real world guests.
> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (29):
>   Documentation: kvm: introduce "VM plane" concept
>   KVM: API definitions for plane userspace exit
>   KVM: add plane info to structs
>   KVM: introduce struct kvm_arch_plane
>   KVM: add plane support to KVM_SIGNAL_MSI
>   KVM: move mem_attr_array to kvm_plane
>   KVM: do not use online_vcpus to test vCPU validity
>   KVM: move vcpu_array to struct kvm_plane
>   KVM: implement plane file descriptors ioctl and creation
>   KVM: share statistics for same vCPU id on different planes
>   KVM: anticipate allocation of dirty ring
>   KVM: share dirty ring for same vCPU id on different planes
>   KVM: implement vCPU creation for extra planes
>   KVM: pass plane to kvm_arch_vcpu_create
>   KVM: x86: pass vcpu to kvm_pv_send_ipi()
>   KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit
>   KVM: x86: block creating irqchip if planes are active
>   KVM: x86: track APICv inhibits per plane
>   KVM: x86: move APIC map to kvm_arch_plane
>   KVM: x86: add planes support for interrupt delivery
>   KVM: x86: add infrastructure to share FPU across planes
>   KVM: x86: implement initial plane support
>   KVM: x86: extract kvm_post_set_cpuid
>   KVM: x86: initialize CPUID for non-default planes
>   KVM: x86: handle interrupt priorities for planes
>   KVM: x86: enable up to 16 planes
>   selftests: kvm: introduce basic test for VM planes
>   selftests: kvm: add plane infrastructure
>   selftests: kvm: add x86-specific plane test
> 
>  Documentation/virt/kvm/api.rst                | 245 +++++++--
>  Documentation/virt/kvm/locking.rst            |   3 +
>  Documentation/virt/kvm/vcpu-requests.rst      |   7 +
>  arch/arm64/include/asm/kvm_host.h             |   5 +
>  arch/arm64/kvm/arm.c                          |   4 +-
>  arch/arm64/kvm/handle_exit.c                  |   6 +-
>  arch/arm64/kvm/hyp/nvhe/gen-hyprel.c          |   4 +-
>  arch/arm64/kvm/mmio.c                         |   4 +-
>  arch/loongarch/include/asm/kvm_host.h         |   5 +
>  arch/loongarch/kvm/exit.c                     |   8 +-
>  arch/loongarch/kvm/vcpu.c                     |   4 +-
>  arch/mips/include/asm/kvm_host.h              |   5 +
>  arch/mips/kvm/emulate.c                       |   2 +-
>  arch/mips/kvm/mips.c                          |  32 +-
>  arch/mips/kvm/vz.c                            |  18 +-
>  arch/powerpc/include/asm/kvm_host.h           |   5 +
>  arch/powerpc/kvm/book3s.c                     |   2 +-
>  arch/powerpc/kvm/book3s_hv.c                  |  46 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c          |   8 +-
>  arch/powerpc/kvm/book3s_pr.c                  |  22 +-
>  arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
>  arch/powerpc/kvm/powerpc.c                    |   6 +-
>  arch/powerpc/kvm/timing.h                     |  28 +-
>  arch/riscv/include/asm/kvm_host.h             |   5 +
>  arch/riscv/kvm/vcpu.c                         |   4 +-
>  arch/riscv/kvm/vcpu_exit.c                    |  10 +-
>  arch/riscv/kvm/vcpu_insn.c                    |  16 +-
>  arch/riscv/kvm/vcpu_sbi.c                     |   2 +-
>  arch/riscv/kvm/vcpu_sbi_hsm.c                 |   2 +-
>  arch/s390/include/asm/kvm_host.h              |   5 +
>  arch/s390/kvm/diag.c                          |  18 +-
>  arch/s390/kvm/intercept.c                     |  20 +-
>  arch/s390/kvm/interrupt.c                     |  48 +-
>  arch/s390/kvm/kvm-s390.c                      |  10 +-
>  arch/s390/kvm/priv.c                          |  60 +--
>  arch/s390/kvm/sigp.c                          |  50 +-
>  arch/s390/kvm/vsie.c                          |   2 +-
>  arch/x86/include/asm/kvm_host.h               |  46 +-
>  arch/x86/kvm/cpuid.c                          |  57 +-
>  arch/x86/kvm/cpuid.h                          |   2 +
>  arch/x86/kvm/debugfs.c                        |   2 +-
>  arch/x86/kvm/hyperv.c                         |   7 +-
>  arch/x86/kvm/i8254.c                          |   7 +-
>  arch/x86/kvm/ioapic.c                         |   4 +-
>  arch/x86/kvm/irq_comm.c                       |  14 +-
>  arch/x86/kvm/kvm_cache_regs.h                 |   4 +-
>  arch/x86/kvm/lapic.c                          | 147 +++--
>  arch/x86/kvm/mmu/mmu.c                        |  41 +-
>  arch/x86/kvm/mmu/tdp_mmu.c                    |   2 +-
>  arch/x86/kvm/svm/sev.c                        |   4 +-
>  arch/x86/kvm/svm/svm.c                        |  21 +-
>  arch/x86/kvm/vmx/tdx.c                        |   8 +-
>  arch/x86/kvm/vmx/vmx.c                        |  20 +-
>  arch/x86/kvm/x86.c                            | 319 ++++++++---
>  arch/x86/kvm/xen.c                            |   1 +
>  include/linux/kvm_host.h                      | 130 +++--
>  include/linux/kvm_types.h                     |   1 +
>  include/uapi/linux/kvm.h                      |  28 +-
>  tools/testing/selftests/kvm/Makefile.kvm      |   2 +
>  .../testing/selftests/kvm/include/kvm_util.h  |  48 ++
>  .../selftests/kvm/include/x86/processor.h     |   1 +
>  tools/testing/selftests/kvm/lib/kvm_util.c    |  65 ++-
>  .../testing/selftests/kvm/lib/x86/processor.c |  15 +
>  tools/testing/selftests/kvm/plane_test.c      | 103 ++++
>  tools/testing/selftests/kvm/x86/plane_test.c  | 270 ++++++++++
>  virt/kvm/dirty_ring.c                         |   5 +-
>  virt/kvm/guest_memfd.c                        |   3 +-
>  virt/kvm/irqchip.c                            |   5 +-
>  virt/kvm/kvm_main.c                           | 500 ++++++++++++++----
>  69 files changed, 1991 insertions(+), 614 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/plane_test.c
>  create mode 100644 tools/testing/selftests/kvm/x86/plane_test.c
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2025-08-07 12:34 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-01 16:10 [RFC PATCH 00/29] KVM: VM planes Paolo Bonzini
2025-04-01 16:10 ` [PATCH 01/29] Documentation: kvm: introduce "VM plane" concept Paolo Bonzini
2025-04-21 18:43   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 02/29] KVM: API definitions for plane userspace exit Paolo Bonzini
2025-06-04  0:10   ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 03/29] KVM: add plane info to structs Paolo Bonzini
2025-04-21 18:57   ` Tom Lendacky
2025-04-21 19:04   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 04/29] KVM: introduce struct kvm_arch_plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 05/29] KVM: add plane support to KVM_SIGNAL_MSI Paolo Bonzini
2025-04-01 16:10 ` [PATCH 06/29] KVM: move mem_attr_array to kvm_plane Paolo Bonzini
2025-06-06 22:50   ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 07/29] KVM: do not use online_vcpus to test vCPU validity Paolo Bonzini
2025-06-05 22:45   ` Sean Christopherson
2025-06-06 13:49     ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 08/29] KVM: move vcpu_array to struct kvm_plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 09/29] KVM: implement plane file descriptors ioctl and creation Paolo Bonzini
2025-04-21 20:32   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 10/29] KVM: share statistics for same vCPU id on different planes Paolo Bonzini
2025-06-06 16:23   ` Sean Christopherson
2025-06-06 16:32     ` Paolo Bonzini
2025-04-01 16:10 ` [PATCH 11/29] KVM: anticipate allocation of dirty ring Paolo Bonzini
2025-04-01 16:10 ` [PATCH 12/29] KVM: share dirty ring for same vCPU id on different planes Paolo Bonzini
2025-04-21 21:51   ` Tom Lendacky
2025-04-01 16:10 ` [PATCH 13/29] KVM: implement vCPU creation for extra planes Paolo Bonzini
2025-04-21 22:08   ` Tom Lendacky
2025-06-05 22:47     ` Sean Christopherson
2025-04-01 16:10 ` [PATCH 14/29] KVM: pass plane to kvm_arch_vcpu_create Paolo Bonzini
2025-04-01 16:10 ` [PATCH 15/29] KVM: x86: pass vcpu to kvm_pv_send_ipi() Paolo Bonzini
2025-04-01 16:10 ` [PATCH 16/29] KVM: x86: split "if" in __kvm_set_or_clear_apicv_inhibit Paolo Bonzini
2025-04-01 16:10 ` [PATCH 17/29] KVM: x86: block creating irqchip if planes are active Paolo Bonzini
2025-04-01 16:10 ` [PATCH 18/29] KVM: x86: track APICv inhibits per plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 19/29] KVM: x86: move APIC map to kvm_arch_plane Paolo Bonzini
2025-04-01 16:10 ` [PATCH 20/29] KVM: x86: add planes support for interrupt delivery Paolo Bonzini
2025-06-06 16:30   ` Sean Christopherson
2025-06-06 16:38     ` Paolo Bonzini
2025-04-01 16:10 ` [PATCH 21/29] KVM: x86: add infrastructure to share FPU across planes Paolo Bonzini
2025-04-01 16:10 ` [PATCH 22/29] KVM: x86: implement initial plane support Paolo Bonzini
2025-04-01 16:11 ` [PATCH 23/29] KVM: x86: extract kvm_post_set_cpuid Paolo Bonzini
2025-04-01 16:11 ` [PATCH 24/29] KVM: x86: initialize CPUID for non-default planes Paolo Bonzini
2025-04-01 16:11 ` [PATCH 25/29] KVM: x86: handle interrupt priorities for planes Paolo Bonzini
2025-04-01 16:11 ` [PATCH 26/29] KVM: x86: enable up to 16 planes Paolo Bonzini
2025-06-06 22:41   ` Sean Christopherson
2025-04-01 16:11 ` [PATCH 27/29] selftests: kvm: introduce basic test for VM planes Paolo Bonzini
2025-04-01 16:11 ` [PATCH 28/29] selftests: kvm: add plane infrastructure Paolo Bonzini
2025-04-01 16:11 ` [PATCH 29/29] selftests: kvm: add x86-specific plane test Paolo Bonzini
2025-04-01 16:16 ` [RFC PATCH 00/29] KVM: VM planes Sean Christopherson
2025-06-06 16:42 ` Tom Lendacky
2025-08-07 12:34 ` Vaishali Thakkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).