public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] KVM: nVMX: Improve performance for unmanaged guest memory
@ 2025-11-18 17:11 griffoul
  2025-11-18 17:11 ` [PATCH v2 01/10] KVM: nVMX: Implement cache for L1 MSR bitmap griffoul
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: griffoul @ 2025-11-18 17:11 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, pbonzini, vkuznets, shuah, dwmw, linux-kselftest,
	linux-kernel, Fred Griffoul

From: Fred Griffoul <fgriffo@amazon.co.uk>

This patch series addresses both performance and correctness issues in
nested VMX when handling guest memory.

During nested VMX operations, L0 (KVM) accesses specific L1 guest pages
to manage L2 execution. These pages fall into two categories: pages
accessed only by L0 (such as the L1 MSR bitmap page or the eVMCS page),
and pages passed to the L2 guest via vmcs02 (such as APIC access,
virtual APIC, and posted interrupt descriptor pages).

The current implementation uses kvm_vcpu_map/unmap, which causes two
issues.

First, the current approach is missing proper invalidation handling in
critical scenarios. Enlightened VMCS (eVMCS) pages can become stale when
memslots are modified, as there is no mechanism to invalidate the cached
mappings. Similarly, APIC access and virtual APIC pages can be migrated
by the host, but without proper notification through mmu_notifier
callbacks, the mappings become invalid and can lead to incorrect
behavior.

Second, for unmanaged guest memory (memory not directly mapped by the
kernel, such as memory passed with the mem= parameter or guest_memfd for
non-CoCo VMs), this workflow invokes expensive memremap/memunmap
operations on every L2 VM entry/exit cycle. This creates significant
overhead that impacts nested virtualization performance.

This series replaces kvm_host_map with gfn_to_pfn_cache in nested VMX.
The pfncache infrastructure maintains persistent mappings as long as the
page GPA does not change, eliminating the memremap/memunmap overhead on
every VM entry/exit cycle. Additionally, pfncache provides proper
invalidation handling via mmu_notifier callbacks and memslots generation
check, ensuring that mappings are correctly updated during both memslot
updates and page migration events.

As an example, a microbenchmark using memslot_perf_test with 8192
memslots demonstrates huge improvements in nested VMX operations with
unmanaged guest memory:

                        Before          After           Improvement
  map:                  26.12s          1.54s           ~17x faster
  unmap:                40.00s          0.017s          ~2353x faster
  unmap chunked:        10.07s          0.005s          ~2014x faster

The series is organized as follows:

Patches 1-5 handle the L1 MSR bitmap page and system pages (APIC access,
virtual APIC, and posted interrupt descriptor). Patch 1 converts the MSR
bitmap to use gfn_to_pfn_cache. Patches 2-3 restore and complete
"guest-uses-pfn" support in pfncache. Patch 4 converts the system pages
to use gfn_to_pfn_cache. Patch 5 adds a selftest for cache invalidation
and memslot updates.

Patches 6-7 add enlightened VMCS support. Patch 6 avoids accessing eVMCS
fields after they are copied into the cached vmcs12 structure. Patch 7
converts eVMCS page mapping to use gfn_to_pfn_cache.

Patches 8-10 implement persistent nested context to handle L2 vCPU
multiplexing and migration between L1 vCPUs. Patch 8 introduces the
nested context management infrastructure. Patch 9 integrates pfncache
with persistent nested context. Patch 10 adds a selftest for this L2
vCPU context switching.

v2:
  - Extended series to support enlightened VMCS (eVMCS).
  - Added persistent nested context for improved L2 vCPU handling.
  - Added additional selftests.

Suggested-by: dwmw@amazon.co.uk


Fred Griffoul (10):
  KVM: nVMX: Implement cache for L1 MSR bitmap
  KVM: pfncache: Restore guest-uses-pfn support
  KVM: x86: Add nested state validation for pfncache support
  KVM: nVMX: Implement cache for L1 APIC pages
  KVM: selftests: Add nested VMX APIC cache invalidation test
  KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry
  KVM: nVMX: Replace evmcs kvm_host_map with pfncache
  KVM: x86: Add nested context management
  KVM: nVMX: Use nested context for pfncache persistence
  KVM: selftests: Add L2 vcpu context switch test

 arch/x86/include/asm/kvm_host.h               |  32 ++
 arch/x86/include/uapi/asm/kvm.h               |   2 +
 arch/x86/kvm/Makefile                         |   2 +-
 arch/x86/kvm/nested.c                         | 199 ++++++++
 arch/x86/kvm/vmx/hyperv.c                     |   5 +-
 arch/x86/kvm/vmx/hyperv.h                     |  33 +-
 arch/x86/kvm/vmx/nested.c                     | 463 ++++++++++++++----
 arch/x86/kvm/vmx/vmx.c                        |   8 +
 arch/x86/kvm/vmx/vmx.h                        |  16 +-
 arch/x86/kvm/x86.c                            |  19 +-
 include/linux/kvm_host.h                      |  34 +-
 include/linux/kvm_types.h                     |   1 +
 tools/testing/selftests/kvm/Makefile.kvm      |   2 +
 .../selftests/kvm/x86/vmx_apic_update_test.c  | 302 ++++++++++++
 .../selftests/kvm/x86/vmx_l2_switch_test.c    | 416 ++++++++++++++++
 virt/kvm/kvm_main.c                           |   3 +-
 virt/kvm/kvm_mm.h                             |   6 +-
 virt/kvm/pfncache.c                           |  43 +-
 18 files changed, 1467 insertions(+), 119 deletions(-)
 create mode 100644 arch/x86/kvm/nested.c
 create mode 100644 tools/testing/selftests/kvm/x86/vmx_apic_update_test.c
 create mode 100644 tools/testing/selftests/kvm/x86/vmx_l2_switch_test.c

--
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-11-21 21:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-18 17:11 [PATCH v2 00/10] KVM: nVMX: Improve performance for unmanaged guest memory griffoul
2025-11-18 17:11 ` [PATCH v2 01/10] KVM: nVMX: Implement cache for L1 MSR bitmap griffoul
2025-11-18 17:11 ` [PATCH v2 02/10] KVM: pfncache: Restore guest-uses-pfn support griffoul
2025-11-18 17:11 ` [PATCH v2 03/10] KVM: x86: Add nested state validation for pfncache support griffoul
2025-11-18 17:11 ` [PATCH v2 04/10] KVM: nVMX: Implement cache for L1 APIC pages griffoul
2025-11-18 17:11 ` [PATCH v2 05/10] KVM: selftests: Add nested VMX APIC cache invalidation test griffoul
2025-11-18 17:11 ` [PATCH v2 06/10] KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry griffoul
2025-11-18 17:11 ` [PATCH v2 07/10] KVM: nVMX: Replace evmcs kvm_host_map with pfncache griffoul
2025-11-20 12:52   ` kernel test robot
2025-11-18 17:11 ` [PATCH v2 08/10] KVM: x86: Add nested context management griffoul
2025-11-20 21:35   ` kernel test robot
2025-11-21 21:34   ` kernel test robot
2025-11-18 17:11 ` [PATCH v2 09/10] KVM: nVMX: Use nested context for pfncache persistence griffoul
2025-11-18 17:11 ` [PATCH v2 10/10] KVM: selftests: Add L2 vcpu context switch test griffoul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox