kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] KVM: x86: Changes for 6.17
@ 2025-07-25 22:07 Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Local APIC refactoring " Sean Christopherson
                   ` (12 more replies)
  0 siblings, 13 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

As promised, the storm has arrived :-)

There are a two anomalies this time around, but thankfully only one conflict,
and a trivial one at that (details on that in the MMIO Stale Data pull request).

1. The "no assignment" pull request depends on the IRQs and MMIO Stale Data
   pull requests.  I created the topic branch based on the IRQs branch (minus
   one commit that came in later), and then merged in the MMIO branch to create
   a common base.  All the commits came out as I wanted, but the diff stats
   generated by `git request-pull` are funky, so I doctored them up, a lot.

2. The "SEV cache maintenance" pull request is based on a tag/branch from the
   tip tree.  I don't think you need to do anything special here?  Except
   possibly mention it to Linus if the KVM pull request happens to get sent
   before the associated tip pull request (which seems unlikely given how they
   send a bunch of small pulls).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: Local APIC refactoring for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: Dirty Ring changes " Sean Christopherson
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Move most of KVM's local APIC helpers to common x86 (with new names that are
better suited for global symbols) so that they can be shared by Secure AVIC
(guest side) support[*].  The actual Secure AVIC support is likely destined
for 6.18 or later, and will go through the tip tree (shouldn't have to touch
anything KVM related).

[*] https://lore.kernel.org/all/20250709033242.267892-1-Neeraj.Upadhyay@amd.com

The following changes since commit d7b8f8e20813f0179d8ef519541a3527e7661d3a:

  Linux 6.16-rc5 (2025-07-06 14:10:26 -0700)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-apic-6.17

for you to fetch changes up to b95a9d313642c9f3abebb77a04b41bb7bdd0feef:

  x86/apic: Rename 'reg_off' to 'reg' (2025-07-10 09:44:44 -0700)

----------------------------------------------------------------
KVM local APIC changes for 6.17

Extract many of KVM's helpers for accessing architectural local APIC state
to common x86 so that they can be shared by guest-side code for Secure AVIC.

----------------------------------------------------------------
Neeraj Upadhyay (13):
      KVM: x86: Open code setting/clearing of bits in the ISR
      KVM: x86: Remove redundant parentheses around 'bitmap'
      KVM: x86: Rename VEC_POS/REG_POS macro usages
      KVM: x86: Change lapic regs base address to void pointer
      KVM: x86: Rename find_highest_vector()
      KVM: x86: Rename lapic get/set_reg() helpers
      KVM: x86: Rename lapic get/set_reg64() helpers
      KVM: x86: Rename lapic set/clear vector helpers
      x86/apic: KVM: Move apic_find_highest_vector() to a common header
      x86/apic: KVM: Move lapic get/set helpers to common code
      x86/apic: KVM: Move lapic set/clear_vector() helpers to common code
      x86/apic: KVM: Move apic_test)vector() to common code
      x86/apic: Rename 'reg_off' to 'reg'

Sean Christopherson (1):
      x86/apic: KVM: Deduplicate APIC vector => register+bit math

 arch/x86/include/asm/apic.h | 66 +++++++++++++++++++++++++++++-
 arch/x86/kvm/lapic.c        | 97 ++++++++++++---------------------------------
 arch/x86/kvm/lapic.h        | 24 ++---------
 3 files changed, 94 insertions(+), 93 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: Dirty Ring changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Local APIC refactoring " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: Generic " Sean Christopherson
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

A set of Dirty Ring changes to fix a flaw where a misbehaving userspace can
induce a soft lockup, along with general hardening and cleanups.

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-dirty_ring-6.17

for you to fetch changes up to 614fb9d1479b1d90721ca70da8b7c55f69fe9ad2:

  KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings (2025-06-20 13:41:04 -0700)

----------------------------------------------------------------
KVM Dirty Ring changes for 6.17

Fix issues with dirty ring harvesting where KVM doesn't bound the processing
of entries in any way, which allows userspace to keep KVM in a tight loop
indefinitely.  Clean up code and comments along the way.

----------------------------------------------------------------
Sean Christopherson (6):
      KVM: Bound the number of dirty ring entries in a single reset at INT_MAX
      KVM: Bail from the dirty ring reset flow if a signal is pending
      KVM: Conditionally reschedule when resetting the dirty ring
      KVM: Check for empty mask of harvested dirty ring entries in caller
      KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
      KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings

 include/linux/kvm_dirty_ring.h |  18 ++-----
 virt/kvm/dirty_ring.c          | 111 +++++++++++++++++++++++++++++------------
 virt/kvm/kvm_main.c            |   9 ++--
 3 files changed, 89 insertions(+), 49 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: Generic changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Local APIC refactoring " Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: Dirty Ring changes " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: IRQ " Sean Christopherson
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

A few one-off changes that didn't have a better home :-)

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-generic-6.17

for you to fetch changes up to 87d4fbf4a387f207d6906806ef6bf5c8eb289bd7:

  KVM: guest_memfd: Remove redundant kvm_gmem_getattr implementation (2025-06-25 13:42:33 -0700)

----------------------------------------------------------------
KVM generic changes for 6.17

 - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related
   to private <=> shared memory conversions.

 - Drop guest_memfd's .getattr() implementation as the VFS layer will call
   generic_fillattr() if inode_operations.getattr is NULL.

----------------------------------------------------------------
Liam Merwick (2):
      KVM: Add trace_kvm_vm_set_mem_attributes()
      KVM: fix typo in kvm_vm_set_mem_attributes() comment

Shivank Garg (1):
      KVM: guest_memfd: Remove redundant kvm_gmem_getattr implementation

 include/trace/events/kvm.h | 27 +++++++++++++++++++++++++++
 virt/kvm/guest_memfd.c     | 11 -----------
 virt/kvm/kvm_main.c        |  4 +++-
 3 files changed, 30 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: IRQ changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: Generic " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Misc " Sean Christopherson
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Pretty please, with a cherry on top and a pile of cash (or crypto) buried
underneath if need be, pull a pile of IRQ related changes.  The general theme
is to clean up, harden, and improve documentation for all things related to
device posted IRQs.

All non-KVM changes (irqbypass, AMD IOMMU, and sched/wait) have been acked
and/or reviewed by their respective owners (some of the IOMMU changes lack
explicit tags, but unless I grossly misread things, they're good to go).

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-irqs-6.17

for you to fetch changes up to 81bf24f1ac77029bf858c0da081088eb62b1b230:

  KVM: selftests: Add CONFIG_EVENTFD for irqfd selftest (2025-07-10 06:20:20 -0700)

----------------------------------------------------------------
KVM IRQ changes for 6.17

 - Rework irqbypass to track/match producers and consumers via an xarray
   instead of a linked list.  Using a linked list leads to O(n^2) insertion
   times, which is hugely problematic for use cases that create large numbers
   of VMs.  Such use cases typically don't actually use irqbypass, but
   eliminating the pointless registration is a future problem to solve as it
   likely requires new uAPI.

 - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
   to avoid making a simple concept unnecessarily difficult to understand.

 - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
   and PIT emulation at compile time.

 - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.

 - Fix a variety of flaws and bugs in the AVIC device posted IRQ code.

 - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
   supports) instead of rejecting vCPU creation.

 - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
   clear in the vCPU's physical ID table entry.

 - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
   erratum #1235, to allow (safely) enabling AVIC on such CPUs.

 - Dedup x86's device posted IRQ code, as the vast majority of functionality
   can be shared verbatime between SVM and VMX.

 - Harden the device posted IRQ code against bugs and runtime errors.

 - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
   instead of O(n).

 - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
   only if KVM needs a notification in order to wake the vCPU.

 - Decouple device posted IRQs from VFIO device assignment, as binding a VM to
   a VFIO group is not a requirement for enabling device posted IRQs.

 - Clean up and document/comment the irqfd assignment code.

 - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
   ensure an eventfd is bound to at most one irqfd through the entire host,
   and add a selftest to verify eventfd:irqfd bindings are globally unique.

----------------------------------------------------------------
Mark Brown (1):
      KVM: selftests: Add CONFIG_EVENTFD for irqfd selftest

Maxim Levitsky (2):
      KVM: SVM: Add enable_ipiv param, never set IsRunning if disabled
      KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235

Sean Christopherson (98):
      KVM: arm64: WARN if unmapping a vLPI fails in any path
      irqbypass: Drop pointless and misleading THIS_MODULE get/put
      irqbypass: Drop superfluous might_sleep() annotations
      irqbypass: Take ownership of producer/consumer token tracking
      irqbypass: Explicitly track producer and consumer bindings
      irqbypass: Use paired consumer/producer to disconnect during unregister
      irqbypass: Use guard(mutex) in lieu of manual lock+unlock
      irqbypass: Use xarray to track producers and consumers
      irqbypass: Require producers to pass in Linux IRQ number during registration
      KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update()
      KVM: x86: Drop superfluous kvm_set_pic_irq() => kvm_pic_set_irq() wrapper
      KVM: x86: Drop superfluous kvm_set_ioapic_irq() => kvm_ioapic_set_irq() wrapper
      KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper
      KVM: x86: Move PIT ioctl helpers to i8254.c
      KVM: x86: Move KVM_{GET,SET}_IRQCHIP ioctl helpers to irq.c
      KVM: x86: Rename irqchip_kernel() to irqchip_full()
      KVM: x86: Move kvm_setup_default_irq_routing() into irq.c
      KVM: x86: Move kvm_{request,free}_irq_source_id() to i8254.c (PIT)
      KVM: x86: Hardcode the PIT IRQ source ID to '2'
      KVM: x86: Don't clear PIT's IRQ line status when destroying PIT
      KVM: x86: Explicitly check for in-kernel PIC when getting ExtINT
      KVM: Move x86-only tracepoints to x86's trace.h
      KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
      KVM: Squash two CONFIG_HAVE_KVM_IRQCHIP #ifdefs into one
      KVM: selftests: Fall back to split IRQ chip if full in-kernel chip is unsupported
      KVM: x86: Move IRQ mask notifier infrastructure to I/O APIC emulation
      KVM: x86: Fold irq_comm.c into irq.c
      KVM: Pass new routing entries and irqfd when updating IRTEs
      KVM: SVM: Track per-vCPU IRTEs using kvm_kernel_irqfd structure
      KVM: SVM: Delete IRTE link from previous vCPU before setting new IRTE
      iommu/amd: KVM: SVM: Delete now-unused cached/previous GA tag fields
      KVM: SVM: Delete IRTE link from previous vCPU irrespective of new routing
      KVM: SVM: Drop pointless masking of default APIC base when setting V_APIC_BAR
      KVM: SVM: Drop pointless masking of kernel page pa's with AVIC HPA masks
      KVM: SVM: Add helper to deduplicate code for getting AVIC backing page
      KVM: SVM: Drop vcpu_svm's pointless avic_backing_page field
      KVM: SVM: Inhibit AVIC if ID is too big instead of rejecting vCPU creation
      KVM: SVM: Drop redundant check in AVIC code on ID during vCPU creation
      KVM: SVM: Track AVIC tables as natively sized pointers, not "struct pages"
      KVM: SVM: Drop superfluous "cache" of AVIC Physical ID entry pointer
      KVM: VMX: Move enable_ipiv knob to common x86
      KVM: VMX: Suppress PI notifications whenever the vCPU is put
      KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blocking
      iommu/amd: KVM: SVM: Use pi_desc_addr to derive ga_root_ptr
      iommu/amd: KVM: SVM: Pass NULL @vcpu_info to indicate "not guest mode"
      KVM: SVM: Stop walking list of routing table entries when updating IRTE
      KVM: VMX: Stop walking list of routing table entries when updating IRTE
      KVM: SVM: Extract SVM specific code out of get_pi_vcpu_info()
      KVM: x86: Move IRQ routing/delivery APIs from x86.c => irq.c
      KVM: x86: Nullify irqfd->producer after updating IRTEs
      KVM: x86: Dedup AVIC vs. PI code for identifying target vCPU
      KVM: x86: Move posted interrupt tracepoint to common code
      KVM: SVM: Clean up return handling in avic_pi_update_irte()
      iommu: KVM: Split "struct vcpu_data" into separate AMD vs. Intel structs
      KVM: Don't WARN if updating IRQ bypass route fails
      KVM: Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing()
      KVM: x86: Track irq_bypass_vcpu in common x86 code
      KVM: x86: Skip IOMMU IRTE updates if there's no old or new vCPU being targeted
      KVM: x86: Don't update IRTE entries when old and new routes were !MSI
      KVM: SVM: Revert IRTE to legacy mode if IOMMU doesn't provide IR metadata
      KVM: SVM: Take and hold ir_list_lock across IRTE updates in IOMMU
      iommu/amd: Document which IRTE fields amd_iommu_update_ga() can modify
      iommu/amd: KVM: SVM: Infer IsRun from validity of pCPU destination
      iommu/amd: Factor out helper for manipulating IRTE GA/CPU info
      iommu/amd: KVM: SVM: Set pCPU info in IRTE when setting vCPU affinity
      iommu/amd: KVM: SVM: Add IRTE metadata to affined vCPU's list if AVIC is inhibited
      KVM: SVM: Don't check for assigned device(s) when updating affinity
      KVM: SVM: Don't check for assigned device(s) when activating AVIC
      KVM: SVM: WARN if (de)activating guest mode in IOMMU fails
      KVM: SVM: Process all IRTEs on affinity change even if one update fails
      KVM: SVM: WARN if updating IRTE GA fields in IOMMU fails
      KVM: x86: Drop superfluous "has assigned device" check in kvm_pi_update_irte()
      KVM: x86: WARN if IRQ bypass isn't supported in kvm_pi_update_irte()
      KVM: x86: WARN if IRQ bypass routing is updated without in-kernel local APIC
      KVM: SVM: WARN if ir_list is non-empty at vCPU free
      KVM: x86: Decouple device assignment from IRQ bypass
      KVM: VMX: WARN if VT-d Posted IRQs aren't possible when starting IRQ bypass
      KVM: SVM: Use vcpu_idx, not vcpu_id, for GA log tag/metadata
      iommu/amd: WARN if KVM calls GA IRTE helpers without virtual APIC support
      KVM: SVM: Fold avic_set_pi_irte_mode() into its sole caller
      KVM: SVM: Don't check vCPU's blocking status when toggling AVIC on/off
      KVM: SVM: Consolidate IRTE update when toggling AVIC on/off
      iommu/amd: KVM: SVM: Allow KVM to control need for GA log interrupts
      KVM: SVM: Generate GA log IRQs only if the associated vCPUs is blocking
      KVM: x86: Rename kvm_set_msi_irq() => kvm_msi_to_lapic_irq()
      KVM: Use a local struct to do the initial vfs_poll() on an irqfd
      KVM: Acquire SCRU lock outside of irqfds.lock during assignment
      KVM: Initialize irqfd waitqueue callback when adding to the queue
      KVM: Add irqfd to KVM's list via the vfs_poll() callback
      KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock
      sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority()
      xen: privcmd: Don't mark eventfd waiter as EXCLUSIVE
      sched/wait: Add a waitqueue helper for fully exclusive priority waiters
      KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter
      KVM: Drop sanity check that per-VM list of irqfds is unique
      KVM: selftests: Assert that eventfd() succeeds in Xen shinfo test
      KVM: selftests: Add utilities to create eventfds and do KVM_IRQFD
      KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements

 arch/arm64/kvm/arm.c                              |  20 +-
 arch/arm64/kvm/vgic/vgic-its.c                    |   2 +-
 arch/arm64/kvm/vgic/vgic-v4.c                     |  10 +-
 arch/x86/include/asm/irq_remapping.h              |  17 +-
 arch/x86/include/asm/kvm-x86-ops.h                |   2 +-
 arch/x86/include/asm/kvm_host.h                   |  45 +-
 arch/x86/include/asm/svm.h                        |  13 +-
 arch/x86/kvm/Kconfig                              |  10 +
 arch/x86/kvm/Makefile                             |   7 +-
 arch/x86/kvm/hyperv.c                             |  10 +-
 arch/x86/kvm/hyperv.h                             |   3 +-
 arch/x86/kvm/i8254.c                              |  90 ++-
 arch/x86/kvm/i8254.h                              |  17 +-
 arch/x86/kvm/i8259.c                              |  17 +-
 arch/x86/kvm/ioapic.c                             |  55 +-
 arch/x86/kvm/ioapic.h                             |  24 +-
 arch/x86/kvm/irq.c                                | 567 ++++++++++++++++-
 arch/x86/kvm/irq.h                                |  35 +-
 arch/x86/kvm/irq_comm.c                           | 469 ---------------
 arch/x86/kvm/lapic.c                              |   7 +-
 arch/x86/kvm/svm/avic.c                           | 702 ++++++++++------------
 arch/x86/kvm/svm/svm.c                            |   4 +
 arch/x86/kvm/svm/svm.h                            |  32 +-
 arch/x86/kvm/trace.h                              |  99 ++-
 arch/x86/kvm/vmx/capabilities.h                   |   1 -
 arch/x86/kvm/vmx/main.c                           |   2 +-
 arch/x86/kvm/vmx/posted_intr.c                    | 140 ++---
 arch/x86/kvm/vmx/posted_intr.h                    |  10 +-
 arch/x86/kvm/vmx/vmx.c                            |   2 -
 arch/x86/kvm/x86.c                                | 254 +-------
 drivers/hv/mshv_eventfd.c                         |   8 +
 drivers/iommu/amd/amd_iommu_types.h               |   1 -
 drivers/iommu/amd/iommu.c                         | 125 ++--
 drivers/iommu/intel/irq_remapping.c               |  10 +-
 drivers/irqchip/irq-gic-v4.c                      |   4 +-
 drivers/vfio/pci/vfio_pci_intrs.c                 |  10 +-
 drivers/vhost/vdpa.c                              |  10 +-
 include/kvm/arm_vgic.h                            |   2 +-
 include/linux/amd-iommu.h                         |  25 +-
 include/linux/irqbypass.h                         |  46 +-
 include/linux/irqchip/arm-gic-v4.h                |   2 +-
 include/linux/kvm_host.h                          |  18 +-
 include/linux/kvm_irqfd.h                         |   5 +-
 include/linux/wait.h                              |   2 +
 include/trace/events/kvm.h                        |  84 +--
 kernel/sched/wait.c                               |  22 +-
 tools/testing/selftests/kvm/Makefile.kvm          |   1 +
 tools/testing/selftests/kvm/arm64/vgic_irq.c      |  12 +-
 tools/testing/selftests/kvm/config                |   1 +
 tools/testing/selftests/kvm/include/kvm_util.h    |  40 ++
 tools/testing/selftests/kvm/irqfd_test.c          | 135 +++++
 tools/testing/selftests/kvm/lib/kvm_util.c        |  13 +-
 tools/testing/selftests/kvm/x86/xen_shinfo_test.c |  21 +-
 virt/kvm/eventfd.c                                | 159 +++--
 virt/kvm/irqchip.c                                |   2 -
 virt/lib/irqbypass.c                              | 190 +++---
 56 files changed, 1848 insertions(+), 1766 deletions(-)
 delete mode 100644 arch/x86/kvm/irq_comm.c
 create mode 100644 tools/testing/selftests/kvm/irqfd_test.c

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: Misc changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: IRQ " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-28 15:11   ` Paolo Bonzini
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: MMIO State Data mitigation " Sean Christopherson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

The highlights are the DEBUGCTL.FREEZE_IN_SMM fix from Maxim, Jim's APERF/MPERF
support that has probably made him question the meaning of life, and a big
cleanup of the MSR interception code to ease the pain of adding support for
CET, FRED, and the mediated PMU (and any other features that deal with MSRs).

But the one change that I really want your eyeballs on is that last commit,
"Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created"; it's an ABI
change that could break userspace.  AFAICT, it won't affect any (known)
userspace, and restricting the ioctl for all VM types is much simpler than
special casing "secure" TSC guests.  Holler if you want a new tag/pull request
without that change; I deliberately kept it dead last specifically so it could
be omitted without any fuss.

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-misc-6.17

for you to fetch changes up to dcbe5a466c123a475bb66492749549f09b5cab00:

  KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created (2025-07-14 15:29:33 -0700)

----------------------------------------------------------------
KVM x86 misc changes for 6.17

 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
   guest.  Failure to honor FREEZE_IN_SMM can bleed host state into the guest.

 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to
   prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF.

 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.

 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.

 - Recalculate all MSR intercepts from the "source" on MSR filter changes, and
   drop the dedicated "shadow" bitmaps (and their awful "max" size defines).

 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR.

 - Advertise support for LKGS (Load Kernel GS base), a new instruction that's
   loosely related to FRED, but is supported and enumerated independently.

 - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED,
   a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON).  Use
   the same approach KVM uses for dealing with "impossible" emulation when
   running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU
   has architecturally impossible state.

 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can "virtualize"
   APERF/MPERF (with many caveats).

 - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default"
   frequency is unsupported for VMs with a "secure" TSC, and there's no known
   use case for changing the default frequency for other VM types.

----------------------------------------------------------------
Chao Gao (2):
      KVM: x86: Deduplicate MSR interception enabling and disabling
      KVM: SVM: Simplify MSR interception logic for IA32_XSS MSR

Jim Mattson (3):
      KVM: x86: Replace growing set of *_in_guest bools with a u64
      KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
      KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF

Kai Huang (1):
      KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created

Maxim Levitsky (3):
      KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter
      KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs
      KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest

Sean Christopherson (44):
      KVM: TDX: Use kvm_arch_vcpu.host_debugctl to restore the host's DEBUGCTL
      KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap
      KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag
      KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported
      KVM: VMX: Extract checking of guest's DEBUGCTL into helper
      KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest
      KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup()
      KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
      KVM: SVM: Tag MSR bitmap initialization helpers with __init
      KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
      KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
      KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
      KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs
      KVM: SVM: Clean up macros related to architectural MSRPM definitions
      KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
      KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge
      KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough"
      KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
      KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
      KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
      KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
      KVM: x86: Move definition of X2APIC_MSR() to lapic.h
      KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
      KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
      KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
      KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific
      KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
      KVM: SVM: Merge "after set CPUID" intercept recalc helpers
      KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses
      KVM: SVM: Move svm_msrpm_offset() to nested.c
      KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
      KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps
      KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR
      KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
      KVM: SVM: Add a helper to allocate and initialize permissions bitmaps
      KVM: x86: Simplify userspace filter logic when disabling MSR interception
      KVM: selftests: Verify KVM disable interception (for userspace) on filter change
      KVM: x86: Drop pending_smi vs. INIT_RECEIVED check when setting MP_STATE
      KVM: x86: WARN and reject KVM_RUN if vCPU's MP_STATE is SIPI_RECEIVED
      KVM: x86: Move INIT_RECEIVED vs. INIT/SIPI blocked check to KVM_RUN
      KVM: x86: Refactor handling of SIPI_RECEIVED when setting MP_STATE
      KVM: VMX: Add a macro to track which DEBUGCTL bits are host-owned
      KVM: selftests: Expand set of APIs for pinning tasks to a single CPU
      KVM: selftests: Convert arch_timer tests to common helpers to pin task

Xin Li (1):
      KVM: x86: Advertise support for LKGS

 Documentation/virt/kvm/api.rst                     |  25 +-
 arch/x86/include/asm/kvm-x86-ops.h                 |   3 +-
 arch/x86/include/asm/kvm_host.h                    |  22 +-
 arch/x86/include/asm/msr-index.h                   |   1 +
 arch/x86/kvm/cpuid.c                               |   1 +
 arch/x86/kvm/lapic.h                               |   2 +
 arch/x86/kvm/svm/nested.c                          | 128 ++++--
 arch/x86/kvm/svm/sev.c                             |  33 +-
 arch/x86/kvm/svm/svm.c                             | 500 +++++++--------------
 arch/x86/kvm/svm/svm.h                             | 104 ++++-
 arch/x86/kvm/vmx/common.h                          |   2 -
 arch/x86/kvm/vmx/main.c                            |  23 +-
 arch/x86/kvm/vmx/nested.c                          |  27 +-
 arch/x86/kvm/vmx/pmu_intel.c                       |   8 +-
 arch/x86/kvm/vmx/tdx.c                             |  24 +-
 arch/x86/kvm/vmx/vmx.c                             | 284 ++++--------
 arch/x86/kvm/vmx/vmx.h                             |  61 ++-
 arch/x86/kvm/vmx/x86_ops.h                         |   6 +-
 arch/x86/kvm/x86.c                                 | 106 +++--
 arch/x86/kvm/x86.h                                 |  18 +-
 include/uapi/linux/kvm.h                           |   1 +
 tools/include/uapi/linux/kvm.h                     |   1 +
 tools/testing/selftests/kvm/Makefile.kvm           |   1 +
 tools/testing/selftests/kvm/arch_timer.c           |   7 +-
 .../selftests/kvm/arm64/arch_timer_edge_cases.c    |  23 +-
 tools/testing/selftests/kvm/include/kvm_util.h     |  31 +-
 tools/testing/selftests/kvm/lib/kvm_util.c         |  15 +-
 tools/testing/selftests/kvm/lib/memstress.c        |   2 +-
 tools/testing/selftests/kvm/x86/aperfmperf_test.c  | 213 +++++++++
 .../selftests/kvm/x86/userspace_msr_exit_test.c    |   8 +
 30 files changed, 930 insertions(+), 750 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86/aperfmperf_test.c

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: MMIO State Data mitigation changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Misc " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: MMU " Sean Christopherson
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Rework the MMIO Stale Data mitigation to apply to all VMs that can access host
MMIO, not just VMs that are associated with a VFIO group.

My motivation for this series is all about killing off assigned_device_count
(spoiler alert), I honestly have no idea if there are any real world setups
that are affected by this change.

You should see a trivial conflict with Linus' tree (commit f9af88a3d384
("x86/bugs: Rename MDS machinery to something more generic")).  As usual,
Stephen's resolution[*] is correct:

diff --cc arch/x86/kvm/vmx/vmx.c
index 191a9ed0da22,65949882afa9..47019c9af671
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@@ -7290,8 -7210,8 +7210,8 @@@ static noinstr void vmx_vcpu_enter_exit
  	if (static_branch_unlikely(&vmx_l1d_should_flush))
  		vmx_l1d_flush(vcpu);
  	else if (static_branch_unlikely(&cpu_buf_vm_clear) &&
- 		 kvm_arch_has_assigned_device(vcpu->kvm))
+ 		 (flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO))
 -		mds_clear_cpu_buffers();
 +		x86_clear_cpu_buffers();
  
  	vmx_disable_fb_clear(vmx);

[*] https://lore.kernel.org/all/20250709171115.7556c98c@canb.auug.org.au

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-mmio-6.17

for you to fetch changes up to 83ebe715748314331f9639de2220d02debfe926d:

  KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest (2025-06-25 08:42:51 -0700)

----------------------------------------------------------------
KVM MMIO Stale Data mitigation cleanup for 6.17

Rework KVM's mitigation for the MMIO State Data vulnerability to track
whether or not a vCPU has access to (host) MMIO based on the MMU that will be
used when running in the guest.  The current approach doesn't actually detect
whether or not a guest has access to MMIO, and is prone to false negatives (and
to a lesser extent, false positives), as KVM_DEV_VFIO_FILE_ADD is optional, and
obviously only covers VFIO devices.

----------------------------------------------------------------
Sean Christopherson (3):
      KVM: x86: Avoid calling kvm_is_mmio_pfn() when kvm_x86_ops.get_mt_mask is NULL
      KVM: x86/mmu: Locally cache whether a PFN is host MMIO when making a SPTE
      KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu_internal.h |  3 +++
 arch/x86/kvm/mmu/spte.c         | 43 ++++++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/mmu/spte.h         | 10 ++++++++++
 arch/x86/kvm/vmx/run_flags.h    | 10 ++++++----
 arch/x86/kvm/vmx/vmx.c          |  8 +++++++-
 6 files changed, 67 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: MMU changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: MMIO State Data mitigation " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: Device assignment accounting " Sean Christopherson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Dynamically allocate the shadow MMU's hashed page list, as it's a whopping
32KiB and isn't needed when using the TDP MMU without nested VMs.  The TDX
change is quite out of place, but it's in here as "KVM: x86: Use kvzalloc()
to allocate VM struct" depends on both the TDX change and on dynamically
allocating the hashed list (KVM uses kvzalloc() purely because the 32KiB
for the list blows up the size of struct kvm).

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-mmu-6.17

for you to fetch changes up to 9c4fe6d1509b386ab78f27dfaa2d128be77dc2d2:

  KVM: x86/mmu: Defer allocation of shadow MMU's hashed page list (2025-06-24 12:51:07 -0700)

----------------------------------------------------------------
KVM x86 MMU changes for 6.17

 - Exempt nested EPT from the the !USER + CR0.WP logic, as EPT doesn't interact
   with CR0.WP.

 - Move the TDX hardware setup code to tdx.c to better co-locate TDX code
   and eliminate a few global symbols.

 - Dynamically allocation the shadow MMU's hashed page list, and defer
   allocating the hashed list until it's actually needed (the TDP MMU doesn't
   use the list).

----------------------------------------------------------------
Sean Christopherson (5):
      KVM: x86/mmu: Exempt nested EPT page tables from !USER, CR0.WP=0 logic
      KVM: TDX: Move TDX hardware setup from main.c to tdx.c
      KVM: x86/mmu: Dynamically allocate shadow MMU's hashed page list
      KVM: x86: Use kvzalloc() to allocate VM struct
      KVM: x86/mmu: Defer allocation of shadow MMU's hashed page list

 arch/x86/include/asm/kvm_host.h |  6 ++--
 arch/x86/kvm/mmu/mmu.c          | 75 +++++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/mmu/paging_tmpl.h  |  8 +++--
 arch/x86/kvm/svm/svm.c          |  2 ++
 arch/x86/kvm/vmx/main.c         | 36 ++------------------
 arch/x86/kvm/vmx/tdx.c          | 47 +++++++++++++++++++-------
 arch/x86/kvm/vmx/tdx.h          |  1 +
 arch/x86/kvm/vmx/vmx.c          |  2 ++
 arch/x86/kvm/vmx/x86_ops.h      | 10 ------
 arch/x86/kvm/x86.c              |  5 ++-
 arch/x86/kvm/x86.h              | 22 ++++++++++++
 11 files changed, 145 insertions(+), 69 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: Device assignment accounting changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: MMU " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-28 15:08   ` Paolo Bonzini
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Selftests " Sean Christopherson
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Two changes that depend on the IRQ and MMIO State Data pull requests, to kill
off kvm_arch_{start,end}_assignment().

Note!  To generate the pull request, I used the result of a local merge of
kvm/next with kvm-x86-irqs-6.17 and kvm-x86-mmio-6.17.  The resulting shortlog
matches my expectations (and intentions), but the diff stats showed all of the
changes from kvm-x86-irqs-6.17, and I couldn't for the life of me figure out
how to coerce git into behaving as I want.

AFAICT, it's just a cosmetic display error, there aren't any duplicate commits
or anything.  So, rather that copy+paste those weird diff stats, I locally
processed _this_ merge too, and then manually generated the stats with
`git diff --stat base..HEAD`.

The following changes since <the result of the aforementioned merges>:

  KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest (2025-06-25 08:42:51 -0700)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-no_assignment-6.17

for you to fetch changes up to bbc13ae593e0ea47357ff6e4740c533c16c2ae1e:

  VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment() (2025-06-25 09:51:33 -0700)

----------------------------------------------------------------
KVM VFIO device assignment cleanups for 6.17

Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking now
that KVM no longer uses assigned_device_count as a bad heuristic for "VM has
an irqbypass producer" or for "VM has access to host MMIO".

----------------------------------------------------------------
Sean Christopherson (3):
      Merge branch 'kvm-x86 mmio'
      Revert "kvm: detect assigned device via irqbypass manager"
      VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment()

 arch/x86/include/asm/kvm_host.h |  2 --
 arch/x86/kvm/irq.c              |  9 +--------
 arch/x86/kvm/x86.c              | 18 ------------------
 include/linux/kvm_host.h        | 18 ------------------
 virt/kvm/vfio.c                 |  3 ---
 5 files changed, 1 insertion(+), 49 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: Selftests changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: Device assignment accounting " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: SEV " Sean Christopherson
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Minor "quality of user life" selftests cleanups.

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-selftests-6.17

for you to fetch changes up to 71443210e26de3b35aea8dced894ad3c420d55d5:

  KVM: selftests: Print a more helpful message for EACCESS in access tracking test (2025-06-20 13:39:11 -0700)

----------------------------------------------------------------
KVM selftests changes for 6.17

 - Fix a comment typo.

 - Verify KVM is loaded when getting any KVM module param so that attempting to
   run a selftest without kvm.ko loaded results in a SKIP message about KVM not
   being loaded/enabled, versus some random parameter not existing.

 - SKIP tests that hit EACCES when attempting to access a file, with a "Root
   required?" help message.  In most cases, the test just needs to be run with
   elevated permissions.

----------------------------------------------------------------
Rahul Kumar (1):
      KVM: selftests: Fix spelling of 'occurrences' in sparsebit.c comments

Sean Christopherson (4):
      KVM: selftests: Verify KVM is loaded when getting a KVM module param
      KVM: selftests: Add __open_path_or_exit() variant to provide extra help info
      KVM: selftests: Play nice with EACCES errors in open_path_or_exit()
      KVM: selftests: Print a more helpful message for EACCESS in access tracking test

 .../selftests/kvm/access_tracking_perf_test.c      |  7 ++-----
 tools/testing/selftests/kvm/include/kvm_util.h     |  1 +
 .../testing/selftests/kvm/include/x86/processor.h  |  6 +++++-
 tools/testing/selftests/kvm/lib/kvm_util.c         | 23 ++++++++++++++++++----
 tools/testing/selftests/kvm/lib/sparsebit.c        |  4 ++--
 tools/testing/selftests/kvm/lib/x86/processor.c    | 10 ----------
 .../x86/vmx_exception_with_invalid_guest_state.c   |  2 +-
 7 files changed, 30 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: SEV changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Selftests " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: SVM " Sean Christopherson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Note!  This pull request is based on tags/x86_core_for_kvm from the tip tree.
Unless you merge that first (I don't think you'll do that?), merging this will
also suck in:

  4fdc3431e03b x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUs
  07f99c3fbe6e x86/lib: Add WBNOINVD helper functions
  e638081751a2 x86/lib: Drop the unused return value from wbinvd_on_all_cpus()
  1d738dbb252f drm/gpu: Remove dead checks on wbinvd_on_all_cpus()'s return value

Holler if you want the full diff stat, or if you want me to handle dependencies
like this differently in the future.

The following changes since commit 4fdc3431e03b9c11803f399f91837fca487029a1:

  x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUs (2025-07-10 13:30:17 +0200)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-sev-6.17

for you to fetch changes up to 6f38f8c574642a822f2e85f079fa29a49176c49c:

  KVM: SVM: Flush cache only on CPUs running SEV guest (2025-07-14 15:14:02 -0700)

----------------------------------------------------------------
KVM SEV cache maintenance changes for 6.17

 - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM.

 - Use WBNOINVD instead of WBINVD when possible, for SEV cache maintenance,
   e.g. to minimize collateral damage when reclaiming memory from an SEV guest.

 - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that
   have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs
   that can't possibly have cache lines with dirty, encrypted data.

----------------------------------------------------------------
Kevin Loughlin (1):
      KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiency

Sean Christopherson (1):
      KVM: x86: Use wbinvd_on_cpu() instead of an open-coded equivalent

Zheyun Shen (2):
      KVM: SVM: Remove wbinvd in sev_vm_destroy()
      KVM: SVM: Flush cache only on CPUs running SEV guest

 arch/x86/kvm/svm/sev.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
 arch/x86/kvm/svm/svm.h |   1 +
 arch/x86/kvm/x86.c     |   8 +-------
 3 files changed, 84 insertions(+), 35 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: SVM changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (9 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: SEV " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: VMX " Sean Christopherson
  2025-07-28 15:50 ` [GIT PULL] KVM: x86: Changes " Paolo Bonzini
  12 siblings, 0 replies; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Two small SNP changes.  I initially balked at completely dropping KVM's checks,
but I can't think of any way this will cause ABI problems, and I also don't see
how having KVM perform checks would add value in any way.  So here they are :-)

The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:

  KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-svm-6.17

for you to fetch changes up to 24be2b7956a545945fcb560d42e3ea86406dba09:

  KVM: SVM: Allow SNP guest policy to specify SINGLE_SOCKET (2025-06-20 13:33:45 -0700)

----------------------------------------------------------------
KVM SVM changes for 6.17

Drop KVM's rejection of SNP's SMT and single-socket policy restrictions, and
instead rely on firmware to verify that the policy can actually be supported.
Don't bother checking that requested policy(s) can actually be satisfied, as
an incompatible policy doesn't put the kernel at risk in any way, and providing
guarantees with respect to the physical topology is outside of KVM's purview.

----------------------------------------------------------------
Tom Lendacky (2):
      KVM: SVM: Allow SNP guest policy disallow running with SMT enabled
      KVM: SVM: Allow SNP guest policy to specify SINGLE_SOCKET

 arch/x86/kvm/svm/sev.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [GIT PULL] KVM: x86: VMX changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (10 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: SVM " Sean Christopherson
@ 2025-07-25 22:07 ` Sean Christopherson
  2025-07-28 15:47   ` Paolo Bonzini
  2025-07-28 15:50 ` [GIT PULL] KVM: x86: Changes " Paolo Bonzini
  12 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2025-07-25 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel, Sean Christopherson

Add a sub-ioctl to allow getting TDX VMs into TEARDOWN before the last reference
to the VM is put, so that reclaiming the VM's memory doesn't have to jump
through all the hoops needed to reclaim memory from a live TD, which are quite
costly, especially for large VMs.

The following changes since commit 347e9f5043c89695b01e66b3ed111755afcf1911:

  Linux 6.16-rc6 (2025-07-13 14:25:58 -0700)

are available in the Git repository at:

  https://github.com/kvm-x86/linux.git tags/kvm-x86-vmx-6.17

for you to fetch changes up to dcab95e533642d8f733e2562b8bfa5715541e0cf:

  KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM (2025-07-21 16:23:02 -0700)

----------------------------------------------------------------
KVM VMX changes for 6.17

Add a TDX sub-ioctl, KVM_TDX_TERMINATE_VM, to let userspace mark a VM as dead,
and most importantly release its HKID, prior to dropping the last reference to
the VM.  Releasing the HKID moves the VM to TDX's TEARDOWN state, which allows
pages to be reclaimed directly and ultimately reduces total reclaim time by a
factor of 10x or more.

----------------------------------------------------------------
Sean Christopherson (1):
      KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM

 Documentation/virt/kvm/x86/intel-tdx.rst | 22 ++++++++++++++++++-
 arch/x86/include/uapi/asm/kvm.h          |  7 ++++++-
 arch/x86/kvm/vmx/tdx.c                   | 36 +++++++++++++++++++++++++-------
 3 files changed, 55 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [GIT PULL] KVM: Device assignment accounting changes for 6.17
  2025-07-25 22:07 ` [GIT PULL] KVM: Device assignment accounting " Sean Christopherson
@ 2025-07-28 15:08   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2025-07-28 15:08 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel

On Sat, Jul 26, 2025 at 12:07 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Two changes that depend on the IRQ and MMIO State Data pull requests, to kill
> off kvm_arch_{start,end}_assignment().
>
> Note!  To generate the pull request, I used the result of a local merge of
> kvm/next with kvm-x86-irqs-6.17 and kvm-x86-mmio-6.17.  The resulting shortlog
> matches my expectations (and intentions), but the diff stats showed all of the
> changes from kvm-x86-irqs-6.17, and I couldn't for the life of me figure out
> how to coerce git into behaving as I want.
>
> AFAICT, it's just a cosmetic display error, there aren't any duplicate commits
> or anything.  So, rather that copy+paste those weird diff stats, I locally
> processed _this_ merge too, and then manually generated the stats with
> `git diff --stat base..HEAD`.

Yep, that happens. Not enough to consider writing a replacement for
git-request-pull, but it happens...

For me the problems are usually due to back-merges from Linus's tree
(mostly unintentional, like if arch1 uses rc2 as the base and arch2
uses rc4), so I just do a final merge from Linus's tree and use the
*reverse* of that commit's diffstat (git diff --stat HEAD HEAD^) for
the pull request.

Paolo

> The following changes since <the result of the aforementioned merges>:
>
>   KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest (2025-06-25 08:42:51 -0700)
>
> are available in the Git repository at:
>
>   https://github.com/kvm-x86/linux.git tags/kvm-x86-no_assignment-6.17
>
> for you to fetch changes up to bbc13ae593e0ea47357ff6e4740c533c16c2ae1e:
>
>   VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment() (2025-06-25 09:51:33 -0700)
>
> ----------------------------------------------------------------
> KVM VFIO device assignment cleanups for 6.17
>
> Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking now
> that KVM no longer uses assigned_device_count as a bad heuristic for "VM has
> an irqbypass producer" or for "VM has access to host MMIO".
>
> ----------------------------------------------------------------
> Sean Christopherson (3):
>       Merge branch 'kvm-x86 mmio'
>       Revert "kvm: detect assigned device via irqbypass manager"
>       VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment()
>
>  arch/x86/include/asm/kvm_host.h |  2 --
>  arch/x86/kvm/irq.c              |  9 +--------
>  arch/x86/kvm/x86.c              | 18 ------------------
>  include/linux/kvm_host.h        | 18 ------------------
>  virt/kvm/vfio.c                 |  3 ---
>  5 files changed, 1 insertion(+), 49 deletions(-)
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [GIT PULL] KVM: x86: Misc changes for 6.17
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: Misc " Sean Christopherson
@ 2025-07-28 15:11   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2025-07-28 15:11 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel

On Sat, Jul 26, 2025 at 12:07 AM Sean Christopherson <seanjc@google.com> wrote:
>
> The highlights are the DEBUGCTL.FREEZE_IN_SMM fix from Maxim, Jim's APERF/MPERF
> support that has probably made him question the meaning of life, and a big
> cleanup of the MSR interception code to ease the pain of adding support for
> CET, FRED, and the mediated PMU (and any other features that deal with MSRs).
>
> But the one change that I really want your eyeballs on is that last commit,
> "Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created"; it's an ABI
> change that could break userspace.  AFAICT, it won't affect any (known)
> userspace, and restricting the ioctl for all VM types is much simpler than
> special casing "secure" TSC guests.  Holler if you want a new tag/pull request
> without that change; I deliberately kept it dead last specifically so it could
> be omitted without any fuss.

No problem there. It makes no sense to use the VM ioctl if you can't
issue it before vCPU creation, the whole point is to have a homogenous
frequency.

Paolo

> The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5:
>
>   KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400)
>
> are available in the Git repository at:
>
>   https://github.com/kvm-x86/linux.git tags/kvm-x86-misc-6.17
>
> for you to fetch changes up to dcbe5a466c123a475bb66492749549f09b5cab00:
>
>   KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created (2025-07-14 15:29:33 -0700)
>
> ----------------------------------------------------------------
> KVM x86 misc changes for 6.17
>
>  - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
>    guest.  Failure to honor FREEZE_IN_SMM can bleed host state into the guest.
>
>  - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to
>    prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF.
>
>  - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
>    vCPU's CPUID model.
>
>  - Rework the MSR interception code so that the SVM and VMX APIs are more or
>    less identical.
>
>  - Recalculate all MSR intercepts from the "source" on MSR filter changes, and
>    drop the dedicated "shadow" bitmaps (and their awful "max" size defines).
>
>  - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
>    nested SVM MSRPM offsets tracker can't handle an MSR.
>
>  - Advertise support for LKGS (Load Kernel GS base), a new instruction that's
>    loosely related to FRED, but is supported and enumerated independently.
>
>  - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED,
>    a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON).  Use
>    the same approach KVM uses for dealing with "impossible" emulation when
>    running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU
>    has architecturally impossible state.
>
>  - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
>    APERF/MPERF reads, so that a "properly" configured VM can "virtualize"
>    APERF/MPERF (with many caveats).
>
>  - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default"
>    frequency is unsupported for VMs with a "secure" TSC, and there's no known
>    use case for changing the default frequency for other VM types.
>
> ----------------------------------------------------------------
> Chao Gao (2):
>       KVM: x86: Deduplicate MSR interception enabling and disabling
>       KVM: SVM: Simplify MSR interception logic for IA32_XSS MSR
>
> Jim Mattson (3):
>       KVM: x86: Replace growing set of *_in_guest bools with a u64
>       KVM: x86: Provide a capability to disable APERF/MPERF read intercepts
>       KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
>
> Kai Huang (1):
>       KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created
>
> Maxim Levitsky (3):
>       KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter
>       KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs
>       KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest
>
> Sean Christopherson (44):
>       KVM: TDX: Use kvm_arch_vcpu.host_debugctl to restore the host's DEBUGCTL
>       KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap
>       KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag
>       KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported
>       KVM: VMX: Extract checking of guest's DEBUGCTL into helper
>       KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest
>       KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup()
>       KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails
>       KVM: SVM: Tag MSR bitmap initialization helpers with __init
>       KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs
>       KVM: SVM: Kill the VM instead of the host if MSR interception is buggy
>       KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts
>       KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs
>       KVM: SVM: Clean up macros related to architectural MSRPM definitions
>       KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps
>       KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge
>       KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough"
>       KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets
>       KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
>       KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest
>       KVM: SVM: Drop "always" flag from list of possible passthrough MSRs
>       KVM: x86: Move definition of X2APIC_MSR() to lapic.h
>       KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change
>       KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change
>       KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts()
>       KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific
>       KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller
>       KVM: SVM: Merge "after set CPUID" intercept recalc helpers
>       KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses
>       KVM: SVM: Move svm_msrpm_offset() to nested.c
>       KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *"
>       KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps
>       KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR
>       KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels
>       KVM: SVM: Add a helper to allocate and initialize permissions bitmaps
>       KVM: x86: Simplify userspace filter logic when disabling MSR interception
>       KVM: selftests: Verify KVM disable interception (for userspace) on filter change
>       KVM: x86: Drop pending_smi vs. INIT_RECEIVED check when setting MP_STATE
>       KVM: x86: WARN and reject KVM_RUN if vCPU's MP_STATE is SIPI_RECEIVED
>       KVM: x86: Move INIT_RECEIVED vs. INIT/SIPI blocked check to KVM_RUN
>       KVM: x86: Refactor handling of SIPI_RECEIVED when setting MP_STATE
>       KVM: VMX: Add a macro to track which DEBUGCTL bits are host-owned
>       KVM: selftests: Expand set of APIs for pinning tasks to a single CPU
>       KVM: selftests: Convert arch_timer tests to common helpers to pin task
>
> Xin Li (1):
>       KVM: x86: Advertise support for LKGS
>
>  Documentation/virt/kvm/api.rst                     |  25 +-
>  arch/x86/include/asm/kvm-x86-ops.h                 |   3 +-
>  arch/x86/include/asm/kvm_host.h                    |  22 +-
>  arch/x86/include/asm/msr-index.h                   |   1 +
>  arch/x86/kvm/cpuid.c                               |   1 +
>  arch/x86/kvm/lapic.h                               |   2 +
>  arch/x86/kvm/svm/nested.c                          | 128 ++++--
>  arch/x86/kvm/svm/sev.c                             |  33 +-
>  arch/x86/kvm/svm/svm.c                             | 500 +++++++--------------
>  arch/x86/kvm/svm/svm.h                             | 104 ++++-
>  arch/x86/kvm/vmx/common.h                          |   2 -
>  arch/x86/kvm/vmx/main.c                            |  23 +-
>  arch/x86/kvm/vmx/nested.c                          |  27 +-
>  arch/x86/kvm/vmx/pmu_intel.c                       |   8 +-
>  arch/x86/kvm/vmx/tdx.c                             |  24 +-
>  arch/x86/kvm/vmx/vmx.c                             | 284 ++++--------
>  arch/x86/kvm/vmx/vmx.h                             |  61 ++-
>  arch/x86/kvm/vmx/x86_ops.h                         |   6 +-
>  arch/x86/kvm/x86.c                                 | 106 +++--
>  arch/x86/kvm/x86.h                                 |  18 +-
>  include/uapi/linux/kvm.h                           |   1 +
>  tools/include/uapi/linux/kvm.h                     |   1 +
>  tools/testing/selftests/kvm/Makefile.kvm           |   1 +
>  tools/testing/selftests/kvm/arch_timer.c           |   7 +-
>  .../selftests/kvm/arm64/arch_timer_edge_cases.c    |  23 +-
>  tools/testing/selftests/kvm/include/kvm_util.h     |  31 +-
>  tools/testing/selftests/kvm/lib/kvm_util.c         |  15 +-
>  tools/testing/selftests/kvm/lib/memstress.c        |   2 +-
>  tools/testing/selftests/kvm/x86/aperfmperf_test.c  | 213 +++++++++
>  .../selftests/kvm/x86/userspace_msr_exit_test.c    |   8 +
>  30 files changed, 930 insertions(+), 750 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/x86/aperfmperf_test.c
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [GIT PULL] KVM: x86: VMX changes for 6.17
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: VMX " Sean Christopherson
@ 2025-07-28 15:47   ` Paolo Bonzini
  2025-07-29 19:44     ` Sean Christopherson
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2025-07-28 15:47 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel

On Sat, Jul 26, 2025 at 12:07 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Add a sub-ioctl to allow getting TDX VMs into TEARDOWN before the last reference
> to the VM is put, so that reclaiming the VM's memory doesn't have to jump
> through all the hoops needed to reclaim memory from a live TD, which are quite
> costly, especially for large VMs.
>
> The following changes since commit 347e9f5043c89695b01e66b3ed111755afcf1911:
>
>   Linux 6.16-rc6 (2025-07-13 14:25:58 -0700)
>
> are available in the Git repository at:
>
>   https://github.com/kvm-x86/linux.git tags/kvm-x86-vmx-6.17
>
> for you to fetch changes up to dcab95e533642d8f733e2562b8bfa5715541e0cf:
>
>   KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM (2025-07-21 16:23:02 -0700)

I haven't pulled this for now because I wonder if it's better to make
this a general-purpose ioctl and cap (plus a kvm_x86_ops hook).  The
faster teardown is a TDX module quirk, but for example would it be
useful if you could trigger kvm_vm_dead() in the selftests?

As a side effect it would remove the supported_caps field and separate
namespace for KVM_TDX_CAP_* capabilities, at least for now.

Paolo

> ----------------------------------------------------------------
> KVM VMX changes for 6.17
>
> Add a TDX sub-ioctl, KVM_TDX_TERMINATE_VM, to let userspace mark a VM as dead,
> and most importantly release its HKID, prior to dropping the last reference to
> the VM.  Releasing the HKID moves the VM to TDX's TEARDOWN state, which allows
> pages to be reclaimed directly and ultimately reduces total reclaim time by a
> factor of 10x or more.
>
> ----------------------------------------------------------------
> Sean Christopherson (1):
>       KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
>
>  Documentation/virt/kvm/x86/intel-tdx.rst | 22 ++++++++++++++++++-
>  arch/x86/include/uapi/asm/kvm.h          |  7 ++++++-
>  arch/x86/kvm/vmx/tdx.c                   | 36 +++++++++++++++++++++++++-------
>  3 files changed, 55 insertions(+), 10 deletions(-)
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [GIT PULL] KVM: x86: Changes for 6.17
  2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
                   ` (11 preceding siblings ...)
  2025-07-25 22:07 ` [GIT PULL] KVM: x86: VMX " Sean Christopherson
@ 2025-07-28 15:50 ` Paolo Bonzini
  12 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2025-07-28 15:50 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel

On Sat, Jul 26, 2025 at 12:07 AM Sean Christopherson <seanjc@google.com> wrote:
>
> As promised, the storm has arrived :-)
>
> There are a two anomalies this time around, but thankfully only one conflict,
> and a trivial one at that (details on that in the MMIO Stale Data pull request).
>
> 1. The "no assignment" pull request depends on the IRQs and MMIO Stale Data
>    pull requests.  I created the topic branch based on the IRQs branch (minus
>    one commit that came in later), and then merged in the MMIO branch to create
>    a common base.  All the commits came out as I wanted, but the diff stats
>    generated by `git request-pull` are funky, so I doctored them up, a lot.
>
> 2. The "SEV cache maintenance" pull request is based on a tag/branch from the
>    tip tree.  I don't think you need to do anything special here?  Except
>    possibly mention it to Linus if the KVM pull request happens to get sent
>    before the associated tip pull request (which seems unlikely given how they
>    send a bunch of small pulls).

Pulled everything except the lone TDX commit, thanks. I'm going to
start testing without it.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [GIT PULL] KVM: x86: VMX changes for 6.17
  2025-07-28 15:47   ` Paolo Bonzini
@ 2025-07-29 19:44     ` Sean Christopherson
  2025-07-30 17:55       ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2025-07-29 19:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, linux-kernel

On Mon, Jul 28, 2025, Paolo Bonzini wrote:
> On Sat, Jul 26, 2025 at 12:07 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Add a sub-ioctl to allow getting TDX VMs into TEARDOWN before the last reference
> > to the VM is put, so that reclaiming the VM's memory doesn't have to jump
> > through all the hoops needed to reclaim memory from a live TD, which are quite
> > costly, especially for large VMs.
> >
> > The following changes since commit 347e9f5043c89695b01e66b3ed111755afcf1911:
> >
> >   Linux 6.16-rc6 (2025-07-13 14:25:58 -0700)
> >
> > are available in the Git repository at:
> >
> >   https://github.com/kvm-x86/linux.git tags/kvm-x86-vmx-6.17
> >
> > for you to fetch changes up to dcab95e533642d8f733e2562b8bfa5715541e0cf:
> >
> >   KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM (2025-07-21 16:23:02 -0700)
> 
> I haven't pulled this for now because I wonder if it's better to make
> this a general-purpose ioctl and cap (plus a kvm_x86_ops hook).  The
> faster teardown is a TDX module quirk, but for example would it be
> useful if you could trigger kvm_vm_dead() in the selftests?

I'm leaning "no" (leaning is probably an understatement).

Mainly because I think the current behavior of vm_dead is a mistake.  Rejecting
all ioctls if kvm->vm_dead is true sounds nice on paper, but in practice it gives
us a false sense of security due to the check happening before acquiring kvm->lock,
e.g. see the SEV-ES migration bug found by syzbot.

Enforcing vm_dead with 100% accuracy would be painful given that there are ioctls
that deliberately avoid kvm->lock (vCPU ioctls could simply check KVM_REQ_VM_DEAD),
and I'm not at all convinced that truly making the VM off-limits is actually
desirable.  E.g. it prevents quickly freeing resources by nuking memslots.

I do think it makes sense to reject ioctls if vm_bugged is set, because vm_bugged
is all about limiting the damage when something has already gone wrong, i.e.
providing any kind of ABI is very much a non-goal.

And if the vm_dead behavior is gone, I don't think a generic KVM_TERMINATE_VM
adds much, if any value.  Blocking KVM_RUN isn't terribly interesting, because
VMMs can already accomplish that with signals+immediate_exit, and AFAIK, real-world
use cases don't have problems with KVM_RUN being called at unexpected times.

One thing that we've discussed internally (though not in much depth) is a way to
block accesses to guest memory, e.g. to guard against accesses to guest memory
while saving vCPU state during live migration, when the VMM might expect that
guest memory is frozen, i.e. can't be dirtied.  But we wouldn't want to terminate
the VM in that case, e.g. so that the VM could be resumed if the migration is
aborted at the last minute.

So I think we want something more along the lines of KVM_PAUSE_VM, with specific
semantics and guarantees.

As for this pull request, I vote to drop it for 6.17 and give ourselves time to
figure out what we want to do with vm_dead.  I want to land "terminate VM" in
some form by 6.18 (as the next LTS), but AFAIK there's no rush to get it into
6.17.

I posted a series with a slightly modified version of the KVM_TDX_TERMINATE_VM
patch[1] to show where I think we should go.  We discussed the topic in v4 of the
KVM_TDX_TERMINATE_VM patch[2], but I opted to post it separate (at the time)
because there wasn't a strict dependency.

[1] https://lore.kernel.org/all/20250729193341.621487-1-seanjc@google.com
[2] https://lore.kernel.org/all/aFNa7L74tjztduT-@google.com

> As a side effect it would remove the supported_caps field and separate
> namespace for KVM_TDX_CAP_* capabilities, at least for now.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [GIT PULL] KVM: x86: VMX changes for 6.17
  2025-07-29 19:44     ` Sean Christopherson
@ 2025-07-30 17:55       ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2025-07-30 17:55 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel

On Tue, Jul 29, 2025 at 9:44 PM Sean Christopherson <seanjc@google.com> wrote:
> As for this pull request, I vote to drop it for 6.17 and give ourselves time to
> figure out what we want to do with vm_dead.

Ah ok, so my spidey sense wasn't right for the wrong reasons. :)

> I want to land "terminate VM" in
> some form by 6.18 (as the next LTS), but AFAIK there's no rush to get it into
> 6.17.

As you prefer! I had already rewritten slightly the commit log, so
here it is for
your reference and future consumption:

Add a TDX sub-ioctl, KVM_TDX_TERMINATE_VM, to solve a performance
issue in TDX VM cleanup. A guest_memfd keeps a reference to the
virtual machine, which means the VM cannot be fully destroyed until
the guest_memfd is released. However, to release the guest_memfd the
TDX module must first destroy the Secure EPT, which is a slow
operation if
performed while the VM is still valid.  KVM_TDX_TERMINATE_VM allows
userspace to initiate the transition to the TEARDOWN state before file
descriptors are closed (either by hand or on process exit). The TDX
module then releases the HKID and S-EPT destruction can runup to 10x
faster.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-07-30 17:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-25 22:07 [GIT PULL] KVM: x86: Changes for 6.17 Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: x86: Local APIC refactoring " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: Dirty Ring changes " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: Generic " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: IRQ " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: x86: Misc " Sean Christopherson
2025-07-28 15:11   ` Paolo Bonzini
2025-07-25 22:07 ` [GIT PULL] KVM: x86: MMIO State Data mitigation " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: x86: MMU " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: Device assignment accounting " Sean Christopherson
2025-07-28 15:08   ` Paolo Bonzini
2025-07-25 22:07 ` [GIT PULL] KVM: x86: Selftests " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: x86: SEV " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: x86: SVM " Sean Christopherson
2025-07-25 22:07 ` [GIT PULL] KVM: x86: VMX " Sean Christopherson
2025-07-28 15:47   ` Paolo Bonzini
2025-07-29 19:44     ` Sean Christopherson
2025-07-30 17:55       ` Paolo Bonzini
2025-07-28 15:50 ` [GIT PULL] KVM: x86: Changes " Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).