All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Like Xu <like.xu.linux@gmail.com>,
	Chao Gao <chao.gao@intel.com>,
	Sean Christopherson <seanjc@google.com>
Subject: [PATCH 6.6 36/82] KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled
Date: Wed, 20 Nov 2024 13:56:46 +0100	[thread overview]
Message-ID: <20241120125630.426335042@linuxfoundation.org> (raw)
In-Reply-To: <20241120125629.623666563@linuxfoundation.org>

6.6-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sean Christopherson <seanjc@google.com>

commit 2657b82a78f18528bef56dc1b017158490970873 upstream.

When getting the current VPID, e.g. to emulate a guest TLB flush, return
vpid01 if L2 is running but with VPID disabled, i.e. if VPID is disabled
in vmcs12.  Architecturally, if VPID is disabled, then the guest and host
effectively share VPID=0.  KVM emulates this behavior by using vpid01 when
running an L2 with VPID disabled (see prepare_vmcs02_early_rare()), and so
KVM must also treat vpid01 as the current VPID while L2 is active.

Unconditionally treating vpid02 as the current VPID when L2 is active
causes KVM to flush TLB entries for vpid02 instead of vpid01, which
results in TLB entries from L1 being incorrectly preserved across nested
VM-Enter to L2 (L2=>L1 isn't problematic, because the TLB flush after
nested VM-Exit flushes vpid01).

The bug manifests as failures in the vmx_apicv_test KVM-Unit-Test, as KVM
incorrectly retains TLB entries for the APIC-access page across a nested
VM-Enter.

Opportunisticaly add comments at various touchpoints to explain the
architectural requirements, and also why KVM uses vpid01 instead of vpid02.

All credit goes to Chao, who root caused the issue and identified the fix.

Link: https://lore.kernel.org/all/ZwzczkIlYGX+QXJz@intel.com
Fixes: 2b4a5a5d5688 ("KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST")
Cc: stable@vger.kernel.org
Cc: Like Xu <like.xu.linux@gmail.com>
Debugged-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20241031202011.1580522-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kvm/vmx/nested.c |   30 +++++++++++++++++++++++++-----
 arch/x86/kvm/vmx/vmx.c    |    2 +-
 2 files changed, 26 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1150,11 +1150,14 @@ static void nested_vmx_transition_tlb_fl
 		kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
 
 	/*
-	 * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
-	 * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
-	 * full TLB flush from the guest's perspective.  This is required even
-	 * if VPID is disabled in the host as KVM may need to synchronize the
-	 * MMU in response to the guest TLB flush.
+	 * If VPID is disabled, then guest TLB accesses use VPID=0, i.e. the
+	 * same VPID as the host, and so architecturally, linear and combined
+	 * mappings for VPID=0 must be flushed at VM-Enter and VM-Exit.  KVM
+	 * emulates L2 sharing L1's VPID=0 by using vpid01 while running L2,
+	 * and so KVM must also emulate TLB flush of VPID=0, i.e. vpid01.  This
+	 * is required if VPID is disabled in KVM, as a TLB flush (there are no
+	 * VPIDs) still occurs from L1's perspective, and KVM may need to
+	 * synchronize the MMU in response to the guest TLB flush.
 	 *
 	 * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use.
 	 * EPT is a special snowflake, as guest-physical mappings aren't
@@ -2229,6 +2232,17 @@ static void prepare_vmcs02_early_rare(st
 
 	vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA);
 
+	/*
+	 * If VPID is disabled, then guest TLB accesses use VPID=0, i.e. the
+	 * same VPID as the host.  Emulate this behavior by using vpid01 for L2
+	 * if VPID is disabled in vmcs12.  Note, if VPID is disabled, VM-Enter
+	 * and VM-Exit are architecturally required to flush VPID=0, but *only*
+	 * VPID=0.  I.e. using vpid02 would be ok (so long as KVM emulates the
+	 * required flushes), but doing so would cause KVM to over-flush.  E.g.
+	 * if L1 runs L2 X with VPID12=1, then runs L2 Y with VPID12 disabled,
+	 * and then runs L2 X again, then KVM can and should retain TLB entries
+	 * for VPID12=1.
+	 */
 	if (enable_vpid) {
 		if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02)
 			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->nested.vpid02);
@@ -5827,6 +5841,12 @@ static int handle_invvpid(struct kvm_vcp
 		return nested_vmx_fail(vcpu,
 			VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
 
+	/*
+	 * Always flush the effective vpid02, i.e. never flush the current VPID
+	 * and never explicitly flush vpid01.  INVVPID targets a VPID, not a
+	 * VMCS, and so whether or not the current vmcs12 has VPID enabled is
+	 * irrelevant (and there may not be a loaded vmcs12).
+	 */
 	vpid02 = nested_get_vpid02(vcpu);
 	switch (type) {
 	case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3193,7 +3193,7 @@ static void vmx_flush_tlb_all(struct kvm
 
 static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu)
 {
-	if (is_guest_mode(vcpu))
+	if (is_guest_mode(vcpu) && nested_cpu_has_vpid(get_vmcs12(vcpu)))
 		return nested_get_vpid02(vcpu);
 	return to_vmx(vcpu)->vpid;
 }



  parent reply	other threads:[~2024-11-20 12:59 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20 12:56 [PATCH 6.6 00/82] 6.6.63-rc1 review Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 01/82] netlink: terminate outstanding dump on socket close Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 02/82] sctp: fix possible UAF in sctp_v6_available() Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 03/82] net: vertexcom: mse102x: Fix tx_bytes calculation Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 04/82] drm/rockchip: vop: Fix a dereferenced before check warning Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 05/82] mptcp: error out earlier on disconnect Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 06/82] mptcp: cope racing subflow creation in mptcp_rcv_space_adjust Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 07/82] net/mlx5: fs, lock FTE when checking if active Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 08/82] net/mlx5e: kTLS, Fix incorrect page refcounting Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 09/82] net/mlx5e: clear xdp features on non-uplink representors Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 10/82] net/mlx5e: CT: Fix null-ptr-deref in add rule err flow Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 11/82] virtio/vsock: Fix accept_queue memory leak Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 12/82] Revert "RDMA/core: Fix ENODEV error for iWARP test over vlan" Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 13/82] Bluetooth: hci_core: Fix calling mgmt_device_connected Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 14/82] Bluetooth: btintel: Direct exception event to bluetooth stack Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 15/82] net/sched: cls_u32: replace int refcounts with proper refcounts Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 16/82] net: sched: cls_u32: Fix u32s systematic failure to free IDR entries for hnodes Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 17/82] samples: pktgen: correct dev to DEV Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 18/82] net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 19/82] net: Make copy_safe_from_sockptr() match documentation Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 20/82] net: stmmac: dwmac-intel-plat: use devm_stmmac_probe_config_dt() Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 21/82] net: stmmac: dwmac-visconti: " Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 22/82] net: stmmac: rename stmmac_pltfr_remove_no_dt to stmmac_pltfr_remove Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 23/82] stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 24/82] net: ti: icssg-prueth: Fix 1 PPS sync Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 25/82] bonding: add ns target multicast address to slave device Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 26/82] ARM: 9419/1: mm: Fix kernel memory mapping for xip kernels Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 27/82] tools/mm: fix compile error Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 28/82] x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 29/82] mm: fix NULL pointer dereference in alloc_pages_bulk_noprof Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 30/82] ocfs2: uncache inode which has failed entering the group Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 31/82] mm: revert "mm: shmem: fix data-race in shmem_getattr()" Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 32/82] vdpa: solidrun: Fix UB bug with devres Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 33/82] vdpa/mlx5: Fix PA offset with unaligned starting iotlb map Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 34/82] vp_vdpa: fix id_table array not null terminated error Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 35/82] ima: fix buffer overrun in ima_eventdigest_init_common Greg Kroah-Hartman
2024-11-20 12:56 ` Greg Kroah-Hartman [this message]
2024-11-20 12:56 ` [PATCH 6.6 37/82] KVM: x86: Unconditionally set irr_pending when updating APICv state Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 38/82] KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 39/82] nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 40/82] nommu: pass NULL argument to vma_iter_prealloc() Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 41/82] ALSA: hda/realtek - Fixed Clevo platform headset Mic issue Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 42/82] ALSA: hda/realtek: fix mute/micmute LEDs for a HP EliteBook 645 G10 Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 43/82] ocfs2: fix UBSAN warning in ocfs2_verify_volume() Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 44/82] nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 45/82] LoongArch: Fix early_numa_add_cpu() usage for FDT systems Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 46/82] LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 47/82] LoongArch: Make KASAN work with 5-level page-tables Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 48/82] Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K" Greg Kroah-Hartman
2024-11-20 12:56 ` [PATCH 6.6 49/82] mmc: sunxi-mmc: Fix A100 compatible description Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 50/82] drm/bridge: tc358768: Fix DSI command tx Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 51/82] pmdomain: imx93-blk-ctrl: correct remove path Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 52/82] nouveau: fw: sync dma after setup is called Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 53/82] drm/amd: Fix initialization mistake for NBIO 7.7.0 Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 54/82] drm/amd/display: Adjust VSDB parser for replay feature Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 55/82] mm/damon/core: implement scheme-specific apply interval Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 56/82] mm/damon/core: handle zero {aggregation,ops_update} intervals Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 57/82] staging: vchiq_arm: Get the rid off struct vchiq_2835_state Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 58/82] staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 59/82] lib/buildid: Fix build ID parsing logic Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 60/82] media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 61/82] NFSD: initialize copy->cp_clp early in nfsd4_copy for use by trace point Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 62/82] NFSD: Async COPY result needs to return a write verifier Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 63/82] NFSD: Limit the number of concurrent async COPY operations Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 64/82] NFSD: Initialize struct nfsd4_copy earlier Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 65/82] NFSD: Never decrement pending_async_copies on error Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 66/82] mptcp: define more local variables sk Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 67/82] mptcp: add userspace_pm_lookup_addr_by_id helper Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 68/82] mptcp: update local address flags when setting it Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 69/82] mptcp: hold pm lock when deleting entry Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 70/82] mptcp: drop lookup_by_id in lookup_addr Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 71/82] mptcp: pm: use _rcu variant under rcu_read_lock Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 72/82] drm/amd/pm: Vangogh: Fix kernel memory out of bounds write Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 73/82] fs/9p: fix uninitialized values during inode evict Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 74/82] leds: mlxreg: Use devm_mutex_init() for mutex initialization Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 75/82] mm: avoid unsafe VMA hook invocation when error arises on mmap hook Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 76/82] mm: unconditionally close VMAs on error Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 77/82] mm: refactor map_deny_write_exec() Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 78/82] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 79/82] mm: resolve faulty mmap_region() error path behaviour Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 80/82] mm/damon/core: check apply interval in damon_do_apply_schemes() Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 81/82] mm/damon/core: handle zero schemes apply interval Greg Kroah-Hartman
2024-11-20 12:57 ` [PATCH 6.6 82/82] mm/damon/core: copy nr_accesses when splitting region Greg Kroah-Hartman
2024-11-20 16:44 ` [PATCH 6.6 00/82] 6.6.63-rc1 review Mark Brown
2024-11-20 17:02 ` SeongJae Park
2024-11-20 19:05 ` Florian Fainelli
2024-11-20 23:20 ` Shuah Khan
2024-11-21  4:20 ` Ron Economos
2024-11-21  7:37 ` Naresh Kamboju
2024-11-21 16:57 ` Hardik Garg
2024-11-21 19:39 ` Jon Hunter
2024-11-22  6:55 ` Muhammad Usama Anjum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241120125630.426335042@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=chao.gao@intel.com \
    --cc=like.xu.linux@gmail.com \
    --cc=patches@lists.linux.dev \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.