From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06C361A8F95 for ; Thu, 16 Jan 2025 22:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737066922; cv=none; b=inTECr+woA2LxnL6qKOmwZlOPBxYthHYMYysZcDESNpvF6uepZc1j/UcofV/tYeWAfUBpJAp7dRJKKdsWckQltbd1I7+TuA5g3VI55VUFkmndc/IdJX4kTVc3ptbtF0B8/g7BG6VzBalt1gkYuSDnuEeSNOe4oajAXsP0Yw/O88= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737066922; c=relaxed/simple; bh=GkxjblgJIwnUrl7yFh0fd+yTdpnAxSdPyEOWBe1ogGw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=A0jXe5F1X0eRthujteZxHIDCVKKL3ujb8Kjvo9RbIXBBNMecO39atR7/pox3pgJWuLXjfXyXEbPngmGi6lm0jKCViYBPmdyjWwT/qyEG79gz3lCbyeppMc+NjsowTVLxpkJ/3ecMLZ7tO8Btb1utIU1mRRZBn9P+aBUPh6sFCSw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SV1EuSvs; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SV1EuSvs" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-216387ddda8so30336545ad.3 for ; Thu, 16 Jan 2025 14:35:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737066920; x=1737671720; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=v7ZSFYoL0QJz/svIjPrUJ1gwe8KhLK+3Cjm89lweI1M=; b=SV1EuSvswfQy9HHcKREsnyHyj61GabKtgGVmhkhflyUuO3laVEdZWcJX21iT9jE5Y+ QMd2CROseWDnJMBQd8uWXy36vybSegttTdDnDOpFxmOjPxeydj+VyWPBBSzSaP66PpS3 o1yppPTKojRrBLvA31JrrL4bg3CdJIw7E6CXFKPo8ppUt0U8sAfTIlaVqyQrqEF/Yiud J+KQbeTi3S98gcsctthvmqW/5HrLspHm/kF1xw035ZIBXIw23NQBa4PnWHolMfenHasN gleQv1mahlAx+sjrNdB3e1npTEkiHxgVlM7VgBwLyUKhP9EEd+55xK8dp63DW41KrTPA 63Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737066920; x=1737671720; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=v7ZSFYoL0QJz/svIjPrUJ1gwe8KhLK+3Cjm89lweI1M=; b=PS4ajN6YV+gCyl34fjTlweLpajbjqLOS0lkbm9TjtcVPohS9F0SrIRQrELSYxgz64n GiNlMGwqzWdrEs7R+lqSTvgaIKJGofWsPdHS9lhalygKD1VTfXaDJtWDBb7UhQw5wQwS E7OOHP0BAcb5wgrcMlyKSD61XoJP8sBqPj5D0KQmVX5hiL3C84B+kWDerBuFi9tVlLyb xQptIvAmQ8xq7C0oVnEGnolBsVJ6Tyh341+S6S16DaY+UAlKIXfz+skzYRVg1IaaoSW4 lEAZv4VqL8vMWKgHfT+LC06OWhRhcuXOxSJ2Lvjhf8dyPR+udBsCCe8t0oRERV6U4e6U wqhg== X-Forwarded-Encrypted: i=1; AJvYcCW4FXSQa0yqtcCEDtRMRR2O4i1O8HCZgKdTyvpKnK89dZQnoovdoHXn2zmC5vbPPgoPvh2+MyMUNoDGuRg=@vger.kernel.org X-Gm-Message-State: AOJu0YzRy/KR63FYZSFr0jt20JXL+wp+FZDOeOXRRTAbMslboEvoiZki AIMM4B+HgyZSpwukFNgfXAq9LsgkEicFzLnG7AL7kpo7piiVBlRdTd6/f2Joz89o3PmR3AfgHlT t5A== X-Google-Smtp-Source: AGHT+IFfulyv5+xsasTh0o5vptCFt+1BlyveHLOPxD0zksiSehNzsehsAo+ZcCPpoa6N2tkYbS8sdiPukEE= X-Received: from pjbsl4.prod.google.com ([2002:a17:90b:2e04:b0:2f5:4762:e778]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:41c3:b0:216:4883:fb43 with SMTP id d9443c01a7336-21c35594385mr6635515ad.32.1737066920414; Thu, 16 Jan 2025 14:35:20 -0800 (PST) Date: Thu, 16 Jan 2025 14:35:18 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250116035008.43404-1-yosryahmed@google.com> Message-ID: Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit From: Sean Christopherson To: Yosry Ahmed Cc: Jim Mattson , Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Thu, Jan 16, 2025, Yosry Ahmed wrote: > On Thu, Jan 16, 2025 at 9:11=E2=80=AFAM Sean Christopherson wrote: > > > > On Thu, Jan 16, 2025, Yosry Ahmed wrote: > > > On Wed, Jan 15, 2025 at 9:27=E2=80=AFPM Jim Mattson wrote: > > > > On Wed, Jan 15, 2025 at 7:50=E2=80=AFPM Yosry Ahmed wrote: > > > > > Use KVM_REQ_TLB_FLUSH_GUEST in this case in > > > > > nested_vmx_transition_tlb_flush() for consistency. This arguably = makes > > > > > more sense conceptually too -- L1 and L2 cannot share the TLB tag= for > > > > > guest-physical translations, so only flushing linear and combined > > > > > translations (i.e. guest-generated translations) is needed. > > > > No, using KVM_REQ_TLB_FLUSH_CURRENT is correct. From *L1's* perspectiv= e, VPID > > is enabled, and so VM-Entry/VM-Exit are NOT architecturally guaranteed = to flush > > TLBs, and thus KVM is not required to FLUSH_GUEST. > > > > E.g. if KVM is using shadow paging (no EPT whatsoever), and L1 has modi= fied the > > PTEs used to map L2 but has not yet flushed TLBs for L2's VPID, then KV= M is allowed > > to retain its old, "stale" SPTEs that map L2 because architecturally th= ey aren't > > guaranteed to be visible to L2. > > > > But because L1 and L2 share TLB entries *in hardware*, KVM needs to ens= ure the > > hardware TLBs are flushed. Without EPT, KVM will use different CR3s fo= r L1 and > > L2, but Intel's ASID tag doesn't include the CR3 address, only the PCID= , which > > KVM always pulls from guest CR3, i.e. could be the same for L1 and L2. > > > > Specifically, the synchronization of shadow roots in kvm_vcpu_flush_tlb= _guest() > > is not required in this scenario. >=20 > Aha, I was examining vmx_flush_tlb_guest() not > kvm_vcpu_flush_tlb_guest(), so I missed the synchronization. Yeah I > think it's possible that we end up unnecessarily synchronizing the > shadow page tables (or dropping them) in this case. >=20 > Do you think it's worth expanding the comment in > nested_vmx_transition_tlb_flush()? > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 2ed454186e59c..43d34e413d016 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -1239,6 +1239,11 @@ static void > nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu, > * does not have a unique TLB tag (ASID), i.e. EPT is disabled an= d > * KVM was unable to allocate a VPID for L2, flush the current co= ntext > * as the effective ASID is common to both L1 and L2. > + * > + * Note that even though TLB_FLUSH_GUEST would be correct because= we > + * only need to flush linear mappings, it would unnecessarily > + * synchronize the MMU even though a TLB flush is not architectur= ally > + * required from L1's perspective. I'm open to calling out that there's no flush from L1's perspective, but th= is is inaccurate. Using TLB_FLUSH_GUEST is simply not correct. Will it cause functional problems? No. But neither would blasting kvm_flush_remote_tlbs= (), and I think most people would consider flushing all TLBs on all vCPUs to be= a bug. How about: * Note, only the hardware TLB entries need to be flushed, as VPID is * fully enabled from L1's perspective, i.e. there's no architectural * TLB flush from L1's perspective. > */ > if (!nested_has_guest_tlb_tag(vcpu)) > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);