Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] KVM: nVMX: Fix ept=n bugs where KVM runs L2 with guest CR3
Date: Thu, 4 Jun 2026 06:15:34 -0700	[thread overview]
Message-ID: <aiF6dg4_9mA-ID-5@google.com> (raw)
In-Reply-To: <CALMp9eT2mBWgX9VntLJcRYDCeMdLzZwzmPbHrbcM6CRL2tYT0g@mail.gmail.com>

On Wed, Jun 03, 2026, Jim Mattson wrote:
> On Wed, Jun 3, 2026 at 3:34 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Fix two bugs where KVM can run L2 with a guest-controlled CR3.  The underlying
> > flaw dates back to commit f087a02941fe ("KVM: nVMX: Stash L1's CR3 in
> > vmcs01.GUEST_CR3 on nested entry w/o EPT").  Past me claimed:
> >
> >   Smashing vmcs01.GUEST_CR3 is safe because nested VM-Exits, and the unwind,
> >   reset KVM's MMU, i.e. vmcs01.GUEST_CR3 is guaranteed to be overwritten with
> >   a shadow CR3 prior to re-entering L1.
> >
> > which was and is true, _if_ a nested VM-Exit or the unwind is reached.  If KVM
> > fails directly, vmcs01.GUEST_CR3 will be left pointing at L1's actual CR3, i.e.
> > KVM will run with legacy shadow paging a guest-controlled CR3, which is... not
> > good.
> >
> > Note, the vTPR fix will cause KVM-Unit-Test's TPR Threshold test to fail when
> > run with warn_on_missed_cc=1:
> >
> >   FAIL: Use TPR shadow enabled: virtual-APIC address = 8000000: vmlaunch succeeds
> >   FAIL: Use TPR shadow enabled: virtual-APIC address = 10000000: vmlaunch succeeds
> >   ...
> >   FAIL: Use TPR shadow enabled: virtual-APIC address = ffffffffff000: vmlaunch succeeds
> >
> > due to KVM actually accessing the virtual APIC page.  Given that I'm pretty
> > sure I'm the only person that runs with warn_on_missed_cc=1, and that IMO this
> > is firmly a test bug (e.g. on bare metal I'm pretty sure there's guarantee
> > VMLAUNCH will succeed), I don't see any reason to try and make it "work" in
> > KVM.  I might try to figure out a way to make the KUT testcase play nice, but
> > for now I'll just ignore the failures in my test runs.
> 
> IIRC, the tests in question are confirming PCI bus error semantics.
> Why would the VMLAUNCH not succeed on bare metal?

Well, I was assuming it would fail because it couldn't actually guarantee PCI Bus
Errors, as it could very well stumble into actual device memory.  But the test
leaves TPR_THRESHOLD as '0', and so regardless of what value the CPU gets back,
consistency check will still pass.

But!  I'm pretty sure the test would generate #MCs, not PCI bus errors.  The
SDM very, very strongly implies that the reads will use WB:

  Bits 53:50 report the memory type that should be used for the VMCS, for
  data structures referenced by pointers in the VMCS (I/O bitmaps,
  virtual-APIC page, MSR areas for VMX transitions), and for the MSEG header.
  ^^^^^^^^^^^^^^^^^

  If software needs to access these data structures (e.g., to modify the
  contents of the MSR bitmaps), it can configure the paging structures to map
  them into the linear-address space. If it does so, it should establish
  mappings that use the memory type reported bits 53:50 in this MSR.

  As of this writing, all processors that support VMX operation indicate the 
  write-back type. The values used are given in Table A-1.

And _that_ will definitely cause problems, especially if the read hits device
memory.

That said, KVM's de facto ABI is that VMX instructions get PCI Bus Error semantics
on accesses KVM can't handle, and it's just as easy to skip the consistency check.
Since a read of 0xff guarantees the vTPR >= TPR_THRESHOLD, the check will pass
regardless of TPR_THRESHOLD.

So, other than my stubbornness :-D, there's no reason to deliberately fail the
check if KVM can't read memory.  I'll go with this for v2:

	gpa_t vtpr_gpa = vmcs12->virtual_apic_page_addr + APIC_TASKPRI;
	u32 vtpr;

	if (!nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
		return 0;

	if (CC(!page_address_valid(vcpu, vmcs12->virtual_apic_page_addr)))
		return -EINVAL;

	if (CC(!nested_cpu_has_vid(vmcs12) && vmcs12->tpr_threshold >> 4))
		return -EINVAL;

	/*
	 * Do the illegal vTPR vs. TPR Threshold consistency check if and only
	 * if KVM is configured to WARN on missed consistency checks, otherwise
	 * it's a waste of time.  KVM needs to rely on hardware to fully detect
	 * an illegal combination due to the vTPR being writable by L1 at all
	 * times (it's an in-memory value, not a VMCS field).  I.e. even if the
	 * check passes now, it might fail at the actual VM-Enter.
	 *
	 * If reading guest memory fails, skip the check as KVM's de facto ABI
	 * for VMX instruction accesses to non-existent memory is to provide
	 * PCI Bus Error semantics (reads return 0xFFs), in which case the vTPR
	 * is guaranteed to greater than or equal to the threshold.
	 *
	 * Note!  Deliberately use the VM-scoped API when reading guest memory,
	 * to ensure the read doesn't hit SMRAM when restoring L2 state on RSM,
	 * and only perform the check when in KVM_RUN, to avoid a false failure
	 * if userspace hasn't yet configured memslots during state restore.
	 */
	if (warn_on_missed_cc && vcpu->wants_to_run &&
	    nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) &&
	    !nested_cpu_has_vid(vmcs12) &&
	    !nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) &&
	    !kvm_read_guest(vcpu->kvm, vtpr_gpa, &vtpr, sizeof(vtpr)) &&
	    CC((vmcs12->tpr_threshold & GENMASK(3, 0)) > ((vtpr >> 4) & GENMASK(3, 0))))
		return -EINVAL;

	return 0;

  reply	other threads:[~2026-06-04 13:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 22:34 [PATCH 0/2] KVM: nVMX: Fix ept=n bugs where KVM runs L2 with guest CR3 Sean Christopherson
2026-06-03 22:34 ` [PATCH 1/2] KVM: nVMX: Move vTPR vs. TPR Threshold consistency check into "normal" checks Sean Christopherson
2026-06-03 22:34 ` [PATCH 2/2] KVM: nVMX: Don't use vmcs01.GUEST_CR3 to snapshot L1's CR3 when EPT is disabled Sean Christopherson
2026-06-03 22:50   ` sashiko-bot
2026-06-03 22:57     ` Sean Christopherson
2026-06-04  6:23 ` [PATCH 0/2] KVM: nVMX: Fix ept=n bugs where KVM runs L2 with guest CR3 Jim Mattson
2026-06-04 13:15   ` Sean Christopherson [this message]
2026-06-04 14:13     ` Jim Mattson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiF6dg4_9mA-ID-5@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox