public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Jim Mattson <jmattson@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit
Date: Fri, 17 Jan 2025 16:03:30 -0800	[thread overview]
Message-ID: <Z4rv0jzFILtUxK4q@google.com> (raw)
In-Reply-To: <CAJD7tkaa1cqUeUUKNdQADBqXH-G9h=5Liv+wj=5gitgbdO9Tsw@mail.gmail.com>

On Fri, Jan 17, 2025, Yosry Ahmed wrote:
> On Fri, Jan 17, 2025 at 10:01 AM Sean Christopherson <seanjc@google.com> wrote:
> > Yep.  I suspect the issue is lack of documentation for TLB_FLUSH_GUEST and
> > TLB_FLUSH_CURRENT.  I'm not entirely sure where it would be best to document
> > them.  I guess maybe where they are #defined?
> 
> I guess at the #define we can just mention that they result in calling
> kvm_vcpu_flush_tlb_{guest/current}() before entering the guest, if
> anything.

Yeah, a "See xx for details" redirect is probably the best option.

> The specific documentation about what they do could be above the
> functions themselves, and describing the potential MMU sync is
> naturally part of documenting kvm_vcpu_flush_tlb_guest() (kinda
> already there).
> 
> The flush_tlb_guest() callback is documented in kvm_host.h, but not
> flush_tlb_current(). I was going to suggest just documenting that. But
> kvm_vcpu_flush_tlb_guest() does not only call flush_tlb_guest(), but
> it also potentially synchronizes the MMU. So only documenting the
> callbacks does not paint a full picture.
> 
> FTR, I initially confused myself because all kvm_vcpu_flush_tlb_*()
> functions are more-or-less thin wrappers around the per-vendor
> callbacks -- except kvm_vcpu_flush_tlb_guest().
> 
> >
> > TLB_FLUSH_GUEST is used when a flush of the guest's TLB, from the guest's
> > perspective, is architecturally required.  The one oddity with TLB_FLUSH_GUEST
> > is that it does NOT include guest-physical mappings, i.e. TLB entries that are
> > associated with an EPT root.
> 
> The way I think about this is how it's documented above the per-vendor
> callback. It flushes translations created by the guest. The guest does
> not (directly) create guest-physical translations, only linear and
> combined translations.

That's not accurate either.  When L1 is using nested TDP, it does create guest-
physical translations.  The lack of any form of handling in TLB_FLUSH_GUEST is
a reflection of two things: EPT is weird, and nested SVM doesn't yet support
precise flushing on transitions, i.e. nested NPT handling is missing because KVM
unconditionally flushes and synchronizes.

EPT is "weird" because the _only_ time guest-physical translations are flushed
is when the "wrong" KVM MMU is loaded.  The only way to flush guest-physical
translations (short of RESET :-D) is via INVEPT, and INVEPT is a root-only (VMX
terminology) instruction, i.e. can only be executed by L1.  And because L1 can't
itself be using EPT[*], INVEPT can never target/flush the current context.

Furthermore, INVEPT isn't strictly tied to a VMCS, e.g. deferring the emulated
flush until the next time KVM runs a vmcs12 isn't viable.  Rather than add
dedicated tracking, KVM simply unloads the roots and lets the normal root
"allocation" handle the flush+sync the next time the vCPU uses the associated MMU.

Nested NPT is different, as there is no INVNPT.  Instead, there's the ASID itself
and a flushing control, both of which are properties of the VMCB.  As a result,
NPT TLB flushes that are initiated by a hypervisor always take effect at VMRUN,
e.g. by bumping the ASID, or via the dedicated flushing control.

So when proper handling of TLB flushing on nested SVM transition comes along, I
do expect that either kvm_vcpu_flush_tlb_guest() will grow.  Or maybe we'll add
yet another TLB_FLUSH_XXX flavor :-)

One thing that could be helpful would be to document that KVM doesn't use
TLB_FLUSH_GUEST to handle INVEPT, and so there's no need to sync nested TDP MMUs.

[*] Even in a deprivileged scenario like pKVM, the guest kernel would become L2
    from KVM's perspective.

  reply	other threads:[~2025-01-18  0:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16  3:50 [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit Yosry Ahmed
2025-01-16  5:27 ` Jim Mattson
2025-01-16 15:25   ` Yosry Ahmed
2025-01-16 17:11     ` Sean Christopherson
2025-01-16 18:24       ` Yosry Ahmed
2025-01-16 22:35         ` Sean Christopherson
2025-01-16 22:43           ` Yosry Ahmed
2025-01-17  0:34             ` Sean Christopherson
2025-01-17  0:53               ` Yosry Ahmed
2025-01-17 18:01                 ` Sean Christopherson
2025-01-17 18:20                   ` Yosry Ahmed
2025-01-18  0:03                     ` Sean Christopherson [this message]
2025-01-22 19:15                       ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4rv0jzFILtUxK4q@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox