From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Jim Mattson <jmattson@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit
Date: Fri, 17 Jan 2025 16:03:30 -0800 [thread overview]
Message-ID: <Z4rv0jzFILtUxK4q@google.com> (raw)
In-Reply-To: <CAJD7tkaa1cqUeUUKNdQADBqXH-G9h=5Liv+wj=5gitgbdO9Tsw@mail.gmail.com>
On Fri, Jan 17, 2025, Yosry Ahmed wrote:
> On Fri, Jan 17, 2025 at 10:01 AM Sean Christopherson <seanjc@google.com> wrote:
> > Yep. I suspect the issue is lack of documentation for TLB_FLUSH_GUEST and
> > TLB_FLUSH_CURRENT. I'm not entirely sure where it would be best to document
> > them. I guess maybe where they are #defined?
>
> I guess at the #define we can just mention that they result in calling
> kvm_vcpu_flush_tlb_{guest/current}() before entering the guest, if
> anything.
Yeah, a "See xx for details" redirect is probably the best option.
> The specific documentation about what they do could be above the
> functions themselves, and describing the potential MMU sync is
> naturally part of documenting kvm_vcpu_flush_tlb_guest() (kinda
> already there).
>
> The flush_tlb_guest() callback is documented in kvm_host.h, but not
> flush_tlb_current(). I was going to suggest just documenting that. But
> kvm_vcpu_flush_tlb_guest() does not only call flush_tlb_guest(), but
> it also potentially synchronizes the MMU. So only documenting the
> callbacks does not paint a full picture.
>
> FTR, I initially confused myself because all kvm_vcpu_flush_tlb_*()
> functions are more-or-less thin wrappers around the per-vendor
> callbacks -- except kvm_vcpu_flush_tlb_guest().
>
> >
> > TLB_FLUSH_GUEST is used when a flush of the guest's TLB, from the guest's
> > perspective, is architecturally required. The one oddity with TLB_FLUSH_GUEST
> > is that it does NOT include guest-physical mappings, i.e. TLB entries that are
> > associated with an EPT root.
>
> The way I think about this is how it's documented above the per-vendor
> callback. It flushes translations created by the guest. The guest does
> not (directly) create guest-physical translations, only linear and
> combined translations.
That's not accurate either. When L1 is using nested TDP, it does create guest-
physical translations. The lack of any form of handling in TLB_FLUSH_GUEST is
a reflection of two things: EPT is weird, and nested SVM doesn't yet support
precise flushing on transitions, i.e. nested NPT handling is missing because KVM
unconditionally flushes and synchronizes.
EPT is "weird" because the _only_ time guest-physical translations are flushed
is when the "wrong" KVM MMU is loaded. The only way to flush guest-physical
translations (short of RESET :-D) is via INVEPT, and INVEPT is a root-only (VMX
terminology) instruction, i.e. can only be executed by L1. And because L1 can't
itself be using EPT[*], INVEPT can never target/flush the current context.
Furthermore, INVEPT isn't strictly tied to a VMCS, e.g. deferring the emulated
flush until the next time KVM runs a vmcs12 isn't viable. Rather than add
dedicated tracking, KVM simply unloads the roots and lets the normal root
"allocation" handle the flush+sync the next time the vCPU uses the associated MMU.
Nested NPT is different, as there is no INVNPT. Instead, there's the ASID itself
and a flushing control, both of which are properties of the VMCB. As a result,
NPT TLB flushes that are initiated by a hypervisor always take effect at VMRUN,
e.g. by bumping the ASID, or via the dedicated flushing control.
So when proper handling of TLB flushing on nested SVM transition comes along, I
do expect that either kvm_vcpu_flush_tlb_guest() will grow. Or maybe we'll add
yet another TLB_FLUSH_XXX flavor :-)
One thing that could be helpful would be to document that KVM doesn't use
TLB_FLUSH_GUEST to handle INVEPT, and so there's no need to sync nested TDP MMUs.
[*] Even in a deprivileged scenario like pKVM, the guest kernel would become L2
from KVM's perspective.
next prev parent reply other threads:[~2025-01-18 0:03 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 3:50 [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit Yosry Ahmed
2025-01-16 5:27 ` Jim Mattson
2025-01-16 15:25 ` Yosry Ahmed
2025-01-16 17:11 ` Sean Christopherson
2025-01-16 18:24 ` Yosry Ahmed
2025-01-16 22:35 ` Sean Christopherson
2025-01-16 22:43 ` Yosry Ahmed
2025-01-17 0:34 ` Sean Christopherson
2025-01-17 0:53 ` Yosry Ahmed
2025-01-17 18:01 ` Sean Christopherson
2025-01-17 18:20 ` Yosry Ahmed
2025-01-18 0:03 ` Sean Christopherson [this message]
2025-01-22 19:15 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4rv0jzFILtUxK4q@google.com \
--to=seanjc@google.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.