public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Robert Hoo <robert.hoo.linux@gmail.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com
Subject: Re: [PATCH v2 5/6] KVM: x86: Keep a per-VM MTRR state
Date: Fri, 26 May 2023 09:09:08 -0700	[thread overview]
Message-ID: <ZHDZpHYQhPtkNnQe@google.com> (raw)
In-Reply-To: <ZHBloxxmY/EUyswL@yzhao56-desk.sh.intel.com>

On Fri, May 26, 2023, Yan Zhao wrote:
> On Thu, May 25, 2023 at 07:53:41AM -0700, Sean Christopherson wrote:
> > > > > > So, if KVM zaps SPTEs when CR0.CD is cleared (even when the quirk is enabled),
> > > > > > then KVM can skip the MTRR zaps when CR0.CD=1 because KVM is ignoring the MTRRs
> > > > > > and will zap when CR0.CD is cleared.  And to avoid regressing the CR0.CD case,
> > > > > > if KVM honors guest PAT for the bizarro CR0.CD quirk, then KVM only needs to
> > > > > > zap non-WB MTRR ranges when CR0.CD is cleared.  Since WB is weak, looking for
> > > Not only non-WB ranges are required to be zapped.
> > > Think about a scenario:
> > > (1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too.
> > > (2) after CR0.CD=1, without zap, its EPT entry memtype is still UC.
> > > (3) then guest performs MTRR disable, no zap too.
> > > (4) guest updates this fixed MTRR range to WB, and performs MTRR enable.
> > > (5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB.
> > 
> > KVM installs WB memtype when the quirk is enabled, thus no need to zap.  KVM
> > currently zaps everything when the quirk is disabled, and I'm not proposing that
> > be changed.
> I didn't explain it clearly.
> 
> (1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too. ==> EPT entry has been created here
> (2) after CR0.CD=1, because of the quirk, no zap, the created EPT entry memtype is still UC.
> (3) then guest performs MTRR disable, no zap too, according to our change.
> (4) guest updates this fixed MTRR range to WB, and performs MTRR enable.
> (5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB.==>we also need to zap WB range.

Ugh, right.  But that case can be handled by zapping non-WB ranges on CR0.CD being
set.  Hmm, and KVM would need to zap everything if CR0.CD is toggled with MTRRs
disabled, but I don't think OVMF ever does that.  Zapping on CR0.CD toggling would
would likely introduce a small performance hit for SeaBIOS due to SeaBIOS clearing
CR0.CD immediately after SIPI, i.e. with MTRRs disabled, but that's arguably a
correctness fix since the quirk means KVM incorrectly runs the vCPU with WB SPTEs
until MTRRs are programmed.

With precise+batched zapping, zapping non-WB ranges even when CR0.CD is set should
still be a healthy performance boost for OVMF.

> > It doesn't actually require non-coherent DMA within the CR0.CD=1 window though.
> If we don't need to mind non-coherent DMA within the window CR0.CD=1 to CR0.CD=0,
> do you think it's a good idea to do in this way...
> 
> (1) when CR0.CD=1, return guest mtrr type. 
> 
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7532,12 +7532,13 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>                 return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> 
>         if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
> -               if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
> -                       cache = MTRR_TYPE_WRBACK;
> -               else
> +               if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) {
> +                       cache = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
> +                       return cache << VMX_EPT_MT_EPTE_SHIFT;
> +               } else {
>                         cache = MTRR_TYPE_UNCACHABLE;
> -
> -               return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> +                       return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> +               }
>         }
> 
> 
> (2) return MTRR_TYPE_WRBACK if guest mtrr has not been enabled for once
> u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
> @@ -628,13 +635,23 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
>         struct mtrr_iter iter;
>         u64 start, end;
>         int type = -1;
>         const int wt_wb_mask = (1 << MTRR_TYPE_WRBACK)
>                                | (1 << MTRR_TYPE_WRTHROUGH);
>  
> +       if (!mtrr_state->enabled_once)
> +               return MTRR_TYPE_WRBACK;

No, because this assumes too many things about the guest, and completely falls
apart if the guest reboots.

> (3) when CR0.CD = 1 and the quirk is on, return MTRR type as if MTRR is enabled
> +       ignore_disable = kvm_read_cr0_bits(vcpu, X86_CR0_CD) &&
> +                        kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED);
> 
> -static void mtrr_lookup_start(struct mtrr_iter *iter)
> +static void mtrr_lookup_start(struct mtrr_iter *iter, bool ignore_disable)
>  {
> -       if (!mtrr_is_enabled(iter->mtrr_state)) {
> +       if (!ignore_disable && !mtrr_is_enabled(iter->mtrr_state)) {
>                 iter->mtrr_disabled = true;
>                 return;
>         }
> 
> (4) Then we only need to do EPT zap when MTRR is enabled for the first time
> and when MTRR fixed/var entries are changed at enabling MTRR or at CR0.CD=0
> (I prefer at enabling MTRR, as seabios may do the MTRR disable/update/enable when
> CR0.CD=0)
> 
> 
> Besides, accoding to the commit message introducing KVM_QUIRK_CD_NW_CLEARED,

I am not willing to lean on the historical commit messages for the quirk to
justify any change.  I'm not at all convinced that there was any meaningful thought
put into ensuring correctness.

> we can return MTRR_TYPE_WRBACK for CR0.CD=1 only when MTRR is not enbled for
> once. (And I tried, it works!)

On your system, for your setup.  The quirk terrifies me because it likely affects
every KVM-based VM out there (I can't find a single VMM that disables the quirk).
These changes are limited to VMs with noncoherent DMA, but still.

In short, I am steadfastly against any change that piles more arbitrary behavior
functional behavior on top of the quirk, especially when that behavior relies on
heuristics to try and guess what the guest is doing.

  reply	other threads:[~2023-05-26 16:10 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-09 13:48 [PATCH v2 0/6] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
2023-05-09 13:50 ` [PATCH v2 1/6] KVM: x86/mmu: add a new mmu zap helper to indicate memtype changes Yan Zhao
2023-05-10  5:30   ` Chao Gao
2023-05-10  8:06     ` Yan Zhao
2023-05-23 22:51       ` Sean Christopherson
2023-05-24  2:22         ` Yan Zhao
2023-05-24 14:50           ` Sean Christopherson
2023-05-25 10:14             ` Yan Zhao
2023-05-25 15:54               ` Sean Christopherson
2023-05-30  1:32                 ` Yan Zhao
2023-05-30  9:48                 ` Yan Zhao
2023-05-30 23:51                   ` Sean Christopherson
2023-05-31  0:18                     ` Yan Zhao
2023-05-09 13:51 ` [PATCH v2 2/6] KVM: x86/mmu: only zap EPT when guest CR0_CD changes Yan Zhao
2023-05-09 13:51 ` [PATCH v2 3/6] KVM: x86/mmu: only zap EPT when guest MTRR changes Yan Zhao
2023-05-10  5:39   ` Chao Gao
2023-05-10  8:00     ` Yan Zhao
2023-05-10 10:54       ` Huang, Kai
2023-05-11  0:15         ` Yan Zhao
2023-05-11  2:42           ` Huang, Kai
2023-05-11  2:31             ` Yan Zhao
2023-05-11  3:05               ` Huang, Kai
2023-05-09 13:52 ` [PATCH v2 4/6] KVM: x86/mmu: Zap all EPT leaf entries according noncoherent DMA count Yan Zhao
2023-05-09 13:53 ` [PATCH v2 5/6] KVM: x86: Keep a per-VM MTRR state Yan Zhao
2023-05-10 17:23   ` David Matlack
2023-05-21  3:44   ` Robert Hoo
2023-05-23  6:21     ` Yan Zhao
2023-05-24  0:13       ` Sean Christopherson
2023-05-24 11:03         ` Yan Zhao
2023-05-24 18:21           ` Sean Christopherson
2023-05-25 10:09             ` Yan Zhao
2023-05-25 14:53               ` Sean Christopherson
2023-05-26  7:54                 ` Yan Zhao
2023-05-26 16:09                   ` Sean Christopherson [this message]
2023-05-30  1:19                     ` Yan Zhao
2023-05-25  7:21       ` Robert Hoo
2023-05-25 15:00         ` Sean Christopherson
2023-05-26  1:49           ` Robert Hoo
2023-05-09 13:53 ` [PATCH v2 6/6] KVM: x86/mmu: use per-VM based MTRR for EPT Yan Zhao
2023-05-24  0:15 ` [PATCH v2 0/6] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
2023-05-24 11:04   ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHDZpHYQhPtkNnQe@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=robert.hoo.linux@gmail.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox