Re: A question about how the KVM emulates the effect of guest MTRRs on AMD platforms

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Yibo Huang <ybhuang@cs.utexas.edu>, kvm@vger.kernel.org
Subject: Re: A question about how the KVM emulates the effect of guest MTRRs on AMD platforms
Date: Tue, 7 Nov 2023 10:06:02 -0800	[thread overview]
Message-ID: <ZUp8iqBDm_Ylqiau@google.com> (raw)
In-Reply-To: <ZUoCxyNsc/dB4/eN@yzhao56-desk.sh.intel.com>

On Tue, Nov 07, 2023, Yan Zhao wrote:
> On Mon, Nov 06, 2023 at 02:34:08PM -0800, Sean Christopherson wrote:
> > On Wed, Nov 01, 2023, Yan Zhao wrote:
> > > On Tue, Oct 31, 2023 at 08:14:41AM -0700, Sean Christopherson wrote:
> 
> > > If no #MC, could EPT type of guest RAM also be set to WB (without IPAT) even
> > > without non-coherent DMA?
> > 
> > No, there are snooping/ordering issues on Intel, and to a lesser extent AMD.  AMD's
> > WC+ solves the most straightfoward cases, e.g. WC+ snoops caches, and VMRUN and
> > #VMEXIT flush the WC buffers to ensure that guest writes are visible and #VMEXIT
> > (and vice versa).  That may or may not be sufficient for multi-threaded use cases,
> > but I've no idea if there is actually anything to worry about on that front.  I
> > think there's also a flaw with guest using UC, which IIUC doesn't snoop caches,
> > i.e. the guest could get stale data.
> > 
> > AFAIK, Intel CPUs don't provide anything like WC+, so KVM would have to provide
> > something similar to safely let the guest control memtypes.  Arguably, KVM should
> > have such mechansisms anyways, e.g. to make non-coherent DMA VMs more robust.
> > 
> > But even then, there's still the question of why, i.e. what would be the benefit
> > of letting the guest control memtypes when it's not required for functional
> > correctness, and would that benefit outweight the cost.
> 
> Ok, so for a coherent device , if it's assigned together with a non-coherent
> device, and if there's a page with host PAT = WB and guest PAT=UC, we need to
> ensure the host write is flushed before guest read/write and guest DMA though no
> need to worry about #MC, right?

It's not even about devices, it applies to all non-MMIO memory, i.e. unless the
host forces UC for a given page, there's potential for WB vs. WC/UC issues.

> > > > > For CR0_CD=1,
> > > > > - w/o KVM_X86_QUIRK_CD_NW_CLEARED, it meets (b), but breaks (a).
> > > > > - w/  KVM_X86_QUIRK_CD_NW_CLEARED, with IPAT=1, it meets (a), but breaks (b);
> > > > >                                    with IPAT=0, it may breaks (a), but meets (b)
> > > > 
> > > > CR0.CD=1 is a mess above and beyond memtypes.  Huh.  It's even worse than I thought,
> > > > because according to the SDM, Atom CPUs don't support no-fill mode:
> > > > 
> > > >   3. Not supported In Intel Atom processors. If CD = 1 in an Intel Atom processor,
> > > >      caching is disabled.
> > > > 
> > > > Before I read that blurb about Atom CPUs, what I was going to say is that, AFAIK,
> > > > it's *impossible* to accurately virtualize CR0.CD=1 on VMX because there's no way
> > > > to emulate no-fill mode.
> > > > 
> > > > > > Discussion from the EPT+MTRR enabling thread[*] more or less confirms that Sheng
> > > > > > Yang was trying to resolve issues with passthrough MMIO.
> > > > > > 
> > > > > >  * Sheng Yang 
> > > > > >   : Do you mean host(qemu) would access this memory and if we set it to guest 
> > > > > >   : MTRR, host access would be broken? We would cover this in our shadow MTRR 
> > > > > >   : patch, for we encountered this in video ram when doing some experiment with 
> > > > > >   : VGA assignment. 
> > > > > > 
> > > > > > And in the same thread, there's also what appears to be confirmation of Intel
> > > > > > running into issues with Windows XP related to a guest device driver mapping
> > > > > > DMA with WC in the PAT.  Hilariously, Avi effectively said "KVM can't modify the
> > > > > > SPTE memtype to match the guest for EPT/NPT", which while true, completely overlooks
> > > > > > the fact that EPT and NPT both honor guest PAT by default.  /facepalm
> > > > > 
> > > > > My interpretation is that the since guest PATs are in guest page tables,
> > > > > while with EPT/NPT, guest page tables are not shadowed, it's not easy to
> > > > > check guest PATs  to disallow host QEMU access to non-WB guest RAM.
> > > > 
> > > > Ah, yeah, your interpretation makes sense.
> > > > 
> > > > The best idea I can think of to support things like this is to have KVM grab the
> > > > effective PAT memtype from the host userspace page tables, shove that into the
> > > > EPT/NPT memtype, and then ignore guest PAT.  I don't if that would actually work
> > > > though.
> > > Hmm, it might not work. E.g. in GPU, some MMIOs are mapped as UC-, while some
> > > others as WC, even they belong to the same BAR.
> > > I don't think host can know which one to choose in advance.
> > > I think it should be also true to RAM range, guest can do memremap to a memory
> > > type that host doesn't know beforehand.
> > 
> > The goal wouldn't be to honor guest memtype, it would be to ensure correctness.
> > E.g. guest can do memremap all it wants, and KVM will always ignore the guest's
> > memtype.
> AFAIK, some GPUs with TTM driver may call set_pages_array_uc() to convert pages
> to PAT=UC-(e.g. for doorbell). Intel i915 also could vmap a page with PAT=WC
> (e.g. for some command buffer, see i915_gem_object_map_page()).
> It's not easy for host to know which guest pages are allocated by guest driver
> for such UC/WC conversion, and it should have problem to map such pages as "WB +
> ignore guest PAT" if the device is non-coherent.

Ah, right, I was thinking specifically of virtio-gpu, where there is more explicit
coordination between guest and host regarding the buffers.  Drat.

next prev parent reply	other threads:[~2023-11-07 18:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-12 23:04 A question about how the KVM emulates the effect of guest MTRRs on AMD platforms Yibo Huang
2023-10-27 23:13 ` Sean Christopherson
2023-10-30 12:16   ` Yan Zhao
2023-10-30 19:24     ` Sean Christopherson
     [not found]       ` <3E43ADC6-E817-411A-9EBF-B16142B9B478@cs.utexas.edu>
2023-10-30 21:52         ` Sean Christopherson
2023-11-01  3:07           ` Yibo Huang
2023-10-31 10:01       ` Yan Zhao
2023-10-31 15:14         ` Sean Christopherson
2023-11-01  3:53           ` Huang, Kai
2023-11-01  9:08           ` Yan Zhao
2023-11-06 22:34             ` Sean Christopherson
2023-11-07  9:26               ` Yan Zhao
2023-11-07 18:06                 ` Sean Christopherson [this message]
2023-11-08  4:32                   ` Yan Zhao
2023-11-10 17:09                     ` Sean Christopherson
2023-11-13  8:07                       ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUp8iqBDm_Ylqiau@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ybhuang@cs.utexas.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox