From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Yibo Huang <ybhuang@cs.utexas.edu>, kvm@vger.kernel.org
Subject: Re: A question about how the KVM emulates the effect of guest MTRRs on AMD platforms
Date: Tue, 7 Nov 2023 10:06:02 -0800 [thread overview]
Message-ID: <ZUp8iqBDm_Ylqiau@google.com> (raw)
In-Reply-To: <ZUoCxyNsc/dB4/eN@yzhao56-desk.sh.intel.com>
On Tue, Nov 07, 2023, Yan Zhao wrote:
> On Mon, Nov 06, 2023 at 02:34:08PM -0800, Sean Christopherson wrote:
> > On Wed, Nov 01, 2023, Yan Zhao wrote:
> > > On Tue, Oct 31, 2023 at 08:14:41AM -0700, Sean Christopherson wrote:
>
> > > If no #MC, could EPT type of guest RAM also be set to WB (without IPAT) even
> > > without non-coherent DMA?
> >
> > No, there are snooping/ordering issues on Intel, and to a lesser extent AMD. AMD's
> > WC+ solves the most straightfoward cases, e.g. WC+ snoops caches, and VMRUN and
> > #VMEXIT flush the WC buffers to ensure that guest writes are visible and #VMEXIT
> > (and vice versa). That may or may not be sufficient for multi-threaded use cases,
> > but I've no idea if there is actually anything to worry about on that front. I
> > think there's also a flaw with guest using UC, which IIUC doesn't snoop caches,
> > i.e. the guest could get stale data.
> >
> > AFAIK, Intel CPUs don't provide anything like WC+, so KVM would have to provide
> > something similar to safely let the guest control memtypes. Arguably, KVM should
> > have such mechansisms anyways, e.g. to make non-coherent DMA VMs more robust.
> >
> > But even then, there's still the question of why, i.e. what would be the benefit
> > of letting the guest control memtypes when it's not required for functional
> > correctness, and would that benefit outweight the cost.
>
> Ok, so for a coherent device , if it's assigned together with a non-coherent
> device, and if there's a page with host PAT = WB and guest PAT=UC, we need to
> ensure the host write is flushed before guest read/write and guest DMA though no
> need to worry about #MC, right?
It's not even about devices, it applies to all non-MMIO memory, i.e. unless the
host forces UC for a given page, there's potential for WB vs. WC/UC issues.
> > > > > For CR0_CD=1,
> > > > > - w/o KVM_X86_QUIRK_CD_NW_CLEARED, it meets (b), but breaks (a).
> > > > > - w/ KVM_X86_QUIRK_CD_NW_CLEARED, with IPAT=1, it meets (a), but breaks (b);
> > > > > with IPAT=0, it may breaks (a), but meets (b)
> > > >
> > > > CR0.CD=1 is a mess above and beyond memtypes. Huh. It's even worse than I thought,
> > > > because according to the SDM, Atom CPUs don't support no-fill mode:
> > > >
> > > > 3. Not supported In Intel Atom processors. If CD = 1 in an Intel Atom processor,
> > > > caching is disabled.
> > > >
> > > > Before I read that blurb about Atom CPUs, what I was going to say is that, AFAIK,
> > > > it's *impossible* to accurately virtualize CR0.CD=1 on VMX because there's no way
> > > > to emulate no-fill mode.
> > > >
> > > > > > Discussion from the EPT+MTRR enabling thread[*] more or less confirms that Sheng
> > > > > > Yang was trying to resolve issues with passthrough MMIO.
> > > > > >
> > > > > > * Sheng Yang
> > > > > > : Do you mean host(qemu) would access this memory and if we set it to guest
> > > > > > : MTRR, host access would be broken? We would cover this in our shadow MTRR
> > > > > > : patch, for we encountered this in video ram when doing some experiment with
> > > > > > : VGA assignment.
> > > > > >
> > > > > > And in the same thread, there's also what appears to be confirmation of Intel
> > > > > > running into issues with Windows XP related to a guest device driver mapping
> > > > > > DMA with WC in the PAT. Hilariously, Avi effectively said "KVM can't modify the
> > > > > > SPTE memtype to match the guest for EPT/NPT", which while true, completely overlooks
> > > > > > the fact that EPT and NPT both honor guest PAT by default. /facepalm
> > > > >
> > > > > My interpretation is that the since guest PATs are in guest page tables,
> > > > > while with EPT/NPT, guest page tables are not shadowed, it's not easy to
> > > > > check guest PATs to disallow host QEMU access to non-WB guest RAM.
> > > >
> > > > Ah, yeah, your interpretation makes sense.
> > > >
> > > > The best idea I can think of to support things like this is to have KVM grab the
> > > > effective PAT memtype from the host userspace page tables, shove that into the
> > > > EPT/NPT memtype, and then ignore guest PAT. I don't if that would actually work
> > > > though.
> > > Hmm, it might not work. E.g. in GPU, some MMIOs are mapped as UC-, while some
> > > others as WC, even they belong to the same BAR.
> > > I don't think host can know which one to choose in advance.
> > > I think it should be also true to RAM range, guest can do memremap to a memory
> > > type that host doesn't know beforehand.
> >
> > The goal wouldn't be to honor guest memtype, it would be to ensure correctness.
> > E.g. guest can do memremap all it wants, and KVM will always ignore the guest's
> > memtype.
> AFAIK, some GPUs with TTM driver may call set_pages_array_uc() to convert pages
> to PAT=UC-(e.g. for doorbell). Intel i915 also could vmap a page with PAT=WC
> (e.g. for some command buffer, see i915_gem_object_map_page()).
> It's not easy for host to know which guest pages are allocated by guest driver
> for such UC/WC conversion, and it should have problem to map such pages as "WB +
> ignore guest PAT" if the device is non-coherent.
Ah, right, I was thinking specifically of virtio-gpu, where there is more explicit
coordination between guest and host regarding the buffers. Drat.
next prev parent reply other threads:[~2023-11-07 18:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-12 23:04 A question about how the KVM emulates the effect of guest MTRRs on AMD platforms Yibo Huang
2023-10-27 23:13 ` Sean Christopherson
2023-10-30 12:16 ` Yan Zhao
2023-10-30 19:24 ` Sean Christopherson
[not found] ` <3E43ADC6-E817-411A-9EBF-B16142B9B478@cs.utexas.edu>
2023-10-30 21:52 ` Sean Christopherson
2023-11-01 3:07 ` Yibo Huang
2023-10-31 10:01 ` Yan Zhao
2023-10-31 15:14 ` Sean Christopherson
2023-11-01 3:53 ` Huang, Kai
2023-11-01 9:08 ` Yan Zhao
2023-11-06 22:34 ` Sean Christopherson
2023-11-07 9:26 ` Yan Zhao
2023-11-07 18:06 ` Sean Christopherson [this message]
2023-11-08 4:32 ` Yan Zhao
2023-11-10 17:09 ` Sean Christopherson
2023-11-13 8:07 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZUp8iqBDm_Ylqiau@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=ybhuang@cs.utexas.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.