From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Gerd Hoffmann <kraxel@redhat.com>,
kvm@vger.kernel.org, rcu@vger.kernel.org,
linux-kernel@vger.kernel.org, Kevin Tian <kevin.tian@intel.com>,
Yiwei Zhang <zzyiwei@google.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Josh Triplett <josh@joshtriplett.org>
Subject: Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop
Date: Mon, 9 Sep 2024 09:04:17 -0700 [thread overview]
Message-ID: <Zt8cgUASZCN6gP8H@google.com> (raw)
In-Reply-To: <c1d420ba-13de-48dd-abee-473988172d07@redhat.com>
On Mon, Sep 09, 2024, Paolo Bonzini wrote:
> On 9/9/24 07:30, Yan Zhao wrote:
> > On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:
> > > On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:
> > > > On Wed, Sep 04, 2024, Yan Zhao wrote:
> > > > > On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> > > > > > On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > > > > > > Sean Christopherson <seanjc@google.com> writes:
> > > > > > >
> > > > > > > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > > > > > > > > FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
> > > > > > > > > but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
> > > > > > > > > Silver 4410Y".
> > > > > > > >
> > > > > > > > Has this been reproduced on any other hardware besides SPR? I.e. did we stumble
> > > > > > > > on another hardware issue?
> > > > > > >
> > > > > > > Very possible, as according to Yan Zhao this doesn't reproduce on at
> > > > > > > least "Coffee Lake-S". Let me try to grab some random hardware around
> > > > > > > and I'll be back with my observations.
> > > > > >
> > > > > > Update some new findings from my side:
> > > > > >
> > > > > > BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
> > > > > > from 0xfd000000 to 0xfe000000.
> > > > > >
> > > > > > On "Sapphire Rapids XCC":
> > > > > >
> > > > > > 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
> > > > > > correctly.
> > > > > > i.e.
> > > > > > if (gfn >= 0xfd000 && gfn < 0xfe000) {
> > > > > > return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> > > > > > }
> > > > > > return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > > > > >
> > > > > > 2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show / gdm
> > > > > > restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
> > > > > > correctly in this case).
> > > > > >
> > > > > > 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
> > > > > > this fb_map range as WC, with
> > > > > > iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, mem->size));
> > > > > >
> > > > > > However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
> > > > > > this fb_map has been reserved as uc- by ioremap().
> > > > > > Then, the ioremap_wc() during starting GDM will only map guest PAT with UC-.
> > > > > >
> > > > > > So, with KVM setting WB (no IPAT) to this fb_map range, the effective
> > > > > > memory type is UC- and installer/gdm restarts endlessly.
> > > > > >
> > > > > > 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs driver
> > > > > > to call ioremap_wc() instead in bochs_hw_init(), gdm can launch correctly.
> > > > > > (didn't verify the installer's case as I can't update the driver in that case).
> > > > > >
> > > > > > The reason is that the ioremap_wc() called during starting GDM will no longer
> > > > > > meet conflict and can map guest PAT as WC.
> > > >
> > > > Huh. The upside of this is that it sounds like there's nothing broken with WC
> > > > or self-snoop.
> > > Considering a different perspective, the fb_map range is used as frame buffer
> > > (vram), with the guest writing to this range and the host reading from it.
> > > If the issue were related to self-snooping, we would expect the VNC window to
> > > display distorted data. However, the observed behavior is that the GDM window
> > > shows up correctly for a sec and restarts over and over.
> > >
> > > So, do you think we can simply fix this issue by calling ioremap_wc() for the
> > > frame buffer/vram range in bochs driver, as is commonly done in other gpu
> > > drivers?
> > >
> > > --- a/drivers/gpu/drm/tiny/bochs.c
> > > +++ b/drivers/gpu/drm/tiny/bochs.c
> > > @@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
> > > if (pci_request_region(pdev, 0, "bochs-drm") != 0)
> > > DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
> > >
> > > - bochs->fb_map = ioremap(addr, size);
> > > + bochs->fb_map = ioremap_wc(addr, size);
> > > if (bochs->fb_map == NULL) {
> > > DRM_ERROR("Cannot map framebuffer\n");
> > > return -ENOMEM;
>
> While this is a fix for future kernels, it doesn't change the result for VMs
> already in existence.
I would prefer to bottom out on exactly whether or not the SPR/CLX behavior is
working as intended. Maybe the ~8x slowdown is just a side effect of any Intel
multi-socket/node system, but I think we should get confirmation (inasmuch as
possible) that that is indeed the case. E.g. if this is actually a bug in CLX+,
then the actions we need to take are different.
> I don't think there's an alternative to putting this behind a quirk.
This gets a bit weird, which is why I want to bottom out on whether or not CLX
and SPR are working as intended. If non-coherent DMA is attached to the VM, then
even before this patch KVM would honor guest PAT. I agree that we don't want to
break existing setups, but if CLX+SPR are working as intended, then this is
inarguably a bochs driver bug, and I would prefer to have the quirk explicitly
reference bochs-compatible devices, e.g. in the name and documentation, so that
userspace can disable the quirk by default and only leave it enabled if a bochs
device is being exposed to the guest.
next prev parent reply other threads:[~2024-09-09 16:04 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-09 1:09 [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT Sean Christopherson
2024-03-09 1:09 ` [PATCH 1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes Sean Christopherson
2024-03-11 7:44 ` Yan Zhao
2024-03-12 0:08 ` Sean Christopherson
2024-03-12 1:10 ` Dongli Zhang
2024-03-12 17:08 ` Sean Christopherson
2024-03-14 10:31 ` Dongli Zhang
2024-03-14 14:47 ` Sean Christopherson
2024-03-09 1:09 ` [PATCH 2/5] KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1 Sean Christopherson
2024-03-09 1:09 ` [PATCH 3/5] srcu: Add an API for a memory barrier after SRCU read lock Sean Christopherson
2024-03-09 1:09 ` [PATCH 4/5] KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path Sean Christopherson
2024-06-20 22:38 ` Paolo Bonzini
2024-06-20 23:42 ` Paul E. McKenney
2024-06-21 0:52 ` Yan Zhao
2024-03-09 1:09 ` [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop Sean Christopherson
2024-03-11 1:16 ` Yan Zhao
2024-03-12 0:25 ` Sean Christopherson
2024-03-12 7:30 ` Tian, Kevin
2024-03-12 16:07 ` Sean Christopherson
2024-03-13 1:18 ` Yan Zhao
2024-03-13 8:52 ` Tian, Kevin
2024-03-13 8:55 ` Yan Zhao
2024-03-13 15:09 ` Sean Christopherson
2024-03-14 0:12 ` Yan Zhao
2024-03-14 1:00 ` Sean Christopherson
2024-03-25 3:43 ` Chao Gao
2024-04-01 22:29 ` Sean Christopherson
2024-08-30 9:35 ` Vitaly Kuznetsov
2024-08-30 11:05 ` Gerd Hoffmann
2024-08-30 13:47 ` Vitaly Kuznetsov
2024-08-30 13:52 ` Sean Christopherson
2024-08-30 14:06 ` Vitaly Kuznetsov
2024-08-30 14:37 ` Vitaly Kuznetsov
2024-08-30 16:13 ` Sean Christopherson
2024-09-02 8:23 ` Gerd Hoffmann
2024-09-02 1:44 ` Yan Zhao
2024-09-02 9:49 ` Vitaly Kuznetsov
2024-09-03 0:25 ` Yan Zhao
2024-09-03 15:30 ` Sean Christopherson
2024-09-03 16:20 ` Vitaly Kuznetsov
2024-09-04 2:28 ` Yan Zhao
2024-09-04 12:17 ` Yan Zhao
2024-09-05 0:41 ` Sean Christopherson
2024-09-05 9:43 ` Yan Zhao
2024-09-09 5:30 ` Yan Zhao
2024-09-09 13:24 ` Paolo Bonzini
2024-09-09 16:04 ` Sean Christopherson [this message]
2024-09-10 1:05 ` Yan Zhao
2024-09-04 11:47 ` Vitaly Kuznetsov
2024-10-07 13:28 ` Linux regression tracking (Thorsten Leemhuis)
2024-10-07 13:38 ` Vitaly Kuznetsov
2024-10-07 14:04 ` Linux regression tracking (Thorsten Leemhuis)
2024-03-22 9:29 ` [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT Ma, Yongwei
2024-03-22 13:08 ` Yan Zhao
2024-03-25 6:56 ` Ma, XiangfeiX
2024-03-25 8:02 ` Ma, XiangfeiX
2024-06-05 23:20 ` Sean Christopherson
2024-06-06 0:03 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2025-04-10 1:13 [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop Myrsky Lintu
2025-04-10 5:12 ` Yan Zhao
2025-04-10 10:05 ` Myrsky Lintu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zt8cgUASZCN6gP8H@google.com \
--to=seanjc@google.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=kevin.tian@intel.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=rcu@vger.kernel.org \
--cc=vkuznets@redhat.com \
--cc=yan.y.zhao@intel.com \
--cc=zzyiwei@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox