From: Alex Williamson <alex.williamson@redhat.com>
To: Peter Xu <peterx@redhat.com>,
Athul Krishna <athul.krishna.kr@protonmail.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Linux PCI <linux-pci@vger.kernel.org>,
"regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: [bugzilla-daemon@kernel.org: [Bug 219619] New: vfio-pci: screen graphics artifacts after 6.12 kernel upgrade]
Date: Mon, 23 Dec 2024 11:15:25 -0700 [thread overview]
Message-ID: <20241223111525.0249057e.alex.williamson@redhat.com> (raw)
In-Reply-To: <Z2mW2k8GfP7S0c5M@x1n>
On Mon, 23 Dec 2024 11:59:06 -0500
Peter Xu <peterx@redhat.com> wrote:
> On Mon, Dec 23, 2024 at 07:37:46AM +0000, Athul Krishna wrote:
> > Can confirm. Reverting f9e54c3a2f5b from v6.13-rc1 fixed the problem.
>
> I suppose Alex should have some more thoughts, probably after the holidays.
> Before that, one quick question to ask..
Yeah, apologies in advance for latency over the next couple weeks.
> > -------- Original Message --------
> > On 23/12/24 04:06, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > > Forwarding since not everybody follows bugzilla. Apparently bisected
> > > to f9e54c3a2f5b ("vfio/pci: implement huge_fault support").
> > >
> > > Athul, f9e54c3a2f5b appears to revert cleanly from v6.13-rc1. Can you
> > > verify that reverting it is enough to avoid these artifacts?
> > >
> > > #regzbot introduced: f9e54c3a2f5b ("vfio/pci: implement huge_fault support")
> > >
> > > ----- Forwarded message from bugzilla-daemon@kernel.org -----
> > >
> > > Date: Sat, 21 Dec 2024 10:10:02 +0000
> > > From: bugzilla-daemon@kernel.org
> > > To: bjorn@helgaf9e54c3a2f5bas.com
> > > Subject: [Bug 219619] New: vfio-pci: screen graphics artifacts after 6.12 kernel upgrade
> > > Message-ID: <bug-219619-41252@https.bugzilla.kernel.org/>
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=219619
> > >
> > > Bug ID: 219619
> > > Summary: vfio-pci: screen graphics artifacts after 6.12 kernel
> > > upgrade
> > > Product: Drivers
> > > Version: 2.5
> > > Hardware: AMD
> > > OS: Linux
> > > Status: NEW
> > > Severity: normal
> > > Priority: P3
> > > Component: PCI
> > > Assignee: drivers_pci@kernel-bugs.osdl.org
> > > Reporter: athul.krishna.kr@protonmail.com
> > > Regression: No
> > >
> > > Created attachment 307382
> > > --> https://bugzilla.kernel.org/attachment.cgi?id=307382&action=edit
> > > dmesg
>
> vfio-pci 0000:03:00.0: vfio_bar_restore: reset recovery - restoring BARs
Is the reset recovery message seen even with the suspect commit
reverted? Timestamps here would be useful for correlation.
> pcieport 0000:00:01.1: AER: Multiple Uncorrectable (Non-Fatal) error message received from 0000:03:00.1
> vfio-pci 0000:03:00.0: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)
> vfio-pci 0000:03:00.0: device [1002:73ef] error status/mask=00100000/00000000
> vfio-pci 0000:03:00.0: [20] UnsupReq (First)
> vfio-pci 0000:03:00.0: AER: TLP Header: 60001004 000000ff 0000007d fe7eb000
> vfio-pci 0000:03:00.1: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)
> vfio-pci 0000:03:00.1: device [1002:ab28] error status/mask=00100000/00000000
> vfio-pci 0000:03:00.1: [20] UnsupReq (First)
> vfio-pci 0000:03:00.1: AER: TLP Header: 60001004 000000ff 0000007d fe7eb000
> vfio-pci 0000:03:00.1: AER: Error of this Agent is reported first
> pcieport 0000:02:00.0: AER: broadcast error_detected message
> pcieport 0000:02:00.0: AER: broadcast mmio_enabled message
> pcieport 0000:02:00.0: AER: broadcast resume message
> pcieport 0000:02:00.0: AER: device recovery successful
> pcieport 0000:02:00.0: AER: broadcast error_detected message
> pcieport 0000:02:00.0: AER: broadcast mmio_enabled message
> pcieport 0000:02:00.0: AER: broadcast resume message
> pcieport 0000:02:00.0: AER: device recovery successful
>
> > >
> > > Device: Asus Zephyrus GA402RJ
> > > CPU: Ryzen 7 6800HS
> > > GPU: RX 6700S
> > > Kernel: 6.13.0-rc3-g8faabc041a00
> > >
> > > Problem:
> > > Launching games or gpu bench-marking tools in qemu windows 11 vm will cause
> > > screen artifacts, ultimately qemu will pause with unrecoverable error.
>
> Is there more information on what setup can reproduce it?
>
> For example, does it only happen with Windows guests? Does the GPU
> vendor/model matter?
And the CPU vendor, this was predominately tested by me on Intel +
NVIDIA. I'm also not seeing any similar reports on r/VFIO, which is a
bit strange as there are a lot of bleeding edge users there. The bz is
reported against 6.13.0-rc3-g8faabc041a00 and a revert against
v6.13-rc1 was reported as stable. Has this actually been confirmed on
v6.12, or might something in v6.13-rc have introduced a new issue?
> > > Commit:
> > > f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101 is the first bad commit
> > > commit f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101
> > > Author: Alex Williamson <alex.williamson@redhat.com>
> > > Date: Mon Aug 26 16:43:53 2024 -0400
> > >
> > > vfio/pci: implement huge_fault support
>
> Personally I have no clue yet on how this could affect it. I was initially
> worrying on any implicit cache mode changes on the mappings, but I don't
> think any of such was involved in this specific change.
>
> This commit majorly does two things: (1) allow 2M/1G mappings for BARs
> instead of small 4Ks always, and (2) always lazy faults rather than
> "install everything in the 1st fault". Maybe one of the two could have
> some impact in some way.
Athul, can you test reverting both f9e54c3a2f5b and d71a989cf5d9? That
would provide the faulting behavior without yet making use of huge
pfnmaps. Thanks,
Alex
next prev parent reply other threads:[~2024-12-23 18:15 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-22 22:36 [bugzilla-daemon@kernel.org: [Bug 219619] New: vfio-pci: screen graphics artifacts after 6.12 kernel upgrade] Bjorn Helgaas
2024-12-23 7:37 ` Athul Krishna
2024-12-23 16:59 ` Peter Xu
2024-12-23 18:15 ` Alex Williamson [this message]
2024-12-24 18:06 ` Athul Krishna
2024-12-30 21:03 ` Precific
2024-12-31 1:27 ` Alex Williamson
2024-12-31 15:44 ` Precific
2024-12-31 16:07 ` Alex Williamson
2025-01-01 3:10 ` Precific
2025-01-02 16:39 ` Peter Xu
2025-01-02 17:04 ` Alex Williamson
2025-01-02 18:38 ` Alex Williamson
2025-02-25 17:59 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241223111525.0249057e.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=athul.krishna.kr@protonmail.com \
--cc=helgaas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox