From: Bjorn Helgaas <helgaas@kernel.org>
To: Mario Limonciello <superm1@kernel.org>
Cc: Alex Williamson <alex.williamson@redhat.com>,
mario.limonciello@amd.com, bhelgaas@google.com,
Thomas Zimmermann <tzimmermann@suse.de>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI/VGA: Look at all PCI display devices in VGA arbiter
Date: Fri, 13 Jun 2025 15:31:30 -0500 [thread overview]
Message-ID: <20250613203130.GA974345@bhelgaas> (raw)
In-Reply-To: <3a0a7aeb-436d-442a-bede-9e760a69fa47@kernel.org>
On Fri, Jun 13, 2025 at 02:31:10PM -0500, Mario Limonciello wrote:
> On 6/13/2025 2:07 PM, Bjorn Helgaas wrote:
> > On Thu, Jun 12, 2025 at 10:12:14PM -0500, Mario Limonciello wrote:
> > > From: Mario Limonciello <mario.limonciello@amd.com>
> > >
> > > On an A+N mobile system the APU is not being selected by some desktop
> > > environments for any rendering tasks. This is because the neither GPU
> > > is being treated as "boot_vga" but that is what some environments use
> > > to select a GPU [1].
> >
> > What is "A+N" and "APU"?
>
> A+N is meant to refer to an AMD APU + NVIDIA dGPU.
> APU is an SoC that contains a lot more IP than just x86 cores. In this
> context it contains both integrated graphics and display IP.
So I guess "APU is not being selected" refers to the AMD APU?
> > I didn't quite follow the second sentence. I guess you're saying some
> > userspace environments use the "boot_vga" sysfs file to select a GPU?
> > And on this A+N system, neither device has the file?
>
> Yeah. Specifically this problem happens in Xorg that the library it uses
> (libpciaccess) looks for a boot_vga file. That file isn't found on either
> GPU and so Xorg picks the first GPU it finds in the PCI heirarchy which
> happens to be the NVIDIA GPU.
>
> > > The VGA arbiter driver only looks at devices that report as PCI display
> > > VGA class devices. Neither GPU on the system is a display VGA class
> > > device:
> > >
> > > c5:00.0 3D controller: NVIDIA Corporation Device 2db9 (rev a1)
> > > c6:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 150e (rev d1)
> > >
> > > So neither device gets the boot_vga sysfs file. The vga_is_boot_device()
> > > function already has some handling for integrated GPUs by looking at the
> > > ACPI HID entries, so if all PCI display class devices are looked at it
> > > actually can detect properly with this device ordering. However if there
> > > is a different ordering it could flag the wrong device.
> > >
> > > Modify the VGA arbiter code and matching sysfs file entries to examine all
> > > PCI display class devices. After every device is added to the arbiter list
> > > make a pass on all devices and explicitly check whether one is integrated.
> > >
> > > This will cause all GPUs to gain a `boot_vga` file, but the correct device
> > > (APU in this case) will now show `1` and the incorrect device shows `0`.
> > > Userspace then picks the right device as well.
> >
> > What determines whether the APU is the "correct" device?
>
> In this context is the one that is physically connected to the eDP panel.
> When the "wrong" one is selected then it is used for rendering.
How does the code figure out which is connected to the eDP panel? I
didn't see anything in the patch that I can relate to this. This
needs to be something people who are not AMD and NVIDIA experts can
figure out in five years.
It feels like we're fixing a point problem and next month's system
might have the opposite issue, and we won't know how to make both
systems work.
> Without this patch the net outcome ends up being that the APU display
> hardware drives the eDP but the desktop is rendered using the N dGPU. There
> is a lot of latency in doing it this way, and worse off the N dGPU stays
> powered on at all times. It never enters into runtime PM.
> > > +{
> > > + struct pci_dev *candidate = vga_default_device();
> > > + struct vga_device *vgadev;
> > > +
> > > + list_for_each_entry(vgadev, &vga_list, list) {
> > > + if (vga_is_boot_device(vgadev)) {
> > > + /* check if one is an integrated GPU, use that if so */
> > > + if (candidate) {
> > > + if (vga_arb_integrated_gpu(&candidate->dev))
> > > + break;
> > > + if (vga_arb_integrated_gpu(&vgadev->pdev->dev)) {
> > > + candidate = vgadev->pdev;
> > > + break;
> > > + }
> > > + } else
> > > + candidate = vgadev->pdev;
> > > + }
> > > + }
> > > +
> > > + if (candidate)
> > > + vga_set_default_device(candidate);
> > > +}
> >
> > It looks like this is related to the integrated GPU code in
> > vga_is_boot_device(). Can this be done by updating the logic there,
> > so it's more clear what change this patch makes?
> >
> > It seems like this patch would change the selection in a way that
> > makes some of the vga_is_boot_device() comments obsolete. E.g., "We
> > select the default VGA device in this order"? Or "we use the *last*
> > [integrated GPU] because that was the previous behavior"?
> >
> > The end of vga_is_boot_device() mentions non-legacy (non-VGA) devices,
> > but I don't remember now how we ever got there because
> > vga_arb_device_init() and pci_notify() only call
> > vga_arbiter_add_pci_device() for VGA devices.
>
> Sure I'll review the comments and update. As for moving the logic I didn't
> see an obvious way to do this. This code is "tie-breaker" code in the case
> that two GPUs are otherwise ranked equally.
How do we break the tie? I guess we use the first one we find?
The comment in vga_is_boot_device() says we expect only a single
integrated GPU, but I guess this system breaks that assumption?
And in the absence of other clues, vga_is_boot_device() decides that
every integrated GPU it finds should be the default, so the last one
wins? But now we want the first one to win?
> > > - while ((pdev =
> > > - pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
> > > - PCI_ANY_ID, pdev)) != NULL) {
> > > - if (pci_is_vga(pdev))
> > > - vga_arbiter_add_pci_device(pdev);
> > > - }
> > > + while ((pdev = pci_get_base_class(PCI_BASE_CLASS_DISPLAY, pdev)))
> > > + vga_arbiter_add_pci_device(pdev);
> >
> > I guess pci_get_base_class(PCI_BASE_CLASS_DISPLAY) is sort of a source
> > code optimization and this one-line change would be equivalent?
> >
> > - if (pci_is_vga(pdev))
> > + if (pci_is_display(pdev))
> > vga_arbiter_add_pci_device(pdev);
> >
> > If so, I think I prefer the one-liner because then everything in this
> > file would use pci_is_display() and we wouldn't have to figure out the
> > equivalent-ness of pci_get_base_class(PCI_BASE_CLASS_DISPLAY).
>
> pci_get_base_class() is a search function. It only really makes sense for
> iterating.
Right I'm saying that if you do this:
while ((pdev =
pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
PCI_ANY_ID, pdev)) != NULL) {
if (pci_is_display(pdev))
vga_arbiter_add_pci_device(pdev);
}
the patch is a bit smaller and we don't have to look up
pci_get_base_class() and confirm that it returns things for which
pci_is_display() is true. That's just a little more analysis.
Bjorn
next prev parent reply other threads:[~2025-06-13 20:31 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 3:12 [PATCH] PCI/VGA: Look at all PCI display devices in VGA arbiter Mario Limonciello
2025-06-13 19:07 ` Bjorn Helgaas
2025-06-13 19:31 ` Mario Limonciello
2025-06-13 20:31 ` Bjorn Helgaas [this message]
2025-06-13 20:56 ` Mario Limonciello
2025-06-13 21:47 ` Daniel Dadap
2025-06-14 7:04 ` Lukas Wunner
2025-06-16 15:05 ` Daniel Dadap
2025-06-16 19:14 ` Lukas Wunner
2025-06-17 15:53 ` Daniel Dadap
2025-06-17 16:06 ` Mario Limonciello
2025-06-17 16:36 ` Daniel Dadap
2025-06-17 17:01 ` Mario Limonciello
2025-06-13 21:19 ` Alex Williamson
2025-06-13 21:29 ` Mario Limonciello
2025-06-16 6:42 ` Thomas Zimmermann
2025-06-16 6:50 ` David Airlie
2025-06-16 7:21 ` Thomas Zimmermann
2025-06-16 7:24 ` David Airlie
2025-06-16 9:11 ` Gerd Hoffmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250613203130.GA974345@bhelgaas \
--to=helgaas@kernel.org \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=superm1@kernel.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).