linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
@ 2025-06-21  9:40 Lukas Wunner
  2025-06-21 12:07 ` Ben Hutchings
  0 siblings, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2025-06-21  9:40 UTC (permalink / raw)
  To: David Airlie, Bjorn Helgaas
  Cc: Ben Hutchings, Joerg Roedel, Suravee Suthikulpanit, Andi Kleen,
	Ahmed Salem, Borislav Petkov, dri-devel, iommu, linux-pci

Since commit 172efbb40333 ("AGP: Try unsupported AGP chipsets on x86-64 by
default"), the AGP driver for AMD Opteron/Athlon64 CPUs attempts to bind
to any PCI device.

On modern CPUs exposing an AMD IOMMU, this results in a message with
KERN_CRIT severity:

  pci 0000:00:00.2: Resources present before probing

The driver used to bind only to devices exposing the AGP Capability, but
that restriction was removed by commit 6fd024893911 ("amd64-agp: Probe
unknown AGP devices the right way").

Reinstate checking for AGP presence to avoid the message.

Fixes: 3be5fa236649 (Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices")
Reported-by: Fedor Pchelkin <pchelkin@ispras.ru>
Closes: https://lore.kernel.org/r/wpoivftgshz5b5aovxbkxl6ivvquinukqfvb5z6yi4mv7d25ew@edtzr2p74ckg/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
Compile tested only, I do not have a machine with AMD IOMMU at my disposal.

Reporter is not cc'ed because ispras.ru is an OFAC sanctioned entity,
which prohibits me from two-way engagement with the reporter:
https://sanctionssearch.ofac.treas.gov/Details.aspx?id=50890
https://www.linuxfoundation.org/blog/navigating-global-regulations-and-open-source-us-ofac-sanctions

 drivers/char/agp/amd64-agp.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
index bf49096..4bf508b 100644
--- a/drivers/char/agp/amd64-agp.c
+++ b/drivers/char/agp/amd64-agp.c
@@ -735,6 +735,18 @@ static int agp_amd64_resume(struct device *dev)
 	.driver.pm  = &agp_amd64_pm_ops,
 };
 
+static bool __init agp_dev_present(void)
+{
+	struct pci_dev *pdev = NULL;
+
+	for_each_pci_dev(pdev)
+		if (pci_find_capability(pdev, PCI_CAP_ID_AGP)) {
+			pci_dev_put(pdev);
+			return true;
+		}
+
+	return false;
+}
 
 /* Not static due to IOMMU code calling it early. */
 int __init agp_amd64_init(void)
@@ -761,7 +773,7 @@ int __init agp_amd64_init(void)
 		}
 
 		/* First check that we have at least one AMD64 NB */
-		if (!amd_nb_num()) {
+		if (!amd_nb_num() || !agp_dev_present()) {
 			pci_unregister_driver(&agp_amd64_pci_driver);
 			return -ENODEV;
 		}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-21  9:40 [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present Lukas Wunner
@ 2025-06-21 12:07 ` Ben Hutchings
  2025-06-21 12:29   ` Lukas Wunner
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Hutchings @ 2025-06-21 12:07 UTC (permalink / raw)
  To: Lukas Wunner, David Airlie, Bjorn Helgaas
  Cc: Joerg Roedel, Suravee Suthikulpanit, Andi Kleen, Ahmed Salem,
	Borislav Petkov, dri-devel, iommu, linux-pci

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

On Sat, 2025-06-21 at 11:40 +0200, Lukas Wunner wrote:
> Since commit 172efbb40333 ("AGP: Try unsupported AGP chipsets on x86-64 by
> default"), the AGP driver for AMD Opteron/Athlon64 CPUs attempts to bind
> to any PCI device.
> 
> On modern CPUs exposing an AMD IOMMU, this results in a message with
> KERN_CRIT severity:
> 
>   pci 0000:00:00.2: Resources present before probing
> 
> The driver used to bind only to devices exposing the AGP Capability, but
> that restriction was removed by commit 6fd024893911 ("amd64-agp: Probe
> unknown AGP devices the right way").
[...]

That didn't remove any restriction as the probe function still started
by checking for an AGP capability.  The change I made was that the
driver would actually bind to devices with the AGP capability instead of
starting to use them without binding.

Ben.

-- 
Ben Hutchings
All the simple programs have been written, and all the good names taken

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-21 12:07 ` Ben Hutchings
@ 2025-06-21 12:29   ` Lukas Wunner
  2025-06-21 13:51     ` Ben Hutchings
  0 siblings, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2025-06-21 12:29 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Airlie, Bjorn Helgaas, Joerg Roedel, Suravee Suthikulpanit,
	Andi Kleen, Ahmed Salem, Borislav Petkov, dri-devel, iommu,
	linux-pci

On Sat, Jun 21, 2025 at 02:07:40PM +0200, Ben Hutchings wrote:
> On Sat, 2025-06-21 at 11:40 +0200, Lukas Wunner wrote:
> > Since commit 172efbb40333 ("AGP: Try unsupported AGP chipsets on x86-64 by
> > default"), the AGP driver for AMD Opteron/Athlon64 CPUs attempts to bind
> > to any PCI device.
> > 
> > On modern CPUs exposing an AMD IOMMU, this results in a message with
> > KERN_CRIT severity:
> > 
> >   pci 0000:00:00.2: Resources present before probing
> > 
> > The driver used to bind only to devices exposing the AGP Capability, but
> > that restriction was removed by commit 6fd024893911 ("amd64-agp: Probe
> > unknown AGP devices the right way").
> 
> That didn't remove any restriction as the probe function still started
> by checking for an AGP capability.  The change I made was that the
> driver would actually bind to devices with the AGP capability instead of
> starting to use them without binding.

The message above would not be emitted without your change.

The check for the AGP capability in agp_amd64_probe() is too late
to prevent the message.  That's because the message is emitted
before ->probe() is even called.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-21 12:29   ` Lukas Wunner
@ 2025-06-21 13:51     ` Ben Hutchings
  2025-06-21 14:05       ` Lukas Wunner
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Hutchings @ 2025-06-21 13:51 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David Airlie, Bjorn Helgaas, Joerg Roedel, Suravee Suthikulpanit,
	Andi Kleen, Ahmed Salem, Borislav Petkov, dri-devel, iommu,
	linux-pci

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On Sat, 2025-06-21 at 14:29 +0200, Lukas Wunner wrote:
> On Sat, Jun 21, 2025 at 02:07:40PM +0200, Ben Hutchings wrote:
> > On Sat, 2025-06-21 at 11:40 +0200, Lukas Wunner wrote:
> > > Since commit 172efbb40333 ("AGP: Try unsupported AGP chipsets on x86-64 by
> > > default"), the AGP driver for AMD Opteron/Athlon64 CPUs attempts to bind
> > > to any PCI device.
> > > 
> > > On modern CPUs exposing an AMD IOMMU, this results in a message with
> > > KERN_CRIT severity:
> > > 
> > >   pci 0000:00:00.2: Resources present before probing
> > > 
> > > The driver used to bind only to devices exposing the AGP Capability, but
> > > that restriction was removed by commit 6fd024893911 ("amd64-agp: Probe
> > > unknown AGP devices the right way").
> > 
> > That didn't remove any restriction as the probe function still started
> > by checking for an AGP capability.  The change I made was that the
> > driver would actually bind to devices with the AGP capability instead of
> > starting to use them without binding.
> 
> The message above would not be emitted without your change.
> 
> The check for the AGP capability in agp_amd64_probe() is too late
> to prevent the message.  That's because the message is emitted
> before ->probe() is even called.

I understand that.  But I don't feel that the explanation above
accurately described the history here.

Ben.

-- 
Ben Hutchings
All the simple programs have been written, and all the good names taken

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-21 13:51     ` Ben Hutchings
@ 2025-06-21 14:05       ` Lukas Wunner
  2025-06-24 21:54         ` Ben Hutchings
  0 siblings, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2025-06-21 14:05 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Airlie, Bjorn Helgaas, Joerg Roedel, Suravee Suthikulpanit,
	Andi Kleen, Ahmed Salem, Borislav Petkov, dri-devel, iommu,
	linux-pci

On Sat, Jun 21, 2025 at 03:51:44PM +0200, Ben Hutchings wrote:
> On Sat, 2025-06-21 at 14:29 +0200, Lukas Wunner wrote:
> > On Sat, Jun 21, 2025 at 02:07:40PM +0200, Ben Hutchings wrote:
> > > On Sat, 2025-06-21 at 11:40 +0200, Lukas Wunner wrote:
> > > > Since commit 172efbb40333 ("AGP: Try unsupported AGP chipsets on x86-64 by
> > > > default"), the AGP driver for AMD Opteron/Athlon64 CPUs attempts to bind
> > > > to any PCI device.
> > > > 
> > > > On modern CPUs exposing an AMD IOMMU, this results in a message with
> > > > KERN_CRIT severity:
> > > > 
> > > >   pci 0000:00:00.2: Resources present before probing
> > > > 
> > > > The driver used to bind only to devices exposing the AGP Capability, but
> > > > that restriction was removed by commit 6fd024893911 ("amd64-agp: Probe
> > > > unknown AGP devices the right way").
> > > 
> > > That didn't remove any restriction as the probe function still started
> > > by checking for an AGP capability.  The change I made was that the
> > > driver would actually bind to devices with the AGP capability instead of
> > > starting to use them without binding.
> > 
> > The message above would not be emitted without your change.
> > 
> > The check for the AGP capability in agp_amd64_probe() is too late
> > to prevent the message.  That's because the message is emitted
> > before ->probe() is even called.
> 
> I understand that.  But I don't feel that the explanation above
> accurately described the history here.

So please propose a more accurate explanation.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-21 14:05       ` Lukas Wunner
@ 2025-06-24 21:54         ` Ben Hutchings
  2025-06-25 14:08           ` Hans de Goede
  2025-07-01 18:18           ` Lukas Wunner
  0 siblings, 2 replies; 15+ messages in thread
From: Ben Hutchings @ 2025-06-24 21:54 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: David Airlie, Bjorn Helgaas, Joerg Roedel, Suravee Suthikulpanit,
	Andi Kleen, Ahmed Salem, Borislav Petkov, dri-devel, iommu,
	linux-pci

[-- Attachment #1: Type: text/plain, Size: 2586 bytes --]

On Sat, 2025-06-21 at 16:05 +0200, Lukas Wunner wrote:
> On Sat, Jun 21, 2025 at 03:51:44PM +0200, Ben Hutchings wrote:
> > On Sat, 2025-06-21 at 14:29 +0200, Lukas Wunner wrote:
> > > On Sat, Jun 21, 2025 at 02:07:40PM +0200, Ben Hutchings wrote:
> > > > On Sat, 2025-06-21 at 11:40 +0200, Lukas Wunner wrote:
> > > > > Since commit 172efbb40333 ("AGP: Try unsupported AGP chipsets on x86-64 by
> > > > > default"), the AGP driver for AMD Opteron/Athlon64 CPUs attempts to bind
> > > > > to any PCI device.
> > > > > 
> > > > > On modern CPUs exposing an AMD IOMMU, this results in a message with
> > > > > KERN_CRIT severity:
> > > > > 
> > > > >   pci 0000:00:00.2: Resources present before probing
> > > > > 
> > > > > The driver used to bind only to devices exposing the AGP Capability, but
> > > > > that restriction was removed by commit 6fd024893911 ("amd64-agp: Probe
> > > > > unknown AGP devices the right way").
> > > > 
> > > > That didn't remove any restriction as the probe function still started
> > > > by checking for an AGP capability.  The change I made was that the
> > > > driver would actually bind to devices with the AGP capability instead of
> > > > starting to use them without binding.
> > > 
> > > The message above would not be emitted without your change.
> > > 
> > > The check for the AGP capability in agp_amd64_probe() is too late
> > > to prevent the message.  That's because the message is emitted
> > > before ->probe() is even called.
> > 
> > I understand that.  But I don't feel that the explanation above
> > accurately described the history here.
> 
> So please propose a more accurate explanation.

Something like "The driver iterates over all PCI devices, checking for
an AGP capability.  Since commit 6fd024893911 ("amd64-agp: Probe unknown
AGP devices the right way") this is done with driver_attach() and a
wildcard PCI ID table, and the preparation for probing the IOMMU device
produces this error message."

Thinking about this further:

- Why *does* the IOMMU device have resources assigned but no driver
  bound?  Is that the real bug?

- If not, and there's a general problem with this promiscuous probing,
  would it make more sense to:
  1. Restore the search for an AGP capability in agp_amd64_init().
  2. If and only if an AGP device is found, poke the appropriate device
     ID into agp_amd64_pci_promisc_table and then call driver_attach().
  ?

Ben.

-- 
Ben Hutchings
Horngren's Observation:
              Among economists, the real world is often a special case.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-24 21:54         ` Ben Hutchings
@ 2025-06-25 14:08           ` Hans de Goede
  2025-06-25 14:33             ` Lukas Wunner
  2025-07-01 18:18           ` Lukas Wunner
  1 sibling, 1 reply; 15+ messages in thread
From: Hans de Goede @ 2025-06-25 14:08 UTC (permalink / raw)
  To: Ben Hutchings, Lukas Wunner
  Cc: David Airlie, Bjorn Helgaas, Joerg Roedel, Suravee Suthikulpanit,
	Andi Kleen, Ahmed Salem, Borislav Petkov, dri-devel, iommu,
	linux-pci

Hi,

On 24-Jun-25 11:54 PM, Ben Hutchings wrote:
> On Sat, 2025-06-21 at 16:05 +0200, Lukas Wunner wrote:
>> On Sat, Jun 21, 2025 at 03:51:44PM +0200, Ben Hutchings wrote:
>>> On Sat, 2025-06-21 at 14:29 +0200, Lukas Wunner wrote:
>>>> On Sat, Jun 21, 2025 at 02:07:40PM +0200, Ben Hutchings wrote:
>>>>> On Sat, 2025-06-21 at 11:40 +0200, Lukas Wunner wrote:
>>>>>> Since commit 172efbb40333 ("AGP: Try unsupported AGP chipsets on x86-64 by
>>>>>> default"), the AGP driver for AMD Opteron/Athlon64 CPUs attempts to bind
>>>>>> to any PCI device.
>>>>>>
>>>>>> On modern CPUs exposing an AMD IOMMU, this results in a message with
>>>>>> KERN_CRIT severity:
>>>>>>
>>>>>>   pci 0000:00:00.2: Resources present before probing
>>>>>>
>>>>>> The driver used to bind only to devices exposing the AGP Capability, but
>>>>>> that restriction was removed by commit 6fd024893911 ("amd64-agp: Probe
>>>>>> unknown AGP devices the right way").
>>>>>
>>>>> That didn't remove any restriction as the probe function still started
>>>>> by checking for an AGP capability.  The change I made was that the
>>>>> driver would actually bind to devices with the AGP capability instead of
>>>>> starting to use them without binding.
>>>>
>>>> The message above would not be emitted without your change.
>>>>
>>>> The check for the AGP capability in agp_amd64_probe() is too late
>>>> to prevent the message.  That's because the message is emitted
>>>> before ->probe() is even called.
>>>
>>> I understand that.  But I don't feel that the explanation above
>>> accurately described the history here.
>>
>> So please propose a more accurate explanation.
> 
> Something like "The driver iterates over all PCI devices, checking for
> an AGP capability.  Since commit 6fd024893911 ("amd64-agp: Probe unknown
> AGP devices the right way") this is done with driver_attach() and a
> wildcard PCI ID table, and the preparation for probing the IOMMU device
> produces this error message."
> 
> Thinking about this further:
> 
> - Why *does* the IOMMU device have resources assigned but no driver
>   bound?  Is that the real bug?

Arguably yes, but I assume that this is done because the IOMMU needs
to be setup early, before any drivers probe() methods run.

Note that cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers
to IOMMU PCI devices") which has been reverted did effectively ban
other drivers from binding. So arguably that needs to be unreverted
and then this problem will go away.

> - If not, and there's a general problem with this promiscuous probing,
>   would it make more sense to:
>   1. Restore the search for an AGP capability in agp_amd64_init().
>   2. If and only if an AGP device is found, poke the appropriate device
>      ID into agp_amd64_pci_promisc_table and then call driver_attach().
>   ?

Lukas made me aware of this attempt to fix the KERN_CRIT msg, because
I wrote a slightly different patch to fix this:

https://lore.kernel.org/dri-devel/20250625112411.4123-1-hansg@kernel.org/

This seems like a cleaner fix to me and something which would be good
to have regardless since currently the driver_attach() call is doing
too much work because the promisc table catches an unnecessary wide
net / match matching many PCI devices which cannot be AGP capable
at all.

Regards,

Hans




> 
> Ben.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-25 14:08           ` Hans de Goede
@ 2025-06-25 14:33             ` Lukas Wunner
  2025-06-25 18:43               ` Hans de Goede
  0 siblings, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2025-06-25 14:33 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Ben Hutchings, David Airlie, Bjorn Helgaas, Joerg Roedel,
	Suravee Suthikulpanit, Andi Kleen, Ahmed Salem, Borislav Petkov,
	dri-devel, iommu, linux-pci

On Wed, Jun 25, 2025 at 04:08:38PM +0200, Hans de Goede wrote:
> Lukas made me aware of this attempt to fix the KERN_CRIT msg, because
> I wrote a slightly different patch to fix this:
> 
> https://lore.kernel.org/dri-devel/20250625112411.4123-1-hansg@kernel.org/
> 
> This seems like a cleaner fix to me and something which would be good
> to have regardless since currently the driver_attach() call is doing
> too much work because the promisc table catches an unnecessary wide
> net / match matching many PCI devices which cannot be AGP capable
> at all.

So how do you know that all of these unsupported devices have
PCI_CLASS_BRIDGE_HOST?  The only thing we know is that an AGP
Capability must be present.

In particular, AGP 3.0 sec 2.5 explicitly allows PCI-to-PCI bridges
in addition to Host-to-PCI bridges.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-25 14:33             ` Lukas Wunner
@ 2025-06-25 18:43               ` Hans de Goede
  2025-06-30 11:10                 ` Hans de Goede
                                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Hans de Goede @ 2025-06-25 18:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Ben Hutchings, David Airlie, Bjorn Helgaas, Joerg Roedel,
	Suravee Suthikulpanit, Andi Kleen, Ahmed Salem, Borislav Petkov,
	dri-devel, iommu, linux-pci

Hi,

On 25-Jun-25 4:33 PM, Lukas Wunner wrote:
> On Wed, Jun 25, 2025 at 04:08:38PM +0200, Hans de Goede wrote:
>> Lukas made me aware of this attempt to fix the KERN_CRIT msg, because
>> I wrote a slightly different patch to fix this:
>>
>> https://lore.kernel.org/dri-devel/20250625112411.4123-1-hansg@kernel.org/
>>
>> This seems like a cleaner fix to me and something which would be good
>> to have regardless since currently the driver_attach() call is doing
>> too much work because the promisc table catches an unnecessary wide
>> net / match matching many PCI devices which cannot be AGP capable
>> at all.
> 
> So how do you know that all of these unsupported devices have
> PCI_CLASS_BRIDGE_HOST?

The top of the driver says

 * This is a GART driver for the AMD Opteron/Athlon64 on-CPU northbridge.
 * It also includes support for the AMD 8151 AGP bridge

Note this only talks about north bridges.

Also given the age of AGP, I would expect the agp_amd64_pci_table[]
to be pretty much complete and the need for probing for unknown AGP
capable bridges is likely a relic which can be disabled by default.

Actually the amd64-agp code is weird in that has support for
unknown AGP bridges enabled by default in the first place.

The global probe unknown AGP bridges bool which is called
agp_try_unsupported_boot is false by default.

As discussed in the thread with my patch, we should probably
just change the AMD specific agp_try_unsupported to default
to false too.

> The only thing we know is that an AGP
> Capability must be present.
> 
> In particular, AGP 3.0 sec 2.5 explicitly allows PCI-to-PCI bridges
> in addition to Host-to-PCI bridges.

Ok, so we can add a second entry to the agp_amd64_pci_promisc_table[]
to match PCI to PCI bridges just to be sure, that still feels
cleaner to me.

Regards,

Hans



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-25 18:43               ` Hans de Goede
@ 2025-06-30 11:10                 ` Hans de Goede
  2025-07-02 10:47                   ` Lukas Wunner
  2025-07-01 18:28                 ` Lukas Wunner
  2025-07-02 15:24                 ` Lukas Wunner
  2 siblings, 1 reply; 15+ messages in thread
From: Hans de Goede @ 2025-06-30 11:10 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Ben Hutchings, David Airlie, Bjorn Helgaas, Joerg Roedel,
	Suravee Suthikulpanit, Andi Kleen, Ahmed Salem, Borislav Petkov,
	dri-devel, iommu, linux-pci

Hi,

On 25-Jun-25 8:43 PM, Hans de Goede wrote:
> Hi,
> 
> On 25-Jun-25 4:33 PM, Lukas Wunner wrote:
>> On Wed, Jun 25, 2025 at 04:08:38PM +0200, Hans de Goede wrote:
>>> Lukas made me aware of this attempt to fix the KERN_CRIT msg, because
>>> I wrote a slightly different patch to fix this:
>>>
>>> https://lore.kernel.org/dri-devel/20250625112411.4123-1-hansg@kernel.org/
>>>
>>> This seems like a cleaner fix to me and something which would be good
>>> to have regardless since currently the driver_attach() call is doing
>>> too much work because the promisc table catches an unnecessary wide
>>> net / match matching many PCI devices which cannot be AGP capable
>>> at all.
>>
>> So how do you know that all of these unsupported devices have
>> PCI_CLASS_BRIDGE_HOST?
> 
> The top of the driver says
> 
>  * This is a GART driver for the AMD Opteron/Athlon64 on-CPU northbridge.
>  * It also includes support for the AMD 8151 AGP bridge
> 
> Note this only talks about north bridges.
> 
> Also given the age of AGP, I would expect the agp_amd64_pci_table[]
> to be pretty much complete and the need for probing for unknown AGP
> capable bridges is likely a relic which can be disabled by default.
> 
> Actually the amd64-agp code is weird in that has support for
> unknown AGP bridges enabled by default in the first place.
> 
> The global probe unknown AGP bridges bool which is called
> agp_try_unsupported_boot is false by default.
> 
> As discussed in the thread with my patch, we should probably
> just change the AMD specific agp_try_unsupported to default
> to false too.
> 
>> The only thing we know is that an AGP
>> Capability must be present.
>>
>> In particular, AGP 3.0 sec 2.5 explicitly allows PCI-to-PCI bridges
>> in addition to Host-to-PCI bridges.
> 
> Ok, so we can add a second entry to the agp_amd64_pci_promisc_table[]
> to match PCI to PCI bridges just to be sure, that still feels
> cleaner to me.

ping? It would be good to get some consensus on how to
fix this and move forward with a fix. Either the patch from
this thread; or my patch:

https://lore.kernel.org/dri-devel/20250625112411.4123-1-hansg@kernel.org/

Works for me, the most important thing here is to get this
regression fixed.

Regards,

Hans


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-24 21:54         ` Ben Hutchings
  2025-06-25 14:08           ` Hans de Goede
@ 2025-07-01 18:18           ` Lukas Wunner
  1 sibling, 0 replies; 15+ messages in thread
From: Lukas Wunner @ 2025-07-01 18:18 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Airlie, Bjorn Helgaas, Joerg Roedel, Suravee Suthikulpanit,
	Andi Kleen, Ahmed Salem, Borislav Petkov, dri-devel, iommu,
	linux-pci, Hans de Goede

On Tue, Jun 24, 2025 at 11:54:46PM +0200, Ben Hutchings wrote:
> On Sat, 2025-06-21 at 16:05 +0200, Lukas Wunner wrote:
> > So please propose a more accurate explanation.
> 
> Something like "The driver iterates over all PCI devices, checking for
> an AGP capability.  Since commit 6fd024893911 ("amd64-agp: Probe unknown
> AGP devices the right way") this is done with driver_attach() and a
> wildcard PCI ID table, and the preparation for probing the IOMMU device
> produces this error message."

Thanks, I will respin and amend the commit message.


> Thinking about this further:
> 
> - Why *does* the IOMMU device have resources assigned but no driver
>   bound?  Is that the real bug?

I don't really know, I was hoping the AMD IOMMU maintainers could
comment on that.


> - If not, and there's a general problem with this promiscuous probing,
>   would it make more sense to:
>   1. Restore the search for an AGP capability in agp_amd64_init().
>   2. If and only if an AGP device is found, poke the appropriate device
>      ID into agp_amd64_pci_promisc_table and then call driver_attach().
>   ?

I like the idea.  I've realized that we've got pci_add_dynid() for
just this sort of thing.  It avoids the need to poke device IDs
into an array at runtime.  The (as yet completely untested) patch
below should do the trick.

-- >8 --

diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
index bf49096..9b9c473 100644
--- a/drivers/char/agp/amd64-agp.c
+++ b/drivers/char/agp/amd64-agp.c
@@ -720,11 +720,6 @@ static int agp_amd64_resume(struct device *dev)
 
 MODULE_DEVICE_TABLE(pci, agp_amd64_pci_table);
 
-static const struct pci_device_id agp_amd64_pci_promisc_table[] = {
-	{ PCI_DEVICE_CLASS(0, 0) },
-	{ }
-};
-
 static DEFINE_SIMPLE_DEV_PM_OPS(agp_amd64_pm_ops, NULL, agp_amd64_resume);
 
 static struct pci_driver agp_amd64_pci_driver = {
@@ -739,7 +734,8 @@ static int agp_amd64_resume(struct device *dev)
 /* Not static due to IOMMU code calling it early. */
 int __init agp_amd64_init(void)
 {
-	int err = 0;
+	struct pci_dev *pdev = NULL;
+	int err, ret;
 
 	if (agp_off)
 		return -EINVAL;
@@ -767,8 +763,15 @@ int __init agp_amd64_init(void)
 		}
 
 		/* Look for any AGP bridge */
-		agp_amd64_pci_driver.id_table = agp_amd64_pci_promisc_table;
-		err = driver_attach(&agp_amd64_pci_driver.driver);
+		for_each_pci_dev(pdev)
+			if (pci_find_capability(pdev, PCI_CAP_ID_AGP)) {
+				ret = pci_add_dynid(&agp_amd64_pci_driver,
+						    pdev->vendor, pdev->device,
+						    pdev->subvendor,
+						    pdev->subdevice, 0, 0, 0);
+				if (ret)
+					err = ret;
+			}
 		if (err == 0 && agp_bridges_found == 0) {
 			pci_unregister_driver(&agp_amd64_pci_driver);
 			err = -ENODEV;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-25 18:43               ` Hans de Goede
  2025-06-30 11:10                 ` Hans de Goede
@ 2025-07-01 18:28                 ` Lukas Wunner
  2025-07-02 15:24                 ` Lukas Wunner
  2 siblings, 0 replies; 15+ messages in thread
From: Lukas Wunner @ 2025-07-01 18:28 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Ben Hutchings, David Airlie, Bjorn Helgaas, Joerg Roedel,
	Suravee Suthikulpanit, Andi Kleen, Ahmed Salem, Borislav Petkov,
	dri-devel, iommu, linux-pci

On Wed, Jun 25, 2025 at 08:43:45PM +0200, Hans de Goede wrote:
> On 25-Jun-25 4:33 PM, Lukas Wunner wrote:
> > So how do you know that all of these unsupported devices have
> > PCI_CLASS_BRIDGE_HOST?
> 
> The top of the driver says
> 
>  * This is a GART driver for the AMD Opteron/Athlon64 on-CPU northbridge.
>  * It also includes support for the AMD 8151 AGP bridge
> 
> Note this only talks about north bridges.
> 
> Also given the age of AGP, I would expect the agp_amd64_pci_table[]
> to be pretty much complete and the need for probing for unknown AGP
> capable bridges is likely a relic which can be disabled by default.
> 
> Actually the amd64-agp code is weird in that has support for
> unknown AGP bridges enabled by default in the first place.

I agree that probing *any* PCI device should never have been introduced,
much less made the default.  But changing that risks regressing
users that depend on it.

The conservative approach is to retain the existing behavior,
but make it more benign by constraining probing to devices with
AGP Capability, as we did prior to 6fd024893911.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-30 11:10                 ` Hans de Goede
@ 2025-07-02 10:47                   ` Lukas Wunner
  2025-07-02 13:29                     ` Lukas Wunner
  0 siblings, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2025-07-02 10:47 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Ben Hutchings, David Airlie, Bjorn Helgaas, Joerg Roedel,
	Suravee Suthikulpanit, Andi Kleen, Ahmed Salem, Borislav Petkov,
	dri-devel, iommu, linux-pci

On Mon, Jun 30, 2025 at 01:10:24PM +0200, Hans de Goede wrote:
> ping? It would be good to get some consensus on how to
> fix this and move forward with a fix. Either the patch from
> this thread; or my patch:
> 
> https://lore.kernel.org/dri-devel/20250625112411.4123-1-hansg@kernel.org/
> 
> Works for me, the most important thing here is to get this
> regression fixed.

You seem to have a machine where you can trigger the
"Resources present before probing" message.

Would you mind enabling CONFIG_DEBUG_DEVRES=y and adding
"log_devres=1" to the kernel command line so that we
can understand what kind of resource is attached to
the AMD IOMMU, and where that happens.

I don't see invocations of devm_*() in arch/x86/ or
drivers/iommu/amd/ that would explain the error message.

Just so that we get a full understanding of the issue,
independently of the AGP driver probing everything.

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-07-02 10:47                   ` Lukas Wunner
@ 2025-07-02 13:29                     ` Lukas Wunner
  0 siblings, 0 replies; 15+ messages in thread
From: Lukas Wunner @ 2025-07-02 13:29 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Ben Hutchings, David Airlie, Bjorn Helgaas, Joerg Roedel,
	Suravee Suthikulpanit, Andi Kleen, Ahmed Salem, Borislav Petkov,
	dri-devel, iommu, linux-pci, Thomas Gleixner

[cc += tglx, start of thread:
https://lore.kernel.org/r/f8ff40f35a9a5836d1371f60e85c09c5735e3c5e.1750497201.git.lukas@wunner.de/
]

On Wed, Jul 02, 2025 at 12:47:49PM +0200, Lukas Wunner wrote:
> On Mon, Jun 30, 2025 at 01:10:24PM +0200, Hans de Goede wrote:
> > ping? It would be good to get some consensus on how to
> > fix this and move forward with a fix. Either the patch from
> > this thread; or my patch:
> > 
> > https://lore.kernel.org/dri-devel/20250625112411.4123-1-hansg@kernel.org/
> > 
> > Works for me, the most important thing here is to get this
> > regression fixed.
> 
> You seem to have a machine where you can trigger the
> "Resources present before probing" message.
> 
> Would you mind enabling CONFIG_DEBUG_DEVRES=y and adding
> "log_devres=1" to the kernel command line so that we
> can understand what kind of resource is attached to
> the AMD IOMMU, and where that happens.
> 
> I don't see invocations of devm_*() in arch/x86/ or
> drivers/iommu/amd/ that would explain the error message.

I just remembered that an MSI is set up for the AMD IOMMU.
And sure enough:

amd_iommu_enable_interrupts()
  iommu_init_irq()
    iommu_setup_msi()
      pci_enable_msi()
        __pci_enable_msi_range()
	  pci_setup_msi_device_domain()
            pci_create_device_domain()
              msi_create_device_irq_domain()
                msi_setup_device_data()
                  msi_sysfs_create_group()
		    devm_device_add_group()

... introduced by bf5e758f02fc ("genirq/msi: Simplify sysfs handling").

Not sure if this is the only one or if there are other resources
added anywhere else for the driver-less AMD IOMMU.

We'd have to rework MSI handling to not use devm_*(), so that MSIs
can be requested for a device without it being bound to a driver.
But I suspect tglx may have deliberately designed it to not
support that, in which case what the AMD IOMMU driver does
is somewhat dodgy...

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present
  2025-06-25 18:43               ` Hans de Goede
  2025-06-30 11:10                 ` Hans de Goede
  2025-07-01 18:28                 ` Lukas Wunner
@ 2025-07-02 15:24                 ` Lukas Wunner
  2 siblings, 0 replies; 15+ messages in thread
From: Lukas Wunner @ 2025-07-02 15:24 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Ben Hutchings, David Airlie, Bjorn Helgaas, Joerg Roedel,
	Suravee Suthikulpanit, Andi Kleen, Ahmed Salem, Borislav Petkov,
	dri-devel, iommu, linux-pci

On Wed, Jun 25, 2025 at 08:43:45PM +0200, Hans de Goede wrote:
> On 25-Jun-25 4:33 PM, Lukas Wunner wrote:
> > So how do you know that all of these unsupported devices have
> > PCI_CLASS_BRIDGE_HOST?
> 
> The top of the driver says
> 
>  * This is a GART driver for the AMD Opteron/Athlon64 on-CPU northbridge.
>  * It also includes support for the AMD 8151 AGP bridge
> 
> Note this only talks about north bridges.
> 
> Also given the age of AGP, I would expect the agp_amd64_pci_table[]
> to be pretty much complete and the need for probing for unknown AGP
> capable bridges is likely a relic which can be disabled by default.
> 
> Actually the amd64-agp code is weird in that has support for
> unknown AGP bridges enabled by default in the first place.
> 
> The global probe unknown AGP bridges bool which is called
> agp_try_unsupported_boot is false by default.
> 
> As discussed in the thread with my patch, we should probably
> just change the AMD specific agp_try_unsupported to default
> to false too.

Since the breaking change (causing the annoying message) was introduced
in this cycle, I think we should err on the side of caution and avoid
the risk of creating new regressions, if at all possible.

However if you are willing to deal with potential fallout, I would like
to encourage you to set "agp_try_unsupported = 0" (which the Kconfig
help text suggests is already the case) as a patch for the next cycle.

If you could give v2 of my (just submitted) patch a spin and respond
with a Tested-by on success, I'd be grateful.  I think this could go
in either through drm-misc-fixes or pci/for-linus (since the offending
commit went in through the pci tree).

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-07-02 15:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-21  9:40 [PATCH] agp/amd64: Bind to unsupported devices only if AGP is present Lukas Wunner
2025-06-21 12:07 ` Ben Hutchings
2025-06-21 12:29   ` Lukas Wunner
2025-06-21 13:51     ` Ben Hutchings
2025-06-21 14:05       ` Lukas Wunner
2025-06-24 21:54         ` Ben Hutchings
2025-06-25 14:08           ` Hans de Goede
2025-06-25 14:33             ` Lukas Wunner
2025-06-25 18:43               ` Hans de Goede
2025-06-30 11:10                 ` Hans de Goede
2025-07-02 10:47                   ` Lukas Wunner
2025-07-02 13:29                     ` Lukas Wunner
2025-07-01 18:28                 ` Lukas Wunner
2025-07-02 15:24                 ` Lukas Wunner
2025-07-01 18:18           ` Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).