* [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
@ 2025-08-14 9:39 Rui He
2025-08-14 20:36 ` Bjorn Helgaas
2025-08-17 2:46 ` Ethan Zhao
0 siblings, 2 replies; 7+ messages in thread
From: Rui He @ 2025-08-14 9:39 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-kernel, Prashant.Chikhalkar, Jiguang.Xiao,
Rui.He
For preconfigured PCI bridge, child bus created on the first scan.
While for some reasons(e.g register mutation), the secondary, and subordiante
register reset to 0 on the second scan, which caused to create
PCI bus twice for the same PCI device.
Following is the related log:
[Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0d]
[Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0e-10]
[Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: PCI bridge to [bus 0f-10]
Here PCI device 000:0b:01.0 assigend to bus 0d and 0e.
This patch checks if child PCI bus has been created on the second scan
of bridge. If yes, return directly instead of create a new one.
Signed-off-by: Rui He <rui.he@windriver.com>
---
drivers/pci/probe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f41128f91ca76..ec67adbf31738 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1444,6 +1444,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
goto out;
}
+ if(pci_has_subordinate(dev))
+ goto out;
+
/* Clear errors */
pci_write_config_word(dev, PCI_STATUS, 0xffff);
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
2025-08-14 9:39 [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus() Rui He
@ 2025-08-14 20:36 ` Bjorn Helgaas
2025-08-15 2:31 ` He, Rui
2025-08-17 2:46 ` Ethan Zhao
1 sibling, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2025-08-14 20:36 UTC (permalink / raw)
To: Rui He
Cc: Bjorn Helgaas, linux-pci, linux-kernel, Prashant.Chikhalkar,
Jiguang.Xiao
On Thu, Aug 14, 2025 at 05:39:37PM +0800, Rui He wrote:
> For preconfigured PCI bridge, child bus created on the first scan.
> While for some reasons(e.g register mutation), the secondary, and subordiante
> register reset to 0 on the second scan, which caused to create
> PCI bus twice for the same PCI device.
I don't quite follow this. Do you mean something is changing the
bridge configuration between the first and second scans?
> Following is the related log:
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0d]
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0e-10]
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: PCI bridge to [bus 0f-10]
Drop the timestamps (since they don't contribute to understanding the
problem) and indent the logs a couple spaces.
> Here PCI device 000:0b:01.0 assigend to bus 0d and 0e.
It looks like the [bus 0f-10] range is assigned to both bridges
(0b:01.0 and 0b:05.0), which would definitely be a problem.
I'm surprised that we haven't tripped over this before, and I'm
curious about how we got here. Can you set CONFIG_DYNAMIC_DEBUG=y,
boot with the dyndbg="file drivers/pci/* +p" kernel parameter, and
collect the complete dmesg log?
> This patch checks if child PCI bus has been created on the second scan
> of bridge. If yes, return directly instead of create a new one.
>
> Signed-off-by: Rui He <rui.he@windriver.com>
> ---
> drivers/pci/probe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index f41128f91ca76..ec67adbf31738 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1444,6 +1444,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> goto out;
> }
>
> + if(pci_has_subordinate(dev))
> + goto out;
Follow the coding style, i.e., add a space in "if (pci_..."
> /* Clear errors */
> pci_write_config_word(dev, PCI_STATUS, 0xffff);
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
2025-08-14 20:36 ` Bjorn Helgaas
@ 2025-08-15 2:31 ` He, Rui
2025-08-15 14:22 ` Bjorn Helgaas
0 siblings, 1 reply; 7+ messages in thread
From: He, Rui @ 2025-08-15 2:31 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Chikhalkar, Prashant, Xiao, Jiguang
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: 2025年8月15日 4:36
> To: He, Rui <Rui.He@windriver.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
> linux-kernel@vger.kernel.org; Chikhalkar, Prashant
> <Prashant.Chikhalkar@windriver.com>; Xiao, Jiguang
> <Jiguang.Xiao@windriver.com>
> Subject: Re: [PATCH 1/1] pci: Add subordinate check before
> pci_add_new_bus()
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On Thu, Aug 14, 2025 at 05:39:37PM +0800, Rui He wrote:
> > For preconfigured PCI bridge, child bus created on the first scan.
> > While for some reasons(e.g register mutation), the secondary, and
> > subordiante register reset to 0 on the second scan, which caused to
> > create PCI bus twice for the same PCI device.
>
> I don't quite follow this. Do you mean something is changing the bridge
> configuration between the first and second scans?
I'm not sure what changed the bridge configuration, but the secondary and
subordinate is indeed 0 on the second scan as [bus 0e-10] created for 0000:0b:01.0.
In my opinion, it might be an invalid communication or register mutation in PCI bridge.
>
> > Following is the related log:
> > [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus
> > 0d] [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: bridge
> > configuration invalid ([bus 00-00]), reconfiguring [Wed May 28
> > 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0e-10] [Wed
> > May 28 20:38:36 CST 2025] pci 0000:0b:05.0: PCI bridge to [bus 0f-10]
>
> Drop the timestamps (since they don't contribute to understanding the
> problem) and indent the logs a couple spaces.
>
OK
> > Here PCI device 000:0b:01.0 assigend to bus 0d and 0e.
>
> It looks like the [bus 0f-10] range is assigned to both bridges
> (0b:01.0 and 0b:05.0), which would definitely be a problem.
>
> I'm surprised that we haven't tripped over this before, and I'm curious about
> how we got here. Can you set CONFIG_DYNAMIC_DEBUG=y, boot with the
> dyndbg="file drivers/pci/* +p" kernel parameter, and collect the complete
> dmesg log?
>
Sorry, as this is a individual issue, and cannot be reproduced, I cannot offer more detailed logs.
> > This patch checks if child PCI bus has been created on the second scan
> > of bridge. If yes, return directly instead of create a new one.
> >
> > Signed-off-by: Rui He <rui.he@windriver.com>
> > ---
> > drivers/pci/probe.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> > f41128f91ca76..ec67adbf31738 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -1444,6 +1444,9 @@ static int pci_scan_bridge_extend(struct pci_bus
> *bus, struct pci_dev *dev,
> > goto out;
> > }
> >
> > + if(pci_has_subordinate(dev))
> > + goto out;
>
> Follow the coding style, i.e., add a space in "if (pci_..."
Will be changed in v2.
>
> > /* Clear errors */
> > pci_write_config_word(dev, PCI_STATUS, 0xffff);
> >
> > --
> > 2.43.0
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
2025-08-15 2:31 ` He, Rui
@ 2025-08-15 14:22 ` Bjorn Helgaas
2025-08-26 7:01 ` He, Rui
0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2025-08-15 14:22 UTC (permalink / raw)
To: He, Rui
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Chikhalkar, Prashant, Xiao, Jiguang
On Fri, Aug 15, 2025 at 02:31:31AM +0000, He, Rui wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: 2025年8月15日 4:36
> > To: He, Rui <Rui.He@windriver.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
> > linux-kernel@vger.kernel.org; Chikhalkar, Prashant
> > <Prashant.Chikhalkar@windriver.com>; Xiao, Jiguang
> > <Jiguang.Xiao@windriver.com>
> > Subject: Re: [PATCH 1/1] pci: Add subordinate check before
> > pci_add_new_bus()
> >
> > CAUTION: This email comes from a non Wind River email account!
> > Do not click links or open attachments unless you recognize the sender and
> > know the content is safe.
> >
> > On Thu, Aug 14, 2025 at 05:39:37PM +0800, Rui He wrote:
> > > For preconfigured PCI bridge, child bus created on the first scan.
> > > While for some reasons(e.g register mutation), the secondary, and
> > > subordiante register reset to 0 on the second scan, which caused to
> > > create PCI bus twice for the same PCI device.
> >
> > I don't quite follow this. Do you mean something is changing the
> > bridge configuration between the first and second scans?
>
> I'm not sure what changed the bridge configuration, but the
> secondary and subordinate is indeed 0 on the second scan as [bus
> 0e-10] created for 0000:0b:01.0.
>
> In my opinion, it might be an invalid communication or register
> mutation in PCI bridge.
> > > Following is the related log:
> > > [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus
> > > 0d] [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: bridge
> > > configuration invalid ([bus 00-00]), reconfiguring [Wed May 28
> > > 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0e-10] [Wed
> > > May 28 20:38:36 CST 2025] pci 0000:0b:05.0: PCI bridge to [bus 0f-10]
> > > Here PCI device 000:0b:01.0 assigend to bus 0d and 0e.
> >
> > It looks like the [bus 0f-10] range is assigned to both bridges
> > (0b:01.0 and 0b:05.0), which would definitely be a problem.
> >
> > I'm surprised that we haven't tripped over this before, and I'm
> > curious about how we got here. Can you set
> > CONFIG_DYNAMIC_DEBUG=y, boot with the dyndbg="file drivers/pci/*
> > +p" kernel parameter, and collect the complete dmesg log?
>
> Sorry, as this is a individual issue, and cannot be reproduced, I
> cannot offer more detailed logs.
Do you have the complete dmesg log from this one time you saw the
problem?
As-is, I don't think there's quite enough here to move forward with
this. I think we need some more detailed analysis to figure out how
this happens.
Bjorn
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
2025-08-14 9:39 [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus() Rui He
2025-08-14 20:36 ` Bjorn Helgaas
@ 2025-08-17 2:46 ` Ethan Zhao
2025-08-25 8:47 ` He, Rui
1 sibling, 1 reply; 7+ messages in thread
From: Ethan Zhao @ 2025-08-17 2:46 UTC (permalink / raw)
To: Rui He, Bjorn Helgaas
Cc: linux-pci, linux-kernel, Prashant.Chikhalkar, Jiguang.Xiao
On 8/14/2025 5:39 PM, Rui He wrote:
> For preconfigured PCI bridge, child bus created on the first scan.
> While for some reasons(e.g register mutation), the secondary, and subordiante
> register reset to 0 on the second scan, which caused to create
> PCI bus twice for the same PCI device.
>
> Following is the related log:
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0d]
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0e-10]
> [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: PCI bridge to [bus 0f-10]
Could you help to attach a 'lspci -t' about the topology ?
bridge 0000:0b:01.0 and 0000:0b:05.0 have the same subordinate
bus number, that is weird seems they aren't connected as upstream
and downstream, but siblings.
Does the device behind the bridge 0000:0b:05.0 work after the
second scan (TLP are forwarded) ?>
> Here PCI device 000:0b:01.0 assigend to bus 0d and 0e.
>
> This patch checks if child PCI bus has been created on the second scan
> of bridge. If yes, return directly instead of create a new one.
>
> Signed-off-by: Rui He <rui.he@windriver.com>
> ---
> drivers/pci/probe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index f41128f91ca76..ec67adbf31738 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1444,6 +1444,9 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> goto out;
> }
>
The bridge should was marked as broken=1 already, bailed out earlier,
wouldn't get here with bridge forwarding was disabled. no further
configuration anymore. what is your kernel number ?
Thanks,
Ethan> + if(pci_has_subordinate(dev))
> + goto out;
> +
> /* Clear errors */
> pci_write_config_word(dev, PCI_STATUS, 0xffff);
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
2025-08-17 2:46 ` Ethan Zhao
@ 2025-08-25 8:47 ` He, Rui
0 siblings, 0 replies; 7+ messages in thread
From: He, Rui @ 2025-08-25 8:47 UTC (permalink / raw)
To: Ethan Zhao, Bjorn Helgaas
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Chikhalkar, Prashant, Xiao, Jiguang
> -----Original Message-----
> From: Ethan Zhao <etzhao1900@gmail.com>
> Sent: 2025年8月17日 10:46
> To: He, Rui <Rui.He@windriver.com>; Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Chikhalkar,
> Prashant <Prashant.Chikhalkar@windriver.com>; Xiao, Jiguang
> <Jiguang.Xiao@windriver.com>
> Subject: Re: [PATCH 1/1] pci: Add subordinate check before
> pci_add_new_bus()
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On 8/14/2025 5:39 PM, Rui He wrote:
> > For preconfigured PCI bridge, child bus created on the first scan.
> > While for some reasons(e.g register mutation), the secondary, and
> > subordiante register reset to 0 on the second scan, which caused to
> > create PCI bus twice for the same PCI device.
> >
> > Following is the related log:
> > [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus
> > 0d] [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: bridge
> > configuration invalid ([bus 00-00]), reconfiguring [Wed May 28
> > 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0e-10] [Wed
> > May 28 20:38:36 CST 2025] pci 0000:0b:05.0: PCI bridge to [bus 0f-10]
> Could you help to attach a 'lspci -t' about the topology ?
> bridge 0000:0b:01.0 and 0000:0b:05.0 have the same subordinate bus
> number, that is weird seems they aren't connected as upstream and
> downstream, but siblings.
Follwing is the related lspci logs.
# lspci -tv
......
\-[0000:00]-+-00.0 Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2
+-03.1-[08-73]----00.0-[09-73]--+-00.0 Microsemi / PMC / IDT PES24NT24G2 PCI Express Switch
| +-02.0-[0a-10]----00.0-[0b-10]--+-01.0-[0d]----00.0 Device xxxx
| | +-04.0 PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch
| | +-05.0-[00]--
| | +-07.0-[00]--
| | \-09.0-[00]--
| +-03.0-[11-17]----00.0-[12-17]--+-01.0-[14]----00.0 Device xxxx
| | +-04.0 PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch
| | +-05.0-[00]--
| | +-07.0-[00]--
| | \-09.0-[00]--
......
Yes, you are right. 0000:0b:01.0 and 0000:0b:05.0 are siblings.
I added 000:0b:05.0 to indicate that [bus 0d] is created during the first scan, while [bus 0e-10] is created during the second scan.
Here, 0000:0b:01.0 is pre-assigned to bus 0d. 0000:05[07, 09].0 is not configured.
>
> Does the device behind the bridge 0000:0b:05.0 work after the second scan
> (TLP are forwarded) ?>
> > Here PCI device 000:0b:01.0 assigend to bus 0d and 0e.
> >
> > This patch checks if child PCI bus has been created on the second scan
> > of bridge. If yes, return directly instead of create a new one.
> >
> > Signed-off-by: Rui He <rui.he@windriver.com>
> > ---
> > drivers/pci/probe.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> > f41128f91ca76..ec67adbf31738 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -1444,6 +1444,9 @@ static int pci_scan_bridge_extend(struct pci_bus
> *bus, struct pci_dev *dev,
> > goto out;
> > }
> >
> The bridge should was marked as broken=1 already, bailed out earlier,
> wouldn't get here with bridge forwarding was disabled. no further
> configuration anymore. what is your kernel number ?
>
>
> Thanks,
> Ethan
My kernel version is v5.2.60 (https://git.yoctoproject.org/linux-yocto/tree/Makefile?h=v5.2/standard/base).
The bridge can be marked to broken=1 on the first scan, while this error happens on the second scan, here pass=1, (!pass) always be false,
broken is impossible to set to 1.
[bus 0e-10] was created on 0000:0b:01.0 on the second scan, which means that the if condition is false.
-> if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
-> !is_cardbus && !broken) {
pcibios_assign_all_busses() always be false as "pci=assign-busses" not added to cmdline.
Is_cardbus always be false as 0000:0b.01.0 is a bridge.
Broken always be false on the second scan as pass=1.
The only possible is that (secondary || subordinate) is false.
Thanks,
Rui
> + if(pci_has_subordinate(dev))
> > + goto out;
> > +
> > /* Clear errors */
> > pci_write_config_word(dev, PCI_STATUS, 0xffff);
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
2025-08-15 14:22 ` Bjorn Helgaas
@ 2025-08-26 7:01 ` He, Rui
0 siblings, 0 replies; 7+ messages in thread
From: He, Rui @ 2025-08-26 7:01 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Chikhalkar, Prashant, Xiao, Jiguang
[-- Attachment #1: Type: text/plain, Size: 5290 bytes --]
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: 2025年8月15日 22:23
> To: He, Rui <Rui.He@windriver.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
> linux-kernel@vger.kernel.org; Chikhalkar, Prashant
> <Prashant.Chikhalkar@windriver.com>; Xiao, Jiguang
> <Jiguang.Xiao@windriver.com>
> Subject: Re: [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus()
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know
> the content is safe.
>
> On Fri, Aug 15, 2025 at 02:31:31AM +0000, He, Rui wrote:
> > > -----Original Message-----
> > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > Sent: 2025年8月15日 4:36
> > > To: He, Rui <Rui.He@windriver.com>
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
> > > linux-kernel@vger.kernel.org; Chikhalkar, Prashant
> > > <Prashant.Chikhalkar@windriver.com>; Xiao, Jiguang
> > > <Jiguang.Xiao@windriver.com>
> > > Subject: Re: [PATCH 1/1] pci: Add subordinate check before
> > > pci_add_new_bus()
> > >
> > > CAUTION: This email comes from a non Wind River email account!
> > > Do not click links or open attachments unless you recognize the
> > > sender and know the content is safe.
> > >
> > > On Thu, Aug 14, 2025 at 05:39:37PM +0800, Rui He wrote:
> > > > For preconfigured PCI bridge, child bus created on the first scan.
> > > > While for some reasons(e.g register mutation), the secondary, and
> > > > subordiante register reset to 0 on the second scan, which caused
> > > > to create PCI bus twice for the same PCI device.
> > >
> > > I don't quite follow this. Do you mean something is changing the
> > > bridge configuration between the first and second scans?
> >
> > I'm not sure what changed the bridge configuration, but the secondary
> > and subordinate is indeed 0 on the second scan as [bus 0e-10] created
> > for 0000:0b:01.0.
> >
> > In my opinion, it might be an invalid communication or register
> > mutation in PCI bridge.
>
> > > > Following is the related log:
> > > > [Wed May 28 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to
> > > > [bus 0d] [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: bridge
> > > > configuration invalid ([bus 00-00]), reconfiguring [Wed May 28
> > > > 20:38:36 CST 2025] pci 0000:0b:01.0: PCI bridge to [bus 0e-10]
> > > > [Wed May 28 20:38:36 CST 2025] pci 0000:0b:05.0: PCI bridge to
> > > > [bus 0f-10]
>
> > > > Here PCI device 000:0b:01.0 assigend to bus 0d and 0e.
> > >
> > > It looks like the [bus 0f-10] range is assigned to both bridges
> > > (0b:01.0 and 0b:05.0), which would definitely be a problem.
> > >
> > > I'm surprised that we haven't tripped over this before, and I'm
> > > curious about how we got here. Can you set CONFIG_DYNAMIC_DEBUG=y,
> > > boot with the dyndbg="file drivers/pci/*
> > > +p" kernel parameter, and collect the complete dmesg log?
> >
> > Sorry, as this is a individual issue, and cannot be reproduced, I
> > cannot offer more detailed logs.
>
> Do you have the complete dmesg log from this one time you saw the problem?
>
> As-is, I don't think there's quite enough here to move forward with this. I think
> we need some more detailed analysis to figure out how this happens.
>
> Bjorn
Attached is the dmesg logs while scan the PCI bus.
As the dmesg is customer sensitive, I have removed irrelevant logs and only kept the PCI enumeration logs that can be obtained.
Among that, "Quirks: Set bus range to 0xAABBCC" is printed by the custom hook declared through DECLARE_PCI_FIXUP_EARLY() in drvivers/pci/quirks.c.
0xAABBCC refers to the predefined PIC bridge Subordinate Bus Number, Secondary Bus Number, and Primary Bus Number.
Through the logs, [bus 0d 0e 0c] is created for pci device 0000:0b:01.0, but when removing PCI device 0000:0b:01.0, only bus 0c is removed, bus 0d 0e is still there.
Following is the related logs of "lspci"
# lspci -tv
......
\-[0000:00]-+-00.0 Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2
+-03.1-[08-73]----00.0-[09-73]--+-00.0 Microsemi / PMC / IDT PES24NT24G2 PCI Express Switch
| +-02.0-[0a-10]----00.0-[0b-10]--+-01.0-[0d]----00.0 Device xxxx
| | +-04.0 PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch
| | +-05.0-[00]--
| | +-07.0-[00]--
| | \-09.0-[00]--
| +-03.0-[11-17]----00.0-[12-17]--+-01.0-[14]----00.0 Device xxxx
| | +-04.0 PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch
| | +-05.0-[00]--
| | +-07.0-[00]--
| | \-09.0-[00]--
......
Thanks,
Rui
[-- Attachment #2: dmesg-pci-scan.txt --]
[-- Type: text/plain, Size: 10955 bytes --]
pci 0000:11:00.0: [10b5:8606] type 01 class 0x060400
pci 0000:11:00.0: Quirks: Set bus range to 0x171211
pci 0000:11:00.0: reg 0x10: [mem 0x00000000-0x0001ffff]
pci 0000:11:00.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:11:00.0: PME# supported from D0 D3hot D3cold
pci 0000:11:00.0: Adding to iommu group 28
pci 0000:12:01.0: [10b5:8606] type 01 class 0x060400
pci 0000:12:01.0: Quirks: Set bus range to 0x141412
pci 0000:12:01.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:12:01.0: PME# supported from D0 D3hot D3cold
pci 0000:12:01.0: Adding to iommu group 28
pci 0000:12:04.0: [10b5:8606] type 00 class 0x068000
pci 0000:12:04.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:12:04.0: Adding to iommu group 28
pci 0000:12:05.0: [10b5:8606] type 01 class 0x060400
pci 0000:12:05.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:12:05.0: PME# supported from D0 D3hot D3cold
pci 0000:12:05.0: Adding to iommu group 28
pci 0000:12:07.0: [10b5:8606] type 01 class 0x060400
pci 0000:12:07.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:12:07.0: PME# supported from D0 D3hot D3cold
pci 0000:12:07.0: Adding to iommu group 28
pci 0000:12:09.0: [10b5:8606] type 01 class 0x060400
pci 0000:12:09.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:12:09.0: PME# supported from D0 D3hot D3cold
pci 0000:12:09.0: Adding to iommu group 28
pci 0000:11:00.0: PCI bridge to [bus 12-17]
pci 0000:11:00.0: bridge window [mem 0xa1000000-0xa1ffffff]
......
(removed the enumeration of 0000:14:00.0 as it's customer sensitive)
......
pci 0000:12:01.0: PCI bridge to [bus 14]
pci 0000:12:01.0: bridge window [mem 0xa1000000-0xa17fffff]
pci 0000:12:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:12:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:12:09.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:12:05.0: PCI bridge to [bus 15-17]
pci_bus 0000:15: busn_res: [bus 15-17] end is updated to 15
pci 0000:12:07.0: PCI bridge to [bus 16-17]
pci_bus 0000:16: busn_res: [bus 16-17] end is updated to 16
pci 0000:12:09.0: PCI bridge to [bus 17]
pci_bus 0000:17: busn_res: [bus 17] end is updated to 17
pcieport 0000:09:02.0: bridge window [io 0x1000-0x0fff] to [bus 0a-10] add_size 1000
pcieport 0000:09:03.0: bridge window [io 0x1000-0x0fff] to [bus 11-17] add_size 1000
pci 0000:11:00.0: BAR 14: assigned [mem 0xa1000000-0xa1ffffff]
pci 0000:12:01.0: BAR 14: assigned [mem 0xa1000000-0xa17fffff]
pci 0000:14:00.0: BAR 0: assigned [mem 0xa1000000-0xa17fffff 64bit]
pci 0000:12:01.0: PCI bridge to [bus 14]
pci 0000:12:01.0: bridge window [mem 0xa1000000-0xa17fffff]
pci 0000:12:05.0: PCI bridge to [bus 15]
pci 0000:12:07.0: PCI bridge to [bus 16]
pci 0000:12:09.0: PCI bridge to [bus 17]
pci 0000:11:00.0: PCI bridge to [bus 12-17]
pci 0000:11:00.0: bridge window [mem 0xa1000000-0xa1ffffff]
pcieport 0000:12:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:09.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci_bus 0000:15: busn_res: [bus 15] end is updated to 15
pci_bus 0000:16: busn_res: [bus 16] end is updated to 16
pci_bus 0000:17: busn_res: [bus 17] end is updated to 17
pcieport 0000:09:02.0: bridge window [io 0x1000-0x0fff] to [bus 0a-10] add_size 1000
pcieport 0000:09:03.0: bridge window [io 0x1000-0x0fff] to [bus 11-17] add_size 1000
pci 0000:0a:00.0: [10b5:8606] type 01 class 0x060400
pci 0000:0a:00.0: Quirks: Set bus range to 0x100b0a
pci 0000:0a:00.0: reg 0x10: [mem 0x00000000-0x0001ffff]
pci 0000:0a:00.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:0a:00.0: PME# supported from D0 D3hot D3cold
pci 0000:0a:00.0: Adding to iommu group 27
pci 0000:0b:01.0: [10b5:8606] type 01 class 0x060400
pci 0000:0b:01.0: Quirks: Set bus range to 0x0d0d0b
pci 0000:0b:01.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:0b:01.0: PME# supported from D0 D3hot D3cold
pci 0000:0b:01.0: Adding to iommu group 27
pci 0000:0b:04.0: [10b5:8606] type 00 class 0x068000
pci 0000:0b:04.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:0b:04.0: Adding to iommu group 27
pci 0000:0b:05.0: [10b5:8606] type 01 class 0x060400
pci 0000:0b:05.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:0b:05.0: PME# supported from D0 D3hot D3cold
pci 0000:0b:05.0: Adding to iommu group 27
pci 0000:0b:07.0: [10b5:8606] type 01 class 0x060400
pci 0000:0b:07.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:0b:07.0: PME# supported from D0 D3hot D3cold
pci 0000:0b:07.0: Adding to iommu group 27
pci 0000:0b:09.0: [10b5:8606] type 01 class 0x060400
pci 0000:0b:09.0: Max Payload Size set to 256 (was 128, max 512)
pci 0000:0b:09.0: PME# supported from D0 D3hot D3cold
pci 0000:0b:09.0: Adding to iommu group 27
pci 0000:0a:00.0: PCI bridge to [bus 0b-10]
pci 0000:0a:00.0: bridge window [mem 0xa0000000-0xa0ffffff]
......
(removed the enumeration of 0000:0d:00.0 as it's customer sensitive)
......
pci 0000:0b:01.0: PCI bridge to [bus 0d]
pci 0000:0b:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:0b:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:0b:09.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:0b:01.0: PCI bridge to [bus 0e-10]
pci_bus 0000:0e: busn_res: [bus 0e-10] end is updated to 0e
pci 0000:0b:05.0: PCI bridge to [bus 0f-10]
pci_bus 0000:0f: busn_res: [bus 0f-10] end is updated to 0f
pci 0000:0b:07.0: PCI bridge to [bus 10]
pci_bus 0000:10: busn_res: [bus 10] end is updated to 10
pcieport 0000:12:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:09.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci_bus 0000:15: busn_res: [bus 15] end is updated to 15
pci_bus 0000:16: busn_res: [bus 16] end is updated to 16
pci_bus 0000:17: busn_res: [bus 17] end is updated to 17
pci_bus 0000:11: busn_res: [bus 11-17] end is updated to 17
pci 0000:0b:09.0: devices behind bridge are unusable because [bus 11-17] cannot be assigned for them
pci 0000:0a:00.0: bridge has subordinate 10 but max busn 17
pcieport 0000:12:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:09.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci_bus 0000:15: busn_res: [bus 15] end is updated to 15
pci_bus 0000:16: busn_res: [bus 16] end is updated to 16
pci_bus 0000:17: busn_res: [bus 17] end is updated to 17
pcieport 0000:09:02.0: bridge window [io 0x1000-0x0fff] to [bus 0a-10] add_size 1000
pcieport 0000:09:03.0: bridge window [io 0x1000-0x0fff] to [bus 11-17] add_size 1000
pci 0000:0a:00.0: BAR 14: assigned [mem 0xa0000000-0xa0ffffff]
pci 0000:0a:00.0: BAR 0: no space for [mem size 0x00020000]
pci 0000:0a:00.0: BAR 0: failed to assign [mem size 0x00020000]
pci 0000:0b:01.0: PCI bridge to [bus 0e]
pci 0000:0b:05.0: PCI bridge to [bus 0f]
pci 0000:0b:07.0: PCI bridge to [bus 10]
pci 0000:0a:00.0: PCI bridge to [bus 0b-10]
pci 0000:0a:00.0: bridge window [mem 0xa0000000-0xa0ffffff]
pcieport 0000:0b:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:0b:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:0b:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:0b:09.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci_bus 0000:0c: busn_res: can not insert [bus 0c-10] under [bus 0b-10] (conflicts with (null) [bus 0d])
......
(removed the enumeration of 0000:0d:00.0 as it's customer sensitive)
......
pcieport 0000:0b:01.0: PCI bridge to [bus 0c-10]
pci_bus 0000:0c: busn_res: [bus 0c-10] end is updated to 0c
pci_bus 0000:0d: busn_res: [bus 0d] end is updated to 0d
pci_bus 0000:0e: busn_res: [bus 0e] end is updated to 0e
pci_bus 0000:0f: busn_res: [bus 0f] end is updated to 0f
pcieport 0000:12:05.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pcieport 0000:12:09.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci_bus 0000:15: busn_res: [bus 15] end is updated to 15
pci_bus 0000:16: busn_res: [bus 16] end is updated to 16
pci_bus 0000:17: busn_res: [bus 17] end is updated to 17
pcieport 0000:09:02.0: bridge window [io 0x1000-0x0fff] to [bus 0a-10] add_size 1000
pcieport 0000:09:03.0: bridge window [io 0x1000-0x0fff] to [bus 11-17] add_size 1000
pcieport 0000:0b:01.0: BAR 14: assigned [mem 0xa0000000-0xa07fffff]
pci 0000:0c:00.0: BAR 0: assigned [mem 0xa0000000-0xa07fffff 64bit]
pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:0a:00.0
pcieport 0000:0a:00.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
pcieport 0000:0a:00.0: AER: device [10b5:8606] error status/mask=00100000/00000000
pcieport 0000:0a:00.0: AER: [20] UnsupReq (First)
pcieport 0000:0a:00.0: AER: TLP Header: 40000001 0000060f a0000204 00000000
pcieport 0000:0a:00.0: AER: Error of this Agent is reported first
pcieport 0000:09:02.0: AER: Device recovery failed
pcieport 0000:00:03.1: AER: Uncorrected (Non-Fatal) error received: 0000:0a:00.0
pcieport 0000:0a:00.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
pcieport 0000:0a:00.0: AER: device [10b5:8606] error status/mask=00100000/00000000
pcieport 0000:0a:00.0: AER: [20] UnsupReq (First)
pcieport 0000:0a:00.0: AER: TLP Header: 40000001 0000060f a0000204 00000000
pcieport 0000:09:02.0: AER: Device recovery failed
pci 0000:0c:00.0: Removing from iommu group 27
pci_bus 0000:0c: busn_res: [bus 0c] is released
pci 0000:0b:01.0: Removing from iommu group 27
pci 0000:0b:04.0: Removing from iommu group 27
pci_bus 0000:0f: busn_res: [bus 0f] is released
pci 0000:0b:05.0: Removing from iommu group 27
pci_bus 0000:10: busn_res: [bus 10] is released
pci 0000:0b:07.0: Removing from iommu group 27
pci 0000:0b:09.0: Removing from iommu group 27
pci_bus 0000:0b: busn_res: [bus 0b-10] is released
pci 0000:0a:00.0: Removing from iommu group 27
pcieport 0000:00:03.1: AER: Uncorrected (Non-Fatal) error received: 0000:00:03.1
pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
pcieport 0000:00:03.1: AER: device [8086:2f09] error status/mask=00004000/00000000
pcieport 0000:00:03.1: AER: [14] CmpltTO (First)
pcieport 0000:00:03.1: AER: Device recovery failed
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-26 7:01 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14 9:39 [PATCH 1/1] pci: Add subordinate check before pci_add_new_bus() Rui He
2025-08-14 20:36 ` Bjorn Helgaas
2025-08-15 2:31 ` He, Rui
2025-08-15 14:22 ` Bjorn Helgaas
2025-08-26 7:01 ` He, Rui
2025-08-17 2:46 ` Ethan Zhao
2025-08-25 8:47 ` He, Rui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).