* [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb @ 2023-03-31 11:40 Donald Hunter 2023-03-31 19:42 ` Bjorn Helgaas 2023-04-02 10:26 ` Linux regression tracking #adding (Thorsten Leemhuis) 0 siblings, 2 replies; 12+ messages in thread From: Donald Hunter @ 2023-03-31 11:40 UTC (permalink / raw) To: linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen The 6.3-rc1 and later release candidates are hanging during boot on our Dell PowerEdge R620 servers with Intel I350 nics (igb). After bisecting from v6.2 to v6.3-rc1, I isolated the problem to: [6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3] PCI: Honor firmware's device disabled status diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 1779582fb500..b1d80c1d7a69 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1841,6 +1841,8 @@ int pci_setup_device(struct pci_dev *dev) pci_set_of_node(dev); pci_set_acpi_fwnode(dev); + if (dev->dev.fwnode && !fwnode_device_is_available(dev->dev.fwnode)) + return -ENODEV; pci_dev_assign_slot(dev); I have verified that reverting 6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3 resolves the issue on v6.3-rc4. Here's the kernel log from v6.3.0-rc1: igb: Intel(R) Gigabit Ethernet Network Driver igb: Copyright (c) 2007-2014 Intel Corporation. igb 0000:07:00.0: can't derive routing for PCI INT D igb 0000:07:00.0: PCI INT D: no GSI igb 0000:07:00.0 0000:07:00.0 (uninitialized): PCIe link lost ------------[ cut here ]------------ igb: Failed to read reg 0x18! WARNING: CPU: 23 PID: 814 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x78/0x90 [igb] Modules linked in: igb(+) fjes(-) mei rapl intel_cstate mdio intel_uncore ipmi_si iTCO_wdt intel_pmc_bxt ipmi_devi> CPU: 23 PID: 814 Comm: systemd-udevd Not tainted 6.3.0-rc1 #1 Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 RIP: 0010:igb_rd32+0x78/0x90 [igb] Code: 48 c7 c6 f5 56 d3 c0 e8 96 51 f9 c8 48 8b bb 28 ff ff ff e8 3a 46 b6 c8 84 c0 74 c9 89 ee 48 c7 c7 18 64 d3 > RSP: 0018:ffffab6a07d37b10 EFLAGS: 00010286 RAX: 000000000000001d RBX: ffff900385208f18 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff8a8ba498 RDI: 00000000ffffffff RBP: 0000000000000018 R08: 0000000000000000 R09: ffffab6a07d379b8 R10: 0000000000000003 R11: ffffffff8b143de8 R12: ffff8ffc4518b0d0 R13: ffff9003852089c0 R14: ffff900385208f18 R15: ffff900385208000 FS: 00007faa81c07b40(0000) GS:ffff900b5fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007faa811c3594 CR3: 000000010c6c0001 CR4: 00000000001706e0 Call Trace: <TASK> igb_get_invariants_82575+0x92/0xec0 [igb] igb_probe+0x3bd/0x1510 [igb] local_pci_probe+0x41/0x90 pci_device_probe+0xb3/0x220 really_probe+0x1a2/0x400 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x78/0x170 driver_probe_device+0x1f/0x90 __driver_attach+0xce/0x1c0 bus_for_each_dev+0x74/0xb0 bus_add_driver+0x112/0x210 driver_register+0x55/0x100 ? __pfx_init_module+0x10/0x10 [igb] do_one_initcall+0x59/0x230 do_init_module+0x4a/0x210 __do_sys_finit_module+0x93/0xf0 do_syscall_64+0x5b/0x80 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode_prepare+0x18e/0x1c0 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? __irq_exit_rcu+0x3d/0x140 ? common_interrupt+0x61/0xd0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7faa81b0b27d Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c > RSP: 002b:00007fff03879908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 00005594ac692a60 RCX: 00007faa81b0b27d RDX: 0000000000000000 RSI: 00007faa8224d43c RDI: 000000000000000e RBP: 00007faa8224d43c R08: 0000000000000000 R09: 00005594ac758fc0 R10: 000000000000000e R11: 0000000000000246 R12: 0000000000020000 R13: 00005594ac690480 R14: 0000000000000000 R15: 00005594ac693450 </TASK> ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-03-31 11:40 [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb Donald Hunter @ 2023-03-31 19:42 ` Bjorn Helgaas 2023-04-01 12:52 ` Donald Hunter 2023-04-02 10:26 ` Linux regression tracking #adding (Thorsten Leemhuis) 1 sibling, 1 reply; 12+ messages in thread From: Bjorn Helgaas @ 2023-03-31 19:42 UTC (permalink / raw) To: Donald Hunter Cc: linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen Thanks a lot for the report and for all the work you did to bisect and identify the commit. On Fri, Mar 31, 2023 at 12:40:11PM +0100, Donald Hunter wrote: > The 6.3-rc1 and later release candidates are hanging during boot on our > Dell PowerEdge R620 servers with Intel I350 nics (igb). > > After bisecting from v6.2 to v6.3-rc1, I isolated the problem to: > > [6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3] PCI: Honor firmware's device > disabled status > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 1779582fb500..b1d80c1d7a69 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1841,6 +1841,8 @@ int pci_setup_device(struct pci_dev *dev) > > pci_set_of_node(dev); > pci_set_acpi_fwnode(dev); > + if (dev->dev.fwnode && !fwnode_device_is_available(dev->dev.fwnode)) > + return -ENODEV; > > pci_dev_assign_slot(dev); I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) because it apparently has an ACPI firmware node, and there's something we don't expect about its status? Hopefully Rob will look at this. If I were looking, I would be interested in acpidump to see what's in the DSDT. Bjorn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-03-31 19:42 ` Bjorn Helgaas @ 2023-04-01 12:52 ` Donald Hunter 2023-04-02 22:55 ` Bjorn Helgaas 0 siblings, 1 reply; 12+ messages in thread From: Donald Hunter @ 2023-04-01 12:52 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > because it apparently has an ACPI firmware node, and there's something > we don't expect about its status? Yes they are built-in, to my knowledge. > Hopefully Rob will look at this. If I were looking, I would be > interested in acpidump to see what's in the DSDT. I can get an acpidump. Is there a preferred way to share the files, or just an email attachment? > Bjorn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-01 12:52 ` Donald Hunter @ 2023-04-02 22:55 ` Bjorn Helgaas 2023-04-10 15:10 ` Donald Hunter 0 siblings, 1 reply; 12+ messages in thread From: Bjorn Helgaas @ 2023-04-02 22:55 UTC (permalink / raw) To: Donald Hunter Cc: linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > > because it apparently has an ACPI firmware node, and there's something > > we don't expect about its status? > > Yes they are built-in, to my knowledge. > > > Hopefully Rob will look at this. If I were looking, I would be > > interested in acpidump to see what's in the DSDT. > > I can get an acpidump. Is there a preferred way to share the files, or just > an email attachment? I think by default acpidump produces ASCII that can be directly included in email. http://vger.kernel.org/majordomo-info.html says 100K is the limit for vger mailing lists. Or you could open a report at https://bugzilla.kernel.org and attach it there, maybe along with a complete dmesg log and "sudo lspci -vv" output. Bjorn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-02 22:55 ` Bjorn Helgaas @ 2023-04-10 15:10 ` Donald Hunter 2023-04-10 21:37 ` Bjorn Helgaas 0 siblings, 1 reply; 12+ messages in thread From: Donald Hunter @ 2023-04-10 15:10 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > > > because it apparently has an ACPI firmware node, and there's something > > > we don't expect about its status? > > > > Yes they are built-in, to my knowledge. > > > > > Hopefully Rob will look at this. If I were looking, I would be > > > interested in acpidump to see what's in the DSDT. > > > > I can get an acpidump. Is there a preferred way to share the files, or just > > an email attachment? > > I think by default acpidump produces ASCII that can be directly > included in email. http://vger.kernel.org/majordomo-info.html says > 100K is the limit for vger mailing lists. Or you could open a report > at https://bugzilla.kernel.org and attach it there, maybe along with a > complete dmesg log and "sudo lspci -vv" output. Apologies for the delay, I was unable to access the machine while travelling. https://bugzilla.kernel.org/show_bug.cgi?id=217317 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-10 15:10 ` Donald Hunter @ 2023-04-10 21:37 ` Bjorn Helgaas 2023-04-11 12:53 ` Donald Hunter 0 siblings, 1 reply; 12+ messages in thread From: Bjorn Helgaas @ 2023-04-10 21:37 UTC (permalink / raw) To: Donald Hunter Cc: linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen On Mon, Apr 10, 2023 at 04:10:54PM +0100, Donald Hunter wrote: > On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: > > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > > > > because it apparently has an ACPI firmware node, and there's something > > > > we don't expect about its status? > > > > > > Yes they are built-in, to my knowledge. > > > > > > > Hopefully Rob will look at this. If I were looking, I would be > > > > interested in acpidump to see what's in the DSDT. > > > > > > I can get an acpidump. Is there a preferred way to share the files, or just > > > an email attachment? > > > > I think by default acpidump produces ASCII that can be directly > > included in email. http://vger.kernel.org/majordomo-info.html says > > 100K is the limit for vger mailing lists. Or you could open a report > > at https://bugzilla.kernel.org and attach it there, maybe along with a > > complete dmesg log and "sudo lspci -vv" output. > > Apologies for the delay, I was unable to access the machine while travelling. > > https://bugzilla.kernel.org/show_bug.cgi?id=217317 Thanks for that! Can you boot a kernel with 6fffbc7ae137 reverted with this in the kernel parameters: dyndbg="file drivers/acpi/* +p" and collect the entire dmesg log? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-10 21:37 ` Bjorn Helgaas @ 2023-04-11 12:53 ` Donald Hunter 2023-04-11 19:02 ` Rob Herring 0 siblings, 1 reply; 12+ messages in thread From: Donald Hunter @ 2023-04-11 12:53 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen Bjorn Helgaas <helgaas@kernel.org> writes: > On Mon, Apr 10, 2023 at 04:10:54PM +0100, Donald Hunter wrote: >> On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@kernel.org> wrote: >> > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: >> > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: >> > > > >> > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) >> > > > because it apparently has an ACPI firmware node, and there's something >> > > > we don't expect about its status? >> > > >> > > Yes they are built-in, to my knowledge. >> > > >> > > > Hopefully Rob will look at this. If I were looking, I would be >> > > > interested in acpidump to see what's in the DSDT. >> > > >> > > I can get an acpidump. Is there a preferred way to share the files, or just >> > > an email attachment? >> > >> > I think by default acpidump produces ASCII that can be directly >> > included in email. http://vger.kernel.org/majordomo-info.html says >> > 100K is the limit for vger mailing lists. Or you could open a report >> > at https://bugzilla.kernel.org and attach it there, maybe along with a >> > complete dmesg log and "sudo lspci -vv" output. >> >> Apologies for the delay, I was unable to access the machine while travelling. >> >> https://bugzilla.kernel.org/show_bug.cgi?id=217317 > > Thanks for that! Can you boot a kernel with 6fffbc7ae137 reverted > with this in the kernel parameters: > > dyndbg="file drivers/acpi/* +p" > > and collect the entire dmesg log? Added to the bugzilla report. Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-11 12:53 ` Donald Hunter @ 2023-04-11 19:02 ` Rob Herring 2023-04-12 13:20 ` Andy Shevchenko 0 siblings, 1 reply; 12+ messages in thread From: Rob Herring @ 2023-04-11 19:02 UTC (permalink / raw) To: Donald Hunter, Bjorn Helgaas, Rafael J. Wysocki, Andy Shevchenko Cc: linux-kernel, linux-pci, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen +Rafael, Andy On Tue, Apr 11, 2023 at 7:53 AM Donald Hunter <donald.hunter@gmail.com> wrote: > > Bjorn Helgaas <helgaas@kernel.org> writes: > > > On Mon, Apr 10, 2023 at 04:10:54PM +0100, Donald Hunter wrote: > >> On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@kernel.org> wrote: > >> > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: > >> > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > >> > > > > >> > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > >> > > > because it apparently has an ACPI firmware node, and there's something > >> > > > we don't expect about its status? > >> > > > >> > > Yes they are built-in, to my knowledge. > >> > > > >> > > > Hopefully Rob will look at this. If I were looking, I would be > >> > > > interested in acpidump to see what's in the DSDT. > >> > > > >> > > I can get an acpidump. Is there a preferred way to share the files, or just > >> > > an email attachment? > >> > > >> > I think by default acpidump produces ASCII that can be directly > >> > included in email. http://vger.kernel.org/majordomo-info.html says > >> > 100K is the limit for vger mailing lists. Or you could open a report > >> > at https://bugzilla.kernel.org and attach it there, maybe along with a > >> > complete dmesg log and "sudo lspci -vv" output. > >> > >> Apologies for the delay, I was unable to access the machine while travelling. > >> > >> https://bugzilla.kernel.org/show_bug.cgi?id=217317 > > > > Thanks for that! Can you boot a kernel with 6fffbc7ae137 reverted > > with this in the kernel parameters: > > > > dyndbg="file drivers/acpi/* +p" > > > > and collect the entire dmesg log? > > Added to the bugzilla report. Rafael, Andy, Any ideas why fwnode_device_is_available() would return false for a built-in PCI device with a ACPI device entry? The only thing I see in the log is it looks like the parent PCI bridge/bus doesn't have ACPI device entry (based on "[ 0.913389] pci_bus 0000:07: No ACPI support"). For DT, if the parent doesn't have a node, then the child can't. Not sure on ACPI. Rob ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-11 19:02 ` Rob Herring @ 2023-04-12 13:20 ` Andy Shevchenko 2023-04-19 19:34 ` Bjorn Helgaas 0 siblings, 1 reply; 12+ messages in thread From: Andy Shevchenko @ 2023-04-12 13:20 UTC (permalink / raw) To: Rob Herring Cc: Donald Hunter, Bjorn Helgaas, Rafael J. Wysocki, linux-kernel, linux-pci, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen On Tue, Apr 11, 2023 at 02:02:03PM -0500, Rob Herring wrote: > On Tue, Apr 11, 2023 at 7:53 AM Donald Hunter <donald.hunter@gmail.com> wrote: > > Bjorn Helgaas <helgaas@kernel.org> writes: > > > On Mon, Apr 10, 2023 at 04:10:54PM +0100, Donald Hunter wrote: > > >> On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@kernel.org> wrote: > > >> > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: > > >> > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > > >> > > > > > >> > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > > >> > > > because it apparently has an ACPI firmware node, and there's something > > >> > > > we don't expect about its status? > > >> > > > > >> > > Yes they are built-in, to my knowledge. > > >> > > > > >> > > > Hopefully Rob will look at this. If I were looking, I would be > > >> > > > interested in acpidump to see what's in the DSDT. > > >> > > > > >> > > I can get an acpidump. Is there a preferred way to share the files, or just > > >> > > an email attachment? > > >> > > > >> > I think by default acpidump produces ASCII that can be directly > > >> > included in email. http://vger.kernel.org/majordomo-info.html says > > >> > 100K is the limit for vger mailing lists. Or you could open a report > > >> > at https://bugzilla.kernel.org and attach it there, maybe along with a > > >> > complete dmesg log and "sudo lspci -vv" output. > > >> > > >> Apologies for the delay, I was unable to access the machine while travelling. > > >> > > >> https://bugzilla.kernel.org/show_bug.cgi?id=217317 > > > > > > Thanks for that! Can you boot a kernel with 6fffbc7ae137 reverted > > > with this in the kernel parameters: > > > > > > dyndbg="file drivers/acpi/* +p" > > > > > > and collect the entire dmesg log? > > > > Added to the bugzilla report. > > Rafael, Andy, Any ideas why fwnode_device_is_available() would return > false for a built-in PCI device with a ACPI device entry? The only > thing I see in the log is it looks like the parent PCI bridge/bus > doesn't have ACPI device entry (based on "[ 0.913389] pci_bus > 0000:07: No ACPI support"). For DT, if the parent doesn't have a node, > then the child can't. Not sure on ACPI. Thanks for the Cc'ing. I haven't checked anything yet, but from the above it sounds like a BIOS issue. If PCI has no ACPI companion tree, then why the heck one of the devices has the entry? I'm not even sure this is allowed by ACPI specification, but as I said, I just solely used the above mail. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-12 13:20 ` Andy Shevchenko @ 2023-04-19 19:34 ` Bjorn Helgaas 2023-04-20 15:32 ` Rafael J. Wysocki 0 siblings, 1 reply; 12+ messages in thread From: Bjorn Helgaas @ 2023-04-19 19:34 UTC (permalink / raw) To: Andy Shevchenko Cc: Rob Herring, Donald Hunter, Rafael J. Wysocki, linux-kernel, linux-pci, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen On Wed, Apr 12, 2023 at 04:20:33PM +0300, Andy Shevchenko wrote: > On Tue, Apr 11, 2023 at 02:02:03PM -0500, Rob Herring wrote: > > On Tue, Apr 11, 2023 at 7:53 AM Donald Hunter <donald.hunter@gmail.com> wrote: > > > Bjorn Helgaas <helgaas@kernel.org> writes: > > > > On Mon, Apr 10, 2023 at 04:10:54PM +0100, Donald Hunter wrote: > > > >> On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > >> > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: > > > >> > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > >> > > > > > > >> > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > > > >> > > > because it apparently has an ACPI firmware node, and there's something > > > >> > > > we don't expect about its status? > > > >> > > > > > >> > > Yes they are built-in, to my knowledge. > > > >> > > > > > >> > > > Hopefully Rob will look at this. If I were looking, I would be > > > >> > > > interested in acpidump to see what's in the DSDT. > > > >> > > > > > >> > > I can get an acpidump. Is there a preferred way to share the files, or just > > > >> > > an email attachment? > > > >> > > > > >> > I think by default acpidump produces ASCII that can be directly > > > >> > included in email. http://vger.kernel.org/majordomo-info.html says > > > >> > 100K is the limit for vger mailing lists. Or you could open a report > > > >> > at https://bugzilla.kernel.org and attach it there, maybe along with a > > > >> > complete dmesg log and "sudo lspci -vv" output. > > > >> > > > >> Apologies for the delay, I was unable to access the machine while travelling. > > > >> > > > >> https://bugzilla.kernel.org/show_bug.cgi?id=217317 > > > > > > > > Thanks for that! Can you boot a kernel with 6fffbc7ae137 reverted > > > > with this in the kernel parameters: > > > > > > > > dyndbg="file drivers/acpi/* +p" > > > > > > > > and collect the entire dmesg log? > > > > > > Added to the bugzilla report. > > > > Rafael, Andy, Any ideas why fwnode_device_is_available() would return > > false for a built-in PCI device with a ACPI device entry? The only > > thing I see in the log is it looks like the parent PCI bridge/bus > > doesn't have ACPI device entry (based on "[ 0.913389] pci_bus > > 0000:07: No ACPI support"). For DT, if the parent doesn't have a node, > > then the child can't. Not sure on ACPI. > > Thanks for the Cc'ing. I haven't checked anything yet, but from the above it > sounds like a BIOS issue. If PCI has no ACPI companion tree, then why the heck > one of the devices has the entry? I'm not even sure this is allowed by ACPI > specification, but as I said, I just solely used the above mail. ACPI r6.5, sec 6.3.7, about _STA says: - Bit [0] - Set if the device is present. - Bit [1] - Set if the device is enabled and decoding its resources. - Bit [3] - Set if the device is functioning properly (cleared if device failed its diagnostics). ... If a device is present on an enumerable bus, then _STA must not return 0. In that case, bit[0] must be set and if the status of the device can be determined through a bus-specific enumeration and discovery mechanism, it must be reflected by the values of bit[1] and bit[3], even though the OSPM is not required to take them into account. Since PCI *is* an enumerable bus, I don't think we can use _STA to decide whether a PCI device is present. We can use _STA to decide whether a host bridge is present, of course, but that doesn't help here because the host bridge in question is PNP0A08:00 that leads to [bus 00-3d], and it is present. I don't know exactly what path led to the igb issue, but I don't think we need to figure that out. I think we just need to avoid the use of _STA in fwnode_device_is_available(). 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") appeared in v6.3-rc1, so I think we need to revert or fix it before v6.3, which will probably be tagged Sunday (and I'll be on vacation Friday-Monday). Bjorn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-04-19 19:34 ` Bjorn Helgaas @ 2023-04-20 15:32 ` Rafael J. Wysocki 0 siblings, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2023-04-20 15:32 UTC (permalink / raw) To: Bjorn Helgaas Cc: Andy Shevchenko, Rob Herring, Donald Hunter, linux-kernel, linux-pci, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen, ACPI Devel Maling List On Wed, Apr 19, 2023 at 9:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Apr 12, 2023 at 04:20:33PM +0300, Andy Shevchenko wrote: > > On Tue, Apr 11, 2023 at 02:02:03PM -0500, Rob Herring wrote: > > > On Tue, Apr 11, 2023 at 7:53 AM Donald Hunter <donald.hunter@gmail.com> wrote: > > > > Bjorn Helgaas <helgaas@kernel.org> writes: > > > > > On Mon, Apr 10, 2023 at 04:10:54PM +0100, Donald Hunter wrote: > > > > >> On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > >> > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote: > > > > >> > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > >> > > > > > > > >> > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card) > > > > >> > > > because it apparently has an ACPI firmware node, and there's something > > > > >> > > > we don't expect about its status? > > > > >> > > > > > > >> > > Yes they are built-in, to my knowledge. > > > > >> > > > > > > >> > > > Hopefully Rob will look at this. If I were looking, I would be > > > > >> > > > interested in acpidump to see what's in the DSDT. > > > > >> > > > > > > >> > > I can get an acpidump. Is there a preferred way to share the files, or just > > > > >> > > an email attachment? > > > > >> > > > > > >> > I think by default acpidump produces ASCII that can be directly > > > > >> > included in email. http://vger.kernel.org/majordomo-info.html says > > > > >> > 100K is the limit for vger mailing lists. Or you could open a report > > > > >> > at https://bugzilla.kernel.org and attach it there, maybe along with a > > > > >> > complete dmesg log and "sudo lspci -vv" output. > > > > >> > > > > >> Apologies for the delay, I was unable to access the machine while travelling. > > > > >> > > > > >> https://bugzilla.kernel.org/show_bug.cgi?id=217317 > > > > > > > > > > Thanks for that! Can you boot a kernel with 6fffbc7ae137 reverted > > > > > with this in the kernel parameters: > > > > > > > > > > dyndbg="file drivers/acpi/* +p" > > > > > > > > > > and collect the entire dmesg log? > > > > > > > > Added to the bugzilla report. > > > > > > Rafael, Andy, Any ideas why fwnode_device_is_available() would return > > > false for a built-in PCI device with a ACPI device entry? The only > > > thing I see in the log is it looks like the parent PCI bridge/bus > > > doesn't have ACPI device entry (based on "[ 0.913389] pci_bus > > > 0000:07: No ACPI support"). For DT, if the parent doesn't have a node, > > > then the child can't. Not sure on ACPI. > > > > Thanks for the Cc'ing. I haven't checked anything yet, but from the above it > > sounds like a BIOS issue. If PCI has no ACPI companion tree, then why the heck > > one of the devices has the entry? I'm not even sure this is allowed by ACPI > > specification, but as I said, I just solely used the above mail. > > ACPI r6.5, sec 6.3.7, about _STA says: > > - Bit [0] - Set if the device is present. > - Bit [1] - Set if the device is enabled and decoding its resources. > - Bit [3] - Set if the device is functioning properly (cleared if > device failed its diagnostics). > > ... > > If a device is present on an enumerable bus, then _STA must not > return 0. In that case, bit[0] must be set and if the status of the > device can be determined through a bus-specific enumeration and > discovery mechanism, it must be reflected by the values of bit[1] > and bit[3], even though the OSPM is not required to take them into > account. > > Since PCI *is* an enumerable bus, I don't think we can use _STA to > decide whether a PCI device is present. You are right, _STA can't be used for that. > We can use _STA to decide whether a host bridge is present, of course, > but that doesn't help here because the host bridge in question is > PNP0A08:00 that leads to [bus 00-3d], and it is present. > > I don't know exactly what path led to the igb issue, but I don't think > we need to figure that out. I think we just need to avoid the use of > _STA in fwnode_device_is_available(). I agree. It is incorrect. > 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") appeared > in v6.3-rc1, so I think we need to revert or fix it before v6.3, which > will probably be tagged Sunday (and I'll be on vacation > Friday-Monday). Yes, please revert this one ASAP. Cheers, Rafael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb 2023-03-31 11:40 [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb Donald Hunter 2023-03-31 19:42 ` Bjorn Helgaas @ 2023-04-02 10:26 ` Linux regression tracking #adding (Thorsten Leemhuis) 1 sibling, 0 replies; 12+ messages in thread From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-04-02 10:26 UTC (permalink / raw) To: Donald Hunter, linux-kernel, linux-pci, Rob Herring, Bjorn Helgaas, netdev, Jesse Brandeburg, Tony Nguyen, Linux kernel regressions list [CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html] [TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.] On 31.03.23 13:40, Donald Hunter wrote: > The 6.3-rc1 and later release candidates are hanging during boot on our > Dell PowerEdge R620 servers with Intel I350 nics (igb). > > After bisecting from v6.2 to v6.3-rc1, I isolated the problem to: > > [6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3] PCI: Honor firmware's device > disabled status > [...] Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced 6fffbc7ae1373e10b989afe23a9eeb9c49fe15c3 #regzbot title pci: / net: igb: hangs during boot on PowerEdge R620 #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-04-20 15:32 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-03-31 11:40 [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620 with igb Donald Hunter 2023-03-31 19:42 ` Bjorn Helgaas 2023-04-01 12:52 ` Donald Hunter 2023-04-02 22:55 ` Bjorn Helgaas 2023-04-10 15:10 ` Donald Hunter 2023-04-10 21:37 ` Bjorn Helgaas 2023-04-11 12:53 ` Donald Hunter 2023-04-11 19:02 ` Rob Herring 2023-04-12 13:20 ` Andy Shevchenko 2023-04-19 19:34 ` Bjorn Helgaas 2023-04-20 15:32 ` Rafael J. Wysocki 2023-04-02 10:26 ` Linux regression tracking #adding (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).