* Re: Kernel 6.5-rc2: system crash on suspend bisected [not found] <11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com> @ 2023-07-20 20:21 ` Bjorn Helgaas 2023-07-24 9:27 ` Igor Mammedov 2023-07-27 6:09 ` Michael S. Tsirkin 2023-07-23 9:24 ` Linux regression tracking #adding (Thorsten Leemhuis) 2023-07-24 13:59 ` [PATCH] hack to debug acpiphp crash Igor Mammedov 2 siblings, 2 replies; 11+ messages in thread From: Bjorn Helgaas @ 2023-07-20 20:21 UTC (permalink / raw) To: Woody Suwalski; +Cc: imammedo, bhelgaas, LKML, linux-pci, regressions [+cc regressions list] On Wed, Jul 19, 2023 at 11:36:51PM -0400, Woody Suwalski wrote: > Laptop shows a kernel crash trace after a first suspend to ram, on a second > attempt to suspend it becomes frozen solid. This is 100% repeatable with a > 6.5-rc2 kernel, not happening with a 6.4 kernel - see the attached dmesg > output. > > I have bisected the kernel uilds and it points to : > [40613da52b13fb21c5566f10b287e0ca8c12c4e9] PCI: acpiphp: Reassign resources > on bridge if necessary > > Reversing this patch seems to fix the kernel crash problem on my laptop. Thank you very much for all your work debugging, bisecting, and reporting this! This is incredibly helpful. Original report, including complete dmesg logs for both v6.4 and v6.5-rc2: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com I queued up a revert of 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary") (on my for-linus branch for v6.5). It looks like a NULL pointer dereference; hopefully the fix is obvious and I can drop the revert and replace it with the fix. Bjorn ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 6.5-rc2: system crash on suspend bisected 2023-07-20 20:21 ` Kernel 6.5-rc2: system crash on suspend bisected Bjorn Helgaas @ 2023-07-24 9:27 ` Igor Mammedov 2023-07-27 6:09 ` Michael S. Tsirkin 1 sibling, 0 replies; 11+ messages in thread From: Igor Mammedov @ 2023-07-24 9:27 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Woody Suwalski, bhelgaas, LKML, linux-pci, regressions On Thu, 20 Jul 2023 15:21:10 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > [+cc regressions list] > > On Wed, Jul 19, 2023 at 11:36:51PM -0400, Woody Suwalski wrote: > > Laptop shows a kernel crash trace after a first suspend to ram, on a second > > attempt to suspend it becomes frozen solid. This is 100% repeatable with a > > 6.5-rc2 kernel, not happening with a 6.4 kernel - see the attached dmesg > > output. > > > > I have bisected the kernel uilds and it points to : > > [40613da52b13fb21c5566f10b287e0ca8c12c4e9] PCI: acpiphp: Reassign resources > > on bridge if necessary > > > > Reversing this patch seems to fix the kernel crash problem on my laptop. > > Thank you very much for all your work debugging, bisecting, and > reporting this! This is incredibly helpful. > > Original report, including complete dmesg logs for both v6.4 and > v6.5-rc2: > https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > > I queued up a revert of 40613da52b13 ("PCI: acpiphp: Reassign > resources on bridge if necessary") (on my for-linus branch for v6.5). > > It looks like a NULL pointer dereference; hopefully the fix is obvious > and I can drop the revert and replace it with the fix. it happens here: 2145 void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) 2146 { 2147 struct pci_bus *parent = bridge->subordinate; Let's see if it reproducable on Lenovo laptop and what reading involved code yields. If I can't figure it out anyways, I'll come up with a patch to trace issue. > > Bjorn > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 6.5-rc2: system crash on suspend bisected 2023-07-20 20:21 ` Kernel 6.5-rc2: system crash on suspend bisected Bjorn Helgaas 2023-07-24 9:27 ` Igor Mammedov @ 2023-07-27 6:09 ` Michael S. Tsirkin 2023-07-27 12:07 ` Woody Suwalski 1 sibling, 1 reply; 11+ messages in thread From: Michael S. Tsirkin @ 2023-07-27 6:09 UTC (permalink / raw) To: Bjorn Helgaas Cc: Woody Suwalski, imammedo, bhelgaas, LKML, linux-pci, regressions, Linux regression tracking #adding (Thorsten Leemhuis) On Thu, Jul 20, 2023 at 03:21:10PM -0500, Bjorn Helgaas wrote: > [+cc regressions list] > > On Wed, Jul 19, 2023 at 11:36:51PM -0400, Woody Suwalski wrote: > > Laptop shows a kernel crash trace after a first suspend to ram, on a second > > attempt to suspend it becomes frozen solid. This is 100% repeatable with a > > 6.5-rc2 kernel, not happening with a 6.4 kernel - see the attached dmesg > > output. > > > > I have bisected the kernel uilds and it points to : > > [40613da52b13fb21c5566f10b287e0ca8c12c4e9] PCI: acpiphp: Reassign resources > > on bridge if necessary > > > > Reversing this patch seems to fix the kernel crash problem on my laptop. > > Thank you very much for all your work debugging, bisecting, and > reporting this! This is incredibly helpful. > > Original report, including complete dmesg logs for both v6.4 and > v6.5-rc2: > https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com > > I queued up a revert of 40613da52b13 ("PCI: acpiphp: Reassign > resources on bridge if necessary") (on my for-linus branch for v6.5). > > It looks like a NULL pointer dereference; hopefully the fix is obvious > and I can drop the revert and replace it with the fix. > > Bjorn Patch on list now: https://lore.kernel.org/all/20230726123518.2361181-1-imammedo%40redhat.com -- MST ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 6.5-rc2: system crash on suspend bisected 2023-07-27 6:09 ` Michael S. Tsirkin @ 2023-07-27 12:07 ` Woody Suwalski 0 siblings, 0 replies; 11+ messages in thread From: Woody Suwalski @ 2023-07-27 12:07 UTC (permalink / raw) To: Michael S. Tsirkin, Bjorn Helgaas Cc: imammedo, bhelgaas, LKML, linux-pci, regressions, Linux regression tracking #adding (Thorsten Leemhuis), Woody Suwalski Michael S. Tsirkin wrote: > On Thu, Jul 20, 2023 at 03:21:10PM -0500, Bjorn Helgaas wrote: >> [+cc regressions list] >> >> On Wed, Jul 19, 2023 at 11:36:51PM -0400, Woody Suwalski wrote: >>> Laptop shows a kernel crash trace after a first suspend to ram, on a second >>> attempt to suspend it becomes frozen solid. This is 100% repeatable with a >>> 6.5-rc2 kernel, not happening with a 6.4 kernel - see the attached dmesg >>> output. >>> >>> I have bisected the kernel uilds and it points to : >>> [40613da52b13fb21c5566f10b287e0ca8c12c4e9] PCI: acpiphp: Reassign resources >>> on bridge if necessary >>> >>> Reversing this patch seems to fix the kernel crash problem on my laptop. >> Thank you very much for all your work debugging, bisecting, and >> reporting this! This is incredibly helpful. >> >> Original report, including complete dmesg logs for both v6.4 and >> v6.5-rc2: >> https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com >> >> I queued up a revert of 40613da52b13 ("PCI: acpiphp: Reassign >> resources on bridge if necessary") (on my for-linus branch for v6.5). >> >> It looks like a NULL pointer dereference; hopefully the fix is obvious >> and I can drop the revert and replace it with the fix. >> >> Bjorn > Patch on list now: > https://lore.kernel.org/all/20230726123518.2361181-1-imammedo%40redhat.com Confirm works OK. -- Tested-by: Woody Suwalski <terraluna977@gmail.com> -- Thanks, Woody ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 6.5-rc2: system crash on suspend bisected [not found] <11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com> 2023-07-20 20:21 ` Kernel 6.5-rc2: system crash on suspend bisected Bjorn Helgaas @ 2023-07-23 9:24 ` Linux regression tracking #adding (Thorsten Leemhuis) 2023-07-24 13:59 ` [PATCH] hack to debug acpiphp crash Igor Mammedov 2 siblings, 0 replies; 11+ messages in thread From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-07-23 9:24 UTC (permalink / raw) To: Woody Suwalski, imammedo, bhelgaas, LKML, linux-pci Cc: Linux kernel regressions list [CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html] [TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.] On 20.07.23 05:36, Woody Suwalski wrote: > > Laptop shows a kernel crash trace after a first suspend to ram, on a > second attempt to suspend it becomes frozen solid. This is 100% > repeatable with a 6.5-rc2 kernel, not happening with a 6.4 kernel - see > the attached dmesg output. > > I have bisected the kernel uilds and it points to : > [40613da52b13fb21c5566f10b287e0ca8c12c4e9] PCI: acpiphp: Reassign > resources on bridge if necessary > > Reversing this patch seems to fix the kernel crash problem on my laptop. > > Happy to test some proper fix patches... > Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced 40613da52b13fb21c5566f10b287e0ca8c12c #regzbot title PCI: acpiphp: Oops on first attempt to suspend, freeze on second #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] hack to debug acpiphp crash [not found] <11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com> 2023-07-20 20:21 ` Kernel 6.5-rc2: system crash on suspend bisected Bjorn Helgaas 2023-07-23 9:24 ` Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-07-24 13:59 ` Igor Mammedov 2023-07-25 1:52 ` Woody Suwalski 2 siblings, 1 reply; 11+ messages in thread From: Igor Mammedov @ 2023-07-24 13:59 UTC (permalink / raw) To: linux-kernel; +Cc: terraluna977, bhelgaas, linux-pci, imammedo, mst Woody thanks for testing, can you try following patch which will try to workaround NULL bus->self if it's a really cuplrit and print an extra debug information. Add following to kernel command line(make sure that CONFIG_DYNAMIC_DEBUG is enabled): dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p" ignore_loglevel What I find odd in you logs is that enable_slot() is called while native PCIe should be used. Additional info might help to understand what's going on: 1: 'lspci' output 2: DSDT and all SSDT ACPI tables (you can use 'acpidump -b' to get them). Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- drivers/pci/hotplug/acpiphp_glue.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index 328d1e416014..9ce3fd9d72a9 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -485,7 +485,10 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) struct pci_bus *bus = slot->bus; struct acpiphp_func *func; +WARN(1, "enable_slot"); +pci_info(bus, "enable_slot bus\n"); if (bridge && bus->self && hotplug_is_native(bus->self)) { +pr_err("enable_slot: bridge branch\n"); /* * If native hotplug is used, it will take care of hotplug * slot management and resource allocation for hotplug @@ -498,8 +501,10 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) acpiphp_native_scan_bridge(dev); } } else { + LIST_HEAD(add_list); int max, pass; +pr_err("enable_slot: acpiphp_rescan_slot branch\n"); acpiphp_rescan_slot(slot); max = acpiphp_max_busnr(bus); for (pass = 0; pass < 2; pass++) { @@ -508,13 +513,23 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) continue; max = pci_scan_bridge(bus, dev, max, pass); +pci_info(dev, "enable_slot: pci_scan_bridge: max: %d\n", max); if (pass && dev->subordinate) { check_hotplug_bridge(slot, dev); pcibios_resource_survey_bus(dev->subordinate); + if (bus->self) + __pci_bus_size_bridges(dev->subordinate, + &add_list); } } } - pci_assign_unassigned_bridge_resources(bus->self); + if (bus->self) { +pci_info(bus->self, "enable_slot: pci_assign_unassigned_bridge_resources:\n"); + pci_assign_unassigned_bridge_resources(bus->self); + } else { +pci_info(bus, "enable_slot: __pci_bus_assign_resources:\n"); + __pci_bus_assign_resources(bus, &add_list, NULL); + } } acpiphp_sanitize_bus(bus); @@ -541,6 +556,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) } pci_dev_put(dev); } +pr_err("enable_slot: end\n"); } /** -- 2.39.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] hack to debug acpiphp crash 2023-07-24 13:59 ` [PATCH] hack to debug acpiphp crash Igor Mammedov @ 2023-07-25 1:52 ` Woody Suwalski 2023-07-25 8:06 ` Igor Mammedov 0 siblings, 1 reply; 11+ messages in thread From: Woody Suwalski @ 2023-07-25 1:52 UTC (permalink / raw) To: Igor Mammedov, linux-kernel; +Cc: bhelgaas, linux-pci, mst [-- Attachment #1: Type: text/plain, Size: 3317 bytes --] Igor Mammedov wrote: > Woody thanks for testing, > > can you try following patch which will try to workaround NULL bus->self if it's > a really cuplrit and print an extra debug information. > Add following to kernel command line(make sure that CONFIG_DYNAMIC_DEBUG is enabled): > > dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p" ignore_loglevel > > What I find odd in you logs is that enable_slot() is called while native PCIe > should be used. Additional info might help to understand what's going on: > 1: 'lspci' output > 2: DSDT and all SSDT ACPI tables (you can use 'acpidump -b' to get them). > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > drivers/pci/hotplug/acpiphp_glue.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index 328d1e416014..9ce3fd9d72a9 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -485,7 +485,10 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > struct pci_bus *bus = slot->bus; > struct acpiphp_func *func; > > +WARN(1, "enable_slot"); > +pci_info(bus, "enable_slot bus\n"); > if (bridge && bus->self && hotplug_is_native(bus->self)) { > +pr_err("enable_slot: bridge branch\n"); > /* > * If native hotplug is used, it will take care of hotplug > * slot management and resource allocation for hotplug > @@ -498,8 +501,10 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > acpiphp_native_scan_bridge(dev); > } > } else { > + LIST_HEAD(add_list); > int max, pass; > > +pr_err("enable_slot: acpiphp_rescan_slot branch\n"); > acpiphp_rescan_slot(slot); > max = acpiphp_max_busnr(bus); > for (pass = 0; pass < 2; pass++) { > @@ -508,13 +513,23 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > continue; > > max = pci_scan_bridge(bus, dev, max, pass); > +pci_info(dev, "enable_slot: pci_scan_bridge: max: %d\n", max); > if (pass && dev->subordinate) { > check_hotplug_bridge(slot, dev); > pcibios_resource_survey_bus(dev->subordinate); > + if (bus->self) > + __pci_bus_size_bridges(dev->subordinate, > + &add_list); > } > } > } > - pci_assign_unassigned_bridge_resources(bus->self); > + if (bus->self) { > +pci_info(bus->self, "enable_slot: pci_assign_unassigned_bridge_resources:\n"); > + pci_assign_unassigned_bridge_resources(bus->self); > + } else { > +pci_info(bus, "enable_slot: __pci_bus_assign_resources:\n"); > + __pci_bus_assign_resources(bus, &add_list, NULL); > + } > } > > acpiphp_sanitize_bus(bus); > @@ -541,6 +556,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge) > } > pci_dev_put(dev); > } > +pr_err("enable_slot: end\n"); > } > > /** Unfortunately the patch above does not seem to prevent the kernel crash. Here comes the requested diagnostic info: dmesg's before and after, choice of lspci's and acpi tables. Hope that will help :-) Thanks, Woody [-- Attachment #2: pcidebug.tar.xz --] [-- Type: application/x-xz, Size: 61636 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] hack to debug acpiphp crash 2023-07-25 1:52 ` Woody Suwalski @ 2023-07-25 8:06 ` Igor Mammedov 2023-07-25 8:42 ` Igor Mammedov 0 siblings, 1 reply; 11+ messages in thread From: Igor Mammedov @ 2023-07-25 8:06 UTC (permalink / raw) To: Woody Suwalski; +Cc: linux-kernel, bhelgaas, linux-pci, mst On Mon, 24 Jul 2023 21:52:34 -0400 Woody Suwalski <terraluna977@gmail.com> wrote: > Igor Mammedov wrote: > > Woody thanks for testing, > > > > can you try following patch which will try to workaround NULL bus->self if it's > > a really cuplrit and print an extra debug information. > > Add following to kernel command line(make sure that CONFIG_DYNAMIC_DEBUG is enabled): > > > > dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p" ignore_loglevel > > > > What I find odd in you logs is that enable_slot() is called while native PCIe > > should be used. Additional info might help to understand what's going on: > > 1: 'lspci' output > > 2: DSDT and all SSDT ACPI tables (you can use 'acpidump -b' to get them). > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> [...] > > > > /** > Unfortunately the patch above does not seem to prevent the kernel crash. > Here comes the requested diagnostic info: dmesg's before and after, > choice of lspci's and acpi tables. Hope that will help :-) Looking at dmesg-6.5-debug_after.txt there aren't "BUG: kernel NULL pointer dereference" line anymore The call traces you see are induced by WARN(), which purpose is to show call path that calls enable_slot(). Let me split potential fix from debug and repost that as separate patches for you to try. I'd like to see debug output without 'fix' to track down which root port/device causes NULL pointer dereference. And hopefully in a few roundtrips figure out why old code doesn't crash. PS: What happens is that on resume firmware (likely EC), issues ACPI bus check on root ports which (bus check) is wired to acpiphp module (though pciehp module was initialized at boot to manage root ports), it's likely firmware bug. I'd guess the intent behind this was to check if PCIe devices were hotplugged while laptop has been asleep, and for some reason they didn't use native PCIe hotplug to handle that. However looking at laptop specs you can't hotplug PCIe devices via external ports. Given how old laptop is it isn't going to be fixed, so we would need a workaround or fixup DSDT to skip buscheck. The options I see is to keep old kernel as for such case, or bail out early from bus check/enable_slot since root port is managed by pciehp module (and let it handle hotplug). > Thanks, Woody > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] hack to debug acpiphp crash 2023-07-25 8:06 ` Igor Mammedov @ 2023-07-25 8:42 ` Igor Mammedov 2023-07-25 11:45 ` Woody Suwalski 0 siblings, 1 reply; 11+ messages in thread From: Igor Mammedov @ 2023-07-25 8:42 UTC (permalink / raw) To: Woody Suwalski; +Cc: linux-kernel, bhelgaas, linux-pci, mst On Tue, 25 Jul 2023 10:06:44 +0200 Igor Mammedov <imammedo@redhat.com> wrote: > PS: > What happens is that on resume firmware (likely EC), > issues ACPI bus check on root ports which (bus check) is > wired to acpiphp module (though pciehp module was initialized > at boot to manage root ports), it's likely firmware bug. > > I'd guess the intent behind this was to check if PCIe devices > were hotplugged while laptop has been asleep, and for > some reason they didn't use native PCIe hotplug to handle that. > However looking at laptop specs you can't hotplug PCIe > devices via external ports. Given how old laptop is > it isn't going to be fixed, so we would need a workaround > or fixup DSDT to skip buscheck. > > The options I see is to keep old kernel as for such case, > or bail out early from bus check/enable_slot since root port > is managed by pciehp module (and let it handle hotplug). scratch all of above out (it's wrong). Looking at DSDT firmware sends Notify(rpxx, 2 /* Wake */) event. Which according to spec needs to be handed down to the native device driver. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] hack to debug acpiphp crash 2023-07-25 8:42 ` Igor Mammedov @ 2023-07-25 11:45 ` Woody Suwalski 2023-07-25 11:58 ` Igor Mammedov 0 siblings, 1 reply; 11+ messages in thread From: Woody Suwalski @ 2023-07-25 11:45 UTC (permalink / raw) To: Igor Mammedov; +Cc: linux-kernel, bhelgaas, linux-pci, mst Igor Mammedov wrote: > On Tue, 25 Jul 2023 10:06:44 +0200 > Igor Mammedov <imammedo@redhat.com> wrote: > >> PS: >> What happens is that on resume firmware (likely EC), >> issues ACPI bus check on root ports which (bus check) is >> wired to acpiphp module (though pciehp module was initialized >> at boot to manage root ports), it's likely firmware bug. >> >> I'd guess the intent behind this was to check if PCIe devices >> were hotplugged while laptop has been asleep, and for >> some reason they didn't use native PCIe hotplug to handle that. >> However looking at laptop specs you can't hotplug PCIe >> devices via external ports. Given how old laptop is >> it isn't going to be fixed, so we would need a workaround >> or fixup DSDT to skip buscheck. >> >> The options I see is to keep old kernel as for such case, >> or bail out early from bus check/enable_slot since root port >> is managed by pciehp module (and let it handle hotplug). > scratch all of above out (it's wrong). Looking at DSDT > firmware sends Notify(rpxx, 2 /* Wake */) event. Which > according to spec needs to be handed down to the native > device driver. > > I agree that this laptop is a tricky one. I had to adjust my kernel config NOHZ just to make it suspend to ram, otherwise it was waking back right after going to sleep (and the same nohz kernel worked on all my other machines)... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] hack to debug acpiphp crash 2023-07-25 11:45 ` Woody Suwalski @ 2023-07-25 11:58 ` Igor Mammedov 0 siblings, 0 replies; 11+ messages in thread From: Igor Mammedov @ 2023-07-25 11:58 UTC (permalink / raw) To: Woody Suwalski; +Cc: linux-kernel, bhelgaas, linux-pci, mst On Tue, 25 Jul 2023 07:45:08 -0400 Woody Suwalski <terraluna977@gmail.com> wrote: > Igor Mammedov wrote: > > On Tue, 25 Jul 2023 10:06:44 +0200 > > Igor Mammedov <imammedo@redhat.com> wrote: > > > >> PS: > >> What happens is that on resume firmware (likely EC), > >> issues ACPI bus check on root ports which (bus check) is > >> wired to acpiphp module (though pciehp module was initialized > >> at boot to manage root ports), it's likely firmware bug. > >> > >> I'd guess the intent behind this was to check if PCIe devices > >> were hotplugged while laptop has been asleep, and for > >> some reason they didn't use native PCIe hotplug to handle that. > >> However looking at laptop specs you can't hotplug PCIe > >> devices via external ports. Given how old laptop is > >> it isn't going to be fixed, so we would need a workaround > >> or fixup DSDT to skip buscheck. > >> > >> The options I see is to keep old kernel as for such case, > >> or bail out early from bus check/enable_slot since root port > >> is managed by pciehp module (and let it handle hotplug). > > scratch all of above out (it's wrong). Looking at DSDT > > firmware sends Notify(rpxx, 2 /* Wake */) event. Which > > according to spec needs to be handed down to the native > > device driver. > > > > > I agree that this laptop is a tricky one. I had to adjust my kernel > config NOHZ just to make it suspend to ram, otherwise it was waking back > right after going to sleep (and the same nohz kernel worked on all my > other machines)... Blaming laptop is likely red herring in this case after some more reading. Anyways I've just sent a new round of patches to test. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-07-27 12:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <11fc981c-af49-ce64-6b43-3e282728bd1a@gmail.com>
2023-07-20 20:21 ` Kernel 6.5-rc2: system crash on suspend bisected Bjorn Helgaas
2023-07-24 9:27 ` Igor Mammedov
2023-07-27 6:09 ` Michael S. Tsirkin
2023-07-27 12:07 ` Woody Suwalski
2023-07-23 9:24 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-07-24 13:59 ` [PATCH] hack to debug acpiphp crash Igor Mammedov
2023-07-25 1:52 ` Woody Suwalski
2023-07-25 8:06 ` Igor Mammedov
2023-07-25 8:42 ` Igor Mammedov
2023-07-25 11:45 ` Woody Suwalski
2023-07-25 11:58 ` Igor Mammedov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).