linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume
@ 2023-07-25 11:39 Igor Mammedov
  2023-07-25 11:39 ` [RFC 1/3] acpiphp: extra debug hack Igor Mammedov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Igor Mammedov @ 2023-07-25 11:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: terraluna977, bhelgaas, linux-pci, imammedo, mst


Changelog:
  * split out debug patch into a separate one with extra printk added
  * fixed inverte bus->self check (probably a reason why it didn't work before)


1/3 debug patch
2/3 offending patch
3/3 potential fix
  
I added more files to trace, add following to kernel CLI
   dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel

should be applied on top of 
   e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present

apply a patch one by one and run testcase + capture dmesg after each patch
one shpould endup with 3 dmesg to ananlyse
 1st - old behaviour - no crash
 2nd - crash
 3rd - no crash hopefully

Igor Mammedov (3):
  acpiphp: extra debug hack
  PCI: acpiphp: Reassign resources on bridge if necessary
  acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge

 drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/3] acpiphp: extra debug hack
  2023-07-25 11:39 [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Igor Mammedov
@ 2023-07-25 11:39 ` Igor Mammedov
  2023-07-25 15:12   ` [RFC v2 " Igor Mammedov
  2023-07-25 11:39 ` [RFC 2/3] PCI: acpiphp: Reassign resources on bridge if necessary Igor Mammedov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Igor Mammedov @ 2023-07-25 11:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: terraluna977, bhelgaas, linux-pci, imammedo, mst

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 drivers/pci/hotplug/acpiphp_glue.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 5b1f271c6034..af1c73f2bee6 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -485,6 +485,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
 	struct pci_bus *bus = slot->bus;
 	struct acpiphp_func *func;
 
+pci_info(bus, "enable_slot bus: bridge: %d, bus->self: %p\n", bridge, bus->self);
 	if (bridge && bus->self && hotplug_is_native(bus->self)) {
 		/*
 		 * If native hotplug is used, it will take care of hotplug
@@ -544,6 +545,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
 		}
 		pci_dev_put(dev);
 	}
+pr_err("enable_slot: end\n");
 }
 
 /**
@@ -702,16 +704,20 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
 	if (bridge->is_going_away)
 		return;
 
-	if (bridge->pci_dev)
+	if (bridge->pci_dev) {
 		pm_runtime_get_sync(&bridge->pci_dev->dev);
+pci_info(bridge->pci_dev, "acpiphp_check_bridge\n");
+        }
 
 	list_for_each_entry(slot, &bridge->slots, node) {
 		struct pci_bus *bus = slot->bus;
 		struct pci_dev *dev, *tmp;
 
 		if (slot_no_hotplug(slot)) {
+pr_err("acpiphp_check_bridge: slot_no_hotplug\n");
 			; /* do nothing */
 		} else if (device_status_valid(get_slot_status(slot))) {
+pr_err("acpiphp_check_bridge: device_status_valid\n");
 			/* remove stale devices if any */
 			list_for_each_entry_safe_reverse(dev, tmp,
 							 &bus->devices, bus_list)
@@ -792,6 +798,7 @@ static void hotplug_event(u32 type, struct acpiphp_context *context)
 	if (bridge)
 		get_bridge(bridge);
 
+        acpi_handle_debug(handle, "hotplug_event: Slot: %s\n", slot_name(slot->slot)); 
 	acpi_unlock_hp_context();
 
 	pci_lock_rescan_remove();
@@ -799,7 +806,7 @@ static void hotplug_event(u32 type, struct acpiphp_context *context)
 	switch (type) {
 	case ACPI_NOTIFY_BUS_CHECK:
 		/* bus re-enumerate */
-		acpi_handle_debug(handle, "Bus check in %s()\n", __func__);
+		acpi_handle_debug(handle, "Bus check in %s(): bridge: %p\n", __func__, bridge);
 		if (bridge)
 			acpiphp_check_bridge(bridge);
 		else if (!(slot->flags & SLOT_IS_GOING_AWAY))
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 2/3] PCI: acpiphp: Reassign resources on bridge if necessary
  2023-07-25 11:39 [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Igor Mammedov
  2023-07-25 11:39 ` [RFC 1/3] acpiphp: extra debug hack Igor Mammedov
@ 2023-07-25 11:39 ` Igor Mammedov
  2023-07-25 11:39 ` [RFC 3/3] acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge Igor Mammedov
  2023-07-25 13:51 ` [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Woody Suwalski
  3 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2023-07-25 11:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: terraluna977, bhelgaas, linux-pci, imammedo, mst,
	Rafael J . Wysocki, stable

When using ACPI PCI hotplug, hotplugging a device with large BARs may fail
if bridge windows programmed by firmware are not large enough.

Reproducer:
  $ qemu-kvm -monitor stdio -M q35  -m 4G \
      -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=on \
      -device id=rp1,pcie-root-port,bus=pcie.0,chassis=4 \
      disk_image

 wait till linux guest boots, then hotplug device:
   (qemu) device_add qxl,bus=rp1

 hotplug on guest side fails with:
   pci 0000:01:00.0: [1b36:0100] type 00 class 0x038000
   pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x03ffffff]
   pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x03ffffff]
   pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00001fff]
   pci 0000:01:00.0: reg 0x1c: [io  0x0000-0x001f]
   pci 0000:01:00.0: BAR 0: no space for [mem size 0x04000000]
   pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x04000000]
   pci 0000:01:00.0: BAR 1: no space for [mem size 0x04000000]
   pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x04000000]
   pci 0000:01:00.0: BAR 2: assigned [mem 0xfe800000-0xfe801fff]
   pci 0000:01:00.0: BAR 3: assigned [io  0x1000-0x101f]
   qxl 0000:01:00.0: enabling device (0000 -> 0003)
   Unable to create vram_mapping
   qxl: probe of 0000:01:00.0 failed with error -12

However when using native PCIe hotplug
  '-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off'
it works fine, since kernel attempts to reassign unused resources.

Use the same machinery as native PCIe hotplug to (re)assign resources.

Link: https://lore.kernel.org/r/20230424191557.2464760-1-imammedo@redhat.com
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: stable@vger.kernel.org
---
 drivers/pci/hotplug/acpiphp_glue.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index af1c73f2bee6..c0ffb1389fda 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -499,7 +499,6 @@ pci_info(bus, "enable_slot bus: bridge: %d, bus->self: %p\n", bridge, bus->self)
 				acpiphp_native_scan_bridge(dev);
 		}
 	} else {
-		LIST_HEAD(add_list);
 		int max, pass;
 
 		acpiphp_rescan_slot(slot);
@@ -513,12 +512,10 @@ pci_info(bus, "enable_slot bus: bridge: %d, bus->self: %p\n", bridge, bus->self)
 				if (pass && dev->subordinate) {
 					check_hotplug_bridge(slot, dev);
 					pcibios_resource_survey_bus(dev->subordinate);
-					__pci_bus_size_bridges(dev->subordinate,
-							       &add_list);
 				}
 			}
 		}
-		__pci_bus_assign_resources(bus, &add_list, NULL);
+		pci_assign_unassigned_bridge_resources(bus->self);
 	}
 
 	acpiphp_sanitize_bus(bus);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC 3/3] acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge
  2023-07-25 11:39 [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Igor Mammedov
  2023-07-25 11:39 ` [RFC 1/3] acpiphp: extra debug hack Igor Mammedov
  2023-07-25 11:39 ` [RFC 2/3] PCI: acpiphp: Reassign resources on bridge if necessary Igor Mammedov
@ 2023-07-25 11:39 ` Igor Mammedov
  2023-07-25 13:51 ` [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Woody Suwalski
  3 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2023-07-25 11:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: terraluna977, bhelgaas, linux-pci, imammedo, mst

Commit [1] switched hotplug to pci_assign_unassigned_bridge_resources()
which requires bridge being available, however in S3 suspend/resume
cycle  acpipcihp might receive device check event from firmware and
in case bus->self == NULL, it would make kernel crash with NULL pointer
dereference.
The issue was triggered on  Dell Inspiron 7352/0W6WV0 laptop with
following sequence:
   1. suspend to RAM
   2. wake up
   3. suspend to RAM. which immediately wakes up and following
      backtrace is observed:

[  612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018
[...]
[  612.277735] Call Trace:
[  612.277739]  <TASK>
[  612.277741]  ? __die+0x1a/0x60
[  612.277749]  ? page_fault_oops+0x158/0x430
[  612.277755]  ? prb_read_valid+0x12/0x20
[  612.277759]  ? console_unlock+0x4d/0x100
[  612.277765]  ? __irq_work_queue_local+0x27/0x60
[  612.277771]  ? irq_work_queue+0x2b/0x50
[  612.277776]  ? exc_page_fault+0x357/0x600
[  612.277781]  ? dev_printk_emit+0x7e/0xa0
[  612.277786]  ? asm_exc_page_fault+0x22/0x30
[  612.277792]  ? __pfx_pci_conf1_read+0x10/0x10
[  612.277798]  ? pci_assign_unassigned_bridge_resources+0x1f/0x260
[  612.277804]  ? pcibios_allocate_dev_resources+0x3c/0x2a0
[  612.277809]  enable_slot+0x21f/0x3e0
[  612.277816]  acpiphp_hotplug_notify+0x13d/0x260
[  612.277822]  ? __pfx_acpiphp_hotplug_notify+0x10/0x10
[  612.277827]  acpi_device_hotplug+0xbc/0x540
[  612.277834]  acpi_hotplug_work_fn+0x15/0x20
[  612.277839]  process_one_work+0x1f7/0x370
[  612.277845]  worker_thread+0x45/0x3b0
[  612.277850]  ? __pfx_worker_thread+0x10/0x10
[  612.277854]  kthread+0xdc/0x110
[  612.277860]  ? __pfx_kthread+0x10/0x10
[  612.277866]  ret_from_fork+0x28/0x40
[  612.277871]  ? __pfx_kthread+0x10/0x10
[  612.277876]  ret_from_fork_asm+0x1b/0x30

Fix it by reverting to __pci_bus_assign_resources() usage instead of
pci_assign_unassigned_bridge_resources() when bus doesn't have bridge
assigned to it.

1) 40613da52b13fb21 (PCI: acpiphp: Reassign resources on bridge if necessary)

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v2: fix inverted bus->self condition
---
 drivers/pci/hotplug/acpiphp_glue.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index c0ffb1389fda..816555ab9171 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -499,6 +499,7 @@ pci_info(bus, "enable_slot bus: bridge: %d, bus->self: %p\n", bridge, bus->self)
 				acpiphp_native_scan_bridge(dev);
 		}
 	} else {
+		LIST_HEAD(add_list);
 		int max, pass;
 
 		acpiphp_rescan_slot(slot);
@@ -512,10 +513,18 @@ pci_info(bus, "enable_slot bus: bridge: %d, bus->self: %p\n", bridge, bus->self)
 				if (pass && dev->subordinate) {
 					check_hotplug_bridge(slot, dev);
 					pcibios_resource_survey_bus(dev->subordinate);
+					if (!bus->self)
+						__pci_bus_size_bridges(dev->subordinate, &add_list);
 				}
 			}
 		}
-		pci_assign_unassigned_bridge_resources(bus->self);
+		if (bus->self) {
+pci_info(bus->self, "enable_slot: pci_assign_unassigned_bridge_resources:\n");
+			pci_assign_unassigned_bridge_resources(bus->self);
+		} else {
+pci_info(bus, "enable_slot: __pci_bus_assign_resources:\n");
+			__pci_bus_assign_resources(bus, &add_list, NULL);
+                }
 	}
 
 	acpiphp_sanitize_bus(bus);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume
  2023-07-25 11:39 [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Igor Mammedov
                   ` (2 preceding siblings ...)
  2023-07-25 11:39 ` [RFC 3/3] acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge Igor Mammedov
@ 2023-07-25 13:51 ` Woody Suwalski
  2023-07-25 15:19   ` Igor Mammedov
  2023-07-25 15:41   ` Woody Suwalski
  3 siblings, 2 replies; 10+ messages in thread
From: Woody Suwalski @ 2023-07-25 13:51 UTC (permalink / raw)
  To: Igor Mammedov, linux-kernel; +Cc: bhelgaas, linux-pci, mst

[-- Attachment #1: Type: text/plain, Size: 1703 bytes --]

Igor Mammedov wrote:
> Changelog:
>    * split out debug patch into a separate one with extra printk added
>    * fixed inverte bus->self check (probably a reason why it didn't work before)
>
>
> 1/3 debug patch
> 2/3 offending patch
> 3/3 potential fix
>    
> I added more files to trace, add following to kernel CLI
>     dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel
>
> should be applied on top of
>     e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present
>
> apply a patch one by one and run testcase + capture dmesg after each patch
> one shpould endup with 3 dmesg to ananlyse
>   1st - old behaviour - no crash
>   2nd - crash
>   3rd - no crash hopefully
>
> Igor Mammedov (3):
>    acpiphp: extra debug hack
>    PCI: acpiphp: Reassign resources on bridge if necessary
>    acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge
>
>   drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
>
Actually applying patch1 is already creating the crash (why???), hence I 
have added also dmesg-6.5-0.txt which shows a working condition based on 
git e8afd0d9fccc level (acpiphp_glue in kernel 6.4)

Patch3 did not fix the issue, it seems that the culprit is somewhere 
else triggered by  "benign" patch1 :-(

Also note about the trigger description in patch3: the dmesg trace on 
Inspiron laptop is collected after the first wake from suspend to ram. 
The consecutive  attempt to sleep results in a frozen system.

Thanks, Woody


[-- Attachment #2: rfc.tar.xz --]
[-- Type: application/x-xz, Size: 35540 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC v2 1/3] acpiphp: extra debug hack
  2023-07-25 11:39 ` [RFC 1/3] acpiphp: extra debug hack Igor Mammedov
@ 2023-07-25 15:12   ` Igor Mammedov
  0 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2023-07-25 15:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: terraluna977, bhelgaas, linux-pci, imammedo, mst

v3:
drop recent debug line that probably causing crash

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 drivers/pci/hotplug/acpiphp_glue.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 5b1f271c6034..ea8ed608f2a7 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -485,6 +485,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
 	struct pci_bus *bus = slot->bus;
 	struct acpiphp_func *func;
 
+pci_info(bus, "enable_slot bus: bridge: %d, bus->self: %p\n", bridge, bus->self);
 	if (bridge && bus->self && hotplug_is_native(bus->self)) {
 		/*
 		 * If native hotplug is used, it will take care of hotplug
@@ -544,6 +545,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
 		}
 		pci_dev_put(dev);
 	}
+pr_err("enable_slot: end\n");
 }
 
 /**
@@ -702,16 +704,20 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
 	if (bridge->is_going_away)
 		return;
 
-	if (bridge->pci_dev)
+	if (bridge->pci_dev) {
 		pm_runtime_get_sync(&bridge->pci_dev->dev);
+pci_info(bridge->pci_dev, "acpiphp_check_bridge\n");
+        }
 
 	list_for_each_entry(slot, &bridge->slots, node) {
 		struct pci_bus *bus = slot->bus;
 		struct pci_dev *dev, *tmp;
 
 		if (slot_no_hotplug(slot)) {
+pr_err("acpiphp_check_bridge: slot_no_hotplug\n");
 			; /* do nothing */
 		} else if (device_status_valid(get_slot_status(slot))) {
+pr_err("acpiphp_check_bridge: device_status_valid\n");
 			/* remove stale devices if any */
 			list_for_each_entry_safe_reverse(dev, tmp,
 							 &bus->devices, bus_list)
@@ -799,7 +805,7 @@ static void hotplug_event(u32 type, struct acpiphp_context *context)
 	switch (type) {
 	case ACPI_NOTIFY_BUS_CHECK:
 		/* bus re-enumerate */
-		acpi_handle_debug(handle, "Bus check in %s()\n", __func__);
+		acpi_handle_debug(handle, "Bus check in %s(): bridge: %p\n", __func__, bridge);
 		if (bridge)
 			acpiphp_check_bridge(bridge);
 		else if (!(slot->flags & SLOT_IS_GOING_AWAY))
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume
  2023-07-25 13:51 ` [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Woody Suwalski
@ 2023-07-25 15:19   ` Igor Mammedov
  2023-07-25 15:59     ` Woody Suwalski
  2023-07-25 15:41   ` Woody Suwalski
  1 sibling, 1 reply; 10+ messages in thread
From: Igor Mammedov @ 2023-07-25 15:19 UTC (permalink / raw)
  To: Woody Suwalski; +Cc: linux-kernel, bhelgaas, linux-pci, mst

On Tue, 25 Jul 2023 09:51:53 -0400
Woody Suwalski <terraluna977@gmail.com> wrote:

> Igor Mammedov wrote:
> > Changelog:
> >    * split out debug patch into a separate one with extra printk added
> >    * fixed inverte bus->self check (probably a reason why it didn't work before)
> >
> >
> > 1/3 debug patch
> > 2/3 offending patch
> > 3/3 potential fix
> >    
> > I added more files to trace, add following to kernel CLI
> >     dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel
> >
> > should be applied on top of
> >     e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present
> >
> > apply a patch one by one and run testcase + capture dmesg after each patch
> > one shpould endup with 3 dmesg to ananlyse
> >   1st - old behaviour - no crash
> >   2nd - crash
> >   3rd - no crash hopefully
> >
> > Igor Mammedov (3):
> >    acpiphp: extra debug hack
> >    PCI: acpiphp: Reassign resources on bridge if necessary
> >    acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge
> >
> >   drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++-----
> >   1 file changed, 18 insertions(+), 5 deletions(-)
> >  
> Actually applying patch1 is already creating the crash (why???),
probably it's due to an extra debug line, I've added.
I dropped suspicions one, can you try again and see if it works.

> hence I 
> have added also dmesg-6.5-0.txt which shows a working condition based on 
> git e8afd0d9fccc level (acpiphp_glue in kernel 6.4)
> 
> Patch3 did not fix the issue, it seems that the culprit is somewhere 
> else triggered by  "benign" patch1 :-(
> 
> Also note about the trigger description in patch3: the dmesg trace on 
> Inspiron laptop is collected after the first wake from suspend to ram. 
> The consecutive  attempt to sleep results in a frozen system.

Thanks for clarification, I'll correct commit message once culprit
is found.

> 
> Thanks, Woody
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume
  2023-07-25 13:51 ` [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Woody Suwalski
  2023-07-25 15:19   ` Igor Mammedov
@ 2023-07-25 15:41   ` Woody Suwalski
  1 sibling, 0 replies; 10+ messages in thread
From: Woody Suwalski @ 2023-07-25 15:41 UTC (permalink / raw)
  To: Igor Mammedov, linux-kernel; +Cc: bhelgaas, linux-pci, mst, Woody Suwalski

Woody Suwalski wrote:
> Igor Mammedov wrote:
>> Changelog:
>>    * split out debug patch into a separate one with extra printk added
>>    * fixed inverte bus->self check (probably a reason why it didn't 
>> work before)
>>
>>
>> 1/3 debug patch
>> 2/3 offending patch
>> 3/3 potential fix
>>    I added more files to trace, add following to kernel CLI
>>     dyndbg="file drivers/pci/access.c +p; file 
>> drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; 
>> file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file 
>> drivers/acpi/bus.c +p" ignore_loglevel
>>
>> should be applied on top of
>>     e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not 
>> present
>>
>> apply a patch one by one and run testcase + capture dmesg after each 
>> patch
>> one shpould endup with 3 dmesg to ananlyse
>>   1st - old behaviour - no crash
>>   2nd - crash
>>   3rd - no crash hopefully
>>
>> Igor Mammedov (3):
>>    acpiphp: extra debug hack
>>    PCI: acpiphp: Reassign resources on bridge if necessary
>>    acpipcihp: use __pci_bus_assign_resources() if bus doesn't have 
>> bridge
>>
>>   drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++-----
>>   1 file changed, 18 insertions(+), 5 deletions(-)
>>
> Actually applying patch1 is already creating the crash (why???), hence 
> I have added also dmesg-6.5-0.txt which shows a working condition 
> based on git e8afd0d9fccc level (acpiphp_glue in kernel 6.4)
>
> Patch3 did not fix the issue, it seems that the culprit is somewhere 
> else triggered by  "benign" patch1 :-(
>
> Also note about the trigger description in patch3: the dmesg trace on 
> Inspiron laptop is collected after the first wake from suspend to ram. 
> The consecutive  attempt to sleep results in a frozen system.
>
> Thanks, Woody
>
I think that in patch1 there is a problem in your debug statement 
acpi_handle_debug(...slot_name...) - it is masking the "old" issue.
when I commented out that line in hotplug_event(), it has worked ok (as 
was expected). I will redo the testing in ~2 hours...

Woody


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume
  2023-07-25 15:19   ` Igor Mammedov
@ 2023-07-25 15:59     ` Woody Suwalski
  2023-07-26  8:07       ` Igor Mammedov
  0 siblings, 1 reply; 10+ messages in thread
From: Woody Suwalski @ 2023-07-25 15:59 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: linux-kernel, bhelgaas, linux-pci, mst, Woody Suwalski

[-- Attachment #1: Type: text/plain, Size: 2437 bytes --]

Igor Mammedov wrote:
> On Tue, 25 Jul 2023 09:51:53 -0400
> Woody Suwalski <terraluna977@gmail.com> wrote:
>
>> Igor Mammedov wrote:
>>> Changelog:
>>>     * split out debug patch into a separate one with extra printk added
>>>     * fixed inverte bus->self check (probably a reason why it didn't work before)
>>>
>>>
>>> 1/3 debug patch
>>> 2/3 offending patch
>>> 3/3 potential fix
>>>     
>>> I added more files to trace, add following to kernel CLI
>>>      dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel
>>>
>>> should be applied on top of
>>>      e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present
>>>
>>> apply a patch one by one and run testcase + capture dmesg after each patch
>>> one shpould endup with 3 dmesg to ananlyse
>>>    1st - old behaviour - no crash
>>>    2nd - crash
>>>    3rd - no crash hopefully
>>>
>>> Igor Mammedov (3):
>>>     acpiphp: extra debug hack
>>>     PCI: acpiphp: Reassign resources on bridge if necessary
>>>     acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge
>>>
>>>    drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++-----
>>>    1 file changed, 18 insertions(+), 5 deletions(-)
>>>   
>> Actually applying patch1 is already creating the crash (why???),
> probably it's due to an extra debug line, I've added.
> I dropped suspicions one, can you try again and see if it works.
>
>> hence I
>> have added also dmesg-6.5-0.txt which shows a working condition based on
>> git e8afd0d9fccc level (acpiphp_glue in kernel 6.4)
>>
>> Patch3 did not fix the issue, it seems that the culprit is somewhere
>> else triggered by  "benign" patch1 :-(
>>
>> Also note about the trigger description in patch3: the dmesg trace on
>> Inspiron laptop is collected after the first wake from suspend to ram.
>> The consecutive  attempt to sleep results in a frozen system.
> Thanks for clarification, I'll correct commit message once culprit
> is found.
>
Good news. After removing the botched debug statement which was masking 
the original issue, the testing went as you have predicted, and on patch 
3 system suspends to RAM OK.

Here are the requested 3 dmesg outputs, #2 is for the bad run.

I can retest with a final version of the patch once you have it ready...

Thanks, Woody


[-- Attachment #2: rfc1.tar.xz --]
[-- Type: application/x-xz, Size: 32280 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume
  2023-07-25 15:59     ` Woody Suwalski
@ 2023-07-26  8:07       ` Igor Mammedov
  0 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2023-07-26  8:07 UTC (permalink / raw)
  To: Woody Suwalski; +Cc: linux-kernel, bhelgaas, linux-pci, mst

On Tue, 25 Jul 2023 11:59:56 -0400
Woody Suwalski <terraluna977@gmail.com> wrote:

> Igor Mammedov wrote:
> > On Tue, 25 Jul 2023 09:51:53 -0400
> > Woody Suwalski <terraluna977@gmail.com> wrote:
> >  
> >> Igor Mammedov wrote:  
> >>> Changelog:
> >>>     * split out debug patch into a separate one with extra printk added
> >>>     * fixed inverte bus->self check (probably a reason why it didn't work before)
> >>>
> >>>
> >>> 1/3 debug patch
> >>> 2/3 offending patch
> >>> 3/3 potential fix
> >>>     
> >>> I added more files to trace, add following to kernel CLI
> >>>      dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel
> >>>
> >>> should be applied on top of
> >>>      e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present
> >>>
> >>> apply a patch one by one and run testcase + capture dmesg after each patch
> >>> one shpould endup with 3 dmesg to ananlyse
> >>>    1st - old behaviour - no crash
> >>>    2nd - crash
> >>>    3rd - no crash hopefully
> >>>
> >>> Igor Mammedov (3):
> >>>     acpiphp: extra debug hack
> >>>     PCI: acpiphp: Reassign resources on bridge if necessary
> >>>     acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge
> >>>
> >>>    drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++-----
> >>>    1 file changed, 18 insertions(+), 5 deletions(-)
> >>>     
> >> Actually applying patch1 is already creating the crash (why???),  
> > probably it's due to an extra debug line, I've added.
> > I dropped suspicions one, can you try again and see if it works.
> >  
> >> hence I
> >> have added also dmesg-6.5-0.txt which shows a working condition based on
> >> git e8afd0d9fccc level (acpiphp_glue in kernel 6.4)
> >>
> >> Patch3 did not fix the issue, it seems that the culprit is somewhere
> >> else triggered by  "benign" patch1 :-(
> >>
> >> Also note about the trigger description in patch3: the dmesg trace on
> >> Inspiron laptop is collected after the first wake from suspend to ram.
> >> The consecutive  attempt to sleep results in a frozen system.  
> > Thanks for clarification, I'll correct commit message once culprit
> > is found.
> >  
> Good news. After removing the botched debug statement which was masking 
> the original issue, the testing went as you have predicted, and on patch 
> 3 system suspends to RAM OK.
Thanks for confirmation,
I'll post cleaned up 3/3 patch today.

> 
> Here are the requested 3 dmesg outputs, #2 is for the bad run.
> 
> I can retest with a final version of the patch once you have it ready...
> 
> Thanks, Woody
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-07-26  8:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-25 11:39 [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Igor Mammedov
2023-07-25 11:39 ` [RFC 1/3] acpiphp: extra debug hack Igor Mammedov
2023-07-25 15:12   ` [RFC v2 " Igor Mammedov
2023-07-25 11:39 ` [RFC 2/3] PCI: acpiphp: Reassign resources on bridge if necessary Igor Mammedov
2023-07-25 11:39 ` [RFC 3/3] acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge Igor Mammedov
2023-07-25 13:51 ` [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume Woody Suwalski
2023-07-25 15:19   ` Igor Mammedov
2023-07-25 15:59     ` Woody Suwalski
2023-07-26  8:07       ` Igor Mammedov
2023-07-25 15:41   ` Woody Suwalski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).