linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] PCI: Resource fitting algorith fixes
@ 2025-08-22 12:33 Ilpo Järvinen
  2025-08-22 12:33 ` [PATCH v3 1/3] PCI: Relaxed tail alignment should never increase min_align Ilpo Järvinen
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Ilpo Järvinen @ 2025-08-22 12:33 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, D Scott Phillips, Rio Liu,
	Tudor Ambarus, Markus Elfring
  Cc: linux-kernel, Christian König, Ilpo Järvinen

v3 corrects the max() parameters as noticed by Markus.

It would actually be nice if Rio could retest the first patch just
in case. While I expect his problem originated from the size0 block,
there's a small possibility size1 block could have played role in
which case the changed max() could impact the end result.

Once Rio has tested the first patch, these should replace the v2
patches in the pci/resource branch.

v3:
 - Correct max() parameter.
 - Added Rio's full name into tags.
 - Included Closes: tags from the applied v2 changes.

v2:
 - Add fix to resize problem (new patch).

Ilpo Järvinen (3):
  PCI: Relaxed tail alignment should never increase min_align
  PCI: Fix pdev_resources_assignable() disparity
  PCI: Fix failure detection during resource resize

 drivers/pci/setup-bus.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)


base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
-- 
2.39.5


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] PCI: Relaxed tail alignment should never increase min_align
  2025-08-22 12:33 [PATCH v3 0/3] PCI: Resource fitting algorith fixes Ilpo Järvinen
@ 2025-08-22 12:33 ` Ilpo Järvinen
  2025-08-22 12:33 ` [PATCH v3 2/3] PCI: Fix pdev_resources_assignable() disparity Ilpo Järvinen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Ilpo Järvinen @ 2025-08-22 12:33 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, D Scott Phillips, Rio Liu,
	Tudor Ambarus, Markus Elfring, Ilpo Järvinen, linux-kernel
  Cc: Christian König, stable

When using relaxed tail alignment for the bridge window,
pbus_size_mem() also tries to minimize min_align, which can under
certain scenarios end up increasing min_align from that found by
calculate_mem_align().

Ensure min_align is not increased by the relaxed tail alignment.

Eventually, it would be better to add calculate_relaxed_head_align()
similar to calculate_mem_align() which finds out what alignment can be
used for the head without introducing any gaps into the bridge window
to give flexibility on head address too. But that looks relatively
complex algorithm so it requires much more testing than fixing the
immediate problem causing a regression.

Fixes: 67f9085596ee ("PCI: Allow relaxed bridge window tail sizing for optional resources")
Reported-by: Rio Liu <rio@r26.me>
Closes: https://lore.kernel.org/all/o2bL8MtD_40-lf8GlslTw-AZpUPzm8nmfCnJKvS8RQ3NOzOW1uq1dVCEfRpUjJ2i7G2WjfQhk2IWZ7oGp-7G-jXN4qOdtnyOcjRR0PZWK5I=@r26.me/
Tested-by: Rio Liu <rio@r26.me>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org>
---
 drivers/pci/setup-bus.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7853ac6999e2..527f0479e983 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1169,6 +1169,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	resource_size_t relaxed_align;
 
 	if (!b_res)
 		return -ENOSPC;
@@ -1246,8 +1247,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	if (bus->self && size0 &&
 	    !pbus_upstream_space_available(bus, mask | IORESOURCE_PREFETCH, type,
 					   size0, min_align)) {
-		min_align = 1ULL << (max_order + __ffs(SZ_1M));
-		min_align = max(min_align, win_align);
+		relaxed_align = 1ULL << (max_order + __ffs(SZ_1M));
+		relaxed_align = max(relaxed_align, win_align);
+		min_align = min(min_align, relaxed_align);
 		size0 = calculate_memsize(size, min_size, 0, 0, resource_size(b_res), win_align);
 		pci_info(bus->self, "bridge window %pR to %pR requires relaxed alignment rules\n",
 			 b_res, &bus->busn_res);
@@ -1261,8 +1263,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 		if (bus->self && size1 &&
 		    !pbus_upstream_space_available(bus, mask | IORESOURCE_PREFETCH, type,
 						   size1, add_align)) {
-			min_align = 1ULL << (max_order + __ffs(SZ_1M));
-			min_align = max(min_align, win_align);
+			relaxed_align = 1ULL << (max_order + __ffs(SZ_1M));
+			relaxed_align = max(relaxed_align, win_align);
+			min_align = min(min_align, relaxed_align);
 			size1 = calculate_memsize(size, min_size, add_size, children_add_size,
 						  resource_size(b_res), win_align);
 			pci_info(bus->self,
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] PCI: Fix pdev_resources_assignable() disparity
  2025-08-22 12:33 [PATCH v3 0/3] PCI: Resource fitting algorith fixes Ilpo Järvinen
  2025-08-22 12:33 ` [PATCH v3 1/3] PCI: Relaxed tail alignment should never increase min_align Ilpo Järvinen
@ 2025-08-22 12:33 ` Ilpo Järvinen
  2025-08-22 12:33 ` [PATCH v3 3/3] PCI: Fix failure detection during resource resize Ilpo Järvinen
  2025-08-26 15:23 ` [PATCH v3 0/3] PCI: Resource fitting algorith fixes Rio Liu
  3 siblings, 0 replies; 8+ messages in thread
From: Ilpo Järvinen @ 2025-08-22 12:33 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, D Scott Phillips, Rio Liu,
	Tudor Ambarus, Markus Elfring, Ilpo Järvinen, linux-kernel
  Cc: Christian König, stable

pdev_sort_resources() uses pdev_resources_assignable() helper to decide
if device's resources cannot be assigned. pbus_size_mem(), on the other
hand, does not do the same check. This could lead into a situation
where a resource ends up on realloc_head list but is not on the head
list, which is turn prevents emptying the resource from the
realloc_head list in __assign_resources_sorted().

A non-empty realloc_head is unacceptable because it triggers an
internal sanity check as show in this log with a device that has class
0 (PCI_CLASS_NOT_DEFINED):

pci 0001:01:00.0: [144d:a5a5] type 00 class 0x000000 PCIe Endpoint
pci 0001:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
pci 0001:01:00.0: ROM [mem 0x00000000-0x0000ffff pref]
pci 0001:01:00.0: enabling Extended Tags
pci 0001:01:00.0: PME# supported from D0 D3hot D3cold
pci 0001:01:00.0: 15.752 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x2 link at 0001:00:00.0 (capable of 31.506 Gb/s with 16.0 GT/s PCIe x2 link)
pcieport 0001:00:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
pcieport 0001:00:00.0: bridge window [mem 0x40000000-0x401fffff]: assigned
------------[ cut here ]------------
kernel BUG at drivers/pci/setup-bus.c:2532!
Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
...
Call trace:
 pci_assign_unassigned_bus_resources+0x110/0x114 (P)
 pci_rescan_bus+0x28/0x48

Use pdev_resources_assignable() also within pbus_size_mem() to skip
processing of non-assignable resources which removes the disparity in
between what resources pdev_sort_resources() and pbus_size_mem()
consider. As non-assignable resources are no longer processed, they are
not added to the realloc_head list, thus the sanity check no longer
triggers.

This disparity problem is very old but only now became apparent after
the commit 2499f5348431 ("PCI: Rework optional resource handling") that
made the ROM resources optional when calculating bridge window sizes
which required adding the resource to the realloc_head list.
Previously, bridge windows were just sized larger than necessary.

Fixes: 2499f5348431 ("PCI: Rework optional resource handling")
Reported-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Closes: https://lore.kernel.org/all/5f103643-5e1c-43c6-b8fe-9617d3b5447c@linaro.org/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org>
---
 drivers/pci/setup-bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 527f0479e983..df5aec46c29d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1191,6 +1191,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			resource_size_t r_size;
 
 			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			    !pdev_resources_assignable(dev) ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
 			     (r->flags & mask) != type3))
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] PCI: Fix failure detection during resource resize
  2025-08-22 12:33 [PATCH v3 0/3] PCI: Resource fitting algorith fixes Ilpo Järvinen
  2025-08-22 12:33 ` [PATCH v3 1/3] PCI: Relaxed tail alignment should never increase min_align Ilpo Järvinen
  2025-08-22 12:33 ` [PATCH v3 2/3] PCI: Fix pdev_resources_assignable() disparity Ilpo Järvinen
@ 2025-08-22 12:33 ` Ilpo Järvinen
  2025-08-25 22:21   ` Bjorn Helgaas
  2025-08-26 15:23 ` [PATCH v3 0/3] PCI: Resource fitting algorith fixes Rio Liu
  3 siblings, 1 reply; 8+ messages in thread
From: Ilpo Järvinen @ 2025-08-22 12:33 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, D Scott Phillips, Rio Liu,
	Tudor Ambarus, Markus Elfring, Ilpo Järvinen, linux-kernel
  Cc: Christian König, stable

Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build
fail list in sync") the failed list is always built and returned to let
the caller decide what to do with the failures. The caller may want to
retry resource fitting and assignment and before that can happen, the
resources should be restored to their original state (a reset
effectively clears the struct resource), which requires returning them
on the failed list so that the original state remains stored in the
associated struct pci_dev_resource.

Resource resizing is different from the ordinary resource fitting and
assignment in that it only considers part of the resources. This means
failures for other resource types are not relevant at all and should be
ignored. As resize doesn't unassign such unrelated resources, those
resource ending up into the failed list implies assignment of that
resource must have failed before resize too. The check in
pci_reassign_bridge_resources() to decide if the whole assignment is
successful, however, is based on list emptiness which will cause false
negatives when the failed list has resources with an unrelated type.

If the failed list is not empty, call pci_required_resource_failed()
and extend it to be able to filter on specific resource types too (if
provided).

Calling pci_required_resource_failed() at this point is slightly
problematic because the resource itself is reset when the failed list
is constructed in __assign_resources_sorted(). As a result,
pci_resource_is_optional() does not have access to the original
resource flags. This could be worked around by restoring and
re-reseting the resource around the call to pci_resource_is_optional(),
however, it shouldn't cause issue as resource resizing is meant for
64-bit prefetchable resources according to Christian König (see the
Link which unfortunately doesn't point directly to Christian's reply
because lore didn't store that email at all).

Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync")
Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/
Tested-by: D Scott Phillips <scott@os.amperecomputing.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: <stable@vger.kernel.org>
---
 drivers/pci/setup-bus.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index df5aec46c29d..def29506700e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -28,6 +28,10 @@
 #include <linux/acpi.h>
 #include "pci.h"
 
+#define PCI_RES_TYPE_MASK \
+	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
+	 IORESOURCE_MEM_64)
+
 unsigned int pci_flags;
 EXPORT_SYMBOL_GPL(pci_flags);
 
@@ -384,13 +388,19 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res)
 }
 
 /* Return: @true if assignment of a required resource failed. */
-static bool pci_required_resource_failed(struct list_head *fail_head)
+static bool pci_required_resource_failed(struct list_head *fail_head,
+					 unsigned long type)
 {
 	struct pci_dev_resource *fail_res;
 
+	type &= PCI_RES_TYPE_MASK;
+
 	list_for_each_entry(fail_res, fail_head, list) {
 		int idx = pci_resource_num(fail_res->dev, fail_res->res);
 
+		if (type && (fail_res->flags & PCI_RES_TYPE_MASK) != type)
+			continue;
+
 		if (!pci_resource_is_optional(fail_res->dev, idx))
 			return true;
 	}
@@ -504,7 +514,7 @@ static void __assign_resources_sorted(struct list_head *head,
 	}
 
 	/* Without realloc_head and only optional fails, nothing more to do. */
-	if (!pci_required_resource_failed(&local_fail_head) &&
+	if (!pci_required_resource_failed(&local_fail_head, 0) &&
 	    list_empty(realloc_head)) {
 		list_for_each_entry(save_res, &save_head, list) {
 			struct resource *res = save_res->res;
@@ -1708,10 +1718,6 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
 	}
 }
 
-#define PCI_RES_TYPE_MASK \
-	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
-	 IORESOURCE_MEM_64)
-
 static void pci_bridge_release_resources(struct pci_bus *bus,
 					 unsigned long type)
 {
@@ -2450,8 +2456,12 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
 		free_list(&added);
 
 	if (!list_empty(&failed)) {
-		ret = -ENOSPC;
-		goto cleanup;
+		if (pci_required_resource_failed(&failed, type)) {
+			ret = -ENOSPC;
+			goto cleanup;
+		}
+		/* Only resources with unrelated types failed (again) */
+		free_list(&failed);
 	}
 
 	list_for_each_entry(dev_res, &saved, list) {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 3/3] PCI: Fix failure detection during resource resize
  2025-08-22 12:33 ` [PATCH v3 3/3] PCI: Fix failure detection during resource resize Ilpo Järvinen
@ 2025-08-25 22:21   ` Bjorn Helgaas
  2025-08-26 12:51     ` Ilpo Järvinen
  0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2025-08-25 22:21 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, linux-pci, D Scott Phillips, Rio Liu,
	Tudor Ambarus, Markus Elfring, linux-kernel, Christian König,
	stable

On Fri, Aug 22, 2025 at 03:33:59PM +0300, Ilpo Järvinen wrote:
> Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build
> fail list in sync") the failed list is always built and returned to let
> the caller decide what to do with the failures. The caller may want to
> retry resource fitting and assignment and before that can happen, the
> resources should be restored to their original state (a reset
> effectively clears the struct resource), which requires returning them
> on the failed list so that the original state remains stored in the
> associated struct pci_dev_resource.
> 
> Resource resizing is different from the ordinary resource fitting and
> assignment in that it only considers part of the resources. This means
> failures for other resource types are not relevant at all and should be
> ignored. As resize doesn't unassign such unrelated resources, those
> resource ending up into the failed list implies assignment of that
> resource must have failed before resize too. The check in
> pci_reassign_bridge_resources() to decide if the whole assignment is
> successful, however, is based on list emptiness which will cause false
> negatives when the failed list has resources with an unrelated type.
> 
> If the failed list is not empty, call pci_required_resource_failed()
> and extend it to be able to filter on specific resource types too (if
> provided).
> 
> Calling pci_required_resource_failed() at this point is slightly
> problematic because the resource itself is reset when the failed list
> is constructed in __assign_resources_sorted(). As a result,
> pci_resource_is_optional() does not have access to the original
> resource flags. This could be worked around by restoring and
> re-reseting the resource around the call to pci_resource_is_optional(),
> however, it shouldn't cause issue as resource resizing is meant for
> 64-bit prefetchable resources according to Christian König (see the
> Link which unfortunately doesn't point directly to Christian's reply
> because lore didn't store that email at all).
> 
> Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync")
> Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/
> Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
> Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/

I'm trying to connect this fix with the Asynchronous SError Interrupt
crash that Scott reported here.  From the call trace:

  ...
  arm64_serror_panic+0x6c/0x90
  do_serror+0x58/0x60
  el1h_64_error_handler+0x38/0x60
  el1h_64_error+0x84/0x88
  _raw_spin_lock_irqsave+0x34/0xb0 (P)
  amdgpu_ih_process+0xf0/0x150 [amdgpu]
  amdgpu_irq_handler+0x34/0xa0 [amdgpu]
  __handle_irq_event_percpu+0x60/0x248
  handle_irq_event+0x4c/0xc0
  handle_fasteoi_irq+0xa0/0x1c8
  handle_irq_desc+0x3c/0x68
  generic_handle_domain_irq+0x24/0x40
  __gic_handle_irq_from_irqson.isra.0+0x15c/0x260
  gic_handle_irq+0x28/0x80
  call_on_irq_stack+0x24/0x30
  do_interrupt_handler+0x88/0xa0
  el1_interrupt+0x44/0xd0
  el1h_64_irq_handler+0x18/0x28
  el1h_64_irq+0x84/0x88
  amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P)
  amdgpu_device_rreg+0x24/0x40 [amdgpu]

I guess something happened in amdgpu_device_rreg() that caused an
interrupt, maybe a bogus virtual address for a register?

And then amdgpu_ih_process() did something that caused the SError?  Or
since it seems to be asynchronous, maybe the amdgpu_ih_process()
connection is coincidental and the SError was a consequence of
something else?

I'd like to have a bread crumb here in the commit log that connects
this fix with something a user might see, but I don't know what that
would look like.

> Tested-by: D Scott Phillips <scott@os.amperecomputing.com>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Reviewed-by: D Scott Phillips <scott@os.amperecomputing.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/pci/setup-bus.c | 26 ++++++++++++++++++--------
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index df5aec46c29d..def29506700e 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -28,6 +28,10 @@
>  #include <linux/acpi.h>
>  #include "pci.h"
>  
> +#define PCI_RES_TYPE_MASK \
> +	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
> +	 IORESOURCE_MEM_64)
> +
>  unsigned int pci_flags;
>  EXPORT_SYMBOL_GPL(pci_flags);
>  
> @@ -384,13 +388,19 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res)
>  }
>  
>  /* Return: @true if assignment of a required resource failed. */
> -static bool pci_required_resource_failed(struct list_head *fail_head)
> +static bool pci_required_resource_failed(struct list_head *fail_head,
> +					 unsigned long type)
>  {
>  	struct pci_dev_resource *fail_res;
>  
> +	type &= PCI_RES_TYPE_MASK;
> +
>  	list_for_each_entry(fail_res, fail_head, list) {
>  		int idx = pci_resource_num(fail_res->dev, fail_res->res);
>  
> +		if (type && (fail_res->flags & PCI_RES_TYPE_MASK) != type)
> +			continue;
> +
>  		if (!pci_resource_is_optional(fail_res->dev, idx))
>  			return true;
>  	}
> @@ -504,7 +514,7 @@ static void __assign_resources_sorted(struct list_head *head,
>  	}
>  
>  	/* Without realloc_head and only optional fails, nothing more to do. */
> -	if (!pci_required_resource_failed(&local_fail_head) &&
> +	if (!pci_required_resource_failed(&local_fail_head, 0) &&
>  	    list_empty(realloc_head)) {
>  		list_for_each_entry(save_res, &save_head, list) {
>  			struct resource *res = save_res->res;
> @@ -1708,10 +1718,6 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
>  	}
>  }
>  
> -#define PCI_RES_TYPE_MASK \
> -	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
> -	 IORESOURCE_MEM_64)
> -
>  static void pci_bridge_release_resources(struct pci_bus *bus,
>  					 unsigned long type)
>  {
> @@ -2450,8 +2456,12 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
>  		free_list(&added);
>  
>  	if (!list_empty(&failed)) {
> -		ret = -ENOSPC;
> -		goto cleanup;
> +		if (pci_required_resource_failed(&failed, type)) {
> +			ret = -ENOSPC;
> +			goto cleanup;
> +		}
> +		/* Only resources with unrelated types failed (again) */
> +		free_list(&failed);
>  	}
>  
>  	list_for_each_entry(dev_res, &saved, list) {
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 3/3] PCI: Fix failure detection during resource resize
  2025-08-25 22:21   ` Bjorn Helgaas
@ 2025-08-26 12:51     ` Ilpo Järvinen
  2025-08-27 19:51       ` Bjorn Helgaas
  0 siblings, 1 reply; 8+ messages in thread
From: Ilpo Järvinen @ 2025-08-26 12:51 UTC (permalink / raw)
  To: Bjorn Helgaas, Alex Deucher, Christian König
  Cc: Bjorn Helgaas, linux-pci, D Scott Phillips, Rio Liu,
	Tudor Ambarus, Markus Elfring, LKML, Christian König, stable

[-- Attachment #1: Type: text/plain, Size: 9535 bytes --]

Adding Alex & Christian as they might be able to shed light on the amdgpu 
side, but I think the problem still starts from 
pci_reassign_bridge_resources().

On Mon, 25 Aug 2025, Bjorn Helgaas wrote:
> On Fri, Aug 22, 2025 at 03:33:59PM +0300, Ilpo Järvinen wrote:
> > Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build
> > fail list in sync") the failed list is always built and returned to let
> > the caller decide what to do with the failures. The caller may want to
> > retry resource fitting and assignment and before that can happen, the
> > resources should be restored to their original state (a reset
> > effectively clears the struct resource), which requires returning them
> > on the failed list so that the original state remains stored in the
> > associated struct pci_dev_resource.
> > 
> > Resource resizing is different from the ordinary resource fitting and
> > assignment in that it only considers part of the resources. This means
> > failures for other resource types are not relevant at all and should be
> > ignored. As resize doesn't unassign such unrelated resources, those
> > resource ending up into the failed list implies assignment of that
> > resource must have failed before resize too. The check in
> > pci_reassign_bridge_resources() to decide if the whole assignment is
> > successful, however, is based on list emptiness which will cause false
> > negatives when the failed list has resources with an unrelated type.
> > 
> > If the failed list is not empty, call pci_required_resource_failed()
> > and extend it to be able to filter on specific resource types too (if
> > provided).
> > 
> > Calling pci_required_resource_failed() at this point is slightly
> > problematic because the resource itself is reset when the failed list
> > is constructed in __assign_resources_sorted(). As a result,
> > pci_resource_is_optional() does not have access to the original
> > resource flags. This could be worked around by restoring and
> > re-reseting the resource around the call to pci_resource_is_optional(),
> > however, it shouldn't cause issue as resource resizing is meant for
> > 64-bit prefetchable resources according to Christian König (see the
> > Link which unfortunately doesn't point directly to Christian's reply
> > because lore didn't store that email at all).
> > 
> > Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync")
> > Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/
> > Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
> > Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/
> 
> I'm trying to connect this fix with the Asynchronous SError Interrupt
> crash that Scott reported here.  From the call trace:
> 
>   ...
>   arm64_serror_panic+0x6c/0x90
>   do_serror+0x58/0x60
>   el1h_64_error_handler+0x38/0x60
>   el1h_64_error+0x84/0x88
>   _raw_spin_lock_irqsave+0x34/0xb0 (P)
>   amdgpu_ih_process+0xf0/0x150 [amdgpu]
>   amdgpu_irq_handler+0x34/0xa0 [amdgpu]
>   __handle_irq_event_percpu+0x60/0x248
>   handle_irq_event+0x4c/0xc0
>   handle_fasteoi_irq+0xa0/0x1c8
>   handle_irq_desc+0x3c/0x68
>   generic_handle_domain_irq+0x24/0x40
>   __gic_handle_irq_from_irqson.isra.0+0x15c/0x260
>   gic_handle_irq+0x28/0x80
>   call_on_irq_stack+0x24/0x30
>   do_interrupt_handler+0x88/0xa0
>   el1_interrupt+0x44/0xd0
>   el1h_64_irq_handler+0x18/0x28
>   el1h_64_irq+0x84/0x88
>   amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P)
>   amdgpu_device_rreg+0x24/0x40 [amdgpu]
> 
> I guess something happened in amdgpu_device_rreg() that caused an
> interrupt, maybe a bogus virtual address for a register?

I think that the bogosity starts within pci_reassign_bridge_resources().
I've very recently come to realize the entire BAR resize operation is 
quite fragile as is and can fail to restore the original BARs as they were 
when the resize fails (even if it tries to restore things as they were). To 
fix that, I'll likely need to rework the entire structure of the resize 
related functions so that the saved list can hold resources beyond just 
the bridge windows that were released. I plan to eventually look at it but 
the rebar max size thing seems way more urgent than this atm.

It also looks pci_reassign_bridge_resources() can leave resources in 
non-resetted state for unassigned resources such as in this case 
(the non-resize side of the fitting algorithm resets resources that it 
failed to assign). For such resources, also IORESOURCE_UNSET gets 
overwritten by restore_dev_resource() which is even worse. My guess is 
that something in amdgpu assumes that, e.g., non-zero resource len implies 
the resource is assigned, or it could be that this IORESOURCE_UNSET 
problem make the amdgpu checks for it to not work as intended.

While I cannot pinpoint what ultimately causes the crash within amdgpu, it 
seems that some code there takes pci_resource_start/len() without checking
first if the resource is assigned (admittedly, that check could be 
somewhere else in the call chain, I only grepped for -A20 -B20 'resource' 
which had lots of noise to comb through, using 'pci_resource' too should 
find the interesting bits I think).

I'd actually want to add pci_resource_assigned() which checks only 
res->parent as that seems the most robust check to tell if the resource 
has been truly assigned. Endpoint drivers should then check a resource 
with pci_resource_assigned() before using other resource getters on it.

I could say much more about how I think IORESOURCE_UNSET is entirely 
redundant information and should be just dropped for simplicity's sake 
(and current flags handling likely has many many corner cases which the 
->parent check is entirely immune to) but it'd add to the length of an 
already long reply. :-)

> And then amdgpu_ih_process() did something that caused the SError?  Or
> since it seems to be asynchronous, maybe the amdgpu_ih_process()
> connection is coincidental and the SError was a consequence of
> something else?
> 
> I'd like to have a bread crumb here in the commit log that connects
> this fix with something a user might see, but I don't know what that
> would look like.

I'm sorry I don't know the answer, the amdgpu code is too unfamiliar 
territory, maybe Alex or Christian has some idea and can pinpoint us to 
what to look at.

-- 
 i.

> > Tested-by: D Scott Phillips <scott@os.amperecomputing.com>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Reviewed-by: D Scott Phillips <scott@os.amperecomputing.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: <stable@vger.kernel.org>
> > ---
> >  drivers/pci/setup-bus.c | 26 ++++++++++++++++++--------
> >  1 file changed, 18 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index df5aec46c29d..def29506700e 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -28,6 +28,10 @@
> >  #include <linux/acpi.h>
> >  #include "pci.h"
> >  
> > +#define PCI_RES_TYPE_MASK \
> > +	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
> > +	 IORESOURCE_MEM_64)
> > +
> >  unsigned int pci_flags;
> >  EXPORT_SYMBOL_GPL(pci_flags);
> >  
> > @@ -384,13 +388,19 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res)
> >  }
> >  
> >  /* Return: @true if assignment of a required resource failed. */
> > -static bool pci_required_resource_failed(struct list_head *fail_head)
> > +static bool pci_required_resource_failed(struct list_head *fail_head,
> > +					 unsigned long type)
> >  {
> >  	struct pci_dev_resource *fail_res;
> >  
> > +	type &= PCI_RES_TYPE_MASK;
> > +
> >  	list_for_each_entry(fail_res, fail_head, list) {
> >  		int idx = pci_resource_num(fail_res->dev, fail_res->res);
> >  
> > +		if (type && (fail_res->flags & PCI_RES_TYPE_MASK) != type)
> > +			continue;
> > +
> >  		if (!pci_resource_is_optional(fail_res->dev, idx))
> >  			return true;
> >  	}
> > @@ -504,7 +514,7 @@ static void __assign_resources_sorted(struct list_head *head,
> >  	}
> >  
> >  	/* Without realloc_head and only optional fails, nothing more to do. */
> > -	if (!pci_required_resource_failed(&local_fail_head) &&
> > +	if (!pci_required_resource_failed(&local_fail_head, 0) &&
> >  	    list_empty(realloc_head)) {
> >  		list_for_each_entry(save_res, &save_head, list) {
> >  			struct resource *res = save_res->res;
> > @@ -1708,10 +1718,6 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
> >  	}
> >  }
> >  
> > -#define PCI_RES_TYPE_MASK \
> > -	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
> > -	 IORESOURCE_MEM_64)
> > -
> >  static void pci_bridge_release_resources(struct pci_bus *bus,
> >  					 unsigned long type)
> >  {
> > @@ -2450,8 +2456,12 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
> >  		free_list(&added);
> >  
> >  	if (!list_empty(&failed)) {
> > -		ret = -ENOSPC;
> > -		goto cleanup;
> > +		if (pci_required_resource_failed(&failed, type)) {
> > +			ret = -ENOSPC;
> > +			goto cleanup;
> > +		}
> > +		/* Only resources with unrelated types failed (again) */
> > +		free_list(&failed);
> >  	}
> >  
> >  	list_for_each_entry(dev_res, &saved, list) {
> > -- 
> > 2.39.5
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/3] PCI: Resource fitting algorith fixes
  2025-08-22 12:33 [PATCH v3 0/3] PCI: Resource fitting algorith fixes Ilpo Järvinen
                   ` (2 preceding siblings ...)
  2025-08-22 12:33 ` [PATCH v3 3/3] PCI: Fix failure detection during resource resize Ilpo Järvinen
@ 2025-08-26 15:23 ` Rio Liu
  3 siblings, 0 replies; 8+ messages in thread
From: Rio Liu @ 2025-08-26 15:23 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, linux-pci, D Scott Phillips, Tudor Ambarus,
	Markus Elfring, linux-kernel, Christian König






On Friday, August 22nd, 2025 at AM 8:34, Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote:

> 
> Once Rio has tested the first patch, these should replace the v2
> patches in the pci/resource branch.
> 

Hello Ilpo,

I've just applied PATCH v3 1/3 on the v6.17-rc1 mainline (commit 8f5ae30d69d7543eee0d70083daf4de8fe15d585) and tested it, looks like it still works! Thanks again for your work.

Rio

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 3/3] PCI: Fix failure detection during resource resize
  2025-08-26 12:51     ` Ilpo Järvinen
@ 2025-08-27 19:51       ` Bjorn Helgaas
  0 siblings, 0 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2025-08-27 19:51 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Alex Deucher, Christian König, Bjorn Helgaas, linux-pci,
	D Scott Phillips, Rio Liu, Tudor Ambarus, Markus Elfring, LKML,
	stable

On Tue, Aug 26, 2025 at 03:51:25PM +0300, Ilpo Järvinen wrote:
> Adding Alex & Christian as they might be able to shed light on the amdgpu 
> side, but I think the problem still starts from 
> pci_reassign_bridge_resources().
> 
> On Mon, 25 Aug 2025, Bjorn Helgaas wrote:
> > On Fri, Aug 22, 2025 at 03:33:59PM +0300, Ilpo Järvinen wrote:
> > > Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build
> > > fail list in sync") the failed list is always built and returned to let
> > > the caller decide what to do with the failures. The caller may want to
> > > retry resource fitting and assignment and before that can happen, the
> > > resources should be restored to their original state (a reset
> > > effectively clears the struct resource), which requires returning them
> > > on the failed list so that the original state remains stored in the
> > > associated struct pci_dev_resource.
> > > 
> > > Resource resizing is different from the ordinary resource fitting and
> > > assignment in that it only considers part of the resources. This means
> > > failures for other resource types are not relevant at all and should be
> > > ignored. As resize doesn't unassign such unrelated resources, those
> > > resource ending up into the failed list implies assignment of that
> > > resource must have failed before resize too. The check in
> > > pci_reassign_bridge_resources() to decide if the whole assignment is
> > > successful, however, is based on list emptiness which will cause false
> > > negatives when the failed list has resources with an unrelated type.
> > > 
> > > If the failed list is not empty, call pci_required_resource_failed()
> > > and extend it to be able to filter on specific resource types too (if
> > > provided).
> > > 
> > > Calling pci_required_resource_failed() at this point is slightly
> > > problematic because the resource itself is reset when the failed list
> > > is constructed in __assign_resources_sorted(). As a result,
> > > pci_resource_is_optional() does not have access to the original
> > > resource flags. This could be worked around by restoring and
> > > re-reseting the resource around the call to pci_resource_is_optional(),
> > > however, it shouldn't cause issue as resource resizing is meant for
> > > 64-bit prefetchable resources according to Christian König (see the
> > > Link which unfortunately doesn't point directly to Christian's reply
> > > because lore didn't store that email at all).
> > > 
> > > Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync")
> > > Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@linux.intel.com/
> > > Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
> > > Closes: https://lore.kernel.org/all/86plf0lgit.fsf@scott-ph-mail.amperecomputing.com/
> > 
> > I'm trying to connect this fix with the Asynchronous SError Interrupt
> > crash that Scott reported here.  From the call trace:
> > 
> >   ...
> >   arm64_serror_panic+0x6c/0x90
> >   do_serror+0x58/0x60
> >   el1h_64_error_handler+0x38/0x60
> >   el1h_64_error+0x84/0x88
> >   _raw_spin_lock_irqsave+0x34/0xb0 (P)
> >   amdgpu_ih_process+0xf0/0x150 [amdgpu]
> >   amdgpu_irq_handler+0x34/0xa0 [amdgpu]
> >   __handle_irq_event_percpu+0x60/0x248
> >   handle_irq_event+0x4c/0xc0
> >   handle_fasteoi_irq+0xa0/0x1c8
> >   handle_irq_desc+0x3c/0x68
> >   generic_handle_domain_irq+0x24/0x40
> >   __gic_handle_irq_from_irqson.isra.0+0x15c/0x260
> >   gic_handle_irq+0x28/0x80
> >   call_on_irq_stack+0x24/0x30
> >   do_interrupt_handler+0x88/0xa0
> >   el1_interrupt+0x44/0xd0
> >   el1h_64_irq_handler+0x18/0x28
> >   el1h_64_irq+0x84/0x88
> >   amdgpu_device_rreg.part.0+0x4c/0x190 [amdgpu] (P)
> >   amdgpu_device_rreg+0x24/0x40 [amdgpu]
> > 
> > I guess something happened in amdgpu_device_rreg() that caused an
> > interrupt, maybe a bogus virtual address for a register?
> ...

> > And then amdgpu_ih_process() did something that caused the SError?  Or
> > since it seems to be asynchronous, maybe the amdgpu_ih_process()
> > connection is coincidental and the SError was a consequence of
> > something else?
> > 
> > I'd like to have a bread crumb here in the commit log that connects
> > this fix with something a user might see, but I don't know what that
> > would look like.
> 
> I'm sorry I don't know the answer, the amdgpu code is too unfamiliar 
> territory, maybe Alex or Christian has some idea and can pinpoint us to 
> what to look at.

Do we know what the PCIe controller is here?  Is there a public
datasheet for it?

I've seen other issues that make me wonder if some controllers work
like this:

  - PCIe error occurs on read or write transaction

  - PCIe write dropped or read completed by the controller
    synthesizing ~0 data to CPU

  - PCIe controller signals Asynchronous SError as a result of the
    error

But I guess even if the above happens, I can't explain why the PCIe
error would occur in the first place.  Scott didn't mention anything
like an FLR.  But maybe if we actually got as far as programming
something bogus in a BAR, a read might get no response (or two
responses).

I would assume there should be something logged in the AER Capability,
but I don't think we've looked at that yet.  The AER interrupt is also
asynchronous, so not surprising that this panic could happen before
handling it.

Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-27 19:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 12:33 [PATCH v3 0/3] PCI: Resource fitting algorith fixes Ilpo Järvinen
2025-08-22 12:33 ` [PATCH v3 1/3] PCI: Relaxed tail alignment should never increase min_align Ilpo Järvinen
2025-08-22 12:33 ` [PATCH v3 2/3] PCI: Fix pdev_resources_assignable() disparity Ilpo Järvinen
2025-08-22 12:33 ` [PATCH v3 3/3] PCI: Fix failure detection during resource resize Ilpo Järvinen
2025-08-25 22:21   ` Bjorn Helgaas
2025-08-26 12:51     ` Ilpo Järvinen
2025-08-27 19:51       ` Bjorn Helgaas
2025-08-26 15:23 ` [PATCH v3 0/3] PCI: Resource fitting algorith fixes Rio Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).