* [RFC PATCH] PCI: Sort resources by size as secondary key
@ 2026-06-18 7:25 Ding Hui
2026-06-18 9:42 ` sashiko-bot
0 siblings, 1 reply; 2+ messages in thread
From: Ding Hui @ 2026-06-18 7:25 UTC (permalink / raw)
To: bhelgaas, linux-pci, linux-kernel; +Cc: ilpo.jarvinen, Ding Hui
We encountered an issue on BCM57414 NIC where function 1 failed to
enable SR-IOV after remove & rescan. Investigation revealed this is
caused by BAR allocation failure during rescan.
Simplified topology:
+-[0000:30]-+- ...
| +-02.0-[31]--+-00.0 Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller [14e4:16d7]
| | \-00.1 Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller [14e4:16d7]
iomem layout after init bootup:
22fffec00000-22ffffefffff : PCI Bus 0000:31 [Window size=19M]
22fffec00000-22ffff3fffff : 0000:31:00.1 [align=1M size=8M BAR 9 (VF BAR 2)]
22ffff400000-22ffffbfffff : 0000:31:00.0 [align=1M size=8M BAR 9 (VF BAR 2)]
22ffffc00000-22ffffcfffff : 0000:31:00.1 [align=1M size=1M BAR 2]
22ffffd00000-22ffffdfffff : 0000:31:00.0 [align=1M size=1M BAR 2]
22ffffe00000-22ffffe0ffff : 0000:31:00.1 [align=64K size=64K BAR 0]
22ffffe10000-22ffffe1ffff : 0000:31:00.0 [align=64K size=64K BAR 0]
22ffffe20000-22ffffe3ffff : 0000:31:00.1 [align=16K size=128K BAR 11(VF BAR 4)]
22ffffe40000-22ffffe5ffff : 0000:31:00.1 [align=16K size=128K BAR 7 (VF BAR 0)]
22ffffe60000-22ffffe7ffff : 0000:31:00.0 [align=16K size=128K BAR 11(VF BAR 4)]
22ffffe80000-22ffffe9ffff : 0000:31:00.0 [align=16K size=128K BAR 7 (VF BAR 0)]
22ffffea0000-22ffffea1fff : 0000:31:00.1 [align=8K size=8K BAR 4]
22ffffea2000-22ffffea3fff : 0000:31:00.0 [align=8K size=8K BAR 4]
iomem layout after remove function 1 by
echo "1" > /sys/bus/pci/devices/0000:31:00.1/remove
22fffec00000-22ffffefffff : PCI Bus 0000:31 [Window size=19M]
22ffff400000-22ffffbfffff : 0000:31:00.0 [align=1M size=8M BAR 9 (VF BAR 2)]
22ffffd00000-22ffffdfffff : 0000:31:00.0 [align=1M size=1M BAR 2]
22ffffe10000-22ffffe1ffff : 0000:31:00.0 [align=64K size=64K BAR 0]
22ffffe60000-22ffffe7ffff : 0000:31:00.0 [align=16K size=128K BAR 11(VF BAR 4)]
22ffffe80000-22ffffe9ffff : 0000:31:00.0 [align=16K size=128K BAR 7 (VF BAR 0)]
22ffffea2000-22ffffea3fff : 0000:31:00.0 [align=8K size=8K BAR 4]
Rescan logs triggered by
echo "1" > /sys/bus/pci/devices/0000:30:02.0/rescan
[ 90.585067] pci 0000:31:00.1: [14e4:16d7] type 00 class 0x020000 PCIe Endpoint
[ 90.585107] pci 0000:31:00.1: BAR 0 [mem 0x22ffffe00000-0x22ffffe0ffff 64bit pref]
[ 90.585113] pci 0000:31:00.1: BAR 2 [mem 0x22ffffc00000-0x22ffffcfffff 64bit pref]
[ 90.585116] pci 0000:31:00.1: BAR 4 [mem 0x22ffffea0000-0x22ffffea1fff 64bit pref]
[ 90.585119] pci 0000:31:00.1: ROM [mem 0xb0e00000-0xb0e7ffff pref]
[ 90.585216] pci 0000:31:00.1: PME# supported from D0 D3hot D3cold
[ 90.585253] pci 0000:31:00.1: VF BAR 0 [mem 0x22ffffe40000-0x22ffffe43fff 64bit pref]
[ 90.585255] pci 0000:31:00.1: VF BAR 0 [mem 0x22ffffe40000-0x22ffffe5ffff 64bit pref]: contains BAR 0 for 8 VFs
[ 90.585258] pci 0000:31:00.1: VF BAR 2 [mem 0x22fffec00000-0x22fffecfffff 64bit pref]
[ 90.585260] pci 0000:31:00.1: VF BAR 2 [mem 0x22fffec00000-0x22ffff3fffff 64bit pref]: contains BAR 2 for 8 VFs
[ 90.585263] pci 0000:31:00.1: VF BAR 4 [mem 0x22ffffe20000-0x22ffffe23fff 64bit pref]
[ 90.585265] pci 0000:31:00.1: VF BAR 4 [mem 0x22ffffe20000-0x22ffffe3ffff 64bit pref]: contains BAR 4 for 8 VFs
[ 90.585534] pci 0000:31:00.1: Adding to iommu group 11
[ 90.585575] pci 0000:31:00.1: BAR 2 [mem 0x22fffec00000-0x22fffecfffff 64bit pref]: assigned
[ 90.585585] pci 0000:31:00.1: VF BAR 2 [mem size 0x00800000 64bit pref]: can't assign; no space
[ 90.585587] pci 0000:31:00.1: VF BAR 2 [mem size 0x00800000 64bit pref]: failed to assign
[ 90.585589] pci 0000:31:00.1: ROM [mem 0xb0e00000-0xb0e7ffff pref]: assigned
[ 90.585591] pci 0000:31:00.1: BAR 0 [mem 0x22fffed00000-0x22fffed0ffff 64bit pref]: assigned
[ 90.585599] pci 0000:31:00.1: VF BAR 0 [mem 0x22fffed10000-0x22fffed2ffff 64bit pref]: assigned
[ 90.585603] pci 0000:31:00.1: VF BAR 4 [mem 0x22fffed30000-0x22fffed4ffff 64bit pref]: assigned
[ 90.585606] pci 0000:31:00.1: BAR 4 [mem 0x22fffed50000-0x22fffed51fff 64bit pref]: assigned
Enable sriov failed logs triggered by
echo 2 > /sys/bus/pci/devices/0000:31:00.1/sriov_numvfs
[ 1666.918432] bnxt_en 0000:31:00.1: not enough MMIO resources for SR-IOV
[ 1666.918442] bnxt_en 0000:31:00.1 eth5: pci_enable_sriov failed : -12
The resource allocation process during rescan is as follows:
dev_rescan_store
pci_rescan_bus
pci_assign_unassigned_bus_resources
__pci_bus_assign_resources
pbus_assign_resources_sorted
pdev_sort_resources
__assign_resources_sorted
assign_requested_resources_sorted
pci_assign_resource
We noticed that current sort algorithm is only by alignment.
The BAR 2 (align=1M size=1M) is located before BAR 9 (VF BAR 2
align=1M size=8M), so the 8M cannot be satisfied.
If we keep alignment as primary sorting key, but use size as secondary
key, all resource can be satisfied when remove & rescan.
Does this approach only solve current specific case as a workaround,
or does it also benefit general PCI resource allocation?
I think it may help reduce allocation failures due to fragmentation
theoretically, but I'm not sure.
Appreciate any comment and suggestion, thanks.
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
---
drivers/pci/setup-bus.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4cf120ebe5ad..63f224f0c6be 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -367,7 +367,8 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
align = pci_resource_alignment(dev_res->dev,
dev_res->res);
- if (r_align > align) {
+ if (r_align > align ||
+ (r_align == align && resource_size(r) > resource_size(dev_res->res))) {
n = &dev_res->list;
break;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [RFC PATCH] PCI: Sort resources by size as secondary key
2026-06-18 7:25 [RFC PATCH] PCI: Sort resources by size as secondary key Ding Hui
@ 2026-06-18 9:42 ` sashiko-bot
0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-18 9:42 UTC (permalink / raw)
To: Ding Hui; +Cc: linux-pci
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Incomplete fix: `__assign_resources_sorted()` fails to maintain the new size-based secondary sort key when bridge windows are dynamically resized, breaking the sorting invariant.
--
commit b576a58abc2576977d567e69cb6a8b9e9d867d9b
Author: Ding Hui <dinghui@sangfor.com.cn>
Subject: PCI: Sort resources by size as secondary key
The commit updates pdev_sort_resources() to use resource size as a secondary
sorting key to reduce fragmentation. This resolves an issue on the BCM57414
NIC where function 1 failed to enable SR-IOV after remove and rescan due
to BAR allocation failures.
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 4cf120ebe5adf..63f224f0c6bee 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -367,7 +367,8 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
> align = pci_resource_alignment(dev_res->dev,
> dev_res->res);
>
> - if (r_align > align) {
> + if (r_align > align ||
> + (r_align == align && resource_size(r) > resource_size(dev_res->res))) {
[Severity: High]
Does this patch also need to update the re-sorting logic in
__assign_resources_sorted() to maintain the size-based secondary sort key?
When a bridge window requests additional size via realloc_head in
__assign_resources_sorted(), its size is increased:
drivers/pci/setup-bus.c:__assign_resources_sorted() {
...
res->end += addsize_res->add_size;
...
if (addsize_res->min_align <= res->start)
continue;
If the alignment does not increase, it skips re-sorting entirely, leaving the
enlarged resource in its original position based on its older, smaller size.
If the alignment does increase, it is re-inserted into the sorted list, but
the loop only compares alignments:
list_for_each_entry(dev_res2, head, list) {
align = pci_resource_alignment(dev_res2->dev, dev_res2->res);
if (addsize_res->min_align > align) {
list_move_tail(&dev_res->list, &dev_res2->list);
break;
}
}
This skips over elements with equal alignment, placing the newly enlarged
resource at the end of its alignment group and disregarding its size. Because
bridge windows are frequently dynamically resized, failing to sort them by
size when enlarged might place them behind smaller resources of the same
alignment, potentially re-introducing the fragmentation and allocation
failures this patch intends to solve.
> n = &dev_res->list;
> break;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260618072536.28199-1-dinghui@sangfor.com.cn?part=1
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-18 9:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 7:25 [RFC PATCH] PCI: Sort resources by size as secondary key Ding Hui
2026-06-18 9:42 ` sashiko-bot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.