* [PATCH 1/4] PCI: Fix bridge window alignment with optional resources
2025-11-28 11:50 [PATCH 0/4] PCI: Bridge window head alignment fix/rework Ilpo Järvinen
@ 2025-11-28 11:50 ` Ilpo Järvinen
2025-11-28 11:50 ` [PATCH 2/4] PCI: Rewrite bridge window head alignment function Ilpo Järvinen
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Ilpo Järvinen @ 2025-11-28 11:50 UTC (permalink / raw)
To: linux-pci, Bjorn Helgaas, Benjamin Herrenschmidt, Wei Yang,
Malte Schröder, linux-kernel
Cc: Ilpo Järvinen, stable
pbus_size_mem() has two alignments, one for required resources in
min_align and another in add_align that takes account optional
resources.
The add_align is applied to the bridge window through the realloc_head
list. It can happen, however, that add_align is larger than min_align
but calculated size1 and size0 are equal due to extra tailroom (e.g.,
hotplug reservation, tail alignment), and therefore no entry is created
to the realloc_head list. Without the bridge appearing in the realloc
head, add_align is lost when pbus_size_mem() returns.
The problem is visible in this log for 0000:05:00.0 which lacks
add_size ... add_align ... line that would indicate it was added into
the realloc_head list:
pci 0000:05:00.0: PCI bridge to [bus 06-16]
...
pci 0000:06:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 07] requires relaxed alignment rules
pci 0000:06:06.0: bridge window [mem 0x00100000-0x001fffff] to [bus 0a] requires relaxed alignment rules
pci 0000:06:07.0: bridge window [mem 0x00100000-0x003fffff] to [bus 0b] requires relaxed alignment rules
pci 0000:06:08.0: bridge window [mem 0x00800000-0x00ffffff 64bit pref] to [bus 0c-14] requires relaxed alignment rules
pci 0000:06:08.0: bridge window [mem 0x01000000-0x057fffff] to [bus 0c-14] requires relaxed alignment rules
pci 0000:06:08.0: bridge window [mem 0x01000000-0x057fffff] to [bus 0c-14] requires relaxed alignment rules
pci 0000:06:08.0: bridge window [mem 0x01000000-0x057fffff] to [bus 0c-14] add_size 100000 add_align 1000000
pci 0000:06:0c.0: bridge window [mem 0x00100000-0x001fffff] to [bus 15] requires relaxed alignment rules
pci 0000:06:0d.0: bridge window [mem 0x00100000-0x001fffff] to [bus 16] requires relaxed alignment rules
pci 0000:06:0d.0: bridge window [mem 0x00100000-0x001fffff] to [bus 16] requires relaxed alignment rules
pci 0000:05:00.0: bridge window [mem 0xd4800000-0xd97fffff]: assigned
pci 0000:05:00.0: bridge window [mem 0x1060000000-0x10607fffff 64bit pref]: assigned
pci 0000:06:08.0: bridge window [mem size 0x04900000]: can't assign; no space
pci 0000:06:08.0: bridge window [mem size 0x04900000]: failed to assign
While this bug itself seems old, it has likely become more visible
after the relaxed tail alignment that does not grossly overestimate the
size needed for the bridge window.
Make sure add_align > min_align too results in adding an entry into the
realloc head list. In addition, add handling to the cases where
add_size is zero while only alignment differs.
Fixes: d74b9027a4da ("PCI: Consider additional PF's IOV BAR alignment in sizing and assigning")
Reported-by: Malte Schröder <malte+lkml@tnxip.de>
Tested-by: Malte Schröder <malte+lkml@tnxip.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org
---
drivers/pci/setup-bus.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4a8735b275e4..70d021ffb486 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -14,6 +14,7 @@
* tighter packing. Prefetchable range support.
*/
+#include <linux/align.h>
#include <linux/bitops.h>
#include <linux/init.h>
#include <linux/kernel.h>
@@ -452,7 +453,7 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
"%s %pR: ignoring failure in optional allocation\n",
res_name, res);
}
- } else if (add_size > 0) {
+ } else if (add_size > 0 || !IS_ALIGNED(res->start, align)) {
res->flags |= add_res->flags &
(IORESOURCE_STARTALIGN|IORESOURCE_SIZEALIGN);
if (pci_reassign_resource(dev, idx, add_size, align))
@@ -1438,12 +1439,13 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
resource_set_range(b_res, min_align, size0);
b_res->flags |= IORESOURCE_STARTALIGN;
- if (bus->self && size1 > size0 && realloc_head) {
+ if (bus->self && realloc_head && (size1 > size0 || add_align > min_align)) {
b_res->flags &= ~IORESOURCE_DISABLED;
- add_to_list(realloc_head, bus->self, b_res, size1-size0, add_align);
+ add_size = size1 > size0 ? size1 - size0 : 0;
+ add_to_list(realloc_head, bus->self, b_res, add_size, add_align);
pci_info(bus->self, "bridge window %pR to %pR add_size %llx add_align %llx\n",
b_res, &bus->busn_res,
- (unsigned long long) (size1 - size0),
+ (unsigned long long) add_size,
(unsigned long long) add_align);
}
}
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 2/4] PCI: Rewrite bridge window head alignment function
2025-11-28 11:50 [PATCH 0/4] PCI: Bridge window head alignment fix/rework Ilpo Järvinen
2025-11-28 11:50 ` [PATCH 1/4] PCI: Fix bridge window alignment with optional resources Ilpo Järvinen
@ 2025-11-28 11:50 ` Ilpo Järvinen
2025-11-28 11:50 ` [PATCH 3/4] PCI: Stop over-estimating bridge window size Ilpo Järvinen
2025-11-28 11:50 ` [PATCH 4/4] resource: Increase MAX_IORES_LEVEL to 8 Ilpo Järvinen
3 siblings, 0 replies; 5+ messages in thread
From: Ilpo Järvinen @ 2025-11-28 11:50 UTC (permalink / raw)
To: linux-pci, Bjorn Helgaas, Benjamin Herrenschmidt, Wei Yang,
Malte Schröder, linux-kernel
Cc: Ilpo Järvinen, stable
The calculation of bridge window head alignment is done by
calculate_mem_align() [*]. With the default bridge window alignment, it
is used for both head and tail alignment.
The selected head alignment does not always result in tight-fitting
resources (gap at d4f00000-d4ffffff):
d4800000-dbffffff : PCI Bus 0000:06
d4800000-d48fffff : PCI Bus 0000:07
d4800000-d4803fff : 0000:07:00.0
d4800000-d4803fff : nvme
d4900000-d49fffff : PCI Bus 0000:0a
d4900000-d490ffff : 0000:0a:00.0
d4900000-d490ffff : r8169
d4910000-d4913fff : 0000:0a:00.0
d4a00000-d4cfffff : PCI Bus 0000:0b
d4a00000-d4bfffff : 0000:0b:00.0
d4a00000-d4bfffff : 0000:0b:00.0
d4c00000-d4c07fff : 0000:0b:00.0
d4d00000-d4dfffff : PCI Bus 0000:15
d4d00000-d4d07fff : 0000:15:00.0
d4d00000-d4d07fff : xhci-hcd
d4e00000-d4efffff : PCI Bus 0000:16
d4e00000-d4e7ffff : 0000:16:00.0
d4e80000-d4e803ff : 0000:16:00.0
d4e80000-d4e803ff : ahci
d5000000-dbffffff : PCI Bus 0000:0c
This has not been caused problems (for years) with the default bridge
window tail alignment that grossly over-estimates the required tail
alignment leaving more tail room than necessary. With the introduction
of relaxed tail alignment that leaves no extra tail room whatsoever,
any gaps will immediately turn into assignment failures.
Introduce head alignment calculation that ensures no gaps are left and
apply the new approach when using relaxed alignment.
([*] I don't understand the algorithm in calculate_mem_align().)
Fixes: 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI allocation code (alpha, arm, parisc) [2/2]")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220775
Reported-by: Malte Schröder <malte+lkml@tnxip.de>
Tested-by: Malte Schröder <malte+lkml@tnxip.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org
---
Little annoyingly, there's difference in what aligns array contains
between the legacy alignment approach (which I dare not to touch as I
really don't understand what the algorithm tries to do) and this new
head aligment algorithm, both consuming stack space. After making the
new approach the only available approach in the follow-up patch, only
one array remains (however, that follow-up change is also somewhat
riskier when it comes to regressions).
That being said, the new head alignment could work with the same aligns
array as the legacy approach, it just won't necessarily produce an
optimal (the smallest possible) head alignment when if (r_size <=
align) condition is used. Just let me know if that approach is
preferred (to save some stack space).
---
drivers/pci/setup-bus.c | 53 ++++++++++++++++++++++++++++++++++-------
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 70d021ffb486..93f6b0750174 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1224,6 +1224,45 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
return min_align;
}
+/*
+ * Calculate bridge window head alignment that leaves no gaps in between
+ * resources.
+ */
+static resource_size_t calculate_head_align(resource_size_t *aligns,
+ int max_order)
+{
+ resource_size_t head_align = 1;
+ resource_size_t remainder = 0;
+ int order;
+
+ /* Take the largest alignment as the starting point. */
+ head_align <<= max_order + __ffs(SZ_1M);
+
+ for (order = max_order - 1; order >= 0; order--) {
+ resource_size_t align1 = 1;
+
+ align1 <<= order + __ffs(SZ_1M);
+
+ /*
+ * Account smaller resources with alignment < max_order that
+ * could be used to fill head room if alignment less than
+ * max_order is used.
+ */
+ remainder += aligns[order];
+
+ /*
+ * Test if head fill is enough to satisfy the alignment of
+ * the larger resources after reducing the alignment.
+ */
+ while ((head_align > align1) && (remainder >= head_align / 2)) {
+ head_align /= 2;
+ remainder -= head_align;
+ }
+ }
+
+ return head_align;
+}
+
/**
* pbus_upstream_space_available - Check no upstream resource limits allocation
* @bus: The bus
@@ -1311,13 +1350,13 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
{
struct pci_dev *dev;
resource_size_t min_align, win_align, align, size, size0, size1 = 0;
- resource_size_t aligns[28]; /* Alignments from 1MB to 128TB */
+ resource_size_t aligns[28] = {}; /* Alignments from 1MB to 128TB */
+ resource_size_t aligns2[28] = {};/* Alignments from 1MB to 128TB */
int order, max_order;
struct resource *b_res = pbus_select_window_for_type(bus, type);
resource_size_t children_add_size = 0;
resource_size_t children_add_align = 0;
resource_size_t add_align = 0;
- resource_size_t relaxed_align;
resource_size_t old_size;
if (!b_res)
@@ -1327,7 +1366,6 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
if (b_res->parent)
return;
- memset(aligns, 0, sizeof(aligns));
max_order = 0;
size = 0;
@@ -1378,6 +1416,7 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
*/
if (r_size <= align)
aligns[order] += align;
+ aligns2[order] += align;
if (order > max_order)
max_order = order;
@@ -1402,9 +1441,7 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
if (bus->self && size0 &&
!pbus_upstream_space_available(bus, b_res, size0, min_align)) {
- relaxed_align = 1ULL << (max_order + __ffs(SZ_1M));
- relaxed_align = max(relaxed_align, win_align);
- min_align = min(min_align, relaxed_align);
+ min_align = calculate_head_align(aligns2, max_order);
size0 = calculate_memsize(size, min_size, 0, 0, old_size, win_align);
resource_set_range(b_res, min_align, size0);
pci_info(bus->self, "bridge window %pR to %pR requires relaxed alignment rules\n",
@@ -1418,9 +1455,7 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
if (bus->self && size1 &&
!pbus_upstream_space_available(bus, b_res, size1, add_align)) {
- relaxed_align = 1ULL << (max_order + __ffs(SZ_1M));
- relaxed_align = max(relaxed_align, win_align);
- min_align = min(min_align, relaxed_align);
+ min_align = calculate_head_align(aligns2, max_order);
size1 = calculate_memsize(size, min_size, add_size, children_add_size,
old_size, win_align);
pci_info(bus->self,
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 3/4] PCI: Stop over-estimating bridge window size
2025-11-28 11:50 [PATCH 0/4] PCI: Bridge window head alignment fix/rework Ilpo Järvinen
2025-11-28 11:50 ` [PATCH 1/4] PCI: Fix bridge window alignment with optional resources Ilpo Järvinen
2025-11-28 11:50 ` [PATCH 2/4] PCI: Rewrite bridge window head alignment function Ilpo Järvinen
@ 2025-11-28 11:50 ` Ilpo Järvinen
2025-11-28 11:50 ` [PATCH 4/4] resource: Increase MAX_IORES_LEVEL to 8 Ilpo Järvinen
3 siblings, 0 replies; 5+ messages in thread
From: Ilpo Järvinen @ 2025-11-28 11:50 UTC (permalink / raw)
To: linux-pci, Bjorn Helgaas, Benjamin Herrenschmidt, Wei Yang,
Malte Schröder, linux-kernel
Cc: Ilpo Järvinen
New way to calculate the bridge window head alignment produces
tight-fit, that is, it does not leave any gaps between the resources.
Similarly, relaxed tail alignment does not leave extra tail room.
Start to use bridge window calculation that does not over-estimate
the size of the required window.
pbus_upstream_space_available() can be removed.
Tested-by: Malte Schröder <malte+lkml@tnxip.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
This is relatively risky change when it comes to regressions. In static
setups things are likely okay (in my own testing, many systems had zero
differences or just one bridge window among many that was shrunk some, no
resulting any issue). In cases where resources are discovered later
(hotplug, pwrctrl, delayed enumeration, etc.) the difference might matter
more, if a reduced size results in resources not fitting. Those might be
addressable by provinding pci=hp*size=xx parameter which is the canonical
way to prepare for unknown, instead of relying on artifacts of the bridge
window alignment algorithm.
drivers/pci/setup-bus.c | 97 +++--------------------------------------
1 file changed, 5 insertions(+), 92 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 93f6b0750174..6f4bb2d19cc1 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1263,68 +1263,6 @@ static resource_size_t calculate_head_align(resource_size_t *aligns,
return head_align;
}
-/**
- * pbus_upstream_space_available - Check no upstream resource limits allocation
- * @bus: The bus
- * @res: The resource to help select the correct bridge window
- * @size: The size required from the bridge window
- * @align: Required alignment for the resource
- *
- * Check that @size can fit inside the upstream bridge resources that are
- * already assigned. Select the upstream bridge window based on the type of
- * @res.
- *
- * Return: %true if enough space is available on all assigned upstream
- * resources.
- */
-static bool pbus_upstream_space_available(struct pci_bus *bus,
- struct resource *res,
- resource_size_t size,
- resource_size_t align)
-{
- struct resource_constraint constraint = {
- .max = RESOURCE_SIZE_MAX,
- .align = align,
- };
- struct pci_bus *downstream = bus;
-
- while ((bus = bus->parent)) {
- if (pci_is_root_bus(bus))
- break;
-
- res = pbus_select_window(bus, res);
- if (!res)
- return false;
- if (!res->parent)
- continue;
-
- if (resource_size(res) >= size) {
- struct resource gap = {};
-
- if (find_resource_space(res, &gap, size, &constraint) == 0) {
- gap.flags = res->flags;
- pci_dbg(bus->self,
- "Assigned bridge window %pR to %pR free space at %pR\n",
- res, &bus->busn_res, &gap);
- return true;
- }
- }
-
- if (bus->self) {
- pci_info(bus->self,
- "Assigned bridge window %pR to %pR cannot fit 0x%llx required for %s bridging to %pR\n",
- res, &bus->busn_res,
- (unsigned long long)size,
- pci_name(downstream->self),
- &downstream->busn_res);
- }
-
- return false;
- }
-
- return true;
-}
-
/**
* pbus_size_mem() - Size the memory window of a given bus
*
@@ -1351,7 +1289,6 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
struct pci_dev *dev;
resource_size_t min_align, win_align, align, size, size0, size1 = 0;
resource_size_t aligns[28] = {}; /* Alignments from 1MB to 128TB */
- resource_size_t aligns2[28] = {};/* Alignments from 1MB to 128TB */
int order, max_order;
struct resource *b_res = pbus_select_window_for_type(bus, type);
resource_size_t children_add_size = 0;
@@ -1410,13 +1347,8 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
continue;
}
size += max(r_size, align);
- /*
- * Exclude ranges with size > align from calculation of
- * the alignment.
- */
- if (r_size <= align)
- aligns[order] += align;
- aligns2[order] += align;
+
+ aligns[order] += align;
if (order > max_order)
max_order = order;
@@ -1430,38 +1362,19 @@ static void pbus_size_mem(struct pci_bus *bus, unsigned long type,
old_size = resource_size(b_res);
win_align = window_alignment(bus, b_res->flags);
- min_align = calculate_mem_align(aligns, max_order);
+ min_align = calculate_head_align(aligns, max_order);
min_align = max(min_align, win_align);
- size0 = calculate_memsize(size, min_size, 0, 0, old_size, min_align);
+ size0 = calculate_memsize(size, min_size, 0, 0, old_size, win_align);
if (size0) {
resource_set_range(b_res, min_align, size0);
b_res->flags &= ~IORESOURCE_DISABLED;
}
- if (bus->self && size0 &&
- !pbus_upstream_space_available(bus, b_res, size0, min_align)) {
- min_align = calculate_head_align(aligns2, max_order);
- size0 = calculate_memsize(size, min_size, 0, 0, old_size, win_align);
- resource_set_range(b_res, min_align, size0);
- pci_info(bus->self, "bridge window %pR to %pR requires relaxed alignment rules\n",
- b_res, &bus->busn_res);
- }
-
if (realloc_head && (add_size > 0 || children_add_size > 0)) {
add_align = max(min_align, add_align);
size1 = calculate_memsize(size, min_size, add_size, children_add_size,
- old_size, add_align);
-
- if (bus->self && size1 &&
- !pbus_upstream_space_available(bus, b_res, size1, add_align)) {
- min_align = calculate_head_align(aligns2, max_order);
- size1 = calculate_memsize(size, min_size, add_size, children_add_size,
- old_size, win_align);
- pci_info(bus->self,
- "bridge window %pR to %pR requires relaxed alignment rules\n",
- b_res, &bus->busn_res);
- }
+ old_size, win_align);
}
if (!size0 && !size1) {
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 4/4] resource: Increase MAX_IORES_LEVEL to 8
2025-11-28 11:50 [PATCH 0/4] PCI: Bridge window head alignment fix/rework Ilpo Järvinen
` (2 preceding siblings ...)
2025-11-28 11:50 ` [PATCH 3/4] PCI: Stop over-estimating bridge window size Ilpo Järvinen
@ 2025-11-28 11:50 ` Ilpo Järvinen
3 siblings, 0 replies; 5+ messages in thread
From: Ilpo Järvinen @ 2025-11-28 11:50 UTC (permalink / raw)
To: linux-pci, Bjorn Helgaas, Benjamin Herrenschmidt, Wei Yang,
Malte Schröder, linux-kernel
Cc: Ilpo Järvinen
While debugging a PCI resource allocation issue, the resources for many
nested bridges and endpoints got flattened in /proc/iomem by
MAX_IORES_LEVEL that is set to 5. This made the iomem output hard to
read as the visual hierarchy cues were lost.
Increase MAX_IORES_LEVEL to 8 to avoid flattening PCI topologies with
nested bridges so aggressively (the case in the Link has the deepest
resource at level 7 so 8 looks a reasonable limit).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220775
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
kernel/resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index b9fa2a4ce089..c5c907b3236d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -82,7 +82,7 @@ static struct resource *next_resource(struct resource *p, bool skip_children,
#ifdef CONFIG_PROC_FS
-enum { MAX_IORES_LEVEL = 5 };
+enum { MAX_IORES_LEVEL = 8 };
static void *r_start(struct seq_file *m, loff_t *pos)
__acquires(resource_lock)
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread