public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] PCI: Distribute resources for root buses
@ 2022-11-30 11:22 Mika Westerberg
  2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Mika Westerberg @ 2022-11-30 11:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J . Wysocki, Andy Shevchenko, Jonathan Cameron,
	Lukas Wunner, Chris Chiu, linux-pci, Mika Westerberg

Hi all,

This is third iteration of the patch series trying to solve the problem
reported by Chris Chiu [1]. In summary the current resource
distribution code does not cover the initial device enumeration so if we
find unconfigured bridges they get the bare minimum.

This one tries to be slightly more generic and deal with PCI devices in
addition to PCIe. I've tried this on a system with Maple Ridge
Thunderbolt controller (the same as in the orignal bug report), on QEMU
with similar PCI topology using following parameters:

	-device pcie-pci-bridge,id=br1					\
	-device e1000,bus=br1,addr=2					\
	-device pci-bridge,chassis_nr=1,bus=br1,shpc=off,id=br2,addr=3	\
	-device e1000,bus=br1,addr=4					\
	-device e1000,bus=br2

Then on a QEMU similar to what Jonathan used when he found out the
regression with multifunction devices:

	-device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
	-device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
	-device e1000,bus=root_port13,addr=0.1				\
	-device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
	-device e1000,bus=fun1

The previous versions of the series can be found:

v2: https://lore.kernel.org/linux-pci/20221114115953.40236-1-mika.westerberg@linux.intel.com/
v1: https://lore.kernel.org/linux-pci/20221103103254.30497-1-mika.westerberg@linux.intel.com/

Changes from v2:
  * Make both patches to work with PCI devices too (do not expect that
    the bridge is always first device on the bus).
  * Allow distribution with bridges that do not have all resource
    windows programmed (thereofore the pathch 2/2 is not revert anymore)
  * I did not add the tags from Rafael and Jonathan because the code is
    not exactly the same anymore so was not sure if they still apply.

Changes from v1:
  * Re-worded the commit message to hopefully explain the problem better
  * Added Link: to the bug report
  * Update the comment according to Bjorn's suggestion
  * Dropped the ->multifunction check
  * Use %#llx in log format.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=216000

Mika Westerberg (2):
  PCI: Take other bus devices into account when distributing resources
  PCI: Distribute available resources for root buses too

 drivers/pci/setup-bus.c | 122 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 117 insertions(+), 5 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
  2022-11-30 11:22 [PATCH v3 0/2] PCI: Distribute resources for root buses Mika Westerberg
@ 2022-11-30 11:22 ` Mika Westerberg
  2022-12-02 17:45   ` Jonathan Cameron
  2022-12-02 23:34   ` Bjorn Helgaas
  2022-11-30 11:22 ` [PATCH v3 2/2] PCI: Distribute available resources for root buses too Mika Westerberg
  2022-12-02 17:07 ` [PATCH v3 0/2] PCI: Distribute resources for root buses Jonathan Cameron
  2 siblings, 2 replies; 10+ messages in thread
From: Mika Westerberg @ 2022-11-30 11:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J . Wysocki, Andy Shevchenko, Jonathan Cameron,
	Lukas Wunner, Chris Chiu, linux-pci, Mika Westerberg

A PCI bridge may reside on a bus with other devices as well. The
resource distribution code does not take this into account properly and
therefore it expands the bridge resource windows too much, not leaving
space for the other devices (or functions a multifunction device) and
this leads to an issue that Jonathan reported. He runs QEMU with the
following topoology (QEMU parameters):

 -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
 -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
 -device e1000,bus=root_port13,addr=0.1 			\
 -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
 -device e1000,bus=fun1

The first e1000 NIC here is another function in the switch upstream
port. This leads to following errors:

  pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
  pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
  pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
  e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]

Fix this by taking into account the possible multifunction devices when
uptream port resources are distributed.

Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/setup-bus.c | 66 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b4096598dbcb..d456175ddc4f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1830,10 +1830,68 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 	 * bridges below.
 	 */
 	if (hotplug_bridges + normal_bridges == 1) {
-		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
-		if (dev->subordinate)
-			pci_bus_distribute_available_resources(dev->subordinate,
-				add_list, io, mmio, mmio_pref);
+		bridge = NULL;
+
+		/* Find the single bridge on this bus first */
+		for_each_pci_bridge(dev, bus) {
+			bridge = dev;
+			break;
+		}
+
+		if (WARN_ON_ONCE(!bridge))
+			return;
+		if (!bridge->subordinate)
+			return;
+
+		/*
+		 * Reduce the space available for distribution by the
+		 * amount required by the other devices on the same bus
+		 * as this bridge.
+		 */
+		list_for_each_entry(dev, &bus->devices, bus_list) {
+			int i;
+
+			if (dev == bridge)
+				continue;
+
+			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+				const struct resource *dev_res = &dev->resource[i];
+				resource_size_t dev_sz;
+				struct resource *b_res;
+
+				if (dev_res->flags & IORESOURCE_IO) {
+					b_res = &io;
+				} else if (dev_res->flags & IORESOURCE_MEM) {
+					if (dev_res->flags & IORESOURCE_PREFETCH)
+						b_res = &mmio_pref;
+					else
+						b_res = &mmio;
+				} else {
+					continue;
+				}
+
+				/* Size aligned to bridge window */
+				align = pci_resource_alignment(bridge, b_res);
+				dev_sz = ALIGN(resource_size(dev_res), align);
+				if (!dev_sz)
+					continue;
+
+				pci_dbg(dev, "resource %pR aligned to %#llx\n",
+					dev_res, (unsigned long long)dev_sz);
+
+				if (dev_sz > resource_size(b_res))
+					memset(b_res, 0, sizeof(*b_res));
+				else
+					b_res->end -= dev_sz;
+
+				pci_dbg(bridge, "updated available resources to %pR\n",
+					b_res);
+			}
+		}
+
+		pci_bus_distribute_available_resources(bridge->subordinate,
+						       add_list, io, mmio,
+						       mmio_pref);
 		return;
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/2] PCI: Distribute available resources for root buses too
  2022-11-30 11:22 [PATCH v3 0/2] PCI: Distribute resources for root buses Mika Westerberg
  2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
@ 2022-11-30 11:22 ` Mika Westerberg
  2022-12-02 18:01   ` Jonathan Cameron
  2022-12-02 17:07 ` [PATCH v3 0/2] PCI: Distribute resources for root buses Jonathan Cameron
  2 siblings, 1 reply; 10+ messages in thread
From: Mika Westerberg @ 2022-11-30 11:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J . Wysocki, Andy Shevchenko, Jonathan Cameron,
	Lukas Wunner, Chris Chiu, linux-pci, Mika Westerberg

Previously we distributed spare resources only upon hot-add, so if the
initial root bus scan found devices that had not been fully configured by
the BIOS, we allocated only enough resources to cover what was then
present. If some of those devices were hotplug bridges, we did not leave
any additional resource space for future expansion.

Distribute the available resources for root buses, too, to make this work
the same way as the normal hotplug case.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216000
Link: https://lore.kernel.org/r/20220905080232.36087-5-mika.westerberg@linux.intel.com
Reported-by: Chris Chiu <chris.chiu@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
This is a new version of the patch after the revert due to the regression
reported by Jonathan Cameron. This one changes pci_bridge_resources_not_assigned()
to work with bridges that do not have all the resource windows
programmed by the boot firmware (previously we expected all I/O, memory
and prefetchable memory were all programmed).

 drivers/pci/setup-bus.c | 56 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index d456175ddc4f..143ec80cc0b2 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1768,7 +1768,10 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
 	}
 
 	res->end = res->start + new_size - 1;
-	remove_from_list(add_list, res);
+
+	/* If the resource is part of the add_list remove it now */
+	if (add_list)
+		remove_from_list(add_list, res);
 }
 
 static void pci_bus_distribute_available_resources(struct pci_bus *bus,
@@ -1981,6 +1984,8 @@ static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
 	if (!bridge->is_hotplug_bridge)
 		return;
 
+	pci_dbg(bridge, "distributing available resources\n");
+
 	/* Take the initial extra resources from the hotplug port */
 	available_io = bridge->resource[PCI_BRIDGE_IO_WINDOW];
 	available_mmio = bridge->resource[PCI_BRIDGE_MEM_WINDOW];
@@ -1992,6 +1997,53 @@ static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
 					       available_mmio_pref);
 }
 
+static bool pci_bridge_resources_not_assigned(struct pci_dev *dev)
+{
+	const struct resource *r;
+
+	/*
+	 * Check the child device's resources and if they are not yet
+	 * assigned it means we are configuring them (not the boot
+	 * firmware) so we should be able to extend the upstream
+	 * bridge's (that's the hotplug downstream PCIe port) resources
+	 * in the same way we do with the normal hotplug case.
+	 */
+	r = &dev->resource[PCI_BRIDGE_IO_WINDOW];
+	if (r->flags && !(r->flags & IORESOURCE_STARTALIGN))
+		return false;
+	r = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
+	if (r->flags && !(r->flags & IORESOURCE_STARTALIGN))
+		return false;
+	r = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
+	if (r->flags && !(r->flags & IORESOURCE_STARTALIGN))
+		return false;
+
+	return true;
+}
+
+static void pci_root_bus_distribute_available_resources(struct pci_bus *bus,
+							struct list_head *add_list)
+{
+	struct pci_dev *dev, *bridge = bus->self;
+
+	for_each_pci_bridge(dev, bus) {
+		struct pci_bus *b;
+
+		b = dev->subordinate;
+		if (!b)
+			continue;
+
+		/*
+		 * Need to check "bridge" here too because it is NULL
+		 * in case of root bus.
+		 */
+		if (bridge && pci_bridge_resources_not_assigned(dev))
+			pci_bridge_distribute_available_resources(bridge, add_list);
+		else
+			pci_root_bus_distribute_available_resources(b, add_list);
+	}
+}
+
 /*
  * First try will not touch PCI bridge res.
  * Second and later try will clear small leaf bridge res.
@@ -2031,6 +2083,8 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	 */
 	__pci_bus_size_bridges(bus, add_list);
 
+	pci_root_bus_distribute_available_resources(bus, add_list);
+
 	/* Depth last, allocate resources and update the hardware. */
 	__pci_bus_assign_resources(bus, add_list, &fail_head);
 	if (add_list)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/2] PCI: Distribute resources for root buses
  2022-11-30 11:22 [PATCH v3 0/2] PCI: Distribute resources for root buses Mika Westerberg
  2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
  2022-11-30 11:22 ` [PATCH v3 2/2] PCI: Distribute available resources for root buses too Mika Westerberg
@ 2022-12-02 17:07 ` Jonathan Cameron
  2 siblings, 0 replies; 10+ messages in thread
From: Jonathan Cameron @ 2022-12-02 17:07 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko, Lukas Wunner,
	Chris Chiu, linux-pci

On Wed, 30 Nov 2022 13:22:19 +0200
Mika Westerberg <mika.westerberg@linux.intel.com> wrote:

> Hi all,
> 
> This is third iteration of the patch series trying to solve the problem
> reported by Chris Chiu [1]. In summary the current resource
> distribution code does not cover the initial device enumeration so if we
> find unconfigured bridges they get the bare minimum.
> 
> This one tries to be slightly more generic and deal with PCI devices in
> addition to PCIe. I've tried this on a system with Maple Ridge
> Thunderbolt controller (the same as in the orignal bug report), on QEMU
> with similar PCI topology using following parameters:
> 
> 	-device pcie-pci-bridge,id=br1					\
> 	-device e1000,bus=br1,addr=2					\
> 	-device pci-bridge,chassis_nr=1,bus=br1,shpc=off,id=br2,addr=3	\
> 	-device e1000,bus=br1,addr=4					\
> 	-device e1000,bus=br2
> 
> Then on a QEMU similar to what Jonathan used when he found out the
> regression with multifunction devices:
> 
> 	-device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
> 	-device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
> 	-device e1000,bus=root_port13,addr=0.1				\
> 	-device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
> 	-device e1000,bus=fun1
> 
> The previous versions of the series can be found:
> 
> v2: https://lore.kernel.org/linux-pci/20221114115953.40236-1-mika.westerberg@linux.intel.com/
> v1: https://lore.kernel.org/linux-pci/20221103103254.30497-1-mika.westerberg@linux.intel.com/
> 
> Changes from v2:
>   * Make both patches to work with PCI devices too (do not expect that
>     the bridge is always first device on the bus).
>   * Allow distribution with bridges that do not have all resource
>     windows programmed (thereofore the pathch 2/2 is not revert anymore)

patch

>   * I did not add the tags from Rafael and Jonathan because the code is
>     not exactly the same anymore so was not sure if they still apply.

Fair enough - guess it's time for another look.

> 
> Changes from v1:
>   * Re-worded the commit message to hopefully explain the problem better
>   * Added Link: to the bug report
>   * Update the comment according to Bjorn's suggestion
>   * Dropped the ->multifunction check
>   * Use %#llx in log format.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=216000
> 
> Mika Westerberg (2):
>   PCI: Take other bus devices into account when distributing resources
>   PCI: Distribute available resources for root buses too
> 
>  drivers/pci/setup-bus.c | 122 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 117 insertions(+), 5 deletions(-)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
  2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
@ 2022-12-02 17:45   ` Jonathan Cameron
  2022-12-02 23:35     ` Bjorn Helgaas
  2022-12-02 23:34   ` Bjorn Helgaas
  1 sibling, 1 reply; 10+ messages in thread
From: Jonathan Cameron @ 2022-12-02 17:45 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko, Lukas Wunner,
	Chris Chiu, linux-pci

On Wed, 30 Nov 2022 13:22:20 +0200
Mika Westerberg <mika.westerberg@linux.intel.com> wrote:

> A PCI bridge may reside on a bus with other devices as well. The
> resource distribution code does not take this into account properly and
> therefore it expands the bridge resource windows too much, not leaving
> space for the other devices (or functions a multifunction device) and
> this leads to an issue that Jonathan reported. He runs QEMU with the
> following topoology (QEMU parameters):
> 
>  -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
>  -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
>  -device e1000,bus=root_port13,addr=0.1 			\
>  -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
>  -device e1000,bus=fun1
> 
> The first e1000 NIC here is another function in the switch upstream
> port. This leads to following errors:
> 
>   pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
>   pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
>   pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
>   e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> 
> Fix this by taking into account the possible multifunction devices when
> uptream port resources are distributed.
> 
> Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Trivial comment inline. Either way..

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/pci/setup-bus.c | 66 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index b4096598dbcb..d456175ddc4f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1830,10 +1830,68 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	 * bridges below.
>  	 */
>  	if (hotplug_bridges + normal_bridges == 1) {
> -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> -		if (dev->subordinate)
> -			pci_bus_distribute_available_resources(dev->subordinate,
> -				add_list, io, mmio, mmio_pref);
> +		bridge = NULL;
> +
> +		/* Find the single bridge on this bus first */

> +		for_each_pci_bridge(dev, bus) {

We could cache this a few lines up where we calculate the
number of bridges. Perhaps not worth bothering though other
than it letting you get rid of the WARN_ON_ONCE. 


> +			bridge = dev;
> +			break;
> +		}
> +
> +		if (WARN_ON_ONCE(!bridge))
> +			return;
> +		if (!bridge->subordinate)
> +			return;
> +
> +		/*
> +		 * Reduce the space available for distribution by the
> +		 * amount required by the other devices on the same bus
> +		 * as this bridge.
> +		 */
> +		list_for_each_entry(dev, &bus->devices, bus_list) {
> +			int i;
> +
> +			if (dev == bridge)
> +				continue;
> +
> +			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +				const struct resource *dev_res = &dev->resource[i];
> +				resource_size_t dev_sz;
> +				struct resource *b_res;
> +
> +				if (dev_res->flags & IORESOURCE_IO) {
> +					b_res = &io;
> +				} else if (dev_res->flags & IORESOURCE_MEM) {
> +					if (dev_res->flags & IORESOURCE_PREFETCH)
> +						b_res = &mmio_pref;
> +					else
> +						b_res = &mmio;
> +				} else {
> +					continue;
> +				}
> +
> +				/* Size aligned to bridge window */
> +				align = pci_resource_alignment(bridge, b_res);
> +				dev_sz = ALIGN(resource_size(dev_res), align);
> +				if (!dev_sz)
> +					continue;
> +
> +				pci_dbg(dev, "resource %pR aligned to %#llx\n",
> +					dev_res, (unsigned long long)dev_sz);
> +
> +				if (dev_sz > resource_size(b_res))
> +					memset(b_res, 0, sizeof(*b_res));
> +				else
> +					b_res->end -= dev_sz;
> +
> +				pci_dbg(bridge, "updated available resources to %pR\n",
> +					b_res);
> +			}
> +		}
> +
> +		pci_bus_distribute_available_resources(bridge->subordinate,
> +						       add_list, io, mmio,
> +						       mmio_pref);
>  		return;
>  	}
>  


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/2] PCI: Distribute available resources for root buses too
  2022-11-30 11:22 ` [PATCH v3 2/2] PCI: Distribute available resources for root buses too Mika Westerberg
@ 2022-12-02 18:01   ` Jonathan Cameron
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Cameron @ 2022-12-02 18:01 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko, Lukas Wunner,
	Chris Chiu, linux-pci

On Wed, 30 Nov 2022 13:22:21 +0200
Mika Westerberg <mika.westerberg@linux.intel.com> wrote:

> Previously we distributed spare resources only upon hot-add, so if the
> initial root bus scan found devices that had not been fully configured by
> the BIOS, we allocated only enough resources to cover what was then
> present. If some of those devices were hotplug bridges, we did not leave
> any additional resource space for future expansion.
> 
> Distribute the available resources for root buses, too, to make this work
> the same way as the normal hotplug case.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216000
> Link: https://lore.kernel.org/r/20220905080232.36087-5-mika.westerberg@linux.intel.com
> Reported-by: Chris Chiu <chris.chiu@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
> This is a new version of the patch after the revert due to the regression
> reported by Jonathan Cameron. This one changes pci_bridge_resources_not_assigned()
> to work with bridges that do not have all the resource windows
> programmed by the boot firmware (previously we expected all I/O, memory
> and prefetchable memory were all programmed).
> 

Whilst this sounds plausible my understanding of how those flags are used
is very minimal so I'll leave this one for others to review who hopefully
already know how that works!


>  drivers/pci/setup-bus.c | 56 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index d456175ddc4f..143ec80cc0b2 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1768,7 +1768,10 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
>  	}
>  
>  	res->end = res->start + new_size - 1;
> -	remove_from_list(add_list, res);
> +
> +	/* If the resource is part of the add_list remove it now */
> +	if (add_list)
> +		remove_from_list(add_list, res);
>  }
>  
>  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> @@ -1981,6 +1984,8 @@ static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
>  	if (!bridge->is_hotplug_bridge)
>  		return;
>  
> +	pci_dbg(bridge, "distributing available resources\n");
> +
>  	/* Take the initial extra resources from the hotplug port */
>  	available_io = bridge->resource[PCI_BRIDGE_IO_WINDOW];
>  	available_mmio = bridge->resource[PCI_BRIDGE_MEM_WINDOW];
> @@ -1992,6 +1997,53 @@ static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
>  					       available_mmio_pref);
>  }
>  
> +static bool pci_bridge_resources_not_assigned(struct pci_dev *dev)
> +{
> +	const struct resource *r;
> +
> +	/*
> +	 * Check the child device's resources and if they are not yet
> +	 * assigned it means we are configuring them (not the boot
> +	 * firmware) so we should be able to extend the upstream
> +	 * bridge's (that's the hotplug downstream PCIe port) resources
> +	 * in the same way we do with the normal hotplug case.
> +	 */
> +	r = &dev->resource[PCI_BRIDGE_IO_WINDOW];
> +	if (r->flags && !(r->flags & IORESOURCE_STARTALIGN))
> +		return false;
> +	r = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
> +	if (r->flags && !(r->flags & IORESOURCE_STARTALIGN))
> +		return false;
> +	r = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
> +	if (r->flags && !(r->flags & IORESOURCE_STARTALIGN))
> +		return false;
> +
> +	return true;
> +}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
  2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
  2022-12-02 17:45   ` Jonathan Cameron
@ 2022-12-02 23:34   ` Bjorn Helgaas
  2022-12-05  7:28     ` Mika Westerberg
  1 sibling, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2022-12-02 23:34 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko,
	Jonathan Cameron, Lukas Wunner, Chris Chiu, linux-pci

Hi Mika,

On Wed, Nov 30, 2022 at 01:22:20PM +0200, Mika Westerberg wrote:
> A PCI bridge may reside on a bus with other devices as well. The
> resource distribution code does not take this into account properly and
> therefore it expands the bridge resource windows too much, not leaving
> space for the other devices (or functions a multifunction device) and

functions *of* a 

> this leads to an issue that Jonathan reported. He runs QEMU with the
> following topoology (QEMU parameters):

topology

>  -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
>  -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
>  -device e1000,bus=root_port13,addr=0.1 			\
>  -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
>  -device e1000,bus=fun1

If you use spaces instead of tabs above, the "\" will stay lined up
when git log indents.

> The first e1000 NIC here is another function in the switch upstream
> port. This leads to following errors:
> 
>   pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
>   pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
>   pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
>   e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> 
> Fix this by taking into account the possible multifunction devices when
> uptream port resources are distributed.

"upstream", although I think I would word this so it's less
PCIe-centric.  IIUC, we just want to account for all the BARs on the
bus, whether they're in bridges, peers in a multi-function device, or
other devices.

> Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/setup-bus.c | 66 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index b4096598dbcb..d456175ddc4f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1830,10 +1830,68 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	 * bridges below.
>  	 */
>  	if (hotplug_bridges + normal_bridges == 1) {
> -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> -		if (dev->subordinate)
> -			pci_bus_distribute_available_resources(dev->subordinate,
> -				add_list, io, mmio, mmio_pref);
> +		bridge = NULL;
> +
> +		/* Find the single bridge on this bus first */
> +		for_each_pci_bridge(dev, bus) {
> +			bridge = dev;
> +			break;
> +		}

If we just remember "bridge" in the loop before this hunk, could we
get rid of the loop here?  E.g.,

  bridge = NULL;
  for_each_pci_bridge(dev, bus) {
    bridge = dev;
    if (dev->is_hotplug_bridge)
      hotplug_bridges++;
    else
      normal_bridges++;
  }

> +
> +		if (WARN_ON_ONCE(!bridge))
> +			return;

Then I think this would be superfluous.

> +		if (!bridge->subordinate)
> +			return;
> +
> +		/*
> +		 * Reduce the space available for distribution by the
> +		 * amount required by the other devices on the same bus
> +		 * as this bridge.
> +		 */
> +		list_for_each_entry(dev, &bus->devices, bus_list) {
> +			int i;
> +
> +			if (dev == bridge)
> +				continue;

Why do we skip "bridge"?  Bridges are allowed to have two BARs
themselves, and it seems like they should be included here.

> +			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +				const struct resource *dev_res = &dev->resource[i];
> +				resource_size_t dev_sz;
> +				struct resource *b_res;
> +
> +				if (dev_res->flags & IORESOURCE_IO) {
> +					b_res = &io;
> +				} else if (dev_res->flags & IORESOURCE_MEM) {
> +					if (dev_res->flags & IORESOURCE_PREFETCH)
> +						b_res = &mmio_pref;
> +					else
> +						b_res = &mmio;
> +				} else {
> +					continue;
> +				}
> +
> +				/* Size aligned to bridge window */
> +				align = pci_resource_alignment(bridge, b_res);
> +				dev_sz = ALIGN(resource_size(dev_res), align);
> +				if (!dev_sz)
> +					continue;
> +
> +				pci_dbg(dev, "resource %pR aligned to %#llx\n",
> +					dev_res, (unsigned long long)dev_sz);
> +
> +				if (dev_sz > resource_size(b_res))
> +					memset(b_res, 0, sizeof(*b_res));
> +				else
> +					b_res->end -= dev_sz;
> +
> +				pci_dbg(bridge, "updated available resources to %pR\n",
> +					b_res);
> +			}
> +		}

This only happens for buses with a single bridge.  Shouldn't it happen
regardless of how many bridges there are?

This block feels like something that could be split out to a separate
function.  It looks like it only needs "bus", "io", "mmio",
"mmio_pref", and maybe "bridge".

I don't understand the "bridge" part; it looks like that's basically
to use 4K alignment for I/O windows and 1M for memory windows?
Using "bridge" seems like a clunky way to figure that out.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
  2022-12-02 17:45   ` Jonathan Cameron
@ 2022-12-02 23:35     ` Bjorn Helgaas
  0 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2022-12-02 23:35 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mika Westerberg, Bjorn Helgaas, Rafael J . Wysocki,
	Andy Shevchenko, Lukas Wunner, Chris Chiu, linux-pci

On Fri, Dec 02, 2022 at 05:45:13PM +0000, Jonathan Cameron wrote:
> On Wed, 30 Nov 2022 13:22:20 +0200
> Mika Westerberg <mika.westerberg@linux.intel.com> wrote:

> >  	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate)
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, io, mmio, mmio_pref);
> > +		bridge = NULL;
> > +
> > +		/* Find the single bridge on this bus first */
> 
> > +		for_each_pci_bridge(dev, bus) {
> 
> We could cache this a few lines up where we calculate the
> number of bridges. Perhaps not worth bothering though other
> than it letting you get rid of the WARN_ON_ONCE. 

Sorry for repeating this; I saw your response, but it didn't sink in
before I responded.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
  2022-12-02 23:34   ` Bjorn Helgaas
@ 2022-12-05  7:28     ` Mika Westerberg
  2022-12-05 22:46       ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Mika Westerberg @ 2022-12-05  7:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko,
	Jonathan Cameron, Lukas Wunner, Chris Chiu, linux-pci

Hi,

On Fri, Dec 02, 2022 at 05:34:24PM -0600, Bjorn Helgaas wrote:
> Hi Mika,
> 
> On Wed, Nov 30, 2022 at 01:22:20PM +0200, Mika Westerberg wrote:
> > A PCI bridge may reside on a bus with other devices as well. The
> > resource distribution code does not take this into account properly and
> > therefore it expands the bridge resource windows too much, not leaving
> > space for the other devices (or functions a multifunction device) and
> 
> functions *of* a 
> 
> > this leads to an issue that Jonathan reported. He runs QEMU with the
> > following topoology (QEMU parameters):
> 
> topology
> 
> >  -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
> >  -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
> >  -device e1000,bus=root_port13,addr=0.1 			\
> >  -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
> >  -device e1000,bus=fun1
> 
> If you use spaces instead of tabs above, the "\" will stay lined up
> when git log indents.

Sure.

> > The first e1000 NIC here is another function in the switch upstream
> > port. This leads to following errors:
> > 
> >   pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
> >   pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
> >   pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
> >   e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> > 
> > Fix this by taking into account the possible multifunction devices when
> > uptream port resources are distributed.
> 
> "upstream", although I think I would word this so it's less
> PCIe-centric.  IIUC, we just want to account for all the BARs on the
> bus, whether they're in bridges, peers in a multi-function device, or
> other devices.

Okay.

> > Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> > Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> >  drivers/pci/setup-bus.c | 66 ++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 62 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index b4096598dbcb..d456175ddc4f 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1830,10 +1830,68 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  	 * bridges below.
> >  	 */
> >  	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate)
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, io, mmio, mmio_pref);
> > +		bridge = NULL;
> > +
> > +		/* Find the single bridge on this bus first */
> > +		for_each_pci_bridge(dev, bus) {
> > +			bridge = dev;
> > +			break;
> > +		}
> 
> If we just remember "bridge" in the loop before this hunk, could we
> get rid of the loop here?  E.g.,
> 
>   bridge = NULL;
>   for_each_pci_bridge(dev, bus) {
>     bridge = dev;
>     if (dev->is_hotplug_bridge)
>       hotplug_bridges++;
>     else
>       normal_bridges++;
>   }

Yes, I think that would work too.

> > +
> > +		if (WARN_ON_ONCE(!bridge))
> > +			return;
> 
> Then I think this would be superfluous.
> 
> > +		if (!bridge->subordinate)
> > +			return;
> > +
> > +		/*
> > +		 * Reduce the space available for distribution by the
> > +		 * amount required by the other devices on the same bus
> > +		 * as this bridge.
> > +		 */
> > +		list_for_each_entry(dev, &bus->devices, bus_list) {
> > +			int i;
> > +
> > +			if (dev == bridge)
> > +				continue;
> 
> Why do we skip "bridge"?  Bridges are allowed to have two BARs
> themselves, and it seems like they should be included here.

Good point but then we would need to skip below the bridge window
resources to avoid accounting them.

> > +			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> > +				const struct resource *dev_res = &dev->resource[i];
> > +				resource_size_t dev_sz;
> > +				struct resource *b_res;
> > +
> > +				if (dev_res->flags & IORESOURCE_IO) {
> > +					b_res = &io;
> > +				} else if (dev_res->flags & IORESOURCE_MEM) {
> > +					if (dev_res->flags & IORESOURCE_PREFETCH)
> > +						b_res = &mmio_pref;
> > +					else
> > +						b_res = &mmio;
> > +				} else {
> > +					continue;
> > +				}
> > +
> > +				/* Size aligned to bridge window */
> > +				align = pci_resource_alignment(bridge, b_res);
> > +				dev_sz = ALIGN(resource_size(dev_res), align);
> > +				if (!dev_sz)
> > +					continue;
> > +
> > +				pci_dbg(dev, "resource %pR aligned to %#llx\n",
> > +					dev_res, (unsigned long long)dev_sz);
> > +
> > +				if (dev_sz > resource_size(b_res))
> > +					memset(b_res, 0, sizeof(*b_res));
> > +				else
> > +					b_res->end -= dev_sz;
> > +
> > +				pci_dbg(bridge, "updated available resources to %pR\n",
> > +					b_res);
> > +			}
> > +		}
> 
> This only happens for buses with a single bridge.  Shouldn't it happen
> regardless of how many bridges there are?

This branch specifically deals with the "upstream port" so it gives all
the spare resources to that upstream port. The whole resource
distribution is actually done to accommondate Thunderbolt/USB4
topologies which involve only PCIe devices so we always have PCIe
upstream port and downstream ports which some of them are able to
perform native PCIe hotplug. And for those ports we want to distribute
the available resources so that they can expand to further topologies.

I'm slightly concerned that forcing this to support the "generic" PCI
case makes this rather complicated. This is something that never appears
in the regular PCI based systems because we never distribute resources
for those in the first place (->is_hotplug_bridge needs to be set).

> This block feels like something that could be split out to a separate
> function.  It looks like it only needs "bus", "io", "mmio",
> "mmio_pref", and maybe "bridge".

Makes sense.

> I don't understand the "bridge" part; it looks like that's basically
> to use 4K alignment for I/O windows and 1M for memory windows?
> Using "bridge" seems like a clunky way to figure that out.

Okay, but if not using "bridge", how exactly you suggest to doing the
calculation?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
  2022-12-05  7:28     ` Mika Westerberg
@ 2022-12-05 22:46       ` Bjorn Helgaas
  0 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2022-12-05 22:46 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko,
	Jonathan Cameron, Lukas Wunner, Chris Chiu, linux-pci

On Mon, Dec 05, 2022 at 09:28:30AM +0200, Mika Westerberg wrote:
> On Fri, Dec 02, 2022 at 05:34:24PM -0600, Bjorn Helgaas wrote:
> > On Wed, Nov 30, 2022 at 01:22:20PM +0200, Mika Westerberg wrote:
> > > A PCI bridge may reside on a bus with other devices as well. The
> > > resource distribution code does not take this into account properly and
> > > therefore it expands the bridge resource windows too much, not leaving
> > > space for the other devices (or functions a multifunction device) and

> > > +		 * Reduce the space available for distribution by the
> > > +		 * amount required by the other devices on the same bus
> > > +		 * as this bridge.
> > > +		 */
> > > +		list_for_each_entry(dev, &bus->devices, bus_list) {
> > > +			int i;
> > > +
> > > +			if (dev == bridge)
> > > +				continue;
> > 
> > Why do we skip "bridge"?  Bridges are allowed to have two BARs
> > themselves, and it seems like they should be included here.
> 
> Good point but then we would need to skip below the bridge window
> resources to avoid accounting them.

Seems like we should handle bridge BARs.  There are definitely bridges
(PCIe for sure, I dunno about conventional PCI) that implement them
and some drivers starting to appear that use them for performance
monitoring, etc.

> > This only happens for buses with a single bridge.  Shouldn't it happen
> > regardless of how many bridges there are?
> 
> This branch specifically deals with the "upstream port" so it gives all
> the spare resources to that upstream port. The whole resource
> distribution is actually done to accommondate Thunderbolt/USB4
> topologies which involve only PCIe devices so we always have PCIe
> upstream port and downstream ports which some of them are able to
> perform native PCIe hotplug. And for those ports we want to distribute
> the available resources so that they can expand to further topologies.
> 
> I'm slightly concerned that forcing this to support the "generic" PCI
> case makes this rather complicated. This is something that never appears
> in the regular PCI based systems because we never distribute resources
> for those in the first place (->is_hotplug_bridge needs to be set).

This code is fairly complicated in any case :)

I understand why this is useful for Thunderbolt topologies, but it
should be equally useful for other hotplug topologies because at this
level we're purely talking about the address space needed by devices
and how that space is assigned and routed through bridges.  Nothing
unique to Thunderbolt here.

I don't think we should make this PCIe-specific.  ->is_hotplug_bridge
is set by a PCIe path (set_pcie_hotplug_bridge()), but also by
check_hotplug_bridge() in acpiphp, which could be any flavor of PCI,
and I don't think there's anything intrinsically PCIe-specific about
it.

> > I don't understand the "bridge" part; it looks like that's basically
> > to use 4K alignment for I/O windows and 1M for memory windows?
> > Using "bridge" seems like a clunky way to figure that out.
> 
> Okay, but if not using "bridge", how exactly you suggest to doing the
> calculation?

I was thinking it would always be 4K or 1M, but I guess that's
actually not true.  There are some Intel bridges that support 1K
alignment for I/O windows, and some powerpc hypervisor stuff that can
also influence the alignment.  And it looks like we still need to
figure out which b_res to use, so we couldn't get rid of the IO/MEM
case analysis.  So never mind, I guess ...

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-05 22:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-30 11:22 [PATCH v3 0/2] PCI: Distribute resources for root buses Mika Westerberg
2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
2022-12-02 17:45   ` Jonathan Cameron
2022-12-02 23:35     ` Bjorn Helgaas
2022-12-02 23:34   ` Bjorn Helgaas
2022-12-05  7:28     ` Mika Westerberg
2022-12-05 22:46       ` Bjorn Helgaas
2022-11-30 11:22 ` [PATCH v3 2/2] PCI: Distribute available resources for root buses too Mika Westerberg
2022-12-02 18:01   ` Jonathan Cameron
2022-12-02 17:07 ` [PATCH v3 0/2] PCI: Distribute resources for root buses Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox