All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Alexander Motin <mav@ixsystems.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Lukas Wunner <lukas@wunner.de>,
	Chris Chiu <chris.chiu@canonical.com>,
	Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources
Date: Tue, 10 Jan 2023 11:54:28 +0200	[thread overview]
Message-ID: <Y7011Eg89E1cnoSU@black.fi.intel.com> (raw)
In-Reply-To: <7bdef8bb-7a5e-2b5d-35ba-56cefb38d91f@ixsystems.com>

Hi,

On Mon, Jan 09, 2023 at 02:33:35PM -0500, Alexander Motin wrote:
> On 04.01.2023 04:16, Mika Westerberg wrote:
> > A PCI bridge may reside on a bus with other devices as well. The
> > resource distribution code does not take this into account properly and
> > therefore it expands the bridge resource windows too much, not leaving
> > space for the other devices (or functions of a multifunction device) and
> > this leads to an issue that Jonathan reported. He runs QEMU with the
> > following topology (QEMU parameters):
> > 
> >   -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2  \
> >   -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \
> >   -device e1000,bus=root_port13,addr=0.1                         \
> >   -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3    \
> >   -device e1000,bus=fun1
> > 
> > The first e1000 NIC here is another function in the switch upstream
> > port. This leads to following errors:
> > 
> >    pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
> >    pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
> >    pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
> >    e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> > 
> > Fix this by taking into account the device BARs on the bus, including
> > the ones belonging to bridges themselves.
> > 
> > Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> > Link: https://lore.kernel.org/linux-pci/6053736d-1923-41e7-def9-7585ce1772d9@ixsystems.com/
> > Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reported-by: Alexander Motin <mav@ixsystems.com>
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> >   drivers/pci/setup-bus.c | 205 +++++++++++++++++++++++++---------------
> >   1 file changed, 129 insertions(+), 76 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index b4096598dbcb..cf6a7bdf2427 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1750,6 +1750,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >   				 resource_size_t new_size)
> >   {
> >   	resource_size_t add_size, size = resource_size(res);
> > +	resource_size_t min_size;
> >   	if (res->parent)
> >   		return;
> > @@ -1757,30 +1758,87 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >   	if (!new_size)
> >   		return;
> > +	/* Minimum granularity of a bridge bridge window */
> > +	min_size = window_alignment(bridge->bus, res->flags);
> > +
> >   	if (new_size > size) {
> >   		add_size = new_size - size;
> > +		if (add_size < min_size)
> > +			return;
> >   		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> >   			&add_size);
> >   	} else if (new_size < size) {
> >   		add_size = size - new_size;
> > +		if (add_size < min_size)
> > +			return;
> 
> May be I don't understand something, but in what situation it may happen,
> and won't it be a problem if you silently do nothing here, while the calling
> code will use the passed new_size as-is?

Good point.

I saw this happening when I run QEMU but now that I started to look at
it again the reason seems to be that there is a 32-bit prefetchable
option ROM BAR for the e1000 NIC and that gets reduced incorrectly by
the reduce_dev_resources() which causes the "imbalance".

> >   		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> >   			&add_size);
> > +	} else {
> > +		return;
> >   	}
> >   	res->end = res->start + new_size - 1;
> >   	remove_from_list(add_list, res);
> >   }
> > +static void reduce_dev_resources(struct pci_dev *dev, struct resource *io,
> > +				 struct resource *mmio,
> > +				 struct resource *mmio_pref)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> > +		struct resource *res = &dev->resource[i];
> > +		resource_size_t align, tmp, size;
> > +
> > +		size = resource_size(res);
> > +		if (!size)
> > +			continue;
> > +
> > +		align = pci_resource_alignment(dev, res);
> > +
> > +		if (resource_type(res) == IORESOURCE_IO) {
> > +			align = align ? ALIGN(io->start, align) - io->start : 0;
> > +			tmp = align + size;
> > +			io->start = min(io->start + tmp, io->end + 1);
> > +		} else if (resource_type(res) == IORESOURCE_MEM) {
> > +			if (res->flags & IORESOURCE_PREFETCH) {
> > +				align = align ? ALIGN(mmio_pref->start, align) -
> > +						mmio_pref->start : 0;
> > +				tmp = align + size;
> > +				mmio_pref->start = min(mmio_pref->start + tmp,
> > +						       mmio_pref->end + 1);
> > +			} else {
> > +				align = align ? ALIGN(mmio->start, align) -
> > +						mmio->start : 0;
> > +				tmp = align + size;
> > +				mmio->start = min(mmio->start + tmp,
> > +						  mmio->end + 1);
> > +			}
> > +		}
> > +	}
> > +}
> > +
> > +/*
> > + * io, mmio and mmio_pref contain the total amount of bridge window
> > + * space available. This includes the minimal space needed to cover all
> > + * the existing devices on the bus and the possible extra space that can
> > + * be shared with the bridges.
> > + *
> > + * The resource space consumed by bus->self (the bridge) is already
> > + * reduced.
> > + */
> >   static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >   					    struct list_head *add_list,
> >   					    struct resource io,
> >   					    struct resource mmio,
> >   					    struct resource mmio_pref)
> >   {
> > +	resource_size_t io_align, mmio_align, mmio_pref_align;
> > +	resource_size_t io_per_b, mmio_per_b, mmio_pref_per_b;
> >   	unsigned int normal_bridges = 0, hotplug_bridges = 0;
> >   	struct resource *io_res, *mmio_res, *mmio_pref_res;
> >   	struct pci_dev *dev, *bridge = bus->self;
> > -	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
> >   	io_res = &bridge->resource[PCI_BRIDGE_IO_WINDOW];
> >   	mmio_res = &bridge->resource[PCI_BRIDGE_MEM_WINDOW];
> > @@ -1790,17 +1848,17 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >   	 * The alignment of this bridge is yet to be considered, hence it must
> >   	 * be done now before extending its bridge window.
> >   	 */
> > -	align = pci_resource_alignment(bridge, io_res);
> > -	if (!io_res->parent && align)
> > -		io.start = min(ALIGN(io.start, align), io.end + 1);
> > +	io_align = pci_resource_alignment(bridge, io_res);
> > +	if (!io_res->parent && io_align)
> > +		io.start = min(ALIGN(io.start, io_align), io.end + 1);
> > -	align = pci_resource_alignment(bridge, mmio_res);
> > -	if (!mmio_res->parent && align)
> > -		mmio.start = min(ALIGN(mmio.start, align), mmio.end + 1);
> > +	mmio_align = pci_resource_alignment(bridge, mmio_res);
> > +	if (!mmio_res->parent && mmio_align)
> > +		mmio.start = min(ALIGN(mmio.start, mmio_align), mmio.end + 1);
> > -	align = pci_resource_alignment(bridge, mmio_pref_res);
> > -	if (!mmio_pref_res->parent && align)
> > -		mmio_pref.start = min(ALIGN(mmio_pref.start, align),
> > +	mmio_pref_align = pci_resource_alignment(bridge, mmio_pref_res);
> > +	if (!mmio_pref_res->parent && mmio_pref_align)
> > +		mmio_pref.start = min(ALIGN(mmio_pref.start, mmio_pref_align),
> >   			mmio_pref.end + 1);
> >   	/*
> > @@ -1824,94 +1882,89 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >   			normal_bridges++;
> >   	}
> > -	/*
> > -	 * There is only one bridge on the bus so it gets all available
> > -	 * resources which it can then distribute to the possible hotplug
> > -	 * bridges below.
> > -	 */
> > -	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate)
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, io, mmio, mmio_pref);
> > -		return;
> > -	}
> > -
> > -	if (hotplug_bridges == 0)
> > +	if (!(hotplug_bridges + normal_bridges))
> >   		return;
> >   	/*
> >   	 * Calculate the total amount of extra resource space we can
> > -	 * pass to bridges below this one.  This is basically the
> > -	 * extra space reduced by the minimal required space for the
> > -	 * non-hotplug bridges.
> > +	 * pass to bridges below this one. This is basically the extra
> > +	 * space reduced by the minimal required space for the bridge
> > +	 * windows and device BARs on this bus.
> >   	 */
> > -	for_each_pci_bridge(dev, bus) {
> > -		resource_size_t used_size;
> > -		struct resource *res;
> > -
> > -		if (dev->is_hotplug_bridge)
> > -			continue;
> > -
> > -		/*
> > -		 * Reduce the available resource space by what the
> > -		 * bridge and devices below it occupy.
> > -		 */
> > -		res = &dev->resource[PCI_BRIDGE_IO_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(io.start, align) - io.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			io.start = min(io.start + used_size, io.end + 1);
> > -
> > -		res = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			mmio.start = min(mmio.start + used_size, mmio.end + 1);
> > +	list_for_each_entry(dev, &bus->devices, bus_list)
> > +		reduce_dev_resources(dev, &io, &mmio, &mmio_pref);
> > -		res = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(mmio_pref.start, align) -
> > -			mmio_pref.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			mmio_pref.start = min(mmio_pref.start + used_size,
> > -				mmio_pref.end + 1);
> > +	/*
> > +	 * If there is at least one hotplug bridge on this bus it gets
> > +	 * all the extra resource space that was left after the
> > +	 * reductions above.
> > +	 *
> > +	 * If there are no hotplug bridges the extra resource space is
> > +	 * split between non-hotplug bridges. This is to allow possible
> > +	 * hotplug bridges below them to get the extra space as well.
> > +	 */
> > +	if (hotplug_bridges) {
> > +		io_per_b = div64_ul(resource_size(&io), hotplug_bridges);
> > +		mmio_per_b = div64_ul(resource_size(&mmio), hotplug_bridges);
> > +		mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> > +					   hotplug_bridges);
> > +	} else {
> > +		io_per_b = div64_ul(resource_size(&io), normal_bridges);
> > +		mmio_per_b = div64_ul(resource_size(&mmio), normal_bridges);
> > +		mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> > +					   normal_bridges);
> >   	}
> > -	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> > -	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> > -	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> > -		hotplug_bridges);
> > -
> >   	/*
> > -	 * Go over devices on this bus and distribute the remaining
> > -	 * resource space between hotplug bridges.
> > +	 * Make sure the split resource space is properly aligned for
> > +	 * bridge windows (align it down to avoid going above what is
> > +	 * available).
> >   	 */
> > +	if (io_align)
> > +		io_per_b = ALIGN_DOWN(io_per_b, io_align);
> > +	if (mmio_align)
> > +		mmio_per_b = ALIGN_DOWN(mmio_per_b, mmio_align);
> > +	if (mmio_pref_align)
> > +		mmio_pref_per_b = ALIGN_DOWN(mmio_pref_per_b, mmio_align);
> 
> If I understand it right, you are applying alignment requirements of the
> parent bridge to the windows of its children.  I don't have examples of any
> bridge with different alignment, but shouldn't we better get and use proper
> alignment inside the loop below?

Yes, I agree.

  reply	other threads:[~2023-01-10  9:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-04  9:16 [PATCH v4 0/2] PCI: distribute resources for root buses Mika Westerberg
2023-01-04  9:16 ` [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
2023-01-04 22:48   ` Bjorn Helgaas
2023-01-05  9:12     ` Mika Westerberg
2023-01-05 13:43       ` Mika Westerberg
2023-01-05 17:04         ` Bjorn Helgaas
2023-01-09 11:11           ` Mika Westerberg
2023-01-09 18:27             ` Alexander Motin
2023-01-10 10:07               ` Mika Westerberg
2023-01-09 19:33   ` Alexander Motin
2023-01-10  9:54     ` Mika Westerberg [this message]
2023-01-04  9:16 ` [PATCH v4 2/2] PCI: Distribute available resources for root buses too Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7011Eg89E1cnoSU@black.fi.intel.com \
    --to=mika.westerberg@linux.intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=chris.chiu@canonical.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mav@ixsystems.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=rafael.j.wysocki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.