From: Bjorn Helgaas <helgaas@kernel.org>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
Kit Chow <kchow@gigaio.com>, Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources
Date: Mon, 17 Jun 2019 08:53:07 -0500 [thread overview]
Message-ID: <20190617135307.GA13533@google.com> (raw)
In-Reply-To: <20190531171216.20532-3-logang@deltatee.com>
On Fri, May 31, 2019 at 11:12:16AM -0600, Logan Gunthorpe wrote:
> One odd quirk of PLX switches is that their upstream bridge port has
> 256K of space allocated behind its BAR0 (most other bridge
> implementations do not report any BAR space).
Somewhat unusual, but completely legal, of course.
If a bridge has memory BARs, AFAIK it is impossible to enable a memory
window without also enabling the BARs, so if we want to use the bridge
at all, we *must* allocate space for its BARs, just like for any other
device.
> The lspci for such device
> looks like:
>
> 04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
> Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
> (prog-if 00 [Normal decode])
> Physical Slot: 1
> Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
> Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
> Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
> I/O behind bridge: 00002000-00003fff
> Memory behind bridge: 90000000-909fffff
> Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff
> Kernel driver in use: pcieport
>
> It's not clear what the purpose of the memory at 0x90a00000 is, and
> currently the kernel never actually uses it for anything. In most cases,
> it's safely ignored and does not cause a problem.
>
> However, when the kernel assigns the resource addresses (with the
> pci=realloc command line parameter, for example) it can inadvertently
> disable the struct resource corresponding to the bar. When this happens,
> lspci will report this memory as ignored:
>
> Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]
>
> This is because the kernel reports a zero start address and zero flags
> in the corresponding sysfs resource file and in /proc/bus/pci/devices.
> Investigation with 'lspci -x', however shows the bios-assigned address
> will still be programmed in the device's BAR registers.
Ugh, yep. It took me a while to trace through this, but "lspci -v" by
default shows the kernel view of addresses, i.e., the pdev->resource[]
values, which it gets from the sysfs "resource" file (resource_show())
or "/proc/bus/pci/devices" (show_device()).
But "lspci -x" shows the config space values (I think "lspci -bv"
should also show these) from the sysfs "config" file
(pci_read_config()).
It's definitely a kernel bug that we lost track of what's in config
space.
> In many cases, this still isn't a problem. Nothing uses the memory,
> so nothing is affected. However, a big problem shows up when an IOMMU
> is in use: the IOMMU will not reserve this space in the IOVA because the
> kernel no longer thinks the range is valid. (See
> dmar_init_reserved_ranges() for the Intel implementation of this.)
>
> Without the proper reserved range, we have a situation where a DMA
> mapping may occasionally allocate an IOVA which the PCI bus will actually
> route to a BAR in the PLX switch. This will result in some random DMA
> writes not actually writing to the RAM they are supposed to, or random
> DMA reads returning all FFs from the PLX BAR when it's supposed to have
> read from RAM.
>
> The problem is caused in pci_assign_unassigned_root_bus_resources().
> When any resource from a bridge device fails to get assigned, the code
> sets the resource's flags to zero. This makes sense for bridge resources,
> as they will be re-enabled later, but for regular BARs, it disables them
> permanently. To fix the problem, we only set the flags to zero for
> bridge resources and treat any other resources like non-bridge devices.
>
> Reported-by: Kit Chow <kchow@gigaio.com>
> Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> ---
> drivers/pci/setup-bus.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 0eb40924169b..7adbd4bedd16 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1784,11 +1784,16 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
> /* restore size and flags */
> list_for_each_entry(fail_res, &fail_head, list) {
> struct resource *res = fail_res->res;
> + int idx;
>
> res->start = fail_res->start;
> res->end = fail_res->end;
> res->flags = fail_res->flags;
> - if (fail_res->dev->subordinate)
> +
> + idx = res - &fail_res->dev->resource[0];
> + if (fail_res->dev->subordinate &&
> + idx >= PCI_BRIDGE_RESOURCES &&
> + idx <= PCI_BRIDGE_RESOURCE_END)
> res->flags = 0;
In my ideal world we wouldn't zap the flags of any resource. I think
we should derive the flags from the device's config space *once*
during enumeration and remember them for the life of the device.
This patch preserves res->flags for bridge BARs just like for any
other device, so I think this is definitely a step in the right
direction.
I'm not sure the "dev->subordinate" test is really correct, though.
I think the original intent of this code was to clear res->flags for
bridge windows under the assumptions that (a) we can identify bridges
by "dev->subordinate" being non-zero, and (b) bridges only have
windows and didn't have BARs.
This patch fixes assumption (b), but I think (a) is false, and we
should fix it as well. One can imagine a bridge device without a
subordinate bus (maybe we ran out of bus numbers), so I don't think we
should test dev->subordinate.
We could test something like pci_is_bridge(), although testing for idx
being in the PCI_BRIDGE_RESOURCES range should be sufficient because I
don't think we use those resource for anything other than windows.
> }
> free_list(&fail_head);
> --
> 2.20.1
>
next prev parent reply other threads:[~2019-06-17 13:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-31 17:12 [PATCH v3 0/2] Fix a pair of setup bus bugs Logan Gunthorpe
2019-05-31 17:12 ` [PATCH v3 1/2] PCI: Prevent 64-bit resources from being counted in 32-bit bridge region Logan Gunthorpe
2019-05-31 17:12 ` [PATCH v3 2/2] PCI: Fix disabling of bridge BARs when assigning bus resources Logan Gunthorpe
2019-06-17 13:53 ` Bjorn Helgaas [this message]
2019-06-17 17:32 ` Logan Gunthorpe
2019-06-18 5:27 ` Benjamin Herrenschmidt
2019-06-17 13:49 ` [PATCH v3 0/2] Fix a pair of setup bus bugs Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190617135307.GA13533@google.com \
--to=helgaas@kernel.org \
--cc=kchow@gigaio.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.