* [PATCH 0/2] Bridge window discovery and setup fixes @ 2013-12-06 0:19 Bjorn Helgaas 2013-12-06 0:19 ` [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures Bjorn Helgaas 2013-12-06 0:19 ` [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture Bjorn Helgaas 0 siblings, 2 replies; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-06 0:19 UTC (permalink / raw) To: linux-pci; +Cc: Jason Gunthorpe These are small fixes the way we discover and program PCI-PCI bridge apertures. The first (preventing bus conflicts) was found by Jason Gunthorpe. We wrote 0xf0 to the I/O base/limit registers, which enables the aperture. For arches that emulate the bridge config space, this causes the emulation to open a window we didn't expect. The second keeps us from inadvertently clearing bits in the bridge Secondary Status register while programming the I/O aperture. We don't really use those status bits, but it doesn't seem like a good thing to clear them when we didn't intend to. --- Bjorn Helgaas (2): PCI: Prevent bus conflicts while checking for bridge apertures PCI: Stop clearing bridge Secondary Status when setting up I/O aperture drivers/pci/setup-bus.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures 2013-12-06 0:19 [PATCH 0/2] Bridge window discovery and setup fixes Bjorn Helgaas @ 2013-12-06 0:19 ` Bjorn Helgaas 2013-12-09 19:31 ` Jason Gunthorpe 2013-12-06 0:19 ` [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture Bjorn Helgaas 1 sibling, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-06 0:19 UTC (permalink / raw) To: linux-pci; +Cc: Jason Gunthorpe pci_bridge_check_ranges() determines whether the bridge supports an I/O aperture and a prefetchable memory aperture. Previously, if the I/O aperture was unsupported, disabled, or configured at [io 0x0000-0x0fff], we wrote 0xf0 to PCI_IO_BASE and PCI_IO_LIMIT, which, if the bridge supports it, enables the I/O aperture at [io 0xf000-0xffff]. The enabled aperture may conflict with other devices in the system. Similarly, we wrote 0xfff0 to PCI_PREF_MEMORY_BASE and PCI_PREF_MEMORY_LIMIT, which enables the prefetchable memory aperture at [mem 0xfff00000-0xffffffff], and that may also conflict with other devices. All we need to know is whether the base and limit registers are writable, so we can use values that leave the apertures disabled, e.g., PCI_IO_BASE = 0xf0, PCI_IO_LIMIT = 0xe0, PCI_PREF_MEMORY_BASE = 0xfff0, PCI_PREF_MEMORY_LIMIT = 0xffe0. Writing non-zero values to both the base and limit registers means we detect whether either or both are writable, as we did before. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Based-on-patch-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/setup-bus.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 219a4106480a..80350299a6ea 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -665,21 +665,23 @@ static void pci_bridge_check_ranges(struct pci_bus *bus) pci_read_config_word(bridge, PCI_IO_BASE, &io); if (!io) { - pci_write_config_word(bridge, PCI_IO_BASE, 0xf0f0); + pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0); pci_read_config_word(bridge, PCI_IO_BASE, &io); pci_write_config_word(bridge, PCI_IO_BASE, 0x0); } if (io) b_res[0].flags |= IORESOURCE_IO; + /* DECchip 21050 pass 2 errata: the bridge may miss an address disconnect boundary by one PCI data phase. Workaround: do not use prefetching on this device. */ if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device == 0x0001) return; + pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem); if (!pmem) { pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, - 0xfff0fff0); + 0xffe0fff0); pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem); pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, 0x0); } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures 2013-12-06 0:19 ` [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures Bjorn Helgaas @ 2013-12-09 19:31 ` Jason Gunthorpe 2013-12-09 20:00 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2013-12-09 19:31 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci On Thu, Dec 05, 2013 at 05:19:47PM -0700, Bjorn Helgaas wrote: > pci_bridge_check_ranges() determines whether the bridge supports an I/O > aperture and a prefetchable memory aperture. > > Previously, if the I/O aperture was unsupported, disabled, or configured at > [io 0x0000-0x0fff], we wrote 0xf0 to PCI_IO_BASE and PCI_IO_LIMIT, which, > if the bridge supports it, enables the I/O aperture at [io 0xf000-0xffff]. > The enabled aperture may conflict with other devices in the system. > > Similarly, we wrote 0xfff0 to PCI_PREF_MEMORY_BASE and > PCI_PREF_MEMORY_LIMIT, which enables the prefetchable memory aperture at > [mem 0xfff00000-0xffffffff], and that may also conflict with other devices. > > All we need to know is whether the base and limit registers are writable, > so we can use values that leave the apertures disabled, e.g., PCI_IO_BASE = > 0xf0, PCI_IO_LIMIT = 0xe0, PCI_PREF_MEMORY_BASE = 0xfff0, > PCI_PREF_MEMORY_LIMIT = 0xffe0. > > Writing non-zero values to both the base and limit registers means we > detect whether either or both are writable, as we did before. Looks sane to me. The only refinement would be to detect if IO is enabled and use that as a first check. Cheers, Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures 2013-12-09 19:31 ` Jason Gunthorpe @ 2013-12-09 20:00 ` Bjorn Helgaas 2013-12-09 20:15 ` Jason Gunthorpe 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-09 20:00 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: linux-pci@vger.kernel.org On Mon, Dec 9, 2013 at 12:31 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Thu, Dec 05, 2013 at 05:19:47PM -0700, Bjorn Helgaas wrote: >> pci_bridge_check_ranges() determines whether the bridge supports an I/O >> aperture and a prefetchable memory aperture. >> >> Previously, if the I/O aperture was unsupported, disabled, or configured at >> [io 0x0000-0x0fff], we wrote 0xf0 to PCI_IO_BASE and PCI_IO_LIMIT, which, >> if the bridge supports it, enables the I/O aperture at [io 0xf000-0xffff]. >> The enabled aperture may conflict with other devices in the system. >> >> Similarly, we wrote 0xfff0 to PCI_PREF_MEMORY_BASE and >> PCI_PREF_MEMORY_LIMIT, which enables the prefetchable memory aperture at >> [mem 0xfff00000-0xffffffff], and that may also conflict with other devices. >> >> All we need to know is whether the base and limit registers are writable, >> so we can use values that leave the apertures disabled, e.g., PCI_IO_BASE = >> 0xf0, PCI_IO_LIMIT = 0xe0, PCI_PREF_MEMORY_BASE = 0xfff0, >> PCI_PREF_MEMORY_LIMIT = 0xffe0. >> >> Writing non-zero values to both the base and limit registers means we >> detect whether either or both are writable, as we did before. > > Looks sane to me. The only refinement would be to detect if IO is > enabled and use that as a first check. I assume you mean checking PCI_COMMAND_IO, as in this fragment you posted earlier: + pci_read_config_word(bridge, PCI_COMMAND, &command); + if (command & PCI_COMMAND_IO) + b_res[0].flags |= IORESOURCE_IO; + else { + pci_write_config_word(bridge, PCI_IO_BASE, 0x00f0); That would avoid the writes to PCI_IO_BASE in the case where an I/O aperture is already enabled and configured at [io 0x0000-0x0fff], which has the advantage of not temporarily disabling the aperture. I don't think it's 100% spec compliant because the spec allows PCI_COMMAND_IO to be set if the bridge has an enabled I/O BAR but does not support an I/O aperture. I admit I have never seen such a device, and I doubt one exists :) To my mind, it's not worth adding the check, because we shouldn't be sending I/O accesses to devices behind the bridge at this point anyway, so the temporary disable shouldn't be a problem. If we *are* sending accesses to such a device, the current code moves the aperture, so the bridge won't claim them, and we already have a problem. So I don't think my patch will add any problems there, and it can conceivably prevent a conflict with another device at [io 0xf000-0xffff]. Bjorn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures 2013-12-09 20:00 ` Bjorn Helgaas @ 2013-12-09 20:15 ` Jason Gunthorpe 0 siblings, 0 replies; 14+ messages in thread From: Jason Gunthorpe @ 2013-12-09 20:15 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org On Mon, Dec 09, 2013 at 01:00:36PM -0700, Bjorn Helgaas wrote: > I don't think it's 100% spec compliant because the spec allows > PCI_COMMAND_IO to be set if the bridge has an enabled I/O BAR but does > not support an I/O aperture. I admit I have never seen such a device, > and I doubt one exists :) Yes, fair enough! Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-06 0:19 [PATCH 0/2] Bridge window discovery and setup fixes Bjorn Helgaas 2013-12-06 0:19 ` [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures Bjorn Helgaas @ 2013-12-06 0:19 ` Bjorn Helgaas 2013-12-09 19:33 ` Jason Gunthorpe 1 sibling, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-06 0:19 UTC (permalink / raw) To: linux-pci; +Cc: Jason Gunthorpe pci_setup_bridge_io() accessed PCI_IO_BASE and PCI_IO_LIMIT using dword (32-bit) reads and writes, which also access the Secondary Status register. Since the Secondary Status register is in the upper 16 bits of the dword, and we preserved those upper 16 bits, this had the effect of clearing any of the write-1-to-clear bits that happened to be set in the Secondary Status register. That's not what we want, so use word (16-bit) accesses to update only PCI_IO_BASE and PCI_IO_LIMIT. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/setup-bus.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 80350299a6ea..2e344a5581ae 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -538,7 +538,8 @@ static void pci_setup_bridge_io(struct pci_bus *bus) struct pci_bus_region region; unsigned long io_mask; u8 io_base_lo, io_limit_lo; - u32 l, io_upper16; + u16 l; + u32 io_upper16; io_mask = PCI_IO_RANGE_MASK; if (bridge->io_window_1k) @@ -548,11 +549,10 @@ static void pci_setup_bridge_io(struct pci_bus *bus) res = bus->resource[0]; pcibios_resource_to_bus(bridge, ®ion, res); if (res->flags & IORESOURCE_IO) { - pci_read_config_dword(bridge, PCI_IO_BASE, &l); - l &= 0xffff0000; + pci_read_config_word(bridge, PCI_IO_BASE, &l); io_base_lo = (region.start >> 8) & io_mask; io_limit_lo = (region.end >> 8) & io_mask; - l |= ((u32) io_limit_lo << 8) | io_base_lo; + l = ((u16) io_limit_lo << 8) | io_base_lo; /* Set up upper 16 bits of I/O base/limit. */ io_upper16 = (region.end & 0xffff0000) | (region.start >> 16); dev_info(&bridge->dev, " bridge window %pR\n", res); @@ -564,7 +564,7 @@ static void pci_setup_bridge_io(struct pci_bus *bus) /* Temporarily disable the I/O range before updating PCI_IO_BASE. */ pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff); /* Update lower 16 bits of I/O base/limit. */ - pci_write_config_dword(bridge, PCI_IO_BASE, l); + pci_write_config_word(bridge, PCI_IO_BASE, l); /* Update upper 16 bits of I/O base/limit. */ pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, io_upper16); } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-06 0:19 ` [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture Bjorn Helgaas @ 2013-12-09 19:33 ` Jason Gunthorpe 2013-12-09 20:15 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2013-12-09 19:33 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci On Thu, Dec 05, 2013 at 05:19:55PM -0700, Bjorn Helgaas wrote: > pci_setup_bridge_io() accessed PCI_IO_BASE and PCI_IO_LIMIT using dword > (32-bit) reads and writes, which also access the Secondary Status register. > Since the Secondary Status register is in the upper 16 bits of the dword, > and we preserved those upper 16 bits, this had the effect of clearing any > of the write-1-to-clear bits that happened to be set in the Secondary > Status register. This is a good catch! > - pci_write_config_dword(bridge, PCI_IO_BASE, l); > + pci_write_config_word(bridge, PCI_IO_BASE, l); But this is a problem :( tegra and mvebu at least do not have HW to do non-32 bit writes, so their implementation of pci_write_config_word does the RMW internally and will still have this same bug. I think you have to keep the 32 bit write here, but zero the write-one-to-clear bits :( Regards, Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-09 19:33 ` Jason Gunthorpe @ 2013-12-09 20:15 ` Bjorn Helgaas 2013-12-09 21:27 ` Jason Gunthorpe 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-09 20:15 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: linux-pci@vger.kernel.org On Mon, Dec 9, 2013 at 12:33 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Thu, Dec 05, 2013 at 05:19:55PM -0700, Bjorn Helgaas wrote: >> pci_setup_bridge_io() accessed PCI_IO_BASE and PCI_IO_LIMIT using dword >> (32-bit) reads and writes, which also access the Secondary Status register. >> Since the Secondary Status register is in the upper 16 bits of the dword, >> and we preserved those upper 16 bits, this had the effect of clearing any >> of the write-1-to-clear bits that happened to be set in the Secondary >> Status register. > > This is a good catch! > >> - pci_write_config_dword(bridge, PCI_IO_BASE, l); >> + pci_write_config_word(bridge, PCI_IO_BASE, l); > > But this is a problem :( > > tegra and mvebu at least do not have HW to do non-32 bit writes, so > their implementation of pci_write_config_word does the RMW internally > and will still have this same bug. > > I think you have to keep the 32 bit write here, but zero the > write-one-to-clear bits :( Is there actually some spec requirement about the access sizes that must be supported by the hardware? If there is something, I'll gladly keep the 32-bit access, but if it's only a tegra/mvebu-specific restriction, then I think it needs to be handled in the PCI config accessors for that hardware. This spec statement (PCI (not PCIe) r3.0, sec 3.2.2.3.4) makes it sound like tegra/mvebu is non-conforming and should deal with this specially: "A function must not restrict the size of the access it supports in Configuration Space. The configuration commands, like other commands, allow data to be accessed using any combination of bytes (including a byte, word, DWORD, or non-contiguous bytes) and multiple data phases in a burst. The target is required to handle any combination of byte enables." I think the tegra/mvebu accessors should be able to use the address and access size to figure out what we're trying to do. If we're doing a WORD write to PCI_IO_BASE, it can turn that into: 32-bit read, clear the upper 16 bits (secondary status), insert the PCI_IO_BASE part, 32-bit write. If we're doing a 16-bit write to PCI_SEC_STATUS, it can turn that into: 32-bit read, insert upper 16 bits (secondary status), leave lower 16 bits alone, 32-bit write. Bjorn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-09 20:15 ` Bjorn Helgaas @ 2013-12-09 21:27 ` Jason Gunthorpe 2013-12-09 21:58 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2013-12-09 21:27 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org On Mon, Dec 09, 2013 at 01:15:52PM -0700, Bjorn Helgaas wrote: > Is there actually some spec requirement about the access sizes that > must be supported by the hardware? If there is something, I'll gladly > keep the 32-bit access, but if it's only a tegra/mvebu-specific > restriction, then I think it needs to be handled in the PCI config > accessors for that hardware. The host register interface is not specified by any standard. I looked a for a bit and found other drivers with the same problem: arch/arm/mach-cns3xxx/pcie.c arch/arm/mach-iop13xx/pci.c arch/arm/mach-ks8695/pci.c arch/arm/mach-sa1100/pci-nanoengine.c [..] I stopped looking after that point.. > This spec statement (PCI (not PCIe) r3.0, sec 3.2.2.3.4) makes it > sound like tegra/mvebu is non-conforming and should deal with this > specially: > > "A function must not restrict the size of the access it supports in > Configuration Space. The configuration commands, like other commands, > allow data to be accessed using any combination of bytes (including a > byte, word, DWORD, or non-contiguous bytes) and multiple data phases > in a burst. The target is required to handle any combination of byte > enables." This verbage is talking about the target/responder, not the host. Any target must respond to all possible sizes for all possible config registers, including combinations that even x86 can't issue (like a BE pattern of b0110) > I think the tegra/mvebu accessors should be able to use the address > and access size to figure out what we're trying to do. If we're doing > a WORD write to PCI_IO_BASE, it can turn that into: 32-bit read, clear > the upper 16 bits (secondary status), insert the PCI_IO_BASE part, > 32-bit write. If we're doing a 16-bit write to PCI_SEC_STATUS, it can > turn that into: 32-bit read, insert upper 16 bits (secondary status), > leave lower 16 bits alone, 32-bit write. Seems reasonable, but this obviously shouldn't be open coded in every driver.. Maybe pci_ops should gain a read32/write32 and HW that can only do 32 bit access should not define read/write. The core code could provide read/write functions to do the bit mangling for all possibilities? That would be very clear what is going on. Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-09 21:27 ` Jason Gunthorpe @ 2013-12-09 21:58 ` Bjorn Helgaas 2013-12-10 0:10 ` Jason Gunthorpe 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-09 21:58 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: linux-pci@vger.kernel.org On Mon, Dec 9, 2013 at 2:27 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Mon, Dec 09, 2013 at 01:15:52PM -0700, Bjorn Helgaas wrote: > >> Is there actually some spec requirement about the access sizes that >> must be supported by the hardware? If there is something, I'll gladly >> keep the 32-bit access, but if it's only a tegra/mvebu-specific >> restriction, then I think it needs to be handled in the PCI config >> accessors for that hardware. > > The host register interface is not specified by any standard. > > I looked a for a bit and found other drivers with the same problem: > arch/arm/mach-cns3xxx/pcie.c > arch/arm/mach-iop13xx/pci.c > arch/arm/mach-ks8695/pci.c > arch/arm/mach-sa1100/pci-nanoengine.c > [..] > > I stopped looking after that point.. > >> This spec statement (PCI (not PCIe) r3.0, sec 3.2.2.3.4) makes it >> sound like tegra/mvebu is non-conforming and should deal with this >> specially: >> >> "A function must not restrict the size of the access it supports in >> Configuration Space. The configuration commands, like other commands, >> allow data to be accessed using any combination of bytes (including a >> byte, word, DWORD, or non-contiguous bytes) and multiple data phases >> in a burst. The target is required to handle any combination of byte >> enables." > > This verbage is talking about the target/responder, not the host. Any > target must respond to all possible sizes for all possible config > registers, including combinations that even x86 can't issue (like a BE > pattern of b0110) Now I'm confused. From the PCI core's point of view, pci_setup_bridge_io() is talking to a target, namely a PCI-to-PCI bridge. PCI config space accessors are also for talking to targets. I think you're talking about host bridges, and I agree that they are not covered by this spec statement and can have arbitrary register layout and access size restrictions. Some arches set up their host bridges so they appear in PCI config space and have register layouts that look like PCI-to-PCI bridges. In my opinion, those arches then have the responsibility of following all the PCI-to-PCI bridge rules, including access size restrictions, either directly in hardware or in their config accessors. I don't think it makes sense to try to make the core know about this stuff. There are many places that touch bridges, and it would be pretty hard to manage exceptions like this. Bjorn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-09 21:58 ` Bjorn Helgaas @ 2013-12-10 0:10 ` Jason Gunthorpe 2013-12-10 0:18 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2013-12-10 0:10 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org On Mon, Dec 09, 2013 at 02:58:06PM -0700, Bjorn Helgaas wrote: > >> "A function must not restrict the size of the access it supports in > >> Configuration Space. The configuration commands, like other commands, > >> allow data to be accessed using any combination of bytes (including a > >> byte, word, DWORD, or non-contiguous bytes) and multiple data phases > >> in a burst. The target is required to handle any combination of byte > >> enables." > > > > This verbage is talking about the target/responder, not the host. Any > > target must respond to all possible sizes for all possible config > > registers, including combinations that even x86 can't issue (like a BE > > pattern of b0110) > > Now I'm confused. From the PCI core's point of view, > pci_setup_bridge_io() is talking to a target, namely a PCI-to-PCI > bridge. PCI config space accessors are also for talking to targets. The fundamental issue is that the PCI host drivers I pointed out have hardware that cannot construct a PCI-E ConfigWr TLP (or PCI/PCI-X eqiuv) with any content other than a 32 bit DWORD write. So even though pci_setup_bridge_io is talking to a target, and the target is required to handle all possible ConfigWr TLPs, the host driver (struct pci_ops) can only generate one possibility. The problematic host drivers are all emulating non-32 bit ConfigWr support by doing RMW, and you have observed that RMW is not 100% correct when dealing with write 1 to clear bits. Basically, your fix to pci_setup_bridge_io is fine - but your observation led me to realize that the HW drivers implementing RMW for 8 and 16 bit ops under their struct pci_ops.read function have exactly the same flaw you are fixing here - they will silently wipe out write 1 to clear bits. > Some arches set up their host bridges so they appear in PCI config > space and have register layouts that look like PCI-to-PCI bridges. In > my opinion, those arches then have the responsibility of following all > the PCI-to-PCI bridge rules, including access size restrictions, > either directly in hardware or in their config accessors. Sure, but this isn't the problem :) IMHO, the pci core needs to help host drivers implement a correct pci_ops.write function for the case where the host hardware can only produce 32 bit ConfigWR TLPs. That seems to be common enough and subtle enough it shouldn't be inlined into every driver. Just to be clear, I am talking about implementations like this: static int cns3xxx_pci_write_config(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val) { u32 v; void __iomem *base; u32 mask = (0x1ull << (size * 8)) - 1; int shift = (where % 4) * 8; base = cns3xxx_pci_cfg_base(bus, devfn, where); if (!base) return PCIBIOS_SUCCESSFUL; v = __raw_readl(base); v &= ~(mask << shift); v |= (val & mask) << shift; __raw_writel(v, base); return PCIBIOS_SUCCESSFUL; } static struct pci_ops cns3xxx_pcie_ops = { .read = cns3xxx_pci_read_config, .write = cns3xxx_pci_write_config, }; Which we now know is subtly broken as it will clear write 1 to clear bits when doing sub dword operations. I found 6 copies of this pattern in 5 mins of searching :( Regards, Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-10 0:10 ` Jason Gunthorpe @ 2013-12-10 0:18 ` Bjorn Helgaas 2013-12-10 1:31 ` Jason Gunthorpe 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-10 0:18 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: linux-pci@vger.kernel.org On Mon, Dec 9, 2013 at 5:10 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Mon, Dec 09, 2013 at 02:58:06PM -0700, Bjorn Helgaas wrote: > >> >> "A function must not restrict the size of the access it supports in >> >> Configuration Space. The configuration commands, like other commands, >> >> allow data to be accessed using any combination of bytes (including a >> >> byte, word, DWORD, or non-contiguous bytes) and multiple data phases >> >> in a burst. The target is required to handle any combination of byte >> >> enables." >> > >> > This verbage is talking about the target/responder, not the host. Any >> > target must respond to all possible sizes for all possible config >> > registers, including combinations that even x86 can't issue (like a BE >> > pattern of b0110) >> >> Now I'm confused. From the PCI core's point of view, >> pci_setup_bridge_io() is talking to a target, namely a PCI-to-PCI >> bridge. PCI config space accessors are also for talking to targets. > > The fundamental issue is that the PCI host drivers I pointed out have > hardware that cannot construct a PCI-E ConfigWr TLP (or PCI/PCI-X > eqiuv) with any content other than a 32 bit DWORD write. Yep. > So even though pci_setup_bridge_io is talking to a target, and the > target is required to handle all possible ConfigWr TLPs, the host > driver (struct pci_ops) can only generate one possibility. Yep. > The problematic host drivers are all emulating non-32 bit ConfigWr > support by doing RMW, and you have observed that RMW is not 100% > correct when dealing with write 1 to clear bits. Yep. > Basically, your fix to pci_setup_bridge_io is fine - but your > observation led me to realize that the HW drivers implementing RMW for > 8 and 16 bit ops under their struct pci_ops.read function have exactly > the same flaw you are fixing here - they will silently wipe out write > 1 to clear bits. Ah, OK. I thought you were saying that we couldn't change pci_setup_bridge_io() to use 16-bit accesses because of this problem. >> Some arches set up their host bridges so they appear in PCI config >> space and have register layouts that look like PCI-to-PCI bridges. In >> my opinion, those arches then have the responsibility of following all >> the PCI-to-PCI bridge rules, including access size restrictions, >> either directly in hardware or in their config accessors. > > Sure, but this isn't the problem :) > > IMHO, the pci core needs to help host drivers implement a correct > pci_ops.write function for the case where the host hardware can only > produce 32 bit ConfigWR TLPs. That seems to be common enough and > subtle enough it shouldn't be inlined into every driver. Sure, if somebody can come up with a reasonable way to share this sort of code, that sounds like a good thing. Maybe extending struct pci_ops is the way to do this, but I hope not, because it seems like it will require quite a bit of specific knowledge about which registers have RW1C bits. They're not all at constant offsets because they can be in capabilities, too. > Just to be clear, I am talking about implementations like this: > > static int cns3xxx_pci_write_config(struct pci_bus *bus, unsigned int devfn, > int where, int size, u32 val) > { > u32 v; > void __iomem *base; > u32 mask = (0x1ull << (size * 8)) - 1; > int shift = (where % 4) * 8; > > base = cns3xxx_pci_cfg_base(bus, devfn, where); > if (!base) > return PCIBIOS_SUCCESSFUL; > > v = __raw_readl(base); > > v &= ~(mask << shift); > v |= (val & mask) << shift; > > __raw_writel(v, base); > > return PCIBIOS_SUCCESSFUL; > } > > static struct pci_ops cns3xxx_pcie_ops = { > .read = cns3xxx_pci_read_config, > .write = cns3xxx_pci_write_config, > }; > > Which we now know is subtly broken as it will clear write 1 to clear > bits when doing sub dword operations. > > I found 6 copies of this pattern in 5 mins of searching :( Yes, I bet there are a bunch of them. Bjorn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-10 0:18 ` Bjorn Helgaas @ 2013-12-10 1:31 ` Jason Gunthorpe 2013-12-10 18:22 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Jason Gunthorpe @ 2013-12-10 1:31 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org On Mon, Dec 09, 2013 at 05:18:30PM -0700, Bjorn Helgaas wrote: > > Basically, your fix to pci_setup_bridge_io is fine - but your > > observation led me to realize that the HW drivers implementing RMW for > > 8 and 16 bit ops under their struct pci_ops.read function have exactly > > the same flaw you are fixing here - they will silently wipe out write > > 1 to clear bits. > > Ah, OK. I thought you were saying that we couldn't change > pci_setup_bridge_io() to use 16-bit accesses because of this problem. No, not really - just that this bug you discovered is broader than just that one place :) > Sure, if somebody can come up with a reasonable way to share this sort > of code, that sounds like a good thing. Maybe extending struct > pci_ops is the way to do this, but I hope not, because it seems like Well, sharing the code is no problem, it is exactly the same for every 32 bit only host driver. But supporting RW1C bits in capability lists seems fairly hard once the context at the call site is lost. Which makes me wonder if supporting sub-dword writes to dwords with RW1C bits makes sense at all :( That was why my initial reaction was to not do sub dword writes if you know it will conflict with a RW1C bit. Jason ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture 2013-12-10 1:31 ` Jason Gunthorpe @ 2013-12-10 18:22 ` Bjorn Helgaas 0 siblings, 0 replies; 14+ messages in thread From: Bjorn Helgaas @ 2013-12-10 18:22 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: linux-pci@vger.kernel.org On Mon, Dec 9, 2013 at 6:31 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Mon, Dec 09, 2013 at 05:18:30PM -0700, Bjorn Helgaas wrote: > >> > Basically, your fix to pci_setup_bridge_io is fine - but your >> > observation led me to realize that the HW drivers implementing RMW for >> > 8 and 16 bit ops under their struct pci_ops.read function have exactly >> > the same flaw you are fixing here - they will silently wipe out write >> > 1 to clear bits. >> >> Ah, OK. I thought you were saying that we couldn't change >> pci_setup_bridge_io() to use 16-bit accesses because of this problem. > > No, not really - just that this bug you discovered is broader than just > that one place :) > >> Sure, if somebody can come up with a reasonable way to share this sort >> of code, that sounds like a good thing. Maybe extending struct >> pci_ops is the way to do this, but I hope not, because it seems like > > Well, sharing the code is no problem, it is exactly the same for every > 32 bit only host driver. > > But supporting RW1C bits in capability lists seems fairly hard once > the context at the call site is lost. Strictly speaking, fixing this only requires knowledge of the hardware register layout, so at least in principle, it can be done without the call site context. It does sound like it might be messy, though. QEMU deals with this issue somehow; maybe there's something there we can copy. > Which makes me wonder if supporting sub-dword writes to dwords with > RW1C bits makes sense at all :( > > That was why my initial reaction was to not do sub dword writes if you > know it will conflict with a RW1C bit. Keeping the dword read/write and clearing the upper 16 bits before the write is an option. But I'm inclined to do the 16-bit accesses, i.e., use the patch as posted, because that makes the issue more visible. If we keep the dword access, we can fix this particular issue, but it does nothing to help fix other similar cases. Bjorn ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-12-10 18:22 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-06 0:19 [PATCH 0/2] Bridge window discovery and setup fixes Bjorn Helgaas 2013-12-06 0:19 ` [PATCH 1/2] PCI: Prevent bus conflicts while checking for bridge apertures Bjorn Helgaas 2013-12-09 19:31 ` Jason Gunthorpe 2013-12-09 20:00 ` Bjorn Helgaas 2013-12-09 20:15 ` Jason Gunthorpe 2013-12-06 0:19 ` [PATCH 2/2] PCI: Stop clearing bridge Secondary Status when setting up I/O aperture Bjorn Helgaas 2013-12-09 19:33 ` Jason Gunthorpe 2013-12-09 20:15 ` Bjorn Helgaas 2013-12-09 21:27 ` Jason Gunthorpe 2013-12-09 21:58 ` Bjorn Helgaas 2013-12-10 0:10 ` Jason Gunthorpe 2013-12-10 0:18 ` Bjorn Helgaas 2013-12-10 1:31 ` Jason Gunthorpe 2013-12-10 18:22 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).