From: Sebastian Ott <sebott@linux.vnet.ibm.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Jan Glauber" <jang@linux.vnet.ibm.com>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
heiko.carstens@de.ibm.com, schwidefsky@de.ibm.com,
"Gerald Schäfer" <gerald.schaefer@de.ibm.com>
Subject: Re: [RFC PATCH 01/10] s390/pci: base support
Date: Tue, 18 Dec 2012 19:07:18 +0100 (CET) [thread overview]
Message-ID: <alpine.LFD.2.02.1212181905400.3320@c4eb> (raw)
In-Reply-To: <CAErSpo6pFLiPtQuBGzAo1B4cQOWzJEur_2THT=xJo9c5otp5hQ@mail.gmail.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 8539 bytes --]
On Thu, 13 Dec 2012, Bjorn Helgaas wrote:
> On Thu, Dec 13, 2012 at 4:51 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> > On Mon, 2012-12-10 at 14:14 -0700, Bjorn Helgaas wrote:
> >> On Wed, Nov 14, 2012 at 2:41 AM, Jan Glauber <jang@linux.vnet.ibm.com> wrote:
> >> > Add PCI support for s390, (only 64 bit mode is supported by hardware):
> >> > - PCI facility tests
> >> > - PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit
> >> > - map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions
> >> > - pci_iomap implementation
> >> > - memcpy_fromio/toio
> >> > - pci_root_ops using special pcilg/pcistg
> >> > - device, bus and domain allocation
> >> >
> >> > Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
> >>
> >> I think these patches are in -next already, so I just have some
> >> general comments & questions.
> >
> > Yes, since the feedback was manageable we decided to give the patches
> > some exposure in -next and if no one complains we'll just go for the
> > next merge window. BTW, Sebastian & Gerald (on CC:) will continue the
> > work on the PCI code.
> >
> >> My overall impression is that these are exceptionally well done.
> >> They're easy to read, well organized, and well documented. It's a
> >> refreshing change from a lot of the stuff that's posted.
> >
> > Thanks Björn!
> >
> >> As with other arches that run on top of hypervisors, you have
> >> arch-specific code that enumerates PCI devices using hypervisor calls,
> >> and you hook into the PCI core at a lower level than
> >> pci_scan_root_bus(). That is, you call pci_create_root_bus(),
> >> pci_scan_single_device(), pci_bus_add_devices(), etc., directly from
> >> the arch code. This is the typical approach, but it does make more
> >> dependencies between the arch code and the PCI core than I'd like.
> >>
> >> Did you consider hiding any of the hypervisor details behind the PCI
> >> config accessor interface? If that could be done, the overall
> >> structure might be more similar to other architectures.
> >
> > You mean pci_root_ops? I'm not sure I understand how that can be used
> > to hide hipervisor details.
>
> The object of doing this would be to let you use pci_scan_root_bus(),
> so you wouldn't have to call pci_scan_single_device() and
> pci_bus_add_resources() directly. The idea is to make pci_read() and
> pci_write() smart enough that the PCI core can use them as though you
> have a normal PCI implementation. When pci_scan_root_bus() enumerates
> devices on the root bus using pci_scan_child_bus(), it does config
> reads on the VENDOR_ID of bb:00.0, bb:01.0, ..., bb:1f.0. Your
> pci_read() would return error or 0xffffffff data for everything except
> bb:00.0 (I guess it actually already does that).
>
> Then some of the other init, e.g., zpci_enable_device(), could be done
> by the standard pcibios hooks such as pcibios_add_device() and
> pcibios_enable_device(), which would remove some of the PCI grunge
> from zpci_scan_devices() and the s390 hotplug driver.
>
> > One reason why we use the lower level
> > functions is that we need to create the root bus (and its resources)
> > much earlier then the pci_dev. For instance pci_hp_register wants a
> > pci_bus to create the PCI slot and the slot can exist without a pci_dev.
>
> There must be something else going on here; a bus is *always* created
> before any of the pci_devs on the bus.
>
> One thing that looks a little strange is that zpci_list seems to be
> sort of a cross between a list of PCI host bridges and a list of PCI
> devices. Understandable, since you usually have a one-to-one
> correspondence between them, but for your hotplug stuff, you can have
> the host bridge and slot without the pci_dev.
>
> The hotplug slot support is really a function of the host bridge
> support, so it doesn't seem quite right to me to split it into a
> separate module (although that is the way most PCI hotplug drivers are
> currently structured). One reason is that it makes hot-add of a host
> bridge awkward: you have to have the "if (hotplug_ops.create_slot)"
> test in zpci_create_device().
>
> >> The current config accessors only work for dev/fn 00.0 (they fail when
> >> "devfn != ZPCI_DEVFN"). Why is that? It looks like it precludes
> >> multi-function devices and basically prevents you from building an
> >> arbitrary PCI hierarchy.
> >
> > Our hypervisor does not support multi-function devices. In fact the
> > hypervisor will limit the reported PCI devices to a hand-picked
> > selection so we can be sure that there will be no unsupported devices.
> > The PCI hierarchy is hidden by the hipervisor. We only use the PCI
> > domain number, bus and devfn are always zero. So it looks like every
> > function is directly attached to a PCI root complex.
> >
> > That was the reason for the sanity check, but thinking about it I could
> > remove it since although we don't support multi-function devices I
> > think the s390 code should be more agnostic to these limitations.
>
> The config accessor interface should be defined so it works for all
> PCI devices that exist, and fails for devices that do not exist. The
> approach you've taken so far is to prevent the PCI core from even
> attempting access to non-existent devices, which requires you to use
> some lower-level PCI interfaces. The alternative I'm raising as a
> possibility is to allow the PCI core to attempt accesses to
> non-existent devices, but have the accessor be smart enough to use the
> hypervisor or other arch-specific data to determine whether the device
> exists or not, and act accordingly.
>
> Basically the idea is that if we put more smarts in the config
> accessors, we can make the interface between the PCI core and the
> architecture thinner.
>
> >> zpci_map_resources() is very unusual. The struct pci_dev resource[]
> >> table normally contains CPU physical addresses, but
> >> zpci_map_resources() fills it with virtual addresses. I suspect this
> >> has something to do with the "BAR spaces are not disjunctive on s390"
> >> comment. It almost sounds like that's describing host bridges where
> >> the PCI bus address is not equal to the CPU physical address -- in
> >> that case, device A and device B may have the same values in their
> >> BARs, but they can still be distinct if they're under host bridges
> >> that apply different CPU-to-bus address translations.
> >
> > Yeah, you've found it... I've had 3 or 4 tries on different
> > implementations but all failed. If we use the resources as they are we
> > cannot map them to the instructions (and ioremap does not help because
> > there we cannot find out which device the resource belongs to). If we
> > change the BARs on the card MMIO stops to work. I don't know about host
> > bridges - if we would use a host bridge at which point in the
> > translation process would it kick in?
>
> Here's how it works. The PCI host bridge claims regions of the CPU
> physical address space and forwards transactions in those regions to
> the PCI root bus. Some host bridges can apply an offset when
> forwarding, so the address on the bus may be different from the
> address from the CPU. The bus address is what matches a PCI BAR.
>
> You can tell the PCI core about this translation by using
> pci_add_resource_offset() instead of pci_add_resource(). When the PCI
> core reads a BAR, it applies the offset to convert the BAR value into
> a CPU physical address.
>
> For example, let's say you have two host bridges:
>
> PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [mem 0x100000000-0x1ffffffff] (bus
> address [0x00000000-0xffffffff])
> PCI host bridge to bus 0001:00
> pci_bus 0001:00: root bus resource [mem 0x200000000-0x2ffffffff] (bus
> address [0x00000000-0xffffffff])
>
> Both bridges use the same bus address range, and BARs of devices on
> bus 0000:00 can have the same values as those of devices on bus
> 0001:00. But there's no ambiguity because a CPU access to
> 0x1_0000_0000 will be claimed by the first bridge and translated to
> bus address 0x0 on bus 0000:00, while a CPU access to 0x2_0000_0000
> will be claimed by the second bridge and translated to bus address 0x0
> on bus 0001:00.
>
> The pci_dev resources will contain CPU physical addresses, not the BAR
> values themselves. These addresses implicitly identify the host
> bridge (and, in your case, the pci_dev, since you have at most one
> pci_dev per host bridge).
Hi Bjorn,
thanks for the explanation. I'll look into it.
Regards,
Sebastian
next prev parent reply other threads:[~2012-12-18 18:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-14 9:41 [RFC PATCH 00/10] s390/pci: PCI support on System z Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 01/10] s390/pci: base support Jan Glauber
2012-12-10 21:14 ` Bjorn Helgaas
2012-12-13 11:51 ` Jan Glauber
2012-12-13 19:34 ` Bjorn Helgaas
2012-12-18 18:07 ` Sebastian Ott [this message]
2012-11-14 9:41 ` [RFC PATCH 02/10] s390/pci: CLP interface Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 03/10] s390/bitops: find leftmost bit instruction support Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 04/10] s390/pci: PCI adapter interrupts for MSI/MSI-X Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 05/10] s390/pci: DMA support Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 06/10] s390/pci: CHSC PCI support for error and availability events Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 07/10] s390/pci: PCI hotplug support via SCLP Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 08/10] s390/pci: s390 specific PCI sysfs attributes Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 09/10] s390/pci: add PCI Kconfig options Jan Glauber
2012-11-14 9:41 ` [RFC PATCH 10/10] vga: compile fix, disable vga for s390 Jan Glauber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.02.1212181905400.3320@c4eb \
--to=sebott@linux.vnet.ibm.com \
--cc=bhelgaas@google.com \
--cc=gerald.schaefer@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jang@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.