qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: aik@ozlabs.ru, qemu-devel@nongnu.org, agraf@suse.de,
	ncmike@ncultra.org, qemu-ppc@nongnu.org,
	tyreld@linux.vnet.ibm.com, bharata.rao@gmail.com,
	nfont@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations
Date: Thu, 26 Feb 2015 14:06:04 -0600	[thread overview]
Message-ID: <20150226200604.31752.68449@loki> (raw)
In-Reply-To: <20150225064026.31752.8222@loki>

Quoting Michael Roth (2015-02-25 00:40:26)
> Quoting Michael Roth (2015-02-24 23:17:24)
> > Quoting David Gibson (2015-02-24 21:11:29)
> > > On Mon, Feb 16, 2015 at 08:27:51AM -0600, Michael Roth wrote:
> > > > This enables hotplug for PHB bridges. Upon hotplug we generate the
> > > 
> > > "PCI Host Bridge bridges" :-p
> > > 
> > > > OF-nodes required by PAPR specification and IEEE 1275-1994
> > > > "PCI Bus Binding to Open Firmware" for the device.
> > > > 
> > > > We associate the corresponding FDT for these nodes with the DrcEntry
> > > > corresponding to the slot, which will be fetched via
> > > > ibm,configure-connector RTAS calls by the guest as described by PAPR
> > > > specification. The FDT is cleaned up in the case of unplug.
> > > > 
> > > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > > ---
> > > >  hw/ppc/spapr_pci.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 371 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > > index 6a3d917..b9af1cd 100644
> > > > --- a/hw/ppc/spapr_pci.c
> > > > +++ b/hw/ppc/spapr_pci.c
> > > > @@ -33,8 +33,11 @@
> > > >  #include <libfdt.h>
> > > >  #include "trace.h"
> > > >  #include "qemu/error-report.h"
> > > > +#include "qapi/qmp/qerror.h"
> > > >  
> > > >  #include "hw/pci/pci_bus.h"
> > > > +#include "hw/ppc/spapr_drc.h"
> > > > +#include "sysemu/device_tree.h"
> > > >  
> > > >  /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
> > > >  #define RTAS_QUERY_FN           0
> > > > @@ -47,7 +50,13 @@
> > > >  #define RTAS_TYPE_MSI           1
> > > >  #define RTAS_TYPE_MSIX          2
> > > >  
> > > > -#include "hw/ppc/spapr_drc.h"
> > > > +#define _FDT(exp) \
> > > > +    do { \
> > > > +        int ret = (exp);                                           \
> > > > +        if (ret < 0) {                                             \
> > > > +            return ret;                                            \
> > > > +        }                                                          \
> > > > +    } while (0)
> > > >  
> > > >  static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
> > > >  {
> > > > @@ -481,6 +490,359 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> > > >      return &phb->iommu_as;
> > > >  }
> > > >  
> > > > +/* Macros to operate with address in OF binding to PCI */
> > > > +#define b_x(x, p, l)    (((x) & ((1<<(l))-1)) << (p))
> > > > +#define b_n(x)          b_x((x), 31, 1) /* 0 if relocatable */
> > > > +#define b_p(x)          b_x((x), 30, 1) /* 1 if prefetchable */
> > > > +#define b_t(x)          b_x((x), 29, 1) /* 1 if the address is aliased */
> > > > +#define b_ss(x)         b_x((x), 24, 2) /* the space code */
> > > > +#define b_bbbbbbbb(x)   b_x((x), 16, 8) /* bus number */
> > > > +#define b_ddddd(x)      b_x((x), 11, 5) /* device number */
> > > > +#define b_fff(x)        b_x((x), 8, 3)  /* function number */
> > > > +#define b_rrrrrrrr(x)   b_x((x), 0, 8)  /* register number */
> > > > +/* for 'reg'/'assigned-addresses' OF properties */
> > > > +#define RESOURCE_CELLS_SIZE 2
> > > > +#define RESOURCE_CELLS_ADDRESS 3
> > > > +
> > > > +typedef struct ResourceFields {
> > > > +    uint32_t phys_hi;
> > > > +    uint32_t phys_mid;
> > > > +    uint32_t phys_lo;
> > > > +    uint32_t size_hi;
> > > > +    uint32_t size_lo;
> > > > +} ResourceFields;
> > > 
> > > Hrm, 5*32-bit ints, that probably needs a ((packed)) on at least some
> > > platforms to be safe.
> > 
> > I seem to remember some rule (C99?) about padding only being used in cases
> > where an n-byte field doesn't naturally fall on an n-byte boundary, but
> > it doesn't hurt to be safe/explicit about what we're expecting.
> > 
> > > 
> > > > +typedef struct ResourceProps {
> > > > +    union {
> > > > +        ResourceFields fields[8];
> > > > +        uint8_t data[sizeof(ResourceFields) * 8];
> > > > +    } reg;
> > > > +    union {
> > > > +        ResourceFields fields[7];
> > > > +        uint8_t data[sizeof(ResourceFields) * 7];
> > > > +    } assigned;
> > > > +    uint32_t reg_len;
> > > > +    uint32_t assigned_len;
> > > > +} ResourceProps;
> > > 
> > > I'm a bit dubious about this union, rather than just casting when you
> > > need the bytestring version.
> > 
> > Yah, I may have gotten a bit carried away there...
> > 
> > > 
> > > > +
> > > > +/* fill in the 'reg'/'assigned-resources' OF properties for
> > > > + * a PCI device. 'reg' describes resource requirements for a
> > > > + * device's IO/MEM regions, 'assigned-addresses' describes the
> > > > + * actual resource assignments.
> > > > + *
> > > > + * the properties are arrays of ('phys-addr', 'size') pairs describing
> > > > + * the addressable regions of the PCI device, where 'phys-addr' is a
> > > > + * RESOURCE_CELLS_ADDRESS-tuple of 32-bit integers corresponding to
> > > > + * (phys.hi, phys.mid, phys.lo), and 'size' is a
> > > > + * RESOURCE_CELLS_SIZE-tuple corresponding to (size.hi, size.lo).
> > > > + * phys.hi = 0xYYXXXXZZ, where:
> > > > + *   0xYY = npt000ss
> > > > + *          |||   |
> > > > + *          |||   +-- space code: 1 if IO region, 2 if MEM region
> > > > + *          ||+------ for non-relocatable IO: 1 if aliased
> > > > + *          ||        for relocatable IO: 1 if below 64KB
> > > > + *          ||        for MEM: 1 if below 1MB
> > > > + *          |+------- 1 if region is prefetchable
> > > > + *          +-------- 1 if region is non-relocatable
> > > > + *   0xXXXX = bbbbbbbb dddddfff, encoding bus, slot, and function
> > > > + *            bits respectively
> > > > + *   0xZZ = rrrrrrrr, the register number of the BAR corresponding
> > > > + *          to the region
> > > > + *
> > > > + * phys.mid and phys.lo correspond respectively to the hi/lo portions
> > > > + * of the actual address of the region.
> > > > + *
> > > > + * how the phys-addr/size values are used differ slightly between
> > > > + * 'reg' and 'assigned-addresses' properties. namely, 'reg' has
> > > > + * an additional description for the config space region of the
> > > > + * device, and in the case of QEMU has n=0 and phys.mid=phys.lo=0
> > > > + * to describe the region as relocatable, with an address-mapping
> > > > + * that corresponds directly to the PHB's address space for the
> > > > + * resource. 'assigned-addresses' always has n=1 set with an absolute
> > > > + * address assigned for the resource. in general, 'assigned-addresses'
> > > > + * won't be populated, since addresses for PCI devices are generally
> > > > + * unmapped initially and left to the guest to assign.
> > > > + *
> > > > + * in accordance with PCI Bus Binding to Open Firmware,
> > > > + * IEEE Std 1275-1994, section 4.1.1, as implemented by PAPR+ v2.7,
> > > > + * Appendix C.
> > > > + */
> > > > +static void populate_resource_props(PCIDevice *d, ResourceProps *rp)
> > > > +{
> > > > +    int bus_num = pci_bus_num(PCI_BUS(qdev_get_parent_bus(DEVICE(d))));
> > > > +    uint32_t dev_id = (b_bbbbbbbb(bus_num) |
> > > > +                       b_ddddd(PCI_SLOT(d->devfn)) |
> > > > +                       b_fff(PCI_FUNC(d->devfn)));
> > > > +    ResourceFields *reg, *assigned;
> > > > +    int i, reg_idx = 0, assigned_idx = 0;
> > > > +
> > > > +    /* config space region */
> > > > +    reg = &rp->reg.fields[reg_idx++];
> > > > +    reg->phys_hi = cpu_to_be32(dev_id);
> > > > +    reg->phys_mid = 0;
> > > > +    reg->phys_lo = 0;
> > > > +    reg->size_hi = 0;
> > > > +    reg->size_lo = 0;
> > > > +
> > > > +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > > > +        if (!d->io_regions[i].size) {
> > > > +            continue;
> > > > +        }
> > > > +
> > > > +        reg = &rp->reg.fields[reg_idx++];
> > > > +
> > > > +        reg->phys_hi = cpu_to_be32(dev_id | b_rrrrrrrr(pci_bar(d, i)));
> > > > +        if (d->io_regions[i].type & PCI_BASE_ADDRESS_SPACE_IO) {
> > > > +            reg->phys_hi |= cpu_to_be32(b_ss(1));
> > > > +        } else {
> > > > +            reg->phys_hi |= cpu_to_be32(b_ss(2));
> > > > +        }
> > > > +        reg->phys_mid = 0;
> > > > +        reg->phys_lo = 0;
> > > > +        reg->size_hi = cpu_to_be32(d->io_regions[i].size >> 32);
> > > > +        reg->size_lo = cpu_to_be32(d->io_regions[i].size);
> > > > +
> > > > +        if (d->io_regions[i].addr != PCI_BAR_UNMAPPED) {
> > > > +            continue;
> > > 
> > > This has inverted sense, doesn't it?  If the BAR *is* unmapped you
> > > want to continue, skipping the assigned-addresses stuff.
> > 
> > Yes, should be inverted :( It ends up not mattering much with the
> > current userspace code, since we rely on PCI rescan which always
> > re-assigns BARs, but definitely needs fixing.
> > 
> > > 
> > > > +        }
> > > > +
> > > > +        assigned = &rp->assigned.fields[assigned_idx++];
> > > > +        assigned->phys_hi = cpu_to_be32(reg->phys_hi | b_n(1));
> > > > +        assigned->phys_mid = cpu_to_be32(d->io_regions[i].addr >> 32);
> > > > +        assigned->phys_lo = cpu_to_be32(d->io_regions[i].addr);
> > > 
> > > Not sure if I understood the comment above properly; should these
> > > addresses actually be translated through an mappings above the PHB?
> > > Not that there are any such translations on sPAPR, but...
> > 
> > The io_regions[i].addr values are relative to the IO/MEM address spaces for
> > the device, which correspond to IO/MEM windows for the PHB. So I think the
> > memory API should handle any translation above that...
> > 
> > Or do you mean because of the n=0/relocatable IO regions described by 'reg'?
> > 
> > IIUC, when n=0, reg->phys_{mid,lo} can be used to encode a starting offset,
> > so a guest can re-assign/re-program an IO region that's already been assigned
> > and described via the correspond fields in 'assigned-addresses', so long as
> > uses an addr above reg->phys{mid,lo}.
> > 
> > So, we could use those reg->phys_{mid,lo} values as an alternative to the
> > PHBs IO/MEM windows (I guess for platforms that don't provide these windows
> > and just expose the full address space?).
> > 
> > But since we have those windows, we end up not needing this, so we always do
> > n=0, reg->phys_mid = reg->phys_lo = 0, so the values in
> > assigned->phys_{mid,lo} end up just being offsets into the IO/MEM windows,
> > which correspond directly to io_regions[n].addr.
> 
> Argh, not sure what I was thinking. io_regions[n].addr is relative to the
> IO/MEM window, so from the guest perspective .addr is indeed a different
> value...
> 
> So maybe reg->phys_mid/reg->phys_lo do in fact need to reflect the window
> offsets so that the relocatable region assignments are offset
> accordingly...
> 
> Will look into it more.

So, maybe my first response wasn't as misguided as I thought. I haven't
found any documentation that lays out the specifics, but the kernel code makes
it fairly clear that the IO/MEM addresses in assigned-addresses are relative
to PHB's IO/MEM windows.

The code basically does this:

  arch/powerpc/kernel/pci_of_scan.c: of_pci_parse_addrs()
    addrs = of_get('assigned-addresses')
    config_offset = addrs[0] && 0xff (phys.hi)
    pci_bar_num = (config_offset - PCI_BASE_ADDRESS_0) / 4
    base = &addrs[1:2] (phys.mid/phys.lo)
    size = &addrs[3:4] (size.hi, size.lo)
    region.start = base
    region.end = base + size -1
    res = &dev->resource[pci_bar_num];
    pcibios_bus_to_resource(dev->bus, res, &region)
  drivers/pci/host-bridge.c:pcibios_bus_to_resource(bus, bar_res, bar_region)
    bridge = pci_host_bridge(bus)
    for window in bridge->windows:
      # skip if window type (IO/MEM), doesn't match device BAR's res)
      bus_region.start = window->res->start - window->offset
      bus_region.end = window->res->end - window->offset
      if bus_region contains device BAR region:
        offset = window->offset
        break
    //take the region, add the window offset, and use that as the resource addr
    res->start = region->start + offset
    res->end = region->end + offset

I tested this with the rpaphp hotplug module, which honors
assigned-addresses (current guest hotplug uses generic pci rescan instead
due to a long-standing issue with rpaphp only being able to handle
1 slot per PHB if there are devices present at module-load time), by adding
some test code to pre-assign the BARs during hotplug (with addresses relative
to PHB's IO/MEM windows) to make sure assigned-addresses values get properly
translated to the correct offset in guest's io/system address space.

Will add a note in the comments.

  reply	other threads:[~2015-02-26 20:07 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-16 14:27 [Qemu-devel] [PATCH resend v5 00/16] spapr: add support for pci hotplug Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 01/16] docs: add sPAPR hotplug/dynamic-reconfiguration documentation Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 02/16] spapr_drc: initial implementation of sPAPRDRConnector device Michael Roth
2015-02-23  6:05   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 03/16] spapr_rtas: add get/set-power-level RTAS interfaces Michael Roth
2015-02-24  6:29   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 04/16] spapr_rtas: add set-indicator RTAS interface Michael Roth
2015-02-24  6:32   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 05/16] spapr_rtas: add get-sensor-state " Michael Roth
2015-02-24  6:33   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 06/16] spapr: add rtas_st_buffer_direct() helper Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 07/16] spapr_rtas: add ibm, configure-connector RTAS interface Michael Roth
2015-02-24  6:40   ` David Gibson
2015-02-24 20:43     ` Michael Roth
2015-02-25  0:48       ` David Gibson
2015-02-26 22:21         ` Michael Roth
2015-02-27  5:31           ` David Gibson
2015-02-27 17:37             ` Michael Roth
2015-03-02  6:25               ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 08/16] spapr_events: re-use EPOW event infrastructure for hotplug events Michael Roth
2015-02-24  6:49   ` David Gibson
2015-02-24 20:04     ` Michael Roth
2015-02-25  0:54       ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 09/16] spapr_events: event-scan RTAS interface Michael Roth
2015-02-24  9:11   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 10/16] spapr_drc: add spapr_drc_populate_dt() Michael Roth
2015-02-24  9:26   ` David Gibson
2015-02-24 19:21     ` Michael Roth
2015-02-26 15:56   ` Nathan Fontenot
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 11/16] spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 12/16] spapr_pci: create DRConnectors for each PCI slot during PHB realize Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 13/16] spapr_pci: populate DRC dt entries for PHBs Michael Roth
2015-02-24  9:29   ` David Gibson
2015-02-24 18:52     ` Michael Roth
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 14/16] pci: make pci_bar useable outside pci.c Michael Roth
2015-02-24 10:20   ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations Michael Roth
2015-02-25  3:11   ` David Gibson
2015-02-25  5:17     ` Michael Roth
2015-02-25  6:40       ` Michael Roth
2015-02-26 20:06         ` Michael Roth [this message]
2015-02-27  5:14           ` David Gibson
2015-02-26  3:49       ` David Gibson
2015-02-16 14:27 ` [Qemu-devel] [PATCH v5 16/16] spapr_pci: emit hotplug add/remove events during hotplug Michael Roth
2015-02-25  3:18   ` David Gibson
  -- strict thread matches above, loose matches on Subject: below --
2015-02-16 14:10 [Qemu-devel] [PATCH v5 00/16] spapr: add support for pci hotplug Michael Roth
2015-02-16 14:10 ` [Qemu-devel] [PATCH v5 15/16] spapr_pci: enable basic hotplug operations Michael Roth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150226200604.31752.68449@loki \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=bharata.rao@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=ncmike@ncultra.org \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=tyreld@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).