From: Alexander Graf <agraf@suse.de>
To: alvise rigo <a.rigo@virtualopensystems.com>
Cc: Rob Herring <rob.herring@linaro.org>,
VirtualOpenSystems Technical Team <tech@virtualopensystems.com>,
Claudio Fontana <claudio.fontana@huawei.com>,
QEMU Developers <qemu-devel@nongnu.org>,
Peter Maydell <peter.maydell@linaro.org>
Subject: Re: [Qemu-devel] [RFC v2 3/4] hw/pci-host: Add a generic PCI host controller for virtual platforms
Date: Tue, 06 Jan 2015 12:18:11 +0100 [thread overview]
Message-ID: <54ABC473.7080602@suse.de> (raw)
In-Reply-To: <CAH47eN2OrN62MdYA8nug7_muEdRvgeQMj9T9tZuJrs36jqkX8g@mail.gmail.com>
On 06.01.15 09:47, alvise rigo wrote:
> Hi,
>
> On Mon, Jan 5, 2015 at 6:13 PM, Alexander Graf <agraf@suse.de> wrote:
>>
>>
>> On 21.11.14 19:07, Alvise Rigo wrote:
>>> Add a generic PCI host controller for virtual platforms, based on the
>>> previous work by Rob Herring:
>>> http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg03482.html
>>>
>>> The controller creates a PCI bus for PCI devices; it is intended to
>>> receive from the platform all the needed information to initiate the
>>> memory regions expected by the PCI system. Due to this dependence, a
>>> header file is used to define the structure that the platform has to
>>> fill with the data needed by the controller. The device provides also a
>>> routine for the device tree node generation, which mostly has to take
>>> care of the interrupt-map property generation. This property is fetched
>>> by the guest operating system to map any PCI interrupt to the interrupt
>>> controller. For the time being, the device expects a GIC v2 to be used
>>> by the guest.
>>> Only mach-virt has been used to test the controller.
>>>
>>> Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com>
>>> ---
>>> hw/pci-host/Makefile.objs | 2 +-
>>> hw/pci-host/generic-pci.c | 280 ++++++++++++++++++++++++++++++++++++++
>>> include/hw/pci-host/generic-pci.h | 74 ++++++++++
>>> 3 files changed, 355 insertions(+), 1 deletion(-)
>>> create mode 100644 hw/pci-host/generic-pci.c
>>> create mode 100644 include/hw/pci-host/generic-pci.h
>>>
>>> diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
>>> index bb65f9c..8ef9fac 100644
>>> --- a/hw/pci-host/Makefile.objs
>>> +++ b/hw/pci-host/Makefile.objs
>>> @@ -1,4 +1,4 @@
>>> -common-obj-y += pam.o
>>> +common-obj-y += pam.o generic-pci.o
>>>
>>> # PPC devices
>>> common-obj-$(CONFIG_PREP_PCI) += prep.o
>>> diff --git a/hw/pci-host/generic-pci.c b/hw/pci-host/generic-pci.c
>>> new file mode 100644
>>> index 0000000..9c90263
>>> --- /dev/null
>>> +++ b/hw/pci-host/generic-pci.c
>>> @@ -0,0 +1,280 @@
>>> +/*
>>> + * Generic PCI host controller
>>> + *
>>> + * Copyright (c) 2014 Linaro, Ltd.
>>> + * Author: Rob Herring <rob.herring@linaro.org>
>>> + *
>>> + * Based on ARM Versatile PCI controller (hw/pci-host/versatile.c):
>>> + * Copyright (c) 2006-2009 CodeSourcery.
>>> + * Written by Paul Brook
>>> + *
>>> + * This code is licensed under the LGPL.
>>> + */
>>> +
>>> +#include "hw/sysbus.h"
>>> +#include "hw/pci-host/generic-pci.h"
>>> +#include "exec/address-spaces.h"
>>> +#include "sysemu/device_tree.h"
>>> +
>>> +static const VMStateDescription pci_generic_host_vmstate = {
>>> + .name = "generic-host-pci",
>>> + .version_id = 1,
>>> + .minimum_version_id = 1,
>>> +};
>>> +
>>> +static void pci_cam_config_write(void *opaque, hwaddr addr,
>>> + uint64_t val, unsigned size)
>>> +{
>>> + PCIGenState *s = opaque;
>>> + pci_data_write(&s->pci_bus, addr, val, size);
>>> +}
>>> +
>>> +static uint64_t pci_cam_config_read(void *opaque, hwaddr addr, unsigned size)
>>> +{
>>> + PCIGenState *s = opaque;
>>> + uint32_t val;
>>> + val = pci_data_read(&s->pci_bus, addr, size);
>>> + return val;
>>> +}
>>> +
>>> +static const MemoryRegionOps pci_vpb_config_ops = {
>>> + .read = pci_cam_config_read,
>>> + .write = pci_cam_config_write,
>>> + .endianness = DEVICE_NATIVE_ENDIAN,
>>
>> Never use NATIVE_ENDIAN unless you're 100% sure it's correct. In this
>> case, please make it LITTLE_ENDIAN.
>
> Agreed.
>
>>
>>> +};
>>> +
>>> +static void pci_generic_set_irq(void *opaque, int irq_num, int level)
>>> +{
>>> + qemu_irq *pic = opaque;
>>> + qemu_set_irq(pic[irq_num], level);
>>> +}
>>> +
>>> +static void pci_generic_host_init(Object *obj)
>>> +{
>>> + PCIHostState *h = PCI_HOST_BRIDGE(obj);
>>> + PCIGenState *s = PCI_GEN(obj);
>>> + GenericPCIHostState *gps = &s->pci_gen;
>>> +
>>> + memory_region_init(&s->pci_io_space, OBJECT(s), "pci_io", 1ULL << 32);
>>
>> Why is IO space that big? It's only 64k usually, no?
>
> This was just to prevent a definition of the io space smaller than the
> actual size of the memory region.
> This could be avoided as you suggested later, by postponing the
> creation of the PCIBus.
>
>>
>>> + memory_region_init(&s->pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32);
>>> +
>>> + pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci",
>>> + &s->pci_mem_space, &s->pci_io_space,
>>> + PCI_DEVFN(0, 0), TYPE_PCIE_BUS);
>>> + h->bus = &s->pci_bus;
>>> +
>>> + object_initialize(gps, sizeof(*gps), TYPE_GENERIC_PCI_HOST);
>>> + qdev_set_parent_bus(DEVICE(gps), BUS(&s->pci_bus));
>>> + gps->devfns = 0;
>>> +}
>>> +
>>> +static int generic_pci_map_irq_fn(PCIDevice *pci_dev, int pin)
>>> +{
>>> + BusState *bus = qdev_get_parent_bus(&pci_dev->qdev);
>>> + PCIBus *pci_bus = PCI_BUS(bus);
>>> + PCIDevice *pdev = pci_bus->devices[PCI_DEVFN(0, 0)];
>>> + GenericPCIHostState *gps = PCI_GEN_HOST(pdev);
>>> +
>>> + return gps->irqmap.devfn_idx_map[PCI_SLOT(pci_dev->devfn)]
>>> + [PCI_FUNC(pci_dev->devfn)];
>>> +}
>>> +
>>> +static void pci_generic_host_realize(DeviceState *dev, Error **errp)
>>> +{
>>> + PCIGenState *s = PCI_GEN(dev);
>>> + SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
>>> + GenericPCIPlatformData *pdata = &s->plat_data;
>>
>> Where does this come from? Why isn't it exposed as qdev parameters?
>
> It comes from the platform (e.g. mach-virt).
> Indeed, I will expose them as qdev properties.
>
>>
>>> + int i;
>>> +
>>> + if (!pdata->addr_map[PCI_CFG].size || !pdata->addr_map[PCI_IO].size ||
>>> + !pdata->addr_map[PCI_MEM].size || !pdata->irqs) {
>>> + hw_error("generic_pci: PCI controller not fully configured.");
>>> + }
>>> +
>>> + for (i = 0; i < pdata->irqs; i++) {
>>> + sysbus_init_irq(sbd, &s->irq[i]);
>>> + }
>>> +
>>> + pci_bus_irqs(&s->pci_bus, pci_generic_set_irq, generic_pci_map_irq_fn,
>>> + s->irq, pdata->irqs);
>>> +
>>> + memory_region_init_io(&s->mem_config, OBJECT(s), &pci_vpb_config_ops, s,
>>> + "pci-config", pdata->addr_map[PCI_CFG].size);
>>> + sysbus_init_mmio(sbd, &s->mem_config);
>>> +
>>> + /* Depending on the available memory of the board using the controller, we
>>> + create a window on both the I/O and mememory space. */
>>
>> meme?
>
> Ack.
>
>>
>>> + memory_region_init_alias(&s->pci_io_window, OBJECT(s), "pci-io-win",
>>> + &s->pci_io_space, 0,
>>> + pdata->addr_map[PCI_IO].size);
>>> + sysbus_init_mmio(sbd, &s->pci_io_window);
>>> +
>>> + memory_region_init_alias(&s->pci_mem_window, OBJECT(s), "pci-mem-win",
>>> + &s->pci_mem_space,
>>> + pdata->addr_map[PCI_MEM].addr,
>>> + pdata->addr_map[PCI_MEM].size);
>>> + sysbus_init_mmio(sbd, &s->pci_mem_window);
>>
>> What if we simply postpone the creation of those regions and the
>> creation of the PCIBus to the realize function? At that point we know
>> the size of the regions.
>
> I'm not sure but I think I've already tried this path and for some
> reason I've abandoned it.
> I will explore it again and I'll let you know.
>
>>
>>> +
>>> + /* TODO Remove once realize propagates to child devices. */
>>> + object_property_set_bool(OBJECT(&s->pci_gen), true, "realized", errp);
>>
>> Is this still necessary? In fact, wouldn't the realize of our PHB and
>> the creation of child devices happen consecutively usually?
>
> Sure, I will fix it.
>
>>
>>> +}
>>> +
>>> +static void pci_generic_host_class_init(ObjectClass *klass, void *data)
>>> +{
>>> + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>>> + DeviceClass *dc = DEVICE_CLASS(klass);
>>> +
>>> + k->vendor_id = PCI_VENDOR_ID_REDHAT;
>>> + k->device_id = 0x1234;
>>
>> Is this a reserved ID?
>
> Not at all. I see now from the documentation (docs/specs/pci-ids.txt)
> that a value within the range 0000 -> 00ff should be used.
> I suppose that eventually we will pick up one of the available?
>
>>
>>> + k->class_id = PCI_CLASS_PROCESSOR_CO;
>>> + /*
>>> + * PCI-facing part of the host bridge, not usable without the
>>> + * host-facing part, which can't be device_add'ed, yet.
>>> + */
>>> + dc->cannot_instantiate_with_device_add_yet = true;
>>> +}
>>> +
>>> +struct dt_irq_mapping {
>>> + int irqs;
>>> + uint32_t gic_phandle;
>>> + int base_irq_num;
>>> + uint64_t *data;
>>> +};
>>> +
>>> +/* */
>>> +#define IRQ_MAP_ENTRY_DESC_SZ 14
>>> +static void gen_int_mapping_fn(PCIBus *b, PCIDevice *d, void *opaque)
>>> +{
>>> + struct dt_irq_mapping *map_data;
>>> + int pin;
>>> +
>>> + PCIDevice *pdev = b->devices[PCI_DEVFN(0, 0)];
>>> + GenericPCIHostState *gps = PCI_GEN_HOST(pdev);
>>> + map_data = (struct dt_irq_mapping *)opaque;
>>> +
>>> + /* Check if the platform has enough IRQs available. */
>>> + if (gps->devfns > map_data->irqs) {
>>> + hw_error("generic_pci: too many PCI devices.");
>>> + }
>>> +
>>> + pin = pci_get_byte(d->config + PCI_INTERRUPT_PIN);
>>> + if (pin) {
>>> + uint64_t *data_ptr = map_data->data + (IRQ_MAP_ENTRY_DESC_SZ * gps->devfns);
>>> + uint8_t pci_slot = PCI_SLOT(d->devfn);
>>> + uint8_t pci_func = PCI_FUNC(d->devfn);
>>> + uint8_t *devfn_idx = &gps->irqmap.devfn_idx_map[pci_slot][0];
>>> + uint8_t *devfn_irq = gps->irqmap.devfn_irq_map;
>>> +
>>> + devfn_idx[pci_func] = gps->devfns;
>>> + devfn_irq[gps->devfns] = map_data->base_irq_num + gps->devfns;
>>> +
>>> + uint64_t buffer[IRQ_MAP_ENTRY_DESC_SZ] =
>>> + {1, (pci_slot << 11)|(pci_func << 8), 2, 0x00000000, 1, pin,
>>> + 1, map_data->gic_phandle, 1, 0x0, 1, devfn_irq[gps->devfns],
>>> + 1, 0x1};
>>> +
>>> + memcpy(data_ptr, buffer, IRQ_MAP_ENTRY_DESC_SZ * sizeof(*buffer));
>>> + gps->devfns++;
>>> + }
>>> +}
>>> +
>>> +/* Generate the "interrup-map" node's data and store it in map_data */
>>> +static void generate_int_mapping(struct dt_irq_mapping *map_data,
>>> + PCIGenState *s)
>>> +{
>>> + pci_for_each_device(s->parent_obj.bus, 0, gen_int_mapping_fn, map_data);
>>
>> The interrupt-map should describe interrupt mappings for every slot, not
>> every device. If you only describe the current state of affairs, hotplug
>> won't work.
>
> Ack, as agreed also with Peter in the other thread.
>
>>
>>> +}
>>> +
>>> +static void generate_dt_node(void *fdt, DeviceState *dev)
>>> +{
>>> + PCIGenState *s = PCI_GEN(dev);
>>> + char *nodename;
>>> + uint32_t gic_phandle;
>>> + uint32_t plat_acells;
>>> + uint32_t plat_scells;
>>> +
>>> + SysBusDevice *busdev;
>>> + busdev = SYS_BUS_DEVICE(dev);
>>> + MemoryRegion *cfg = sysbus_mmio_get_region(busdev, 0);
>>> + MemoryRegion *io = sysbus_mmio_get_region(busdev, 1);
>>> + MemoryRegion *mem = sysbus_mmio_get_region(busdev, 2);
>>> +
>>> + nodename = g_strdup_printf("/pci@%" PRIx64, cfg->addr);
>>> + qemu_fdt_add_subnode(fdt, nodename);
>>> + qemu_fdt_setprop_string(fdt, nodename, "compatible",
>>> + "pci-host-cam-generic");
>>> + qemu_fdt_setprop_string(fdt, nodename, "device_type", "pci");
>>> + qemu_fdt_setprop_cell(fdt, nodename, "#address-cells", 0x3);
>>> + qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0x2);
>>> + qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells", 0x1);
>>> +
>>> + plat_acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells");
>>> + plat_scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells");
>>> + qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", plat_acells, cfg->addr,
>>> + plat_scells, memory_region_size(cfg));
>>> +
>>> + qemu_fdt_setprop_sized_cells(fdt, nodename, "ranges",
>>> + 1, 0x01000000, 2, 0x00000000, /* I/O space */
>>> + 2, io->addr, 2, memory_region_size(io),
>>> + 1, 0x02000000, 2, mem->addr, /* 32bit memory space */
>>> + 2, mem->addr, 2, memory_region_size(mem));
>>> +
>>> + gic_phandle = qemu_fdt_get_phandle(fdt, s->plat_data.gic_node_name);
>>> + /* A mask covering bits [15,8] of phys.high allows to honor the
>>> + function number when resolving a triggered PCI interrupt. */
>>> + qemu_fdt_setprop_sized_cells(fdt, nodename, "interrupt-map-mask",
>>> + 1, 0xff00, 1, 0x0, 1, 0x0, 1, 0x7);
>>> +
>>> + uint64_t *int_mapping_data = g_malloc0(IRQ_MAP_ENTRY_DESC_SZ *
>>> + sizeof(uint64_t) * s->plat_data.irqs);
>>> + struct dt_irq_mapping dt_map_data = {
>>> + .irqs = s->plat_data.irqs,
>>> + .gic_phandle = gic_phandle,
>>> + .base_irq_num = s->plat_data.base_irq,
>>> + .data = int_mapping_data
>>> + };
>>> + /* Generate the interrupt mapping according to the devices attached
>>> + * to the PCI bus of the controller. */
>>> + generate_int_mapping(&dt_map_data, s);
>>> + qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "interrupt-map",
>>> + (s->pci_gen.devfns * IRQ_MAP_ENTRY_DESC_SZ)/2, int_mapping_data);
>>> +
>>> + g_free(int_mapping_data);
>>> + g_free(nodename);
>>> +}
>>> +
>>> +void pci_controller_build_dt_node(void *fdt, DeviceState *dev)
>>> +{
>>> + generate_dt_node(fdt, dev);
>>
>> Device tree generation should *never* be part of device code. It's
>> machine / board specific. If one day someone can convince me that it's
>
> Actually you already convinced me that the device tree generation
> should stay in the board code :)
> I will rework the code accordingly.
Meanwhile I've taken a stab at implementing a generic PCIe PHB instead.
I'm not sure we should really bother with legacy PCI at this point.
Please give it a try and see whether all of the devices that worked for
you with this patch also work with the PCIe version:
https://github.com/agraf/qemu/commit/ff56c2b2f8f50fe165b3febee0ab2d773b26040b
Alex
next prev parent reply other threads:[~2015-01-06 11:18 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-21 18:07 [Qemu-devel] [RFC v2 0/4] Add Generic PCI host device update Alvise Rigo
2014-11-21 18:07 ` [Qemu-devel] [RFC v2 1/4] hw/arm/virt: Allow multiple agents to modify dt Alvise Rigo
2014-11-24 11:47 ` Claudio Fontana
2015-01-05 15:36 ` Peter Maydell
2015-01-05 16:14 ` alvise rigo
2015-01-05 16:41 ` Peter Maydell
2015-01-05 17:35 ` alvise rigo
2015-01-05 18:07 ` Peter Maydell
2015-01-06 9:56 ` alvise rigo
2015-01-06 9:18 ` Eric Auger
2015-01-06 9:29 ` alvise rigo
2015-01-06 9:51 ` Peter Maydell
2015-01-06 10:05 ` Eric Auger
2014-11-21 18:07 ` [Qemu-devel] [RFC v2 2/4] hw/arm/virt: find_machine_info: handle NULL value Alvise Rigo
2015-01-05 15:36 ` Peter Maydell
2015-01-05 16:31 ` alvise rigo
2015-01-05 16:42 ` Peter Maydell
2014-11-21 18:07 ` [Qemu-devel] [RFC v2 3/4] hw/pci-host: Add a generic PCI host controller for virtual platforms Alvise Rigo
2014-11-24 10:34 ` Claudio Fontana
2014-11-24 14:57 ` alvise rigo
2014-11-25 10:20 ` Claudio Fontana
2015-01-05 17:13 ` Alexander Graf
2015-01-06 8:47 ` alvise rigo
2015-01-06 11:18 ` Alexander Graf [this message]
[not found] ` <1416593261-13751-5-git-send-email-a.rigo@virtualopensystems.com>
2014-11-24 10:38 ` [Qemu-devel] [RFC v2 4/4] hw/arm/virt: Add generic-pci device support Claudio Fontana
2014-11-24 10:47 ` alvise rigo
2014-11-24 15:50 ` [Qemu-devel] [RFC v2 0/4] Add Generic PCI host device update Claudio Fontana
2014-11-25 10:28 ` alvise rigo
2015-01-12 16:26 ` Claudio Fontana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54ABC473.7080602@suse.de \
--to=agraf@suse.de \
--cc=a.rigo@virtualopensystems.com \
--cc=claudio.fontana@huawei.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rob.herring@linaro.org \
--cc=tech@virtualopensystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.