All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcel Apfelbaum <marcel@redhat.com>
To: Dotan Barak <dotanb@mellanox.com>,
	Knut Omang <knut.omang@oracle.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	Jan Kiszka <jan.kiszka@web.de>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v6 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
Date: Thu, 22 Oct 2015 11:54:23 +0300	[thread overview]
Message-ID: <5628A43F.1090104@redhat.com> (raw)
In-Reply-To: <AM3PR05MB14108A638BEA92945933C189B7270@AM3PR05MB1410.eurprd05.prod.outlook.com>

On 10/22/2015 11:51 AM, Dotan Barak wrote:
> I have minor comments, to use the new helper functions.

Hi Dotan,

I think is a good idea, however we can do it as a simple patch on top of it
if no other issues will be found.

Thanks,
Marcel

>
> Thanks
> Dotan
>
>> -----Original Message-----
>> From: Knut Omang [mailto:knut.omang@oracle.com]
>> Sent: Thursday, October 22, 2015 11:02 AM
>> To: qemu-devel@nongnu.org
>> Cc: Paolo Bonzini <pbonzini@redhat.com>; Richard Henderson
>> <rth@twiddle.net>; Eduardo Habkost <ehabkost@redhat.com>; Michael S.
>> Tsirkin <mst@redhat.com>; Alex Williamson <alex.williamson@redhat.com>;
>> Marcel Apfelbaum <marcel@redhat.com>; Jan Kiszka <jan.kiszka@web.de>;
>> Gonglei (Arei) <arei.gonglei@huawei.com>; Dotan Barak
>> <dotanb@mellanox.com>; Richard W.M. Jones <rjones@redhat.com>; Knut
>> Omang <knut.omang@oracle.com>
>> Subject: [PATCH v6 2/4] pcie: Add support for Single Root I/O Virtualization
>> (SR/IOV)
>>
>> This patch provides the building blocks for creating an SR/IOV PCIe Extended
>> Capability header and register/unregister SR/IOV Virtual Functions.
>>
>> Signed-off-by: Knut Omang <knut.omang@oracle.com>
>> ---
>>   hw/pci/Makefile.objs        |   2 +-
>>   hw/pci/pci.c                |  95 +++++++++++----
>>   hw/pci/pcie.c               |   2 +-
>>   hw/pci/pcie_sriov.c         | 277
>> ++++++++++++++++++++++++++++++++++++++++++++
>>   include/hw/pci/pci.h        |  11 +-
>>   include/hw/pci/pcie.h       |   6 +
>>   include/hw/pci/pcie_sriov.h |  67 +++++++++++
>>   include/qemu/typedefs.h     |   2 +
>>   trace-events                |   5 +
>>   9 files changed, 441 insertions(+), 26 deletions(-)  create mode 100644
>> hw/pci/pcie_sriov.c  create mode 100644 include/hw/pci/pcie_sriov.h
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c index b095cfe..3a6cce3 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -771,6 +782,15 @@ static void pci_init_multifunction(PCIBus *bus,
>> PCIDevice *dev, Error **errp)
>>           dev->config[PCI_HEADER_TYPE] |=
>> PCI_HEADER_TYPE_MULTI_FUNCTION;
>>       }
>>
>> +    /* With SR/IOV and ARI, a device at function 0 need not be a multifunction
>> +     * device, as it may just be a VF that ended up with function 0 in
>> +     * the legacy PCI interpretation. Avoid failing in such cases:
>> +     */
>> +    if (pci_is_vf(dev) &&
>> +        dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION)
> Use pcie_sriov_get_pf()
>
>> {
>> +        return;
>> +    }
>> +
>>       /*
>>        * multifunction bit is interpreted in two ways as follows.
>>        *   - all functions must set the bit to 1.
>
>
>> @@ -1060,11 +1081,44 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev,
>> int region_num)
>>       return pci_dev->io_regions[region_num].addr;
>>   }
>>
>> -static pcibus_t pci_bar_address(PCIDevice *d,
>> -				int reg, uint8_t type, pcibus_t size)
>> +
>> +static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
>> +                                        uint8_t type, pcibus_t size) {
>> +    pcibus_t new_addr;
>> +    if (!pci_is_vf(d)) {
>> +        int bar = pci_bar(d, reg);
>> +        if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
>> +            new_addr = pci_get_quad(d->config + bar);
>> +        } else {
>> +            new_addr = pci_get_long(d->config + bar);
>> +        }
>> +    } else {
>> +        PCIDevice *pf = d->exp.sriov_vf.pf;
> Use pcie_sriov_get_pf()
>
>
>> +        uint16_t sriov_cap = pf->exp.sriov_cap;
>> +        int bar = sriov_cap + PCI_SRIOV_BAR + reg * 4;
>> +        uint16_t vf_offset = pci_get_word(pf->config + sriov_cap +
>> PCI_SRIOV_VF_OFFSET);
>> +        uint16_t vf_stride = pci_get_word(pf->config + sriov_cap +
>> PCI_SRIOV_VF_STRIDE);
>> +        uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) /
>> +vf_stride;
> Use pcie_sriov_vf_number()
>
>
>
>> b/hw/pci/pcie_sriov.c new file mode 100644 index 0000000..756bdde
>> --- /dev/null
>> +++ b/hw/pci/pcie_sriov.c
>> @@ -0,0 +1,277 @@
>> +/*
>> + * pcie_sriov.c:
>> + *
>> + * Implementation of SR/IOV emulation support.
>> + *
>> + * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "hw/pci/pci.h"
>> +#include "hw/pci/pcie.h"
>> +#include "hw/pci/pci_bus.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/range.h"
>> +#include "trace.h"
>> +
>> +#define SRIOV_ID(dev) \
>> +    (dev)->name, PCI_SLOT((dev)->devfn), PCI_FUNC((dev)->devfn)
>> +
>> +static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char
>> +*name, uint16_t vf_num); static void unregister_vfs(PCIDevice *dev);
>> +
>> +void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>> +                        const char *vfname, uint16_t vf_dev_id,
>> +                        uint16_t init_vfs, uint16_t total_vfs,
>> +                        uint16_t vf_offset, uint16_t vf_stride) {
>> +    uint8_t *cfg = dev->config + offset;
>> +    uint8_t *wmask;
>> +
>> +    pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
>> +                        offset, PCI_EXT_CAP_SRIOV_SIZEOF);
>> +    dev->exp.sriov_cap = offset;
>> +    dev->exp.sriov_pf.num_vfs = 0;
>> +    dev->exp.sriov_pf.vfname = g_strdup(vfname);
>> +    dev->exp.sriov_pf.vf = NULL;
>> +
>> +    pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
>> +    pci_set_word(cfg + PCI_SRIOV_VF_STRIDE, vf_stride);
>> +
>> +    /* Mandatory page sizes to support.
>> +     * Device implementations can call pcie_sriov_pf_add_sup_pgsize()
>> +     * to set more bits:
>> +     */
>> +    pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, SRIOV_SUP_PGSIZE_MINREQ);
>> +
>> +    /* Default is to use 4K pages, software can modify it
>> +     * to any of the supported bits
>> +     */
>> +    pci_set_word(cfg + PCI_SRIOV_SYS_PGSIZE, 0x1);
>> +
>> +    /* Set up device ID and initial/total number of VFs available */
>> +    pci_set_word(cfg + PCI_SRIOV_VF_DID, vf_dev_id);
>> +    pci_set_word(cfg + PCI_SRIOV_INITIAL_VF, init_vfs);
>> +    pci_set_word(cfg + PCI_SRIOV_TOTAL_VF, total_vfs);
>> +    pci_set_word(cfg + PCI_SRIOV_NUM_VF, 0);
>> +
>> +    /* Write enable control bits */
>> +    wmask = dev->wmask + offset;
>> +    pci_set_word(wmask + PCI_SRIOV_CTRL,
>> +                 PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE |
>> PCI_SRIOV_CTRL_ARI);
>> +    pci_set_word(wmask + PCI_SRIOV_NUM_VF, 0xffff);
>> +    pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
>> +
>> +    qdev_prop_set_bit(&dev->qdev, "multifunction", true); }
>> +
>> +void pcie_sriov_pf_exit(PCIDevice *dev) {
>> +    unregister_vfs(dev);
>> +    g_free((char *)dev->exp.sriov_pf.vfname);
>> +    dev->exp.sriov_pf.vfname = NULL;
>> +}
>> +
>> +void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
>> +                               uint8_t type, dma_addr_t size) {
>> +    uint32_t addr;
>> +    uint64_t wmask;
>> +    uint16_t sriov_cap = dev->exp.sriov_cap;
>> +
>> +    assert(sriov_cap > 0);
>> +    assert(region_num >= 0);
>> +    assert(region_num < PCI_NUM_REGIONS);
>> +    assert(region_num != PCI_ROM_SLOT);
>> +
>> +    wmask = ~(size - 1);
>> +    addr = sriov_cap + PCI_SRIOV_BAR + region_num * 4;
>> +
>> +    pci_set_long(dev->config + addr, type);
>> +    if (!(type & PCI_BASE_ADDRESS_SPACE_IO) &&
>> +        type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
>> +        pci_set_quad(dev->wmask + addr, wmask);
>> +        pci_set_quad(dev->cmask + addr, ~0ULL);
>> +    } else {
>> +        pci_set_long(dev->wmask + addr, wmask & 0xffffffff);
>> +        pci_set_long(dev->cmask + addr, 0xffffffff);
>> +    }
>> +    dev->exp.sriov_pf.vf_bar_type[region_num] = type; }
>> +
>> +void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
>> +                                MemoryRegion *memory) {
>> +    PCIIORegion *r;
>> +    uint8_t type;
>> +    pcibus_t size = memory_region_size(memory);
>> +
>> +    assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */
>> +    assert(region_num >= 0);
>> +    assert(region_num < PCI_NUM_REGIONS);
>> +    type = dev->exp.sriov_vf.pf->exp.sriov_pf.vf_bar_type[region_num];
> Use pcie_sriov_get_pf()
>
>
>
>> +
>> +    if (!is_power_of_2(size)) {
>> +        error_report("%s: PCI region size must be a power"
>> +                     " of two - type=0x%x, size=0x%"FMT_PCIBUS"\n",
>> +                     __func__, type, size);
>> +        exit(1);
>> +    }
>> +
>> +    r = &dev->io_regions[region_num];
>> +    r->memory = memory;
>> +    r->address_space =
>> +        type & PCI_BASE_ADDRESS_SPACE_IO
>> +        ? dev->bus->address_space_io
>> +        : dev->bus->address_space_mem;
>> +    r->size = size;
>> +    r->type = type;
>> +
>> +    r->addr = pci_bar_address(dev, region_num, r->type, r->size);
>> +    if (r->addr != PCI_BAR_UNMAPPED) {
>> +        memory_region_add_subregion_overlap(r->address_space,
>> +                                            r->addr, r->memory, 1);
>> +    }
>> +}
>> +
>> +static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char
>> +*name, uint16_t vf_num) {
>> +    PCIDevice *dev = pci_create(pf->bus, devfn, name);
>> +    dev->exp.sriov_vf.pf = pf;
>> +    dev->exp.sriov_vf.vf_number = vf_num;
>> +    Error *local_err = NULL;
>> +
>> +    object_property_set_bool(OBJECT(&dev->qdev), true, "realized",
>> &local_err);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        return NULL;
>> +    }
>> +
>> +    /* set vid/did according to sr/iov spec - they are not used */
>> +    pci_config_set_vendor_id(dev->config, 0xffff);
>> +    pci_config_set_device_id(dev->config, 0xffff);
>> +    return dev;
>> +}
>> +
>> +static void register_vfs(PCIDevice *dev) {
>> +    uint16_t num_vfs;
>> +    uint16_t i;
>> +    uint16_t sriov_cap = dev->exp.sriov_cap;
>> +    uint16_t vf_offset = pci_get_word(dev->config + sriov_cap +
>> PCI_SRIOV_VF_OFFSET);
>> +    uint16_t vf_stride = pci_get_word(dev->config + sriov_cap +
>> PCI_SRIOV_VF_STRIDE);
>> +    int32_t devfn = dev->devfn + vf_offset;
>> +
>> +    assert(sriov_cap > 0);
>> +    num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
>> +
>> +    dev->exp.sriov_pf.vf = g_malloc(sizeof(PCIDevice *) * num_vfs);
>> +    assert(dev->exp.sriov_pf.vf);
>> +
>> +    trace_sriov_register_vfs(SRIOV_ID(dev), num_vfs);
>> +    for (i = 0; i < num_vfs; i++) {
>> +        dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, dev-
>>> exp.sriov_pf.vfname, i);
>> +        if (!dev->exp.sriov_pf.vf[i]) {
>> +            num_vfs = i;
>> +            break;
>> +        }
>> +        devfn += vf_stride;
>> +    }
>> +    dev->exp.sriov_pf.num_vfs = num_vfs; }
>> +
>> +static void unregister_vfs(PCIDevice *dev) {
>> +    Error *local_err = NULL;
>> +    uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
>> +    uint16_t i;
>> +
>> +    trace_sriov_unregister_vfs(SRIOV_ID(dev), num_vfs);
>> +    for (i = 0; i < num_vfs; i++) {
>> +        object_property_set_bool(OBJECT(&dev->exp.sriov_pf.vf[i]->qdev), false,
>> "realized", &local_err);
>> +        if (local_err) {
>> +            fprintf(stderr, "Failed to unplug: %s\n",
>> +                    error_get_pretty(local_err));
>> +            error_free(local_err);
>> +        }
>> +    }
>> +    g_free(dev->exp.sriov_pf.vf);
>> +    dev->exp.sriov_pf.vf = NULL;
>> +    dev->exp.sriov_pf.num_vfs = 0;
>> +    pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF,
>> +0); }
>> +
>> +void pcie_sriov_config_write(PCIDevice *dev, uint32_t address, uint32_t
>> +val, int len) {
>> +    uint32_t off;
>> +    uint16_t sriov_cap = dev->exp.sriov_cap;
>> +
>> +    if (!sriov_cap || address < sriov_cap) {
>> +        return;
>> +    }
>> +    off = address - sriov_cap;
>> +    if (off >= PCI_EXT_CAP_SRIOV_SIZEOF) {
>> +        return;
>> +    }
>> +
>> +    trace_sriov_config_write(SRIOV_ID(dev), off, val, len);
>> +
>> +    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
>> +        if (dev->exp.sriov_pf.num_vfs) {
>> +            if (!(val & PCI_SRIOV_CTRL_VFE)) {
>> +                unregister_vfs(dev);
>> +            }
>> +        } else {
>> +            if (val & PCI_SRIOV_CTRL_VFE) {
>> +                register_vfs(dev);
>> +            }
>> +        }
>> +    }
>> +}
>> +
>> +
>> +/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs */
>> +void pcie_sriov_pf_disable_vfs(PCIDevice *dev) {
>> +    uint16_t sriov_cap = dev->exp.sriov_cap;
>> +    if (sriov_cap) {
>> +        uint32_t val = pci_get_byte(dev->config + sriov_cap + PCI_SRIOV_CTRL);
>> +        if (val & PCI_SRIOV_CTRL_VFE) {
>> +            val &= ~PCI_SRIOV_CTRL_VFE;
>> +            pcie_sriov_config_write(dev, sriov_cap + PCI_SRIOV_CTRL, val, 1);
>> +        }
>> +    }
>> +}
>> +
>> +/* Add optional supported page sizes to the mask of supported page
>> +sizes */ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t
>> +opt_sup_pgsize) {
>> +    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
>> +    uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
>> +
>> +    uint16_t sup_pgsize = pci_get_word(cfg + PCI_SRIOV_SUP_PGSIZE);
>> +
>> +    sup_pgsize |= opt_sup_pgsize;
>> +
>> +    /* Make sure the new bits are set, and that system page size
>> +     * also can be set to any of the new values according to spec:
>> +     */
>> +    pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, sup_pgsize);
>> +    pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, sup_pgsize); }
>> +
>> +
>> +uint16_t pcie_sriov_vf_number(PCIDevice *dev) {
>> +    assert(pci_is_vf(dev));
>> +    return dev->exp.sriov_vf.vf_number; }
>> +
>> +
>> +PCIDevice *pcie_sriov_get_pf(PCIDevice *dev)
>> +{
>> +    return dev->exp.sriov_vf.pf;
>> +}
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index 551cb3d..2e9d8ba 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -11,8 +11,6 @@
>>   /* PCI includes legacy ISA access.  */
>>   #include "hw/isa/isa.h"
>>
>> -#include "hw/pci/pcie.h"
>> -
>>   /* PCI bus */
>>
>>   #define PCI_DEVFN(slot, func)   ((((slot) & 0x1f) << 3) | ((func) & 0x07))
>> @@ -132,6 +130,7 @@ enum {
>>   #define QEMU_PCI_VGA_IO_HI_SIZE 0x20
>>
>>   #include "hw/pci/pci_regs.h"
>> +#include "hw/pci/pcie.h"
>>
>>   /* PCI HEADER_TYPE */
>>   #define  PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
>> @@ -421,6 +420,9 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *,
>> void *, int);
>>   AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
>>   void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
>>
>> +pcibus_t pci_bar_address(PCIDevice *d,
>> +                         int reg, uint8_t type, pcibus_t size);
>> +
>>   static inline void
>>   pci_set_byte(uint8_t *config, uint8_t val)
>>   {
>> @@ -672,6 +674,11 @@ static inline int pci_is_express(const PCIDevice *d)
>>       return d->cap_present & QEMU_PCI_CAP_EXPRESS;
>>   }
>>
>> +static inline int pci_is_vf(const PCIDevice *d)
>> +{
>> +    return d->exp.sriov_vf.pf != NULL;
> Use pcie_sriov_get_pf()
>> +}
>> +
>>   static inline uint32_t pci_config_size(const PCIDevice *d)
>>   {
>>       return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE :
>> PCI_CONFIG_SPACE_SIZE;

  reply	other threads:[~2015-10-22  8:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-22  8:01 [Qemu-devel] [PATCH v6 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
2015-10-22  8:01 ` [Qemu-devel] [PATCH v6 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
2015-10-22  8:01 ` [Qemu-devel] [PATCH v6 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang
2015-10-22  8:51   ` Dotan Barak
2015-10-22  8:54     ` Marcel Apfelbaum [this message]
2015-10-22  8:01 ` [Qemu-devel] [PATCH v6 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Knut Omang
2015-10-22  8:01 ` [Qemu-devel] [PATCH v6 4/4] pcie: A few minor fixes (type+code simplify) Knut Omang
2015-10-22  8:27 ` [Qemu-devel] [PATCH v6 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
2015-10-22 13:14 ` Michael S. Tsirkin
2015-10-22 13:47   ` Knut Omang
2015-10-22 14:22     ` Dotan Barak
2015-10-22 14:39       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5628A43F.1090104@redhat.com \
    --to=marcel@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=dotanb@mellanox.com \
    --cc=ehabkost@redhat.com \
    --cc=jan.kiszka@web.de \
    --cc=knut.omang@oracle.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.