* [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization
@ 2015-10-14 15:51 Knut Omang
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
` (4 more replies)
0 siblings, 5 replies; 23+ messages in thread
From: Knut Omang @ 2015-10-14 15:51 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Knut Omang,
Richard W.M. Jones, Alex Williamson, Gonglei (Arei), Jan Kiszka,
Paolo Bonzini, Dotan Barak, Richard Henderson
This patch set implements generic support for SR/IOV as an extension to the
core PCIe functionality, similar to the way other capabilities such as AER
is implemented.
There is no implementation of any device that provides
SR/IOV support included, but I have implemented a test
example which can be found together with this patch set here:
git://github.com/knuto/qemu.git sriov_patches_v5
Testing with the example device was documented here:
http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05110.html
Changes since v4:
- Mostly based on feeback in Marcel Apfelbaum's review:
- The patch with changes to pci_regs.h got eliminated by rebase
- Added some documentation as an additional patch
- Some trivial fixes moved to separate patch
- Modified code to use error and trace functions instead of printfs
Changes since v3:
- Reworked 'pci: Update pci_regs header' to merge kernel version improvements
with the current qemu version instead of copying from the kernel version.
Changes since v2:
- Rebased onto 090d0bfd
- Un-qdev'ified - avoids issues when resetting NUM_VFS
- Fixed handling of vf_offset/vf_stride
Changes since v1:
- Rebased on top of latest master, eliminating prereqs.
- Implement proper support for VF_STRIDE, VF_OFFSET and SUP_PGSIZE
Time better spent fixing it than explaining what the previous
limitations were.
- Added new first patch to fix pci bug related to this
- Split out patch to pci_default_config_write to a separate patch 2
to highlight bug fix.
- Refactored out logic into new source files
hw/pci/pcie_sriov.c include/hw/pci/pcie_sriov.h
similar to pcie_aer.c/h.
- Rename functions and introduce structs to better separate
pf and vf functionality.
- Replaced is_vf member with pci_is_vf() function abstraction
- Fix numerous syntax, whitespace and comment issues
according to Michael's review.
- Fix memory leaks.
- Removed igb example device - a rebased version available
on github instead.
Knut Omang (4):
pci: Make use of the devfn property when registering new devices
pcie: Add support for Single Root I/O Virtualization (SR/IOV)
pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
pcie: A few minor fixes (type+code simplify)
docs/pcie_sriov.txt | 115 +++++++++++++++++++
hw/pci/Makefile.objs | 2 +-
hw/pci/pci.c | 104 +++++++++++++-----
hw/pci/pcie.c | 9 +-
hw/pci/pcie_sriov.c | 263 ++++++++++++++++++++++++++++++++++++++++++++
include/hw/pci/pci.h | 11 +-
include/hw/pci/pcie.h | 6 +
include/hw/pci/pcie_sriov.h | 58 ++++++++++
include/qemu/typedefs.h | 2 +
trace-events | 5 +
10 files changed, 539 insertions(+), 36 deletions(-)
create mode 100644 docs/pcie_sriov.txt
create mode 100644 hw/pci/pcie_sriov.c
create mode 100644 include/hw/pci/pcie_sriov.h
--
2.4.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH v5 1/4] pci: Make use of the devfn property when registering new devices
2015-10-14 15:51 [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
@ 2015-10-14 15:51 ` Knut Omang
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang
` (3 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Knut Omang @ 2015-10-14 15:51 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Knut Omang,
Richard W.M. Jones, Alex Williamson, Gonglei (Arei), Jan Kiszka,
Paolo Bonzini, Dotan Barak, Richard Henderson
Without this, the devfn argument to pci_create_*()
does not affect the assigned devfn.
Needed to support (VF_STRIDE,VF_OFFSET) values other than (1,1)
for SR/IOV.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
---
hw/pci/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b0bf540..b095cfe 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1840,7 +1840,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
bus = PCI_BUS(qdev_get_parent_bus(qdev));
pci_dev = do_pci_register_device(pci_dev, bus,
object_get_typename(OBJECT(qdev)),
- pci_dev->devfn, errp);
+ object_property_get_int(OBJECT(qdev), "addr", NULL), errp);
if (pci_dev == NULL)
return;
--
2.4.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
2015-10-14 15:51 [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
@ 2015-10-14 15:51 ` Knut Omang
2015-10-18 11:02 ` Marcel Apfelbaum
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Knut Omang
` (2 subsequent siblings)
4 siblings, 1 reply; 23+ messages in thread
From: Knut Omang @ 2015-10-14 15:51 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Knut Omang,
Richard W.M. Jones, Alex Williamson, Gonglei (Arei), Jan Kiszka,
Paolo Bonzini, Dotan Barak, Richard Henderson
This patch provides the building blocks for creating an SR/IOV
PCIe Extended Capability header and register/unregister
SR/IOV Virtual Functions.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
hw/pci/Makefile.objs | 2 +-
hw/pci/pci.c | 102 ++++++++++++-----
hw/pci/pcie.c | 2 +-
hw/pci/pcie_sriov.c | 263 ++++++++++++++++++++++++++++++++++++++++++++
include/hw/pci/pci.h | 11 +-
include/hw/pci/pcie.h | 6 +
include/hw/pci/pcie_sriov.h | 58 ++++++++++
include/qemu/typedefs.h | 2 +
trace-events | 5 +
9 files changed, 419 insertions(+), 32 deletions(-)
create mode 100644 hw/pci/pcie_sriov.c
create mode 100644 include/hw/pci/pcie_sriov.h
diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
index 9f905e6..2226980 100644
--- a/hw/pci/Makefile.objs
+++ b/hw/pci/Makefile.objs
@@ -3,7 +3,7 @@ common-obj-$(CONFIG_PCI) += msix.o msi.o
common-obj-$(CONFIG_PCI) += shpc.o
common-obj-$(CONFIG_PCI) += slotid_cap.o
common-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
-common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o
+common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o pcie_sriov.o
common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
common-obj-$(CONFIG_ALL) += pci-stub.o
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b095cfe..4fb5fcf 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -153,6 +153,9 @@ int pci_bar(PCIDevice *d, int reg)
{
uint8_t type;
+ /* PCIe virtual functions do not have their own BARs */
+ assert(!pci_is_vf(d));
+
if (reg != PCI_ROM_SLOT)
return PCI_BASE_ADDRESS_0 + reg * 4;
@@ -211,22 +214,13 @@ void pci_device_deassert_intx(PCIDevice *dev)
}
}
-static void pci_do_device_reset(PCIDevice *dev)
+static void pci_reset_regions(PCIDevice *dev)
{
int r;
+ if (pci_is_vf(dev)) {
+ return;
+ }
- pci_device_deassert_intx(dev);
- assert(dev->irq_state == 0);
-
- /* Clear all writable bits */
- pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
- pci_get_word(dev->wmask + PCI_COMMAND) |
- pci_get_word(dev->w1cmask + PCI_COMMAND));
- pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
- pci_get_word(dev->wmask + PCI_STATUS) |
- pci_get_word(dev->w1cmask + PCI_STATUS));
- dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
- dev->config[PCI_INTERRUPT_LINE] = 0x0;
for (r = 0; r < PCI_NUM_REGIONS; ++r) {
PCIIORegion *region = &dev->io_regions[r];
if (!region->size) {
@@ -240,10 +234,6 @@ static void pci_do_device_reset(PCIDevice *dev)
pci_set_long(dev->config + pci_bar(dev, r), region->type);
}
}
- pci_update_mappings(dev);
-
- msi_reset(dev);
- msix_reset(dev);
}
/*
@@ -253,7 +243,23 @@ static void pci_do_device_reset(PCIDevice *dev)
void pci_device_reset(PCIDevice *dev)
{
qdev_reset_all(&dev->qdev);
- pci_do_device_reset(dev);
+ pci_device_deassert_intx(dev);
+ assert(dev->irq_state == 0);
+
+ /* Clear all writable bits */
+ pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
+ pci_get_word(dev->wmask + PCI_COMMAND) |
+ pci_get_word(dev->w1cmask + PCI_COMMAND));
+ pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
+ pci_get_word(dev->wmask + PCI_STATUS) |
+ pci_get_word(dev->w1cmask + PCI_STATUS));
+ dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
+ dev->config[PCI_INTERRUPT_LINE] = 0x0;
+ pci_reset_regions(dev);
+ pci_update_mappings(dev);
+
+ msi_reset(dev);
+ msix_reset(dev);
}
/*
@@ -268,7 +274,7 @@ static void pcibus_reset(BusState *qbus)
for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
if (bus->devices[i]) {
- pci_do_device_reset(bus->devices[i]);
+ pci_device_reset(bus->devices[i]);
}
}
@@ -771,6 +777,15 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp)
dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
}
+ /* With SR/IOV and ARI, a device at function 0 need not be a multifunction
+ * device, as it may just be a VF that ended up with function 0 in
+ * the legacy PCI interpretation. Avoid failing in such cases:
+ */
+ if (pci_is_vf(dev) &&
+ dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+ return;
+ }
+
/*
* multifunction bit is interpreted in two ways as follows.
* - all functions must set the bit to 1.
@@ -962,6 +977,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
uint64_t wmask;
pcibus_t size = memory_region_size(memory);
+ assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */
assert(region_num >= 0);
assert(region_num < PCI_NUM_REGIONS);
if (size & (size-1)) {
@@ -1060,11 +1076,44 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
return pci_dev->io_regions[region_num].addr;
}
-static pcibus_t pci_bar_address(PCIDevice *d,
- int reg, uint8_t type, pcibus_t size)
+
+static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
+ uint8_t type, pcibus_t size)
+{
+ pcibus_t new_addr;
+ if (!pci_is_vf(d)) {
+ int bar = pci_bar(d, reg);
+ if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+ new_addr = pci_get_quad(d->config + bar);
+ } else {
+ new_addr = pci_get_long(d->config + bar);
+ }
+ } else {
+ PCIDevice *pf = d->exp.sriov_vf.pf;
+ uint16_t sriov_cap = pf->exp.sriov_cap;
+ int bar = sriov_cap + PCI_SRIOV_BAR + reg * 4;
+ uint16_t vf_offset = pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
+ uint16_t vf_stride = pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
+ uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride;
+
+ if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+ new_addr = pci_get_quad(pf->config + bar);
+ } else {
+ new_addr = pci_get_long(pf->config + bar);
+ }
+ new_addr += vf_num * size;
+ }
+ if (reg != PCI_ROM_SLOT) {
+ /* Preserve the rom enable bit */
+ new_addr &= ~(size - 1);
+ }
+ return new_addr;
+}
+
+pcibus_t pci_bar_address(PCIDevice *d,
+ int reg, uint8_t type, pcibus_t size)
{
pcibus_t new_addr, last_addr;
- int bar = pci_bar(d, reg);
uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
Object *machine = qdev_get_machine();
ObjectClass *oc = object_get_class(machine);
@@ -1075,7 +1124,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
if (!(cmd & PCI_COMMAND_IO)) {
return PCI_BAR_UNMAPPED;
}
- new_addr = pci_get_long(d->config + bar) & ~(size - 1);
+ new_addr = pci_config_get_bar_addr(d, reg, type, size);
last_addr = new_addr + size - 1;
/* Check if 32 bit BAR wraps around explicitly.
* TODO: make priorities correct and remove this work around.
@@ -1090,11 +1139,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
if (!(cmd & PCI_COMMAND_MEMORY)) {
return PCI_BAR_UNMAPPED;
}
- if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
- new_addr = pci_get_quad(d->config + bar);
- } else {
- new_addr = pci_get_long(d->config + bar);
- }
+ new_addr = pci_config_get_bar_addr(d, reg, type, size);
/* the ROM slot has a specific enable bit */
if (reg == PCI_ROM_SLOT && !(new_addr & PCI_ROM_ADDRESS_ENABLE)) {
return PCI_BAR_UNMAPPED;
@@ -1228,6 +1273,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
msi_write_config(d, addr, val_in, l);
msix_write_config(d, addr, val_in, l);
+ pcie_sriov_config_write(d, addr, val_in, l);
}
/***********************************************************/
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 6e28985..774b9ed 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -253,7 +253,7 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
* Right now, only a device of function = 0 is allowed to be
* hot plugged/unplugged.
*/
- assert(PCI_FUNC(pci_dev->devfn) == 0);
+ assert(PCI_FUNC(pci_dev->devfn) == 0 || pci_is_vf(pci_dev));
pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_PDS);
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
new file mode 100644
index 0000000..25de8d9
--- /dev/null
+++ b/hw/pci/pcie_sriov.c
@@ -0,0 +1,263 @@
+/*
+ * pcie_sriov.c:
+ *
+ * Implementation of SR/IOV emulation support.
+ *
+ * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pci_bus.h"
+#include "qemu/error-report.h"
+#include "qemu/range.h"
+#include "trace.h"
+
+#define SRIOV_ID(dev) \
+ (dev)->name, PCI_SLOT((dev)->devfn), PCI_FUNC((dev)->devfn)
+
+static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name);
+static void unregister_vfs(PCIDevice *dev);
+
+void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+ const char *vfname, uint16_t vf_dev_id,
+ uint16_t init_vfs, uint16_t total_vfs,
+ uint16_t vf_offset, uint16_t vf_stride)
+{
+ uint8_t *cfg = dev->config + offset;
+ uint8_t *wmask;
+
+ pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
+ offset, PCI_EXT_CAP_SRIOV_SIZEOF);
+ dev->exp.sriov_cap = offset;
+ dev->exp.sriov_pf.num_vfs = 0;
+ dev->exp.sriov_pf.vfname = g_strdup(vfname);
+ dev->exp.sriov_pf.vf = NULL;
+
+ pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
+ pci_set_word(cfg + PCI_SRIOV_VF_STRIDE, vf_stride);
+
+ /* Mandatory page sizes to support.
+ * Device implementations can call pcie_sriov_pf_add_sup_pgsize()
+ * to set more bits:
+ */
+ pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, SRIOV_SUP_PGSIZE_MINREQ);
+
+ /* Default is to use 4K pages, software can modify it
+ * to any of the supported bits
+ */
+ pci_set_word(cfg + PCI_SRIOV_SYS_PGSIZE, 0x1);
+
+ /* Set up device ID and initial/total number of VFs available */
+ pci_set_word(cfg + PCI_SRIOV_VF_DID, vf_dev_id);
+ pci_set_word(cfg + PCI_SRIOV_INITIAL_VF, init_vfs);
+ pci_set_word(cfg + PCI_SRIOV_TOTAL_VF, total_vfs);
+ pci_set_word(cfg + PCI_SRIOV_NUM_VF, 0);
+
+ /* Write enable control bits */
+ wmask = dev->wmask + offset;
+ pci_set_word(wmask + PCI_SRIOV_CTRL,
+ PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI);
+ pci_set_word(wmask + PCI_SRIOV_NUM_VF, 0xffff);
+ pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
+
+ qdev_prop_set_bit(&dev->qdev, "multifunction", true);
+}
+
+void pcie_sriov_pf_exit(PCIDevice *dev)
+{
+ unregister_vfs(dev);
+ g_free((char *)dev->exp.sriov_pf.vfname);
+ dev->exp.sriov_pf.vfname = NULL;
+}
+
+void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
+ uint8_t type, dma_addr_t size)
+{
+ uint32_t addr;
+ uint64_t wmask;
+ uint16_t sriov_cap = dev->exp.sriov_cap;
+
+ assert(sriov_cap > 0);
+ assert(region_num >= 0);
+ assert(region_num < PCI_NUM_REGIONS);
+ assert(region_num != PCI_ROM_SLOT);
+
+ wmask = ~(size - 1);
+ addr = sriov_cap + PCI_SRIOV_BAR + region_num * 4;
+
+ pci_set_long(dev->config + addr, type);
+ if (!(type & PCI_BASE_ADDRESS_SPACE_IO) &&
+ type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+ pci_set_quad(dev->wmask + addr, wmask);
+ pci_set_quad(dev->cmask + addr, ~0ULL);
+ } else {
+ pci_set_long(dev->wmask + addr, wmask & 0xffffffff);
+ pci_set_long(dev->cmask + addr, 0xffffffff);
+ }
+ dev->exp.sriov_pf.vf_bar_type[region_num] = type;
+}
+
+void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
+ MemoryRegion *memory)
+{
+ PCIIORegion *r;
+ uint8_t type;
+ pcibus_t size = memory_region_size(memory);
+
+ assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */
+ assert(region_num >= 0);
+ assert(region_num < PCI_NUM_REGIONS);
+ type = dev->exp.sriov_vf.pf->exp.sriov_pf.vf_bar_type[region_num];
+
+ if (!is_power_of_2(size)) {
+ error_report("%s: PCI region size must be a power"
+ " of two - type=0x%x, size=0x%"FMT_PCIBUS"\n",
+ __func__, type, size);
+ exit(1);
+ }
+
+ r = &dev->io_regions[region_num];
+ r->memory = memory;
+ r->address_space =
+ type & PCI_BASE_ADDRESS_SPACE_IO
+ ? dev->bus->address_space_io
+ : dev->bus->address_space_mem;
+ r->size = size;
+ r->type = type;
+
+ r->addr = pci_bar_address(dev, region_num, r->type, r->size);
+ if (r->addr != PCI_BAR_UNMAPPED) {
+ memory_region_add_subregion_overlap(r->address_space,
+ r->addr, r->memory, 1);
+ }
+}
+
+static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name)
+{
+ PCIDevice *dev = pci_create(pf->bus, devfn, name);
+ dev->exp.sriov_vf.pf = pf;
+ Error *local_err = NULL;
+
+ object_property_set_bool(OBJECT(&dev->qdev), true, "realized", &local_err);
+ if (local_err) {
+ error_report_err(local_err);
+ return NULL;
+ }
+
+ /* set vid/did according to sr/iov spec - they are not used */
+ pci_config_set_vendor_id(dev->config, 0xffff);
+ pci_config_set_device_id(dev->config, 0xffff);
+ return dev;
+}
+
+static void register_vfs(PCIDevice *dev)
+{
+ uint16_t num_vfs;
+ uint16_t i;
+ uint16_t sriov_cap = dev->exp.sriov_cap;
+ uint16_t vf_offset = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
+ uint16_t vf_stride = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
+ int32_t devfn = dev->devfn + vf_offset;
+
+ assert(sriov_cap > 0);
+ num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
+
+ dev->exp.sriov_pf.vf = g_malloc(sizeof(PCIDevice *) * num_vfs);
+ assert(dev->exp.sriov_pf.vf);
+
+ trace_sriov_register_vfs(SRIOV_ID(dev), num_vfs);
+ for (i = 0; i < num_vfs; i++) {
+ dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, dev->exp.sriov_pf.vfname);
+ if (!dev->exp.sriov_pf.vf[i]) {
+ num_vfs = i;
+ break;
+ }
+ devfn += vf_stride;
+ }
+ dev->exp.sriov_pf.num_vfs = num_vfs;
+}
+
+static void unregister_vfs(PCIDevice *dev)
+{
+ Error *local_err = NULL;
+ uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
+ uint16_t i;
+
+ trace_sriov_unregister_vfs(SRIOV_ID(dev), num_vfs);
+ for (i = 0; i < num_vfs; i++) {
+ object_property_set_bool(OBJECT(&dev->exp.sriov_pf.vf[i]->qdev), false, "realized", &local_err);
+ if (local_err) {
+ fprintf(stderr, "Failed to unplug: %s\n",
+ error_get_pretty(local_err));
+ error_free(local_err);
+ }
+ }
+ g_free(dev->exp.sriov_pf.vf);
+ dev->exp.sriov_pf.vf = NULL;
+ dev->exp.sriov_pf.num_vfs = 0;
+ pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0);
+}
+
+void pcie_sriov_config_write(PCIDevice *dev, uint32_t address, uint32_t val, int len)
+{
+ uint32_t off;
+ uint16_t sriov_cap = dev->exp.sriov_cap;
+
+ if (!sriov_cap || address < sriov_cap) {
+ return;
+ }
+ off = address - sriov_cap;
+ if (off >= PCI_EXT_CAP_SRIOV_SIZEOF) {
+ return;
+ }
+
+ trace_sriov_config_write(SRIOV_ID(dev), off, val, len);
+
+ if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
+ if (dev->exp.sriov_pf.num_vfs) {
+ if (!(val & PCI_SRIOV_CTRL_VFE)) {
+ unregister_vfs(dev);
+ }
+ } else {
+ if (val & PCI_SRIOV_CTRL_VFE) {
+ register_vfs(dev);
+ }
+ }
+ }
+}
+
+
+/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs */
+void pcie_sriov_pf_disable_vfs(PCIDevice *dev)
+{
+ uint16_t sriov_cap = dev->exp.sriov_cap;
+ if (sriov_cap) {
+ uint32_t val = pci_get_byte(dev->config + sriov_cap + PCI_SRIOV_CTRL);
+ if (val & PCI_SRIOV_CTRL_VFE) {
+ val &= ~PCI_SRIOV_CTRL_VFE;
+ pcie_sriov_config_write(dev, sriov_cap + PCI_SRIOV_CTRL, val, 1);
+ }
+ }
+}
+
+/* Add optional supported page sizes to the mask of supported page sizes */
+void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize)
+{
+ uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+ uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
+
+ uint16_t sup_pgsize = pci_get_word(cfg + PCI_SRIOV_SUP_PGSIZE);
+
+ sup_pgsize |= opt_sup_pgsize;
+
+ /* Make sure the new bits are set, and that system page size
+ * also can be set to any of the new values according to spec:
+ */
+ pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, sup_pgsize);
+ pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, sup_pgsize);
+}
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 551cb3d..2e9d8ba 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -11,8 +11,6 @@
/* PCI includes legacy ISA access. */
#include "hw/isa/isa.h"
-#include "hw/pci/pcie.h"
-
/* PCI bus */
#define PCI_DEVFN(slot, func) ((((slot) & 0x1f) << 3) | ((func) & 0x07))
@@ -132,6 +130,7 @@ enum {
#define QEMU_PCI_VGA_IO_HI_SIZE 0x20
#include "hw/pci/pci_regs.h"
+#include "hw/pci/pcie.h"
/* PCI HEADER_TYPE */
#define PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
@@ -421,6 +420,9 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
+pcibus_t pci_bar_address(PCIDevice *d,
+ int reg, uint8_t type, pcibus_t size);
+
static inline void
pci_set_byte(uint8_t *config, uint8_t val)
{
@@ -672,6 +674,11 @@ static inline int pci_is_express(const PCIDevice *d)
return d->cap_present & QEMU_PCI_CAP_EXPRESS;
}
+static inline int pci_is_vf(const PCIDevice *d)
+{
+ return d->exp.sriov_vf.pf != NULL;
+}
+
static inline uint32_t pci_config_size(const PCIDevice *d)
{
return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b48a7a2..b09f79a 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -25,6 +25,7 @@
#include "hw/pci/pci_regs.h"
#include "hw/pci/pcie_regs.h"
#include "hw/pci/pcie_aer.h"
+#include "hw/pci/pcie_sriov.h"
#include "hw/hotplug.h"
typedef enum {
@@ -74,6 +75,11 @@ struct PCIExpressDevice {
/* AER */
uint16_t aer_cap;
PCIEAERLog aer_log;
+
+ /* SR/IOV */
+ uint16_t sriov_cap;
+ PCIESriovPF sriov_pf;
+ PCIESriovVF sriov_vf;
};
#define COMPAT_PROP_PCP "power_controller_present"
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
new file mode 100644
index 0000000..71c2b00
--- /dev/null
+++ b/include/hw/pci/pcie_sriov.h
@@ -0,0 +1,58 @@
+/*
+ * pcie_sriov.h:
+ *
+ * Implementation of SR/IOV emulation support.
+ *
+ * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_PCIE_SRIOV_H
+#define QEMU_PCIE_SRIOV_H
+
+struct PCIESriovPF {
+ uint16_t num_vfs; /* Number of virtual functions created */
+ uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */
+ const char *vfname; /* Reference to the device type used for the VFs */
+ PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
+};
+
+struct PCIESriovVF {
+ PCIDevice *pf; /* Pointer back to owner physical function */
+};
+
+void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+ const char *vfname, uint16_t vf_dev_id,
+ uint16_t init_vfs, uint16_t total_vfs,
+ uint16_t vf_offset, uint16_t vf_stride);
+void pcie_sriov_pf_exit(PCIDevice *dev);
+
+/* Set up a VF bar in the SR/IOV bar area */
+void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
+ uint8_t type, dma_addr_t size);
+
+/* Instantiate a bar for a VF */
+void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
+ MemoryRegion *memory);
+
+/* Default (minimal) page size support values as required by the SR/IOV standard:
+ * 0x553 << 12 = 0x553000 = 4K + 8K + 64K + 256K + 1M + 4M
+ */
+#define SRIOV_SUP_PGSIZE_MINREQ 0x553
+
+/* Optionally add supported page sizes to the mask of supported page sizes
+ * Page size values are interpreted as opt_sup_pgsize << 12.
+ */
+void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
+
+/* SR/IOV capability config write handler */
+void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
+ uint32_t val, int len);
+
+/* Reset SR/IOV VF Enable bit to unregister all VFs */
+void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
+
+#endif /* QEMU_PCIE_SRIOV_H */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index ee1ce1d..0b89316 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -54,6 +54,8 @@ typedef struct PCIDevice PCIDevice;
typedef struct PCIEAERErr PCIEAERErr;
typedef struct PCIEAERLog PCIEAERLog;
typedef struct PCIEAERMsg PCIEAERMsg;
+typedef struct PCIESriovPF PCIESriovPF;
+typedef struct PCIESriovVF PCIESriovVF;
typedef struct PCIEPort PCIEPort;
typedef struct PCIESlot PCIESlot;
typedef struct PCIExpressDevice PCIExpressDevice;
diff --git a/trace-events b/trace-events
index a0ddc6b..fd51c5f 100644
--- a/trace-events
+++ b/trace-events
@@ -1531,6 +1531,11 @@ xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space (ad
pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x -> 0x%x"
pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"
+# hw/pci/pcie_sriov.c
+sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: creating %d vf devs"
+sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: Unregistering %d vf devs"
+sriov_config_write(const char *name, int slot, int fun, uint32_t offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
+
# hw/vfio/pci.c
vfio_intx_interrupt(const char *name, char line) " (%s) Pin %c"
vfio_intx_eoi(const char *name) " (%s) EOI"
--
2.4.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-14 15:51 [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang
@ 2015-10-14 15:51 ` Knut Omang
2015-10-16 8:36 ` Michael S. Tsirkin
2015-10-18 11:03 ` Marcel Apfelbaum
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 4/4] pcie: A few minor fixes (type+code simplify) Knut Omang
2015-10-18 11:02 ` [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Marcel Apfelbaum
4 siblings, 2 replies; 23+ messages in thread
From: Knut Omang @ 2015-10-14 15:51 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Knut Omang,
Richard W.M. Jones, Alex Williamson, Gonglei (Arei), Jan Kiszka,
Paolo Bonzini, Dotan Barak, Richard Henderson
Add a small intro + minimal documentation for how to
implement SR/IOV support for an emulated device.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
docs/pcie_sriov.txt | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)
create mode 100644 docs/pcie_sriov.txt
diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
new file mode 100644
index 0000000..f5e891e
--- /dev/null
+++ b/docs/pcie_sriov.txt
@@ -0,0 +1,115 @@
+PCI SR/IOV EMULATION SUPPORT
+============================
+
+Description
+===========
+SR/IOV (Single Root I/O Virtualization) is an optional extended capability
+of a PCI Express device. It allows a single physical function (PF) to appear as multiple
+virtual functions (VFs) for the main purpose of eliminating software
+overhead in I/O from virtual machines.
+
+Qemu now implements the basic common functionality to enable an emulated device
+to support SR/IOV. Yet no fully implemented devices exists in Qemu, but a
+proof-of-concept hack of the Intel igb can be found here:
+
+git://github.com/knuto/qemu.git sriov_patches_v5
+
+Implementation
+==============
+Implementing emulation of an SR/IOV capable device typically consists of
+implementing support for two types of device classes; the "normal" physical device
+(PF) and the virtual device (VF). From Qemu's perspective, the VFs are just
+like other devices, except that some of their properties are derived from
+the PF.
+
+A virtual function is different from a physical function in that the BAR
+space for all VFs are defined by the BAR registers in the PFs SR/IOV
+capability. All VFs have the same BARs and BAR sizes.
+
+Accesses to these virtual BARs then is computed as
+
+ <VF BAR start> + <VF number> * <BAR sz> + <offset>
+
+From our emulation perspective this means that there is a separate call for
+setting up a BAR for a VF.
+
+1) To enable SR/IOV support in the PF, it must be a PCI Express device so
+ you would need to add a PCI Express capability in the normal PCI
+ capability list. You might also want to add an ARI (Alternative
+ Routing-ID Interpretation) capability to indicate that your device
+ supports functions beyond it's "own" function space (0-7),
+ which is necessary to support more than 7 functions, or
+ if functions extends beyond offset 7 because they are placed at an
+ offset > 1 or have stride > 1.
+
+ ...
+ #include "hw/pci/pcie.h"
+ #include "hw/pci/pcie_sriov.h"
+
+ pci_your_pf_dev_realize( ... )
+ {
+ ...
+ int ret = pcie_endpoint_cap_init(d, 0x70);
+ ...
+ pcie_ari_init(d, 0x100, 1);
+ ...
+
+ /* Add and initialize the SR/IOV capability */
+ pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+ vf_devid, initial_vfs, total_vfs,
+ fun_offset, stride);
+
+ /* Set up individual VF BARs (parameters as for normal BARs) */
+ pcie_sriov_pf_init_vf_bar( ... )
+ ...
+ }
+
+ For cleanup, you simply call:
+
+ pcie_sriov_pf_exit(device);
+
+ which will delete all the virtual functions and associated resources.
+
+2) Similarly in the implementation of the virtual function, you need to
+ make it a PCI Express device and add a similar set of capabilities
+ except for the SR/IOV capability. Then you need to set up the VF BARs as
+ subregions of the PFs SR/IOV VF BARs by calling
+ pcie_sriov_vf_register_bar() instead of the normal pci_register_bar() call:
+
+ pci_your_vf_dev_realize( ... )
+ {
+ ...
+ int ret = pcie_endpoint_cap_init(d, 0x60);
+ ...
+ pcie_ari_init(d, 0x100, 1);
+ ...
+ memory_region_init(mr, ... )
+ pcie_sriov_vf_register_bar(d, bar_nr, mr);
+ ...
+ }
+
+Testing on Linux guest
+======================
+The easiest is if your device driver supports sysfs based SR/IOV
+enabling. Support for this was added in kernel v.3.8, so not all drivers
+support it yet.
+
+To enable 4 VFs for a device at 01:00.0:
+
+ modprobe yourdriver
+ echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
+
+You should now see 4 VFs with lspci.
+To turn SR/IOV off again - the standard requires you to turn it off before you can enable
+another VF count, and the emulation enforces this:
+
+ echo 0 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
+
+Older drivers typically provide a max_vfs module parameter
+to enable it at load time:
+
+ modprobe yourdriver max_vfs=4
+
+To disable the VFs again then, you simply have to unload the driver:
+
+ rmmod yourdriver
--
2.4.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH v5 4/4] pcie: A few minor fixes (type+code simplify)
2015-10-14 15:51 [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
` (2 preceding siblings ...)
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Knut Omang
@ 2015-10-14 15:51 ` Knut Omang
2015-10-18 11:03 ` Marcel Apfelbaum
2015-10-18 11:02 ` [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Marcel Apfelbaum
4 siblings, 1 reply; 23+ messages in thread
From: Knut Omang @ 2015-10-14 15:51 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Knut Omang,
Richard W.M. Jones, Alex Williamson, Gonglei (Arei), Jan Kiszka,
Paolo Bonzini, Dotan Barak, Richard Henderson
- Fix comment typo in pcie_cap_slot_write_config
- Simplify code in pcie_cap_slot_hot_unplug_request_cb.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
---
hw/pci/pcie.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 774b9ed..ba49c0f 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -265,10 +265,11 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
{
uint8_t *exp_cap;
+ PCIDevice *pdev = PCI_DEVICE(hotplug_dev);
- pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
+ pcie_cap_slot_hotplug_common(pdev, dev, &exp_cap, errp);
- pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
+ pcie_cap_slot_push_attention_button(pdev);
}
/* pci express slot for pci express root/downstream port
@@ -408,7 +409,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
}
/*
- * If the slot is polulated, power indicator is off and power
+ * If the slot is populated, power indicator is off and power
* controller is off, it is safe to detach the devices.
*/
if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
--
2.4.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Knut Omang
@ 2015-10-16 8:36 ` Michael S. Tsirkin
2015-10-16 9:56 ` Knut Omang
2015-10-18 11:03 ` Marcel Apfelbaum
1 sibling, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2015-10-16 8:36 UTC (permalink / raw)
To: Knut Omang
Cc: Eduardo Habkost, qemu-devel, Richard W.M. Jones, Alex Williamson,
Gonglei (Arei), Jan Kiszka, Paolo Bonzini, Dotan Barak,
Richard Henderson
On Wed, Oct 14, 2015 at 05:51:17PM +0200, Knut Omang wrote:
> Add a small intro + minimal documentation for how to
> implement SR/IOV support for an emulated device.
I worry that we won't keep this up to date as
code changes. Could some or all of this go into
comments in relevant headers?
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
> docs/pcie_sriov.txt | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 115 insertions(+)
> create mode 100644 docs/pcie_sriov.txt
>
> diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
> new file mode 100644
> index 0000000..f5e891e
> --- /dev/null
> +++ b/docs/pcie_sriov.txt
> @@ -0,0 +1,115 @@
> +PCI SR/IOV EMULATION SUPPORT
> +============================
> +
> +Description
> +===========
> +SR/IOV (Single Root I/O Virtualization) is an optional extended capability
> +of a PCI Express device. It allows a single physical function (PF) to appear as multiple
> +virtual functions (VFs) for the main purpose of eliminating software
> +overhead in I/O from virtual machines.
> +
> +Qemu now implements the basic common functionality to enable an emulated device
> +to support SR/IOV. Yet no fully implemented devices exists in Qemu, but a
> +proof-of-concept hack of the Intel igb can be found here:
> +
> +git://github.com/knuto/qemu.git sriov_patches_v5
That branch does not seem to be there.
I don't think we should put such short-lived links into
repository.
> +
> +Implementation
> +==============
> +Implementing emulation of an SR/IOV capable device typically consists of
> +implementing support for two types of device classes; the "normal" physical device
> +(PF) and the virtual device (VF). From Qemu's perspective, the VFs are just
> +like other devices, except that some of their properties are derived from
> +the PF.
> +
> +A virtual function is different from a physical function in that the BAR
> +space for all VFs are defined by the BAR registers in the PFs SR/IOV
> +capability. All VFs have the same BARs and BAR sizes.
> +
> +Accesses to these virtual BARs then is computed as
> +
> + <VF BAR start> + <VF number> * <BAR sz> + <offset>
> +
> +From our emulation perspective this means that there is a separate call for
> +setting up a BAR for a VF.
> +
> +1) To enable SR/IOV support in the PF, it must be a PCI Express device so
> + you would need to add a PCI Express capability in the normal PCI
> + capability list. You might also want to add an ARI (Alternative
> + Routing-ID Interpretation) capability to indicate that your device
> + supports functions beyond it's "own" function space (0-7),
> + which is necessary to support more than 7 functions, or
> + if functions extends beyond offset 7 because they are placed at an
> + offset > 1 or have stride > 1.
> +
> + ...
> + #include "hw/pci/pcie.h"
> + #include "hw/pci/pcie_sriov.h"
> +
> + pci_your_pf_dev_realize( ... )
> + {
> + ...
> + int ret = pcie_endpoint_cap_init(d, 0x70);
> + ...
> + pcie_ari_init(d, 0x100, 1);
> + ...
> +
> + /* Add and initialize the SR/IOV capability */
> + pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
> + vf_devid, initial_vfs, total_vfs,
> + fun_offset, stride);
> +
> + /* Set up individual VF BARs (parameters as for normal BARs) */
> + pcie_sriov_pf_init_vf_bar( ... )
> + ...
> + }
> +
> + For cleanup, you simply call:
> +
> + pcie_sriov_pf_exit(device);
> +
> + which will delete all the virtual functions and associated resources.
> +
> +2) Similarly in the implementation of the virtual function, you need to
> + make it a PCI Express device and add a similar set of capabilities
> + except for the SR/IOV capability. Then you need to set up the VF BARs as
> + subregions of the PFs SR/IOV VF BARs by calling
> + pcie_sriov_vf_register_bar() instead of the normal pci_register_bar() call:
> +
> + pci_your_vf_dev_realize( ... )
> + {
> + ...
> + int ret = pcie_endpoint_cap_init(d, 0x60);
> + ...
> + pcie_ari_init(d, 0x100, 1);
> + ...
> + memory_region_init(mr, ... )
> + pcie_sriov_vf_register_bar(d, bar_nr, mr);
> + ...
> + }
> +
> +Testing on Linux guest
> +======================
> +The easiest is if your device driver supports sysfs based SR/IOV
> +enabling. Support for this was added in kernel v.3.8, so not all drivers
> +support it yet.
> +
> +To enable 4 VFs for a device at 01:00.0:
> +
> + modprobe yourdriver
> + echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> +
> +You should now see 4 VFs with lspci.
> +To turn SR/IOV off again - the standard requires you to turn it off before you can enable
> +another VF count, and the emulation enforces this:
> +
> + echo 0 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> +
> +Older drivers typically provide a max_vfs module parameter
> +to enable it at load time:
> +
> + modprobe yourdriver max_vfs=4
> +
> +To disable the VFs again then, you simply have to unload the driver:
> +
> + rmmod yourdriver
> --
> 2.4.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 8:36 ` Michael S. Tsirkin
@ 2015-10-16 9:56 ` Knut Omang
2015-10-16 10:54 ` Michael S. Tsirkin
0 siblings, 1 reply; 23+ messages in thread
From: Knut Omang @ 2015-10-16 9:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eduardo Habkost, Marcel Apfelbaum, qemu-devel, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Fri, 2015-10-16 at 11:36 +0300, Michael S. Tsirkin wrote:
> On Wed, Oct 14, 2015 at 05:51:17PM +0200, Knut Omang wrote:
> > Add a small intro + minimal documentation for how to
> > implement SR/IOV support for an emulated device.
>
> I worry that we won't keep this up to date as
> code changes. Could some or all of this go into
> comments in relevant headers?
Hopefully the documented part is not going to change that much - it is
not accurate code anyway, just pseudo code to aid in developing new
devices using it.
Marcel's idea to write a small doc seemed like a good idea now. We
could always remove it when a full example driver has been written.
>
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > ---
> > docs/pcie_sriov.txt | 115
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 115 insertions(+)
> > create mode 100644 docs/pcie_sriov.txt
> >
> > diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
> > new file mode 100644
> > index 0000000..f5e891e
> > --- /dev/null
> > +++ b/docs/pcie_sriov.txt
> > @@ -0,0 +1,115 @@
> > +PCI SR/IOV EMULATION SUPPORT
> > +============================
> > +
> > +Description
> > +===========
> > +SR/IOV (Single Root I/O Virtualization) is an optional extended
> > capability
> > +of a PCI Express device. It allows a single physical function (PF)
> > to appear as multiple
> > +virtual functions (VFs) for the main purpose of eliminating
> > software
> > +overhead in I/O from virtual machines.
> > +
> > +Qemu now implements the basic common functionality to enable an
> > emulated device
> > +to support SR/IOV. Yet no fully implemented devices exists in
> > Qemu, but a
> > +proof-of-concept hack of the Intel igb can be found here:
> > +
> > +git://github.com/knuto/qemu.git sriov_patches_v5
>
> That branch does not seem to be there.
> I don't think we should put such short-lived links into
> repository.
Sorry, I just forgot to push it, it's there now.
I'll make sure it stays valid for as long as the reference in the doc
is there.
Hopefully this is a temporary way to be cleaned up once a working
example has been implemented,
But feel free to just skip this patch, the most important is to get the
generic SR/IOV code in there for others to use.
Thanks,
Knut
> > +
> > +Implementation
> > +==============
> > +Implementing emulation of an SR/IOV capable device typically
> > consists of
> > +implementing support for two types of device classes; the "normal"
> > physical device
> > +(PF) and the virtual device (VF). From Qemu's perspective, the VFs
> > are just
> > +like other devices, except that some of their properties are
> > derived from
> > +the PF.
> > +
> > +A virtual function is different from a physical function in that
> > the BAR
> > +space for all VFs are defined by the BAR registers in the PFs
> > SR/IOV
> > +capability. All VFs have the same BARs and BAR sizes.
> > +
> > +Accesses to these virtual BARs then is computed as
> > +
> > + <VF BAR start> + <VF number> * <BAR sz> + <offset>
> > +
> > +From our emulation perspective this means that there is a separate
> > call for
> > +setting up a BAR for a VF.
> > +
> > +1) To enable SR/IOV support in the PF, it must be a PCI Express
> > device so
> > + you would need to add a PCI Express capability in the normal
> > PCI
> > + capability list. You might also want to add an ARI (Alternative
> > + Routing-ID Interpretation) capability to indicate that your
> > device
> > + supports functions beyond it's "own" function space (0-7),
> > + which is necessary to support more than 7 functions, or
> > + if functions extends beyond offset 7 because they are placed at
> > an
> > + offset > 1 or have stride > 1.
> > +
> > + ...
> > + #include "hw/pci/pcie.h"
> > + #include "hw/pci/pcie_sriov.h"
> > +
> > + pci_your_pf_dev_realize( ... )
> > + {
> > + ...
> > + int ret = pcie_endpoint_cap_init(d, 0x70);
> > + ...
> > + pcie_ari_init(d, 0x100, 1);
> > + ...
> > +
> > + /* Add and initialize the SR/IOV capability */
> > + pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
> > + vf_devid, initial_vfs, total_vfs,
> > + fun_offset, stride);
> > +
> > + /* Set up individual VF BARs (parameters as for normal BARs)
> > */
> > + pcie_sriov_pf_init_vf_bar( ... )
> > + ...
> > + }
> > +
> > + For cleanup, you simply call:
> > +
> > + pcie_sriov_pf_exit(device);
> > +
> > + which will delete all the virtual functions and associated
> > resources.
> > +
> > +2) Similarly in the implementation of the virtual function, you
> > need to
> > + make it a PCI Express device and add a similar set of
> > capabilities
> > + except for the SR/IOV capability. Then you need to set up the
> > VF BARs as
> > + subregions of the PFs SR/IOV VF BARs by calling
> > + pcie_sriov_vf_register_bar() instead of the normal
> > pci_register_bar() call:
> > +
> > + pci_your_vf_dev_realize( ... )
> > + {
> > + ...
> > + int ret = pcie_endpoint_cap_init(d, 0x60);
> > + ...
> > + pcie_ari_init(d, 0x100, 1);
> > + ...
> > + memory_region_init(mr, ... )
> > + pcie_sriov_vf_register_bar(d, bar_nr, mr);
> > + ...
> > + }
> > +
> > +Testing on Linux guest
> > +======================
> > +The easiest is if your device driver supports sysfs based SR/IOV
> > +enabling. Support for this was added in kernel v.3.8, so not all
> > drivers
> > +support it yet.
> > +
> > +To enable 4 VFs for a device at 01:00.0:
> > +
> > + modprobe yourdriver
> > + echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> > +
> > +You should now see 4 VFs with lspci.
> > +To turn SR/IOV off again - the standard requires you to turn it
> > off before you can enable
> > +another VF count, and the emulation enforces this:
> > +
> > + echo 0 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> > +
> > +Older drivers typically provide a max_vfs module parameter
> > +to enable it at load time:
> > +
> > + modprobe yourdriver max_vfs=4
> > +
> > +To disable the VFs again then, you simply have to unload the
> > driver:
> > +
> > + rmmod yourdriver
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 9:56 ` Knut Omang
@ 2015-10-16 10:54 ` Michael S. Tsirkin
2015-10-16 11:25 ` Knut Omang
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2015-10-16 10:54 UTC (permalink / raw)
To: Knut Omang
Cc: Eduardo Habkost, Marcel Apfelbaum, qemu-devel, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Fri, Oct 16, 2015 at 11:56:39AM +0200, Knut Omang wrote:
> But feel free to just skip this patch, the most important is to get the
> generic SR/IOV code in there for others to use.
That's my question. Is it for others to use or are you going to use it
youself?
If you plan to e.g. complete igb to the point where Linux can use it,
then I'll merge this.
BTW for igb it might be easier to copy out e1000 code, at least that's what
linux guest drivers did.
Alternatively, if someone on list is interested in using this,
please review and ack.
--
MST
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 10:54 ` Michael S. Tsirkin
@ 2015-10-16 11:25 ` Knut Omang
2015-10-16 11:34 ` Michael S. Tsirkin
2015-10-16 11:27 ` Marcel Apfelbaum
2015-10-16 11:32 ` Richard W.M. Jones
2 siblings, 1 reply; 23+ messages in thread
From: Knut Omang @ 2015-10-16 11:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eduardo Habkost, Marcel Apfelbaum, qemu-devel, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Fri, 2015-10-16 at 13:54 +0300, Michael S. Tsirkin wrote:
> On Fri, Oct 16, 2015 at 11:56:39AM +0200, Knut Omang wrote:
> > But feel free to just skip this patch, the most important is to get
> > the
> > generic SR/IOV code in there for others to use.
>
> That's my question. Is it for others to use or are you going to use
> it youself?
Ideally both - I have had several people approach me for it, hopefully
that will lead to other emulated devices. I use it actively myself but
that code is not in a form that could go into Qemu yet.
> If you plan to e.g. complete igb to the point where Linux can use it,
> then I'll merge this.
I have a student assigned to work on it, so yes, that's definitely a
goal.
> BTW for igb it might be easier to copy out e1000 code, at least
> that's what linux guest drivers did.
Yes, that was more of a exercise for myself of subclassing using QOM..
> Alternatively, if someone on list is interested in using this,
> please review and ack.
Yes, please do :-)
Thanks,
Knut
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 10:54 ` Michael S. Tsirkin
2015-10-16 11:25 ` Knut Omang
@ 2015-10-16 11:27 ` Marcel Apfelbaum
2015-10-16 11:32 ` Richard W.M. Jones
2 siblings, 0 replies; 23+ messages in thread
From: Marcel Apfelbaum @ 2015-10-16 11:27 UTC (permalink / raw)
To: Michael S. Tsirkin, Knut Omang
Cc: Eduardo Habkost, Marcel Apfelbaum, qemu-devel, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On 10/16/2015 01:54 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 16, 2015 at 11:56:39AM +0200, Knut Omang wrote:
>> But feel free to just skip this patch, the most important is to get the
>> generic SR/IOV code in there for others to use.
>
> That's my question. Is it for others to use or are you going to use it
> youself?
>
> If you plan to e.g. complete igb to the point where Linux can use it,
> then I'll merge this.
> BTW for igb it might be easier to copy out e1000 code, at least that's what
> linux guest drivers did.
>
> Alternatively, if someone on list is interested in using this,
> please review and ack.
>
Hi,
Even if nobody is actually going to implement an SR-IOV enabled device
on top of this really soon, having it in QEMU should be handy.
I know at least a hardware vendor that used those bits to emulate their SR-IOV device.
Sadly their code will not come to QEMU, at least not so soon. (I'll ask at least for a "tested-by" tag)
Maybe somebody will want to enable SR-IOV for virtio :)
This is why I proposed to have a "how-to" doc for the first implementation.
The implementer will choose then how to deal with this doc.
I'll go over the code again and add my "Reviewed-by".
Thanks!
Marcel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 10:54 ` Michael S. Tsirkin
2015-10-16 11:25 ` Knut Omang
2015-10-16 11:27 ` Marcel Apfelbaum
@ 2015-10-16 11:32 ` Richard W.M. Jones
2015-10-16 11:39 ` Michael S. Tsirkin
2 siblings, 1 reply; 23+ messages in thread
From: Richard W.M. Jones @ 2015-10-16 11:32 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Eduardo Habkost, Marcel Apfelbaum, Knut Omang, qemu-devel,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Fri, Oct 16, 2015 at 01:54:48PM +0300, Michael S. Tsirkin wrote:
> On Fri, Oct 16, 2015 at 11:56:39AM +0200, Knut Omang wrote:
> > But feel free to just skip this patch, the most important is to get the
> > generic SR/IOV code in there for others to use.
>
> That's my question. Is it for others to use or are you going to use it
> youself?
>
> If you plan to e.g. complete igb to the point where Linux can use it,
> then I'll merge this.
> BTW for igb it might be easier to copy out e1000 code, at least that's what
> linux guest drivers did.
>
> Alternatively, if someone on list is interested in using this,
> please review and ack.
An emulated SR-IOV-capable network card would be useful for debugging
OpenStack's SR-IOV drivers.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 11:25 ` Knut Omang
@ 2015-10-16 11:34 ` Michael S. Tsirkin
2015-10-16 11:39 ` Paolo Bonzini
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2015-10-16 11:34 UTC (permalink / raw)
To: Knut Omang
Cc: Eduardo Habkost, Marcel Apfelbaum, qemu-devel, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Fri, Oct 16, 2015 at 01:25:24PM +0200, Knut Omang wrote:
> > If you plan to e.g. complete igb to the point where Linux can use it,
> > then I'll merge this.
>
> I have a student assigned to work on it, so yes, that's definitely a
> goal.
OK that's fine then. I just don't want this stay there for years on end.
"Like some forgotten corridor cleaned relentlessly by a lost Roomba but
where no user has trodden in years.". https://lwn.net/Articles/633416/
--
MST
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 11:32 ` Richard W.M. Jones
@ 2015-10-16 11:39 ` Michael S. Tsirkin
0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2015-10-16 11:39 UTC (permalink / raw)
To: Richard W.M. Jones
Cc: Eduardo Habkost, Marcel Apfelbaum, Knut Omang, qemu-devel,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Fri, Oct 16, 2015 at 12:32:54PM +0100, Richard W.M. Jones wrote:
> On Fri, Oct 16, 2015 at 01:54:48PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Oct 16, 2015 at 11:56:39AM +0200, Knut Omang wrote:
> > > But feel free to just skip this patch, the most important is to get the
> > > generic SR/IOV code in there for others to use.
> >
> > That's my question. Is it for others to use or are you going to use it
> > youself?
> >
> > If you plan to e.g. complete igb to the point where Linux can use it,
> > then I'll merge this.
> > BTW for igb it might be easier to copy out e1000 code, at least that's what
> > linux guest drivers did.
> >
> > Alternatively, if someone on list is interested in using this,
> > please review and ack.
>
> An emulated SR-IOV-capable network card would be useful for debugging
> OpenStack's SR-IOV drivers.
>
> Rich.
Yes but that's not the question. Is anyone working on this and wants
to use these patches?
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-p2v converts physical machines to virtual machines. Boot with a
> live CD or over the network (PXE) and turn machines into KVM guests.
> http://libguestfs.org/virt-v2v
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-16 11:34 ` Michael S. Tsirkin
@ 2015-10-16 11:39 ` Paolo Bonzini
0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2015-10-16 11:39 UTC (permalink / raw)
To: Michael S. Tsirkin, Knut Omang
Cc: Eduardo Habkost, Marcel Apfelbaum, qemu-devel, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Dotan Barak,
Richard Henderson
On 16/10/2015 13:34, Michael S. Tsirkin wrote:
> On Fri, Oct 16, 2015 at 01:25:24PM +0200, Knut Omang wrote:
>>> If you plan to e.g. complete igb to the point where Linux can use it,
>>> then I'll merge this.
>>
>> I have a student assigned to work on it, so yes, that's definitely a
>> goal.
>
> OK that's fine then. I just don't want this stay there for years on end.
Isn't that what happened for PCIe? :)
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization
2015-10-14 15:51 [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
` (3 preceding siblings ...)
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 4/4] pcie: A few minor fixes (type+code simplify) Knut Omang
@ 2015-10-18 11:02 ` Marcel Apfelbaum
2015-10-18 14:39 ` Knut Omang
4 siblings, 1 reply; 23+ messages in thread
From: Marcel Apfelbaum @ 2015-10-18 11:02 UTC (permalink / raw)
To: Knut Omang, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On 10/14/2015 06:51 PM, Knut Omang wrote:
> This patch set implements generic support for SR/IOV as an extension to the
> core PCIe functionality, similar to the way other capabilities such as AER
> is implemented.
>
> There is no implementation of any device that provides
> SR/IOV support included, but I have implemented a test
> example which can be found together with this patch set here:
>
> git://github.com/knuto/qemu.git sriov_patches_v5
>
> Testing with the example device was documented here:
>
> http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05110.html
>
> Changes since v4:
> - Mostly based on feeback in Marcel Apfelbaum's review:
> - The patch with changes to pci_regs.h got eliminated by rebase
> - Added some documentation as an additional patch
> - Some trivial fixes moved to separate patch
> - Modified code to use error and trace functions instead of printfs
Hi, thanks for posting another version!
Regarding the submission:
Please run ./scripts/checkpatch.pl on your patches, there are some
minor issues there (ms-dos line end...).
This would not prevent the maintainer to take the code, I think
he has some scripts to take care of it, however it may discover
some issues.
Otherwise, I only have some questions on patch 2/4.
I don't think it justifies a re-submission, I just wanted to point to some places
it may be worth to look again.
Thanks,
Marcel
>
> Changes since v3:
> - Reworked 'pci: Update pci_regs header' to merge kernel version improvements
> with the current qemu version instead of copying from the kernel version.
>
> Changes since v2:
> - Rebased onto 090d0bfd
> - Un-qdev'ified - avoids issues when resetting NUM_VFS
> - Fixed handling of vf_offset/vf_stride
>
> Changes since v1:
> - Rebased on top of latest master, eliminating prereqs.
> - Implement proper support for VF_STRIDE, VF_OFFSET and SUP_PGSIZE
> Time better spent fixing it than explaining what the previous
> limitations were.
> - Added new first patch to fix pci bug related to this
> - Split out patch to pci_default_config_write to a separate patch 2
> to highlight bug fix.
> - Refactored out logic into new source files
> hw/pci/pcie_sriov.c include/hw/pci/pcie_sriov.h
> similar to pcie_aer.c/h.
> - Rename functions and introduce structs to better separate
> pf and vf functionality.
> - Replaced is_vf member with pci_is_vf() function abstraction
> - Fix numerous syntax, whitespace and comment issues
> according to Michael's review.
> - Fix memory leaks.
> - Removed igb example device - a rebased version available
> on github instead.
>
> Knut Omang (4):
> pci: Make use of the devfn property when registering new devices
> pcie: Add support for Single Root I/O Virtualization (SR/IOV)
> pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
> pcie: A few minor fixes (type+code simplify)
>
> docs/pcie_sriov.txt | 115 +++++++++++++++++++
> hw/pci/Makefile.objs | 2 +-
> hw/pci/pci.c | 104 +++++++++++++-----
> hw/pci/pcie.c | 9 +-
> hw/pci/pcie_sriov.c | 263 ++++++++++++++++++++++++++++++++++++++++++++
> include/hw/pci/pci.h | 11 +-
> include/hw/pci/pcie.h | 6 +
> include/hw/pci/pcie_sriov.h | 58 ++++++++++
> include/qemu/typedefs.h | 2 +
> trace-events | 5 +
> 10 files changed, 539 insertions(+), 36 deletions(-)
> create mode 100644 docs/pcie_sriov.txt
> create mode 100644 hw/pci/pcie_sriov.c
> create mode 100644 include/hw/pci/pcie_sriov.h
>
> --
> 2.4.3
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang
@ 2015-10-18 11:02 ` Marcel Apfelbaum
2015-10-18 12:26 ` Michael S. Tsirkin
` (3 more replies)
0 siblings, 4 replies; 23+ messages in thread
From: Marcel Apfelbaum @ 2015-10-18 11:02 UTC (permalink / raw)
To: Knut Omang, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On 10/14/2015 06:51 PM, Knut Omang wrote:
> This patch provides the building blocks for creating an SR/IOV
> PCIe Extended Capability header and register/unregister
> SR/IOV Virtual Functions.
>
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
> hw/pci/Makefile.objs | 2 +-
> hw/pci/pci.c | 102 ++++++++++++-----
> hw/pci/pcie.c | 2 +-
> hw/pci/pcie_sriov.c | 263 ++++++++++++++++++++++++++++++++++++++++++++
> include/hw/pci/pci.h | 11 +-
> include/hw/pci/pcie.h | 6 +
> include/hw/pci/pcie_sriov.h | 58 ++++++++++
> include/qemu/typedefs.h | 2 +
> trace-events | 5 +
> 9 files changed, 419 insertions(+), 32 deletions(-)
> create mode 100644 hw/pci/pcie_sriov.c
> create mode 100644 include/hw/pci/pcie_sriov.h
>
> diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
> index 9f905e6..2226980 100644
> --- a/hw/pci/Makefile.objs
> +++ b/hw/pci/Makefile.objs
> @@ -3,7 +3,7 @@ common-obj-$(CONFIG_PCI) += msix.o msi.o
> common-obj-$(CONFIG_PCI) += shpc.o
> common-obj-$(CONFIG_PCI) += slotid_cap.o
> common-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
> -common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o
> +common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o pcie_sriov.o
>
> common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
> common-obj-$(CONFIG_ALL) += pci-stub.o
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index b095cfe..4fb5fcf 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -153,6 +153,9 @@ int pci_bar(PCIDevice *d, int reg)
> {
> uint8_t type;
>
> + /* PCIe virtual functions do not have their own BARs */
> + assert(!pci_is_vf(d));
> +
> if (reg != PCI_ROM_SLOT)
> return PCI_BASE_ADDRESS_0 + reg * 4;
>
> @@ -211,22 +214,13 @@ void pci_device_deassert_intx(PCIDevice *dev)
> }
> }
>
> -static void pci_do_device_reset(PCIDevice *dev)
> +static void pci_reset_regions(PCIDevice *dev)
> {
> int r;
> + if (pci_is_vf(dev)) {
> + return;
> + }
>
> - pci_device_deassert_intx(dev);
> - assert(dev->irq_state == 0);
> -
> - /* Clear all writable bits */
> - pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> - pci_get_word(dev->wmask + PCI_COMMAND) |
> - pci_get_word(dev->w1cmask + PCI_COMMAND));
> - pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> - pci_get_word(dev->wmask + PCI_STATUS) |
> - pci_get_word(dev->w1cmask + PCI_STATUS));
> - dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> - dev->config[PCI_INTERRUPT_LINE] = 0x0;
> for (r = 0; r < PCI_NUM_REGIONS; ++r) {
> PCIIORegion *region = &dev->io_regions[r];
> if (!region->size) {
> @@ -240,10 +234,6 @@ static void pci_do_device_reset(PCIDevice *dev)
> pci_set_long(dev->config + pci_bar(dev, r), region->type);
> }
> }
> - pci_update_mappings(dev);
> -
> - msi_reset(dev);
> - msix_reset(dev);
> }
>
> /*
> @@ -253,7 +243,23 @@ static void pci_do_device_reset(PCIDevice *dev)
> void pci_device_reset(PCIDevice *dev)
> {
> qdev_reset_all(&dev->qdev);
> - pci_do_device_reset(dev);
> + pci_device_deassert_intx(dev);
> + assert(dev->irq_state == 0);
> +
> + /* Clear all writable bits */
> + pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> + pci_get_word(dev->wmask + PCI_COMMAND) |
> + pci_get_word(dev->w1cmask + PCI_COMMAND));
> + pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> + pci_get_word(dev->wmask + PCI_STATUS) |
> + pci_get_word(dev->w1cmask + PCI_STATUS));
> + dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> + dev->config[PCI_INTERRUPT_LINE] = 0x0;
> + pci_reset_regions(dev);
> + pci_update_mappings(dev);
> +
> + msi_reset(dev);
> + msix_reset(dev);
> }
>
> /*
> @@ -268,7 +274,7 @@ static void pcibus_reset(BusState *qbus)
>
> for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
> if (bus->devices[i]) {
> - pci_do_device_reset(bus->devices[i]);
> + pci_device_reset(bus->devices[i]);
Hi,
Here you made the re-factoring:
- replaced pci_do_device_reset to a subset : pci_reset_regions.
- moved all the other code inside pci_device_reset.
but now pcibus_reset will call pci_device_reset that has an extra call to qdev_reset_all.
However pcibus_reset has several calling sites. My question is:
Is this necessary? If yes, I am only thinking how it will affect the other calling sites.
Why did they split it in the first place? Maybe somebody knows.
From what I saw it should be OK.
> }
> }
>
> @@ -771,6 +777,15 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp)
> dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
> }
>
> + /* With SR/IOV and ARI, a device at function 0 need not be a multifunction
> + * device, as it may just be a VF that ended up with function 0 in
> + * the legacy PCI interpretation. Avoid failing in such cases:
> + */
> + if (pci_is_vf(dev) &&
> + dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
> + return;
> + }
> +
> /*
> * multifunction bit is interpreted in two ways as follows.
> * - all functions must set the bit to 1.
> @@ -962,6 +977,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
> uint64_t wmask;
> pcibus_t size = memory_region_size(memory);
>
> + assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */
> assert(region_num >= 0);
> assert(region_num < PCI_NUM_REGIONS);
> if (size & (size-1)) {
> @@ -1060,11 +1076,44 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
> return pci_dev->io_regions[region_num].addr;
> }
>
> -static pcibus_t pci_bar_address(PCIDevice *d,
> - int reg, uint8_t type, pcibus_t size)
> +
> +static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
> + uint8_t type, pcibus_t size)
> +{
> + pcibus_t new_addr;
> + if (!pci_is_vf(d)) {
> + int bar = pci_bar(d, reg);
> + if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> + new_addr = pci_get_quad(d->config + bar);
> + } else {
> + new_addr = pci_get_long(d->config + bar);
> + }
> + } else {
> + PCIDevice *pf = d->exp.sriov_vf.pf;
> + uint16_t sriov_cap = pf->exp.sriov_cap;
> + int bar = sriov_cap + PCI_SRIOV_BAR + reg * 4;
> + uint16_t vf_offset = pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
> + uint16_t vf_stride = pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
> + uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride;
> +
> + if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> + new_addr = pci_get_quad(pf->config + bar);
> + } else {
> + new_addr = pci_get_long(pf->config + bar);
> + }
> + new_addr += vf_num * size;
> + }
> + if (reg != PCI_ROM_SLOT) {
> + /* Preserve the rom enable bit */
> + new_addr &= ~(size - 1);
> + }
> + return new_addr;
> +}
> +
> +pcibus_t pci_bar_address(PCIDevice *d,
> + int reg, uint8_t type, pcibus_t size)
> {
> pcibus_t new_addr, last_addr;
> - int bar = pci_bar(d, reg);
> uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
> Object *machine = qdev_get_machine();
> ObjectClass *oc = object_get_class(machine);
> @@ -1075,7 +1124,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
> if (!(cmd & PCI_COMMAND_IO)) {
> return PCI_BAR_UNMAPPED;
> }
> - new_addr = pci_get_long(d->config + bar) & ~(size - 1);
> + new_addr = pci_config_get_bar_addr(d, reg, type, size);
> last_addr = new_addr + size - 1;
> /* Check if 32 bit BAR wraps around explicitly.
> * TODO: make priorities correct and remove this work around.
> @@ -1090,11 +1139,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
> if (!(cmd & PCI_COMMAND_MEMORY)) {
> return PCI_BAR_UNMAPPED;
> }
> - if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> - new_addr = pci_get_quad(d->config + bar);
> - } else {
> - new_addr = pci_get_long(d->config + bar);
> - }
> + new_addr = pci_config_get_bar_addr(d, reg, type, size);
> /* the ROM slot has a specific enable bit */
> if (reg == PCI_ROM_SLOT && !(new_addr & PCI_ROM_ADDRESS_ENABLE)) {
> return PCI_BAR_UNMAPPED;
> @@ -1228,6 +1273,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
>
> msi_write_config(d, addr, val_in, l);
> msix_write_config(d, addr, val_in, l);
> + pcie_sriov_config_write(d, addr, val_in, l);
> }
>
> /***********************************************************/
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 6e28985..774b9ed 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -253,7 +253,7 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> * Right now, only a device of function = 0 is allowed to be
> * hot plugged/unplugged.
> */
> - assert(PCI_FUNC(pci_dev->devfn) == 0);
> + assert(PCI_FUNC(pci_dev->devfn) == 0 || pci_is_vf(pci_dev));
>
> pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> PCI_EXP_SLTSTA_PDS);
> diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> new file mode 100644
> index 0000000..25de8d9
> --- /dev/null
> +++ b/hw/pci/pcie_sriov.c
> @@ -0,0 +1,263 @@
> +/*
> + * pcie_sriov.c:
> + *
> + * Implementation of SR/IOV emulation support.
> + *
> + * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pcie.h"
> +#include "hw/pci/pci_bus.h"
> +#include "qemu/error-report.h"
> +#include "qemu/range.h"
> +#include "trace.h"
> +
> +#define SRIOV_ID(dev) \
> + (dev)->name, PCI_SLOT((dev)->devfn), PCI_FUNC((dev)->devfn)
This is a little "hacky" but it is used only for tracing and inside a C file.
I am OK with it.
> +
> +static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name);
> +static void unregister_vfs(PCIDevice *dev);
> +
> +void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
> + const char *vfname, uint16_t vf_dev_id,
> + uint16_t init_vfs, uint16_t total_vfs,
> + uint16_t vf_offset, uint16_t vf_stride)
> +{
> + uint8_t *cfg = dev->config + offset;
> + uint8_t *wmask;
> +
> + pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
> + offset, PCI_EXT_CAP_SRIOV_SIZEOF);
> + dev->exp.sriov_cap = offset;
> + dev->exp.sriov_pf.num_vfs = 0;
> + dev->exp.sriov_pf.vfname = g_strdup(vfname);
> + dev->exp.sriov_pf.vf = NULL;
> +
> + pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
> + pci_set_word(cfg + PCI_SRIOV_VF_STRIDE, vf_stride);
> +
> + /* Mandatory page sizes to support.
> + * Device implementations can call pcie_sriov_pf_add_sup_pgsize()
> + * to set more bits:
> + */
> + pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, SRIOV_SUP_PGSIZE_MINREQ);
> +
> + /* Default is to use 4K pages, software can modify it
> + * to any of the supported bits
> + */
> + pci_set_word(cfg + PCI_SRIOV_SYS_PGSIZE, 0x1);
> +
> + /* Set up device ID and initial/total number of VFs available */
> + pci_set_word(cfg + PCI_SRIOV_VF_DID, vf_dev_id);
> + pci_set_word(cfg + PCI_SRIOV_INITIAL_VF, init_vfs);
> + pci_set_word(cfg + PCI_SRIOV_TOTAL_VF, total_vfs);
> + pci_set_word(cfg + PCI_SRIOV_NUM_VF, 0);
> +
> + /* Write enable control bits */
> + wmask = dev->wmask + offset;
> + pci_set_word(wmask + PCI_SRIOV_CTRL,
> + PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI);
> + pci_set_word(wmask + PCI_SRIOV_NUM_VF, 0xffff);
> + pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
> +
> + qdev_prop_set_bit(&dev->qdev, "multifunction", true);
> +}
> +
> +void pcie_sriov_pf_exit(PCIDevice *dev)
> +{
> + unregister_vfs(dev);
> + g_free((char *)dev->exp.sriov_pf.vfname);
> + dev->exp.sriov_pf.vfname = NULL;
> +}
> +
> +void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
> + uint8_t type, dma_addr_t size)
> +{
> + uint32_t addr;
> + uint64_t wmask;
> + uint16_t sriov_cap = dev->exp.sriov_cap;
> +
> + assert(sriov_cap > 0);
> + assert(region_num >= 0);
> + assert(region_num < PCI_NUM_REGIONS);
> + assert(region_num != PCI_ROM_SLOT);
> +
> + wmask = ~(size - 1);
> + addr = sriov_cap + PCI_SRIOV_BAR + region_num * 4;
> +
> + pci_set_long(dev->config + addr, type);
> + if (!(type & PCI_BASE_ADDRESS_SPACE_IO) &&
> + type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> + pci_set_quad(dev->wmask + addr, wmask);
> + pci_set_quad(dev->cmask + addr, ~0ULL);
> + } else {
> + pci_set_long(dev->wmask + addr, wmask & 0xffffffff);
> + pci_set_long(dev->cmask + addr, 0xffffffff);
> + }
> + dev->exp.sriov_pf.vf_bar_type[region_num] = type;
> +}
> +
> +void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
> + MemoryRegion *memory)
> +{
> + PCIIORegion *r;
> + uint8_t type;
> + pcibus_t size = memory_region_size(memory);
> +
> + assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */
> + assert(region_num >= 0);
> + assert(region_num < PCI_NUM_REGIONS);
> + type = dev->exp.sriov_vf.pf->exp.sriov_pf.vf_bar_type[region_num];
> +
> + if (!is_power_of_2(size)) {
> + error_report("%s: PCI region size must be a power"
> + " of two - type=0x%x, size=0x%"FMT_PCIBUS"\n",
> + __func__, type, size);
> + exit(1);
> + }
> +
> + r = &dev->io_regions[region_num];
> + r->memory = memory;
> + r->address_space =
> + type & PCI_BASE_ADDRESS_SPACE_IO
> + ? dev->bus->address_space_io
> + : dev->bus->address_space_mem;
> + r->size = size;
> + r->type = type;
> +
> + r->addr = pci_bar_address(dev, region_num, r->type, r->size);
> + if (r->addr != PCI_BAR_UNMAPPED) {
> + memory_region_add_subregion_overlap(r->address_space,
> + r->addr, r->memory, 1);
> + }
> +}
> +
> +static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name)
> +{
> + PCIDevice *dev = pci_create(pf->bus, devfn, name);
> + dev->exp.sriov_vf.pf = pf;
> + Error *local_err = NULL;
> +
> + object_property_set_bool(OBJECT(&dev->qdev), true, "realized", &local_err);
> + if (local_err) {
> + error_report_err(local_err);
> + return NULL;
> + }
> +
> + /* set vid/did according to sr/iov spec - they are not used */
> + pci_config_set_vendor_id(dev->config, 0xffff);
> + pci_config_set_device_id(dev->config, 0xffff);
> + return dev;
> +}
> +
> +static void register_vfs(PCIDevice *dev)
> +{
> + uint16_t num_vfs;
> + uint16_t i;
> + uint16_t sriov_cap = dev->exp.sriov_cap;
> + uint16_t vf_offset = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
> + uint16_t vf_stride = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
> + int32_t devfn = dev->devfn + vf_offset;
> +
> + assert(sriov_cap > 0);
> + num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> +
> + dev->exp.sriov_pf.vf = g_malloc(sizeof(PCIDevice *) * num_vfs);
> + assert(dev->exp.sriov_pf.vf);
> +
> + trace_sriov_register_vfs(SRIOV_ID(dev), num_vfs);
> + for (i = 0; i < num_vfs; i++) {
> + dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, dev->exp.sriov_pf.vfname);
> + if (!dev->exp.sriov_pf.vf[i]) {
> + num_vfs = i;
> + break;
> + }
> + devfn += vf_stride;
> + }
> + dev->exp.sriov_pf.num_vfs = num_vfs;
> +}
> +
> +static void unregister_vfs(PCIDevice *dev)
> +{
> + Error *local_err = NULL;
> + uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
> + uint16_t i;
> +
> + trace_sriov_unregister_vfs(SRIOV_ID(dev), num_vfs);
> + for (i = 0; i < num_vfs; i++) {
> + object_property_set_bool(OBJECT(&dev->exp.sriov_pf.vf[i]->qdev), false, "realized", &local_err);
> + if (local_err) {
> + fprintf(stderr, "Failed to unplug: %s\n",
> + error_get_pretty(local_err));
> + error_free(local_err);
> + }
> + }
> + g_free(dev->exp.sriov_pf.vf);
> + dev->exp.sriov_pf.vf = NULL;
> + dev->exp.sriov_pf.num_vfs = 0;
> + pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0);
> +}
> +
> +void pcie_sriov_config_write(PCIDevice *dev, uint32_t address, uint32_t val, int len)
> +{
> + uint32_t off;
> + uint16_t sriov_cap = dev->exp.sriov_cap;
> +
> + if (!sriov_cap || address < sriov_cap) {
> + return;
> + }
> + off = address - sriov_cap;
> + if (off >= PCI_EXT_CAP_SRIOV_SIZEOF) {
> + return;
> + }
> +
> + trace_sriov_config_write(SRIOV_ID(dev), off, val, len);
> +
> + if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> + if (dev->exp.sriov_pf.num_vfs) {
> + if (!(val & PCI_SRIOV_CTRL_VFE)) {
> + unregister_vfs(dev);
> + }
> + } else {
> + if (val & PCI_SRIOV_CTRL_VFE) {
> + register_vfs(dev);
> + }
> + }
> + }
> +}
> +
> +
> +/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs */
> +void pcie_sriov_pf_disable_vfs(PCIDevice *dev)
> +{
> + uint16_t sriov_cap = dev->exp.sriov_cap;
> + if (sriov_cap) {
> + uint32_t val = pci_get_byte(dev->config + sriov_cap + PCI_SRIOV_CTRL);
> + if (val & PCI_SRIOV_CTRL_VFE) {
> + val &= ~PCI_SRIOV_CTRL_VFE;
> + pcie_sriov_config_write(dev, sriov_cap + PCI_SRIOV_CTRL, val, 1);
> + }
> + }
> +}
> +
> +/* Add optional supported page sizes to the mask of supported page sizes */
> +void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize)
> +{
> + uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> + uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
> +
> + uint16_t sup_pgsize = pci_get_word(cfg + PCI_SRIOV_SUP_PGSIZE);
> +
> + sup_pgsize |= opt_sup_pgsize;
> +
> + /* Make sure the new bits are set, and that system page size
> + * also can be set to any of the new values according to spec:
> + */
> + pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, sup_pgsize);
> + pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, sup_pgsize);
> +}
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 551cb3d..2e9d8ba 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -11,8 +11,6 @@
> /* PCI includes legacy ISA access. */
> #include "hw/isa/isa.h"
>
> -#include "hw/pci/pcie.h"
> -
> /* PCI bus */
>
> #define PCI_DEVFN(slot, func) ((((slot) & 0x1f) << 3) | ((func) & 0x07))
> @@ -132,6 +130,7 @@ enum {
> #define QEMU_PCI_VGA_IO_HI_SIZE 0x20
>
> #include "hw/pci/pci_regs.h"
> +#include "hw/pci/pcie.h"
>
> /* PCI HEADER_TYPE */
> #define PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
> @@ -421,6 +420,9 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
>
> +pcibus_t pci_bar_address(PCIDevice *d,
> + int reg, uint8_t type, pcibus_t size);
> +
> static inline void
> pci_set_byte(uint8_t *config, uint8_t val)
> {
> @@ -672,6 +674,11 @@ static inline int pci_is_express(const PCIDevice *d)
> return d->cap_present & QEMU_PCI_CAP_EXPRESS;
> }
>
> +static inline int pci_is_vf(const PCIDevice *d)
> +{
> + return d->exp.sriov_vf.pf != NULL;
> +}
> +
> static inline uint32_t pci_config_size(const PCIDevice *d)
> {
> return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
> diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> index b48a7a2..b09f79a 100644
> --- a/include/hw/pci/pcie.h
> +++ b/include/hw/pci/pcie.h
> @@ -25,6 +25,7 @@
> #include "hw/pci/pci_regs.h"
> #include "hw/pci/pcie_regs.h"
> #include "hw/pci/pcie_aer.h"
> +#include "hw/pci/pcie_sriov.h"
> #include "hw/hotplug.h"
>
> typedef enum {
> @@ -74,6 +75,11 @@ struct PCIExpressDevice {
> /* AER */
> uint16_t aer_cap;
> PCIEAERLog aer_log;
> +
> + /* SR/IOV */
> + uint16_t sriov_cap;
> + PCIESriovPF sriov_pf;
> + PCIESriovVF sriov_vf;
> };
>
> #define COMPAT_PROP_PCP "power_controller_present"
> diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
> new file mode 100644
> index 0000000..71c2b00
> --- /dev/null
> +++ b/include/hw/pci/pcie_sriov.h
> @@ -0,0 +1,58 @@
> +/*
> + * pcie_sriov.h:
> + *
> + * Implementation of SR/IOV emulation support.
> + *
> + * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_PCIE_SRIOV_H
> +#define QEMU_PCIE_SRIOV_H
> +
> +struct PCIESriovPF {
> + uint16_t num_vfs; /* Number of virtual functions created */
> + uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */
> + const char *vfname; /* Reference to the device type used for the VFs */
> + PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
> +};
> +
> +struct PCIESriovVF {
> + PCIDevice *pf; /* Pointer back to owner physical function */
> +};
Regarding naming conventions, maybe you it should be PCIESRIOVPF o Ir PCIESRIOVPf,
but this would not justify another version (in my opinion)
> +
> +/* Optionally add supported page sizes to the mask of supported page sizes
> + * Page size values are interpreted as opt_sup_pgsize << 12.
> + */
> +void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
> +
> +/* SR/IOV capability config write handler */
> +void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
> + uint32_t val, int len);
> +
> +/* Reset SR/IOV VF Enable bit to unregister all VFs */
> +void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
> +
> +#endif /* QEMU_PCIE_SRIOV_H */
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index ee1ce1d..0b89316 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -54,6 +54,8 @@ typedef struct PCIDevice PCIDevice;
> typedef struct PCIEAERErr PCIEAERErr;
> typedef struct PCIEAERLog PCIEAERLog;
> typedef struct PCIEAERMsg PCIEAERMsg;
> +typedef struct PCIESriovPF PCIESriovPF;
> +typedef struct PCIESriovVF PCIESriovVF;
> typedef struct PCIEPort PCIEPort;
> typedef struct PCIESlot PCIESlot;
> typedef struct PCIExpressDevice PCIExpressDevice;
> diff --git a/trace-events b/trace-events
> index a0ddc6b..fd51c5f 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1531,6 +1531,11 @@ xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space (ad
> pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x -> 0x%x"
> pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"
>
> +# hw/pci/pcie_sriov.c
> +sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: creating %d vf devs"
> +sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: Unregistering %d vf devs"
> +sriov_config_write(const char *name, int slot, int fun, uint32_t offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
> +
> # hw/vfio/pci.c
> vfio_intx_interrupt(const char *name, char line) " (%s) Pin %c"
> vfio_intx_eoi(const char *name) " (%s) EOI"
>
Thanks for all the work, besides the questions above, from my point of view:
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Thanks,
Marcel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 4/4] pcie: A few minor fixes (type+code simplify)
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 4/4] pcie: A few minor fixes (type+code simplify) Knut Omang
@ 2015-10-18 11:03 ` Marcel Apfelbaum
0 siblings, 0 replies; 23+ messages in thread
From: Marcel Apfelbaum @ 2015-10-18 11:03 UTC (permalink / raw)
To: Knut Omang, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On 10/14/2015 06:51 PM, Knut Omang wrote:
> - Fix comment typo in pcie_cap_slot_write_config
> - Simplify code in pcie_cap_slot_hot_unplug_request_cb.
>
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
> hw/pci/pcie.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 774b9ed..ba49c0f 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -265,10 +265,11 @@ void pcie_cap_slot_hot_unplug_request_cb(HotplugHandler *hotplug_dev,
> DeviceState *dev, Error **errp)
> {
> uint8_t *exp_cap;
> + PCIDevice *pdev = PCI_DEVICE(hotplug_dev);
>
> - pcie_cap_slot_hotplug_common(PCI_DEVICE(hotplug_dev), dev, &exp_cap, errp);
> + pcie_cap_slot_hotplug_common(pdev, dev, &exp_cap, errp);
>
> - pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
> + pcie_cap_slot_push_attention_button(pdev);
> }
>
> /* pci express slot for pci express root/downstream port
> @@ -408,7 +409,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
> }
>
> /*
> - * If the slot is polulated, power indicator is off and power
> + * If the slot is populated, power indicator is off and power
> * controller is off, it is safe to detach the devices.
> */
> if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Thanks,
Marcel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Knut Omang
2015-10-16 8:36 ` Michael S. Tsirkin
@ 2015-10-18 11:03 ` Marcel Apfelbaum
1 sibling, 0 replies; 23+ messages in thread
From: Marcel Apfelbaum @ 2015-10-18 11:03 UTC (permalink / raw)
To: Knut Omang, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On 10/14/2015 06:51 PM, Knut Omang wrote:
> Add a small intro + minimal documentation for how to
> implement SR/IOV support for an emulated device.
>
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> ---
> docs/pcie_sriov.txt | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 115 insertions(+)
> create mode 100644 docs/pcie_sriov.txt
>
> diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
> new file mode 100644
> index 0000000..f5e891e
> --- /dev/null
> +++ b/docs/pcie_sriov.txt
> @@ -0,0 +1,115 @@
> +PCI SR/IOV EMULATION SUPPORT
> +============================
> +
> +Description
> +===========
> +SR/IOV (Single Root I/O Virtualization) is an optional extended capability
> +of a PCI Express device. It allows a single physical function (PF) to appear as multiple
> +virtual functions (VFs) for the main purpose of eliminating software
> +overhead in I/O from virtual machines.
> +
> +Qemu now implements the basic common functionality to enable an emulated device
> +to support SR/IOV. Yet no fully implemented devices exists in Qemu, but a
> +proof-of-concept hack of the Intel igb can be found here:
> +
> +git://github.com/knuto/qemu.git sriov_patches_v5
> +
> +Implementation
> +==============
> +Implementing emulation of an SR/IOV capable device typically consists of
> +implementing support for two types of device classes; the "normal" physical device
> +(PF) and the virtual device (VF). From Qemu's perspective, the VFs are just
> +like other devices, except that some of their properties are derived from
> +the PF.
> +
> +A virtual function is different from a physical function in that the BAR
> +space for all VFs are defined by the BAR registers in the PFs SR/IOV
> +capability. All VFs have the same BARs and BAR sizes.
> +
> +Accesses to these virtual BARs then is computed as
> +
> + <VF BAR start> + <VF number> * <BAR sz> + <offset>
> +
> +From our emulation perspective this means that there is a separate call for
> +setting up a BAR for a VF.
> +
> +1) To enable SR/IOV support in the PF, it must be a PCI Express device so
> + you would need to add a PCI Express capability in the normal PCI
> + capability list. You might also want to add an ARI (Alternative
> + Routing-ID Interpretation) capability to indicate that your device
> + supports functions beyond it's "own" function space (0-7),
> + which is necessary to support more than 7 functions, or
> + if functions extends beyond offset 7 because they are placed at an
> + offset > 1 or have stride > 1.
> +
> + ...
> + #include "hw/pci/pcie.h"
> + #include "hw/pci/pcie_sriov.h"
> +
> + pci_your_pf_dev_realize( ... )
> + {
> + ...
> + int ret = pcie_endpoint_cap_init(d, 0x70);
> + ...
> + pcie_ari_init(d, 0x100, 1);
> + ...
> +
> + /* Add and initialize the SR/IOV capability */
> + pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
> + vf_devid, initial_vfs, total_vfs,
> + fun_offset, stride);
> +
> + /* Set up individual VF BARs (parameters as for normal BARs) */
> + pcie_sriov_pf_init_vf_bar( ... )
> + ...
> + }
> +
> + For cleanup, you simply call:
> +
> + pcie_sriov_pf_exit(device);
> +
> + which will delete all the virtual functions and associated resources.
> +
> +2) Similarly in the implementation of the virtual function, you need to
> + make it a PCI Express device and add a similar set of capabilities
> + except for the SR/IOV capability. Then you need to set up the VF BARs as
> + subregions of the PFs SR/IOV VF BARs by calling
> + pcie_sriov_vf_register_bar() instead of the normal pci_register_bar() call:
> +
> + pci_your_vf_dev_realize( ... )
> + {
> + ...
> + int ret = pcie_endpoint_cap_init(d, 0x60);
> + ...
> + pcie_ari_init(d, 0x100, 1);
> + ...
> + memory_region_init(mr, ... )
> + pcie_sriov_vf_register_bar(d, bar_nr, mr);
> + ...
> + }
> +
> +Testing on Linux guest
> +======================
> +The easiest is if your device driver supports sysfs based SR/IOV
> +enabling. Support for this was added in kernel v.3.8, so not all drivers
> +support it yet.
> +
> +To enable 4 VFs for a device at 01:00.0:
> +
> + modprobe yourdriver
> + echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> +
> +You should now see 4 VFs with lspci.
> +To turn SR/IOV off again - the standard requires you to turn it off before you can enable
> +another VF count, and the emulation enforces this:
> +
> + echo 0 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> +
> +Older drivers typically provide a max_vfs module parameter
> +to enable it at load time:
> +
> + modprobe yourdriver max_vfs=4
> +
> +To disable the VFs again then, you simply have to unload the driver:
> +
> + rmmod yourdriver
>
This doc should be enough for the first developer who is going to add SR-IOV
to a PCIe device.
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Thanks,
Marcel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
2015-10-18 11:02 ` Marcel Apfelbaum
@ 2015-10-18 12:26 ` Michael S. Tsirkin
2015-10-18 15:00 ` Knut Omang
` (2 subsequent siblings)
3 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2015-10-18 12:26 UTC (permalink / raw)
To: marcel
Cc: Eduardo Habkost, Knut Omang, Richard W.M. Jones, qemu-devel,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Sun, Oct 18, 2015 at 02:02:58PM +0300, Marcel Apfelbaum wrote:
> On 10/14/2015 06:51 PM, Knut Omang wrote:
> >This patch provides the building blocks for creating an SR/IOV
> >PCIe Extended Capability header and register/unregister
> >SR/IOV Virtual Functions.
> >
> >Signed-off-by: Knut Omang <knut.omang@oracle.com>
> >---
> > hw/pci/Makefile.objs | 2 +-
> > hw/pci/pci.c | 102 ++++++++++++-----
> > hw/pci/pcie.c | 2 +-
> > hw/pci/pcie_sriov.c | 263 ++++++++++++++++++++++++++++++++++++++++++++
> > include/hw/pci/pci.h | 11 +-
> > include/hw/pci/pcie.h | 6 +
> > include/hw/pci/pcie_sriov.h | 58 ++++++++++
> > include/qemu/typedefs.h | 2 +
> > trace-events | 5 +
> > 9 files changed, 419 insertions(+), 32 deletions(-)
> > create mode 100644 hw/pci/pcie_sriov.c
> > create mode 100644 include/hw/pci/pcie_sriov.h
> >
> >diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
> >index 9f905e6..2226980 100644
> >--- a/hw/pci/Makefile.objs
> >+++ b/hw/pci/Makefile.objs
> >@@ -3,7 +3,7 @@ common-obj-$(CONFIG_PCI) += msix.o msi.o
> > common-obj-$(CONFIG_PCI) += shpc.o
> > common-obj-$(CONFIG_PCI) += slotid_cap.o
> > common-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
> >-common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o
> >+common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o pcie_sriov.o
> >
> > common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
> > common-obj-$(CONFIG_ALL) += pci-stub.o
> >diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >index b095cfe..4fb5fcf 100644
> >--- a/hw/pci/pci.c
> >+++ b/hw/pci/pci.c
> >@@ -153,6 +153,9 @@ int pci_bar(PCIDevice *d, int reg)
> > {
> > uint8_t type;
> >
> >+ /* PCIe virtual functions do not have their own BARs */
> >+ assert(!pci_is_vf(d));
> >+
> > if (reg != PCI_ROM_SLOT)
> > return PCI_BASE_ADDRESS_0 + reg * 4;
> >
> >@@ -211,22 +214,13 @@ void pci_device_deassert_intx(PCIDevice *dev)
> > }
> > }
> >
> >-static void pci_do_device_reset(PCIDevice *dev)
> >+static void pci_reset_regions(PCIDevice *dev)
> > {
> > int r;
> >+ if (pci_is_vf(dev)) {
> >+ return;
> >+ }
> >
> >- pci_device_deassert_intx(dev);
> >- assert(dev->irq_state == 0);
> >-
> >- /* Clear all writable bits */
> >- pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> >- pci_get_word(dev->wmask + PCI_COMMAND) |
> >- pci_get_word(dev->w1cmask + PCI_COMMAND));
> >- pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> >- pci_get_word(dev->wmask + PCI_STATUS) |
> >- pci_get_word(dev->w1cmask + PCI_STATUS));
> >- dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> >- dev->config[PCI_INTERRUPT_LINE] = 0x0;
> > for (r = 0; r < PCI_NUM_REGIONS; ++r) {
> > PCIIORegion *region = &dev->io_regions[r];
> > if (!region->size) {
> >@@ -240,10 +234,6 @@ static void pci_do_device_reset(PCIDevice *dev)
> > pci_set_long(dev->config + pci_bar(dev, r), region->type);
> > }
> > }
> >- pci_update_mappings(dev);
> >-
> >- msi_reset(dev);
> >- msix_reset(dev);
> > }
> >
> > /*
> >@@ -253,7 +243,23 @@ static void pci_do_device_reset(PCIDevice *dev)
> > void pci_device_reset(PCIDevice *dev)
> > {
> > qdev_reset_all(&dev->qdev);
> >- pci_do_device_reset(dev);
> >+ pci_device_deassert_intx(dev);
> >+ assert(dev->irq_state == 0);
> >+
> >+ /* Clear all writable bits */
> >+ pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> >+ pci_get_word(dev->wmask + PCI_COMMAND) |
> >+ pci_get_word(dev->w1cmask + PCI_COMMAND));
> >+ pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> >+ pci_get_word(dev->wmask + PCI_STATUS) |
> >+ pci_get_word(dev->w1cmask + PCI_STATUS));
> >+ dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> >+ dev->config[PCI_INTERRUPT_LINE] = 0x0;
> >+ pci_reset_regions(dev);
> >+ pci_update_mappings(dev);
> >+
> >+ msi_reset(dev);
> >+ msix_reset(dev);
> > }
> >
> > /*
> >@@ -268,7 +274,7 @@ static void pcibus_reset(BusState *qbus)
> >
> > for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
> > if (bus->devices[i]) {
> >- pci_do_device_reset(bus->devices[i]);
> >+ pci_device_reset(bus->devices[i]);
>
> Hi,
>
> Here you made the re-factoring:
> - replaced pci_do_device_reset to a subset : pci_reset_regions.
> - moved all the other code inside pci_device_reset.
> but now pcibus_reset will call pci_device_reset that has an extra call to qdev_reset_all.
> However pcibus_reset has several calling sites. My question is:
>
> Is this necessary? If yes, I am only thinking how it will affect the other calling sites.
> Why did they split it in the first place? Maybe somebody knows.
> From what I saw it should be OK.
I'd like to see reset refactoring split out to a
separate patch, with some motivation on why
is the change is logic (calling qdev_reset_all
from bus reset) a good idea.
>
> > }
> > }
> >
> >@@ -771,6 +777,15 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp)
> > dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
> > }
> >
> >+ /* With SR/IOV and ARI, a device at function 0 need not be a multifunction
> >+ * device, as it may just be a VF that ended up with function 0 in
> >+ * the legacy PCI interpretation. Avoid failing in such cases:
> >+ */
> >+ if (pci_is_vf(dev) &&
> >+ dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
> >+ return;
> >+ }
> >+
> > /*
> > * multifunction bit is interpreted in two ways as follows.
> > * - all functions must set the bit to 1.
> >@@ -962,6 +977,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
> > uint64_t wmask;
> > pcibus_t size = memory_region_size(memory);
> >
> >+ assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */
> > assert(region_num >= 0);
> > assert(region_num < PCI_NUM_REGIONS);
> > if (size & (size-1)) {
> >@@ -1060,11 +1076,44 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
> > return pci_dev->io_regions[region_num].addr;
> > }
> >
> >-static pcibus_t pci_bar_address(PCIDevice *d,
> >- int reg, uint8_t type, pcibus_t size)
> >+
> >+static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
> >+ uint8_t type, pcibus_t size)
> >+{
> >+ pcibus_t new_addr;
> >+ if (!pci_is_vf(d)) {
> >+ int bar = pci_bar(d, reg);
> >+ if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> >+ new_addr = pci_get_quad(d->config + bar);
> >+ } else {
> >+ new_addr = pci_get_long(d->config + bar);
> >+ }
> >+ } else {
> >+ PCIDevice *pf = d->exp.sriov_vf.pf;
> >+ uint16_t sriov_cap = pf->exp.sriov_cap;
> >+ int bar = sriov_cap + PCI_SRIOV_BAR + reg * 4;
> >+ uint16_t vf_offset = pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
> >+ uint16_t vf_stride = pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
> >+ uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride;
> >+
> >+ if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> >+ new_addr = pci_get_quad(pf->config + bar);
> >+ } else {
> >+ new_addr = pci_get_long(pf->config + bar);
> >+ }
> >+ new_addr += vf_num * size;
> >+ }
> >+ if (reg != PCI_ROM_SLOT) {
> >+ /* Preserve the rom enable bit */
> >+ new_addr &= ~(size - 1);
> >+ }
> >+ return new_addr;
> >+}
> >+
> >+pcibus_t pci_bar_address(PCIDevice *d,
> >+ int reg, uint8_t type, pcibus_t size)
> > {
> > pcibus_t new_addr, last_addr;
> >- int bar = pci_bar(d, reg);
> > uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
> > Object *machine = qdev_get_machine();
> > ObjectClass *oc = object_get_class(machine);
> >@@ -1075,7 +1124,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
> > if (!(cmd & PCI_COMMAND_IO)) {
> > return PCI_BAR_UNMAPPED;
> > }
> >- new_addr = pci_get_long(d->config + bar) & ~(size - 1);
> >+ new_addr = pci_config_get_bar_addr(d, reg, type, size);
> > last_addr = new_addr + size - 1;
> > /* Check if 32 bit BAR wraps around explicitly.
> > * TODO: make priorities correct and remove this work around.
> >@@ -1090,11 +1139,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
> > if (!(cmd & PCI_COMMAND_MEMORY)) {
> > return PCI_BAR_UNMAPPED;
> > }
> >- if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> >- new_addr = pci_get_quad(d->config + bar);
> >- } else {
> >- new_addr = pci_get_long(d->config + bar);
> >- }
> >+ new_addr = pci_config_get_bar_addr(d, reg, type, size);
> > /* the ROM slot has a specific enable bit */
> > if (reg == PCI_ROM_SLOT && !(new_addr & PCI_ROM_ADDRESS_ENABLE)) {
> > return PCI_BAR_UNMAPPED;
> >@@ -1228,6 +1273,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
> >
> > msi_write_config(d, addr, val_in, l);
> > msix_write_config(d, addr, val_in, l);
> >+ pcie_sriov_config_write(d, addr, val_in, l);
> > }
> >
> > /***********************************************************/
> >diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >index 6e28985..774b9ed 100644
> >--- a/hw/pci/pcie.c
> >+++ b/hw/pci/pcie.c
> >@@ -253,7 +253,7 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> > * Right now, only a device of function = 0 is allowed to be
> > * hot plugged/unplugged.
> > */
> >- assert(PCI_FUNC(pci_dev->devfn) == 0);
> >+ assert(PCI_FUNC(pci_dev->devfn) == 0 || pci_is_vf(pci_dev));
> >
> > pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > PCI_EXP_SLTSTA_PDS);
> >diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> >new file mode 100644
> >index 0000000..25de8d9
> >--- /dev/null
> >+++ b/hw/pci/pcie_sriov.c
> >@@ -0,0 +1,263 @@
> >+/*
> >+ * pcie_sriov.c:
> >+ *
> >+ * Implementation of SR/IOV emulation support.
> >+ *
> >+ * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
> >+ *
> >+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >+ * See the COPYING file in the top-level directory.
> >+ *
> >+ */
> >+
> >+#include "hw/pci/pci.h"
> >+#include "hw/pci/pcie.h"
> >+#include "hw/pci/pci_bus.h"
> >+#include "qemu/error-report.h"
> >+#include "qemu/range.h"
> >+#include "trace.h"
> >+
> >+#define SRIOV_ID(dev) \
> >+ (dev)->name, PCI_SLOT((dev)->devfn), PCI_FUNC((dev)->devfn)
>
> This is a little "hacky" but it is used only for tracing and inside a C file.
> I am OK with it.
>
> >+
> >+static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name);
> >+static void unregister_vfs(PCIDevice *dev);
> >+
> >+void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
> >+ const char *vfname, uint16_t vf_dev_id,
> >+ uint16_t init_vfs, uint16_t total_vfs,
> >+ uint16_t vf_offset, uint16_t vf_stride)
> >+{
> >+ uint8_t *cfg = dev->config + offset;
> >+ uint8_t *wmask;
> >+
> >+ pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
> >+ offset, PCI_EXT_CAP_SRIOV_SIZEOF);
> >+ dev->exp.sriov_cap = offset;
> >+ dev->exp.sriov_pf.num_vfs = 0;
> >+ dev->exp.sriov_pf.vfname = g_strdup(vfname);
> >+ dev->exp.sriov_pf.vf = NULL;
> >+
> >+ pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
> >+ pci_set_word(cfg + PCI_SRIOV_VF_STRIDE, vf_stride);
> >+
> >+ /* Mandatory page sizes to support.
> >+ * Device implementations can call pcie_sriov_pf_add_sup_pgsize()
> >+ * to set more bits:
> >+ */
> >+ pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, SRIOV_SUP_PGSIZE_MINREQ);
> >+
> >+ /* Default is to use 4K pages, software can modify it
> >+ * to any of the supported bits
> >+ */
> >+ pci_set_word(cfg + PCI_SRIOV_SYS_PGSIZE, 0x1);
> >+
> >+ /* Set up device ID and initial/total number of VFs available */
> >+ pci_set_word(cfg + PCI_SRIOV_VF_DID, vf_dev_id);
> >+ pci_set_word(cfg + PCI_SRIOV_INITIAL_VF, init_vfs);
> >+ pci_set_word(cfg + PCI_SRIOV_TOTAL_VF, total_vfs);
> >+ pci_set_word(cfg + PCI_SRIOV_NUM_VF, 0);
> >+
> >+ /* Write enable control bits */
> >+ wmask = dev->wmask + offset;
> >+ pci_set_word(wmask + PCI_SRIOV_CTRL,
> >+ PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI);
> >+ pci_set_word(wmask + PCI_SRIOV_NUM_VF, 0xffff);
> >+ pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
> >+
> >+ qdev_prop_set_bit(&dev->qdev, "multifunction", true);
> >+}
> >+
> >+void pcie_sriov_pf_exit(PCIDevice *dev)
> >+{
> >+ unregister_vfs(dev);
> >+ g_free((char *)dev->exp.sriov_pf.vfname);
> >+ dev->exp.sriov_pf.vfname = NULL;
> >+}
> >+
> >+void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
> >+ uint8_t type, dma_addr_t size)
> >+{
> >+ uint32_t addr;
> >+ uint64_t wmask;
> >+ uint16_t sriov_cap = dev->exp.sriov_cap;
> >+
> >+ assert(sriov_cap > 0);
> >+ assert(region_num >= 0);
> >+ assert(region_num < PCI_NUM_REGIONS);
> >+ assert(region_num != PCI_ROM_SLOT);
> >+
> >+ wmask = ~(size - 1);
> >+ addr = sriov_cap + PCI_SRIOV_BAR + region_num * 4;
> >+
> >+ pci_set_long(dev->config + addr, type);
> >+ if (!(type & PCI_BASE_ADDRESS_SPACE_IO) &&
> >+ type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> >+ pci_set_quad(dev->wmask + addr, wmask);
> >+ pci_set_quad(dev->cmask + addr, ~0ULL);
> >+ } else {
> >+ pci_set_long(dev->wmask + addr, wmask & 0xffffffff);
> >+ pci_set_long(dev->cmask + addr, 0xffffffff);
> >+ }
> >+ dev->exp.sriov_pf.vf_bar_type[region_num] = type;
> >+}
> >+
> >+void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
> >+ MemoryRegion *memory)
> >+{
> >+ PCIIORegion *r;
> >+ uint8_t type;
> >+ pcibus_t size = memory_region_size(memory);
> >+
> >+ assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */
> >+ assert(region_num >= 0);
> >+ assert(region_num < PCI_NUM_REGIONS);
> >+ type = dev->exp.sriov_vf.pf->exp.sriov_pf.vf_bar_type[region_num];
> >+
> >+ if (!is_power_of_2(size)) {
> >+ error_report("%s: PCI region size must be a power"
> >+ " of two - type=0x%x, size=0x%"FMT_PCIBUS"\n",
> >+ __func__, type, size);
> >+ exit(1);
> >+ }
> >+
> >+ r = &dev->io_regions[region_num];
> >+ r->memory = memory;
> >+ r->address_space =
> >+ type & PCI_BASE_ADDRESS_SPACE_IO
> >+ ? dev->bus->address_space_io
> >+ : dev->bus->address_space_mem;
> >+ r->size = size;
> >+ r->type = type;
> >+
> >+ r->addr = pci_bar_address(dev, region_num, r->type, r->size);
> >+ if (r->addr != PCI_BAR_UNMAPPED) {
> >+ memory_region_add_subregion_overlap(r->address_space,
> >+ r->addr, r->memory, 1);
> >+ }
> >+}
> >+
> >+static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name)
> >+{
> >+ PCIDevice *dev = pci_create(pf->bus, devfn, name);
> >+ dev->exp.sriov_vf.pf = pf;
> >+ Error *local_err = NULL;
> >+
> >+ object_property_set_bool(OBJECT(&dev->qdev), true, "realized", &local_err);
> >+ if (local_err) {
> >+ error_report_err(local_err);
> >+ return NULL;
> >+ }
> >+
> >+ /* set vid/did according to sr/iov spec - they are not used */
> >+ pci_config_set_vendor_id(dev->config, 0xffff);
> >+ pci_config_set_device_id(dev->config, 0xffff);
> >+ return dev;
> >+}
> >+
> >+static void register_vfs(PCIDevice *dev)
> >+{
> >+ uint16_t num_vfs;
> >+ uint16_t i;
> >+ uint16_t sriov_cap = dev->exp.sriov_cap;
> >+ uint16_t vf_offset = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
> >+ uint16_t vf_stride = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
> >+ int32_t devfn = dev->devfn + vf_offset;
> >+
> >+ assert(sriov_cap > 0);
> >+ num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> >+
> >+ dev->exp.sriov_pf.vf = g_malloc(sizeof(PCIDevice *) * num_vfs);
> >+ assert(dev->exp.sriov_pf.vf);
> >+
> >+ trace_sriov_register_vfs(SRIOV_ID(dev), num_vfs);
> >+ for (i = 0; i < num_vfs; i++) {
> >+ dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, dev->exp.sriov_pf.vfname);
> >+ if (!dev->exp.sriov_pf.vf[i]) {
> >+ num_vfs = i;
> >+ break;
> >+ }
> >+ devfn += vf_stride;
> >+ }
> >+ dev->exp.sriov_pf.num_vfs = num_vfs;
> >+}
> >+
> >+static void unregister_vfs(PCIDevice *dev)
> >+{
> >+ Error *local_err = NULL;
> >+ uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
> >+ uint16_t i;
> >+
> >+ trace_sriov_unregister_vfs(SRIOV_ID(dev), num_vfs);
> >+ for (i = 0; i < num_vfs; i++) {
> >+ object_property_set_bool(OBJECT(&dev->exp.sriov_pf.vf[i]->qdev), false, "realized", &local_err);
> >+ if (local_err) {
> >+ fprintf(stderr, "Failed to unplug: %s\n",
> >+ error_get_pretty(local_err));
> >+ error_free(local_err);
> >+ }
> >+ }
> >+ g_free(dev->exp.sriov_pf.vf);
> >+ dev->exp.sriov_pf.vf = NULL;
> >+ dev->exp.sriov_pf.num_vfs = 0;
> >+ pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0);
> >+}
> >+
> >+void pcie_sriov_config_write(PCIDevice *dev, uint32_t address, uint32_t val, int len)
> >+{
> >+ uint32_t off;
> >+ uint16_t sriov_cap = dev->exp.sriov_cap;
> >+
> >+ if (!sriov_cap || address < sriov_cap) {
> >+ return;
> >+ }
> >+ off = address - sriov_cap;
> >+ if (off >= PCI_EXT_CAP_SRIOV_SIZEOF) {
> >+ return;
> >+ }
> >+
> >+ trace_sriov_config_write(SRIOV_ID(dev), off, val, len);
> >+
> >+ if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> >+ if (dev->exp.sriov_pf.num_vfs) {
> >+ if (!(val & PCI_SRIOV_CTRL_VFE)) {
> >+ unregister_vfs(dev);
> >+ }
> >+ } else {
> >+ if (val & PCI_SRIOV_CTRL_VFE) {
> >+ register_vfs(dev);
> >+ }
> >+ }
> >+ }
> >+}
> >+
> >+
> >+/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs */
> >+void pcie_sriov_pf_disable_vfs(PCIDevice *dev)
> >+{
> >+ uint16_t sriov_cap = dev->exp.sriov_cap;
> >+ if (sriov_cap) {
> >+ uint32_t val = pci_get_byte(dev->config + sriov_cap + PCI_SRIOV_CTRL);
> >+ if (val & PCI_SRIOV_CTRL_VFE) {
> >+ val &= ~PCI_SRIOV_CTRL_VFE;
> >+ pcie_sriov_config_write(dev, sriov_cap + PCI_SRIOV_CTRL, val, 1);
> >+ }
> >+ }
> >+}
> >+
> >+/* Add optional supported page sizes to the mask of supported page sizes */
> >+void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize)
> >+{
> >+ uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> >+ uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
> >+
> >+ uint16_t sup_pgsize = pci_get_word(cfg + PCI_SRIOV_SUP_PGSIZE);
> >+
> >+ sup_pgsize |= opt_sup_pgsize;
> >+
> >+ /* Make sure the new bits are set, and that system page size
> >+ * also can be set to any of the new values according to spec:
> >+ */
> >+ pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, sup_pgsize);
> >+ pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, sup_pgsize);
> >+}
> >diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >index 551cb3d..2e9d8ba 100644
> >--- a/include/hw/pci/pci.h
> >+++ b/include/hw/pci/pci.h
> >@@ -11,8 +11,6 @@
> > /* PCI includes legacy ISA access. */
> > #include "hw/isa/isa.h"
> >
> >-#include "hw/pci/pcie.h"
> >-
> > /* PCI bus */
> >
> > #define PCI_DEVFN(slot, func) ((((slot) & 0x1f) << 3) | ((func) & 0x07))
> >@@ -132,6 +130,7 @@ enum {
> > #define QEMU_PCI_VGA_IO_HI_SIZE 0x20
> >
> > #include "hw/pci/pci_regs.h"
> >+#include "hw/pci/pcie.h"
> >
> > /* PCI HEADER_TYPE */
> > #define PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
> >@@ -421,6 +420,9 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
> > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> > void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
> >
> >+pcibus_t pci_bar_address(PCIDevice *d,
> >+ int reg, uint8_t type, pcibus_t size);
> >+
> > static inline void
> > pci_set_byte(uint8_t *config, uint8_t val)
> > {
> >@@ -672,6 +674,11 @@ static inline int pci_is_express(const PCIDevice *d)
> > return d->cap_present & QEMU_PCI_CAP_EXPRESS;
> > }
> >
> >+static inline int pci_is_vf(const PCIDevice *d)
> >+{
> >+ return d->exp.sriov_vf.pf != NULL;
> >+}
> >+
> > static inline uint32_t pci_config_size(const PCIDevice *d)
> > {
> > return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
> >diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> >index b48a7a2..b09f79a 100644
> >--- a/include/hw/pci/pcie.h
> >+++ b/include/hw/pci/pcie.h
> >@@ -25,6 +25,7 @@
> > #include "hw/pci/pci_regs.h"
> > #include "hw/pci/pcie_regs.h"
> > #include "hw/pci/pcie_aer.h"
> >+#include "hw/pci/pcie_sriov.h"
> > #include "hw/hotplug.h"
> >
> > typedef enum {
> >@@ -74,6 +75,11 @@ struct PCIExpressDevice {
> > /* AER */
> > uint16_t aer_cap;
> > PCIEAERLog aer_log;
> >+
> >+ /* SR/IOV */
> >+ uint16_t sriov_cap;
> >+ PCIESriovPF sriov_pf;
> >+ PCIESriovVF sriov_vf;
> > };
> >
> > #define COMPAT_PROP_PCP "power_controller_present"
> >diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
> >new file mode 100644
> >index 0000000..71c2b00
> >--- /dev/null
> >+++ b/include/hw/pci/pcie_sriov.h
> >@@ -0,0 +1,58 @@
> >+/*
> >+ * pcie_sriov.h:
> >+ *
> >+ * Implementation of SR/IOV emulation support.
> >+ *
> >+ * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
> >+ *
> >+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >+ * See the COPYING file in the top-level directory.
> >+ *
> >+ */
> >+
> >+#ifndef QEMU_PCIE_SRIOV_H
> >+#define QEMU_PCIE_SRIOV_H
> >+
> >+struct PCIESriovPF {
> >+ uint16_t num_vfs; /* Number of virtual functions created */
> >+ uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */
> >+ const char *vfname; /* Reference to the device type used for the VFs */
> >+ PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
> >+};
> >+
> >+struct PCIESriovVF {
> >+ PCIDevice *pf; /* Pointer back to owner physical function */
> >+};
>
>
> Regarding naming conventions, maybe you it should be PCIESRIOVPF o Ir PCIESRIOVPf,
> but this would not justify another version (in my opinion)
>
> >+
> >+/* Optionally add supported page sizes to the mask of supported page sizes
> >+ * Page size values are interpreted as opt_sup_pgsize << 12.
> >+ */
> >+void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
> >+
> >+/* SR/IOV capability config write handler */
> >+void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
> >+ uint32_t val, int len);
> >+
> >+/* Reset SR/IOV VF Enable bit to unregister all VFs */
> >+void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
> >+
> >+#endif /* QEMU_PCIE_SRIOV_H */
> >diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> >index ee1ce1d..0b89316 100644
> >--- a/include/qemu/typedefs.h
> >+++ b/include/qemu/typedefs.h
> >@@ -54,6 +54,8 @@ typedef struct PCIDevice PCIDevice;
> > typedef struct PCIEAERErr PCIEAERErr;
> > typedef struct PCIEAERLog PCIEAERLog;
> > typedef struct PCIEAERMsg PCIEAERMsg;
> >+typedef struct PCIESriovPF PCIESriovPF;
> >+typedef struct PCIESriovVF PCIESriovVF;
> > typedef struct PCIEPort PCIEPort;
> > typedef struct PCIESlot PCIESlot;
> > typedef struct PCIExpressDevice PCIExpressDevice;
> >diff --git a/trace-events b/trace-events
> >index a0ddc6b..fd51c5f 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1531,6 +1531,11 @@ xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space (ad
> > pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x -> 0x%x"
> > pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"
> >
> >+# hw/pci/pcie_sriov.c
> >+sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: creating %d vf devs"
> >+sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: Unregistering %d vf devs"
> >+sriov_config_write(const char *name, int slot, int fun, uint32_t offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
> >+
> > # hw/vfio/pci.c
> > vfio_intx_interrupt(const char *name, char line) " (%s) Pin %c"
> > vfio_intx_eoi(const char *name) " (%s) EOI"
> >
>
> Thanks for all the work, besides the questions above, from my point of view:
>
>
> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
>
> Thanks,
> Marcel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization
2015-10-18 11:02 ` [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Marcel Apfelbaum
@ 2015-10-18 14:39 ` Knut Omang
0 siblings, 0 replies; 23+ messages in thread
From: Knut Omang @ 2015-10-18 14:39 UTC (permalink / raw)
To: marcel, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Sun, 2015-10-18 at 14:02 +0300, Marcel Apfelbaum wrote:
> On 10/14/2015 06:51 PM, Knut Omang wrote:
> > This patch set implements generic support for SR/IOV as an
> > extension to the
> > core PCIe functionality, similar to the way other capabilities such
> > as AER
> > is implemented.
> >
> > There is no implementation of any device that provides
> > SR/IOV support included, but I have implemented a test
> > example which can be found together with this patch set here:
> >
> > git://github.com/knuto/qemu.git sriov_patches_v5
> >
> > Testing with the example device was documented here:
> >
> > http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05110
> > .html
> >
> > Changes since v4:
> > - Mostly based on feeback in Marcel Apfelbaum's review:
> > - The patch with changes to pci_regs.h got eliminated by rebase
> > - Added some documentation as an additional patch
> > - Some trivial fixes moved to separate patch
> > - Modified code to use error and trace functions instead of
> > printfs
>
>
> Hi, thanks for posting another version!
>
> Regarding the submission:
> Please run ./scripts/checkpatch.pl on your patches, there are some
> minor issues there (ms-dos line end...).
Strange, I only see lines > 80 chars errors except that it does not
like/understand the deliberate lack of parentheses in my SRIOV_ID
macro. I can assure you it has not even been close to any tool that
uses ms-dos line ends ;-)
It has only been subject to emacs and git on Linux..
> This would not prevent the maintainer to take the code, I think
> he has some scripts to take care of it, however it may discover
> some issues.
>
> Otherwise, I only have some questions on patch 2/4.
> I don't think it justifies a re-submission, I just wanted to point to
> some places
> it may be worth to look again.
I think you found a bug (introduced in v5) there, unfortunately,
looking at it..
Knut
> Thanks,
> Marcel
>
> >
> > Changes since v3:
> > - Reworked 'pci: Update pci_regs header' to merge kernel version
> > improvements
> > with the current qemu version instead of copying from the
> > kernel version.
> >
> > Changes since v2:
> > - Rebased onto 090d0bfd
> > - Un-qdev'ified - avoids issues when resetting NUM_VFS
> > - Fixed handling of vf_offset/vf_stride
> >
> > Changes since v1:
> > - Rebased on top of latest master, eliminating prereqs.
> > - Implement proper support for VF_STRIDE, VF_OFFSET and
> > SUP_PGSIZE
> > Time better spent fixing it than explaining what the previous
> > limitations were.
> > - Added new first patch to fix pci bug related to this
> > - Split out patch to pci_default_config_write to a separate
> > patch 2
> > to highlight bug fix.
> > - Refactored out logic into new source files
> > hw/pci/pcie_sriov.c include/hw/pci/pcie_sriov.h
> > similar to pcie_aer.c/h.
> > - Rename functions and introduce structs to better separate
> > pf and vf functionality.
> > - Replaced is_vf member with pci_is_vf() function abstraction
> > - Fix numerous syntax, whitespace and comment issues
> > according to Michael's review.
> > - Fix memory leaks.
> > - Removed igb example device - a rebased version available
> > on github instead.
> >
> > Knut Omang (4):
> > pci: Make use of the devfn property when registering new devices
> > pcie: Add support for Single Root I/O Virtualization (SR/IOV)
> > pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
> > pcie: A few minor fixes (type+code simplify)
> >
> > docs/pcie_sriov.txt | 115 +++++++++++++++++++
> > hw/pci/Makefile.objs | 2 +-
> > hw/pci/pci.c | 104 +++++++++++++-----
> > hw/pci/pcie.c | 9 +-
> > hw/pci/pcie_sriov.c | 263
> > ++++++++++++++++++++++++++++++++++++++++++++
> > include/hw/pci/pci.h | 11 +-
> > include/hw/pci/pcie.h | 6 +
> > include/hw/pci/pcie_sriov.h | 58 ++++++++++
> > include/qemu/typedefs.h | 2 +
> > trace-events | 5 +
> > 10 files changed, 539 insertions(+), 36 deletions(-)
> > create mode 100644 docs/pcie_sriov.txt
> > create mode 100644 hw/pci/pcie_sriov.c
> > create mode 100644 include/hw/pci/pcie_sriov.h
> >
> > --
> > 2.4.3
> >
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
2015-10-18 11:02 ` Marcel Apfelbaum
2015-10-18 12:26 ` Michael S. Tsirkin
@ 2015-10-18 15:00 ` Knut Omang
2015-10-18 15:57 ` Knut Omang
2015-10-19 9:00 ` Paolo Bonzini
3 siblings, 0 replies; 23+ messages in thread
From: Knut Omang @ 2015-10-18 15:00 UTC (permalink / raw)
To: marcel, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Sun, 2015-10-18 at 14:02 +0300, Marcel Apfelbaum wrote:
> On 10/14/2015 06:51 PM, Knut Omang wrote:
> > This patch provides the building blocks for creating an SR/IOV
> > PCIe Extended Capability header and register/unregister
> > SR/IOV Virtual Functions.
> >
> > Signed-off-by: Knut Omang <knut.omang@oracle.com>
> > ---
> > hw/pci/Makefile.objs | 2 +-
> > hw/pci/pci.c | 102 ++++++++++++-----
> > hw/pci/pcie.c | 2 +-
> > hw/pci/pcie_sriov.c | 263
> > ++++++++++++++++++++++++++++++++++++++++++++
> > include/hw/pci/pci.h | 11 +-
> > include/hw/pci/pcie.h | 6 +
> > include/hw/pci/pcie_sriov.h | 58 ++++++++++
> > include/qemu/typedefs.h | 2 +
> > trace-events | 5 +
> > 9 files changed, 419 insertions(+), 32 deletions(-)
> > create mode 100644 hw/pci/pcie_sriov.c
> > create mode 100644 include/hw/pci/pcie_sriov.h
> >
> > diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs
> > index 9f905e6..2226980 100644
> > --- a/hw/pci/Makefile.objs
> > +++ b/hw/pci/Makefile.objs
> > @@ -3,7 +3,7 @@ common-obj-$(CONFIG_PCI) += msix.o msi.o
> > common-obj-$(CONFIG_PCI) += shpc.o
> > common-obj-$(CONFIG_PCI) += slotid_cap.o
> > common-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
> > -common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o
> > +common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o
> > pcie_sriov.o
> >
> > common-obj-$(call lnot,$(CONFIG_PCI)) += pci-stub.o
> > common-obj-$(CONFIG_ALL) += pci-stub.o
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index b095cfe..4fb5fcf 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -153,6 +153,9 @@ int pci_bar(PCIDevice *d, int reg)
> > {
> > uint8_t type;
> >
> > + /* PCIe virtual functions do not have their own BARs */
> > + assert(!pci_is_vf(d));
> > +
> > if (reg != PCI_ROM_SLOT)
> > return PCI_BASE_ADDRESS_0 + reg * 4;
> >
> > @@ -211,22 +214,13 @@ void pci_device_deassert_intx(PCIDevice *dev)
> > }
> > }
> >
> > -static void pci_do_device_reset(PCIDevice *dev)
> > +static void pci_reset_regions(PCIDevice *dev)
> > {
> > int r;
> > + if (pci_is_vf(dev)) {
> > + return;
> > + }
> >
> > - pci_device_deassert_intx(dev);
> > - assert(dev->irq_state == 0);
> > -
> > - /* Clear all writable bits */
> > - pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> > - pci_get_word(dev->wmask +
> > PCI_COMMAND) |
> > - pci_get_word(dev->w1cmask +
> > PCI_COMMAND));
> > - pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> > - pci_get_word(dev->wmask +
> > PCI_STATUS) |
> > - pci_get_word(dev->w1cmask +
> > PCI_STATUS));
> > - dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> > - dev->config[PCI_INTERRUPT_LINE] = 0x0;
> > for (r = 0; r < PCI_NUM_REGIONS; ++r) {
> > PCIIORegion *region = &dev->io_regions[r];
> > if (!region->size) {
> > @@ -240,10 +234,6 @@ static void pci_do_device_reset(PCIDevice
> > *dev)
> > pci_set_long(dev->config + pci_bar(dev, r), region
> > ->type);
> > }
> > }
> > - pci_update_mappings(dev);
> > -
> > - msi_reset(dev);
> > - msix_reset(dev);
> > }
> >
> > /*
> > @@ -253,7 +243,23 @@ static void pci_do_device_reset(PCIDevice
> > *dev)
> > void pci_device_reset(PCIDevice *dev)
> > {
> > qdev_reset_all(&dev->qdev);
> > - pci_do_device_reset(dev);
> > + pci_device_deassert_intx(dev);
> > + assert(dev->irq_state == 0);
> > +
> > + /* Clear all writable bits */
> > + pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> > + pci_get_word(dev->wmask +
> > PCI_COMMAND) |
> > + pci_get_word(dev->w1cmask +
> > PCI_COMMAND));
> > + pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> > + pci_get_word(dev->wmask +
> > PCI_STATUS) |
> > + pci_get_word(dev->w1cmask +
> > PCI_STATUS));
> > + dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> > + dev->config[PCI_INTERRUPT_LINE] = 0x0;
> > + pci_reset_regions(dev);
> > + pci_update_mappings(dev);
> > +
> > + msi_reset(dev);
> > + msix_reset(dev);
> > }
> >
> > /*
> > @@ -268,7 +274,7 @@ static void pcibus_reset(BusState *qbus)
> >
> > for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
> > if (bus->devices[i]) {
> > - pci_do_device_reset(bus->devices[i]);
> > + pci_device_reset(bus->devices[i]);
>
> Hi,
>
> Here you made the re-factoring:
> - replaced pci_do_device_reset to a subset : pci_reset_regions.
> - moved all the other code inside pci_device_reset.
> but now pcibus_reset will call pci_device_reset that has an extra
> call to qdev_reset_all.
> However pcibus_reset has several calling sites. My question is:
>
> Is this necessary?
No, I will fix it. I finally realize what has happened here,..
My intention is just to factor out the for loop that resets the BAR
registers, which is not necessary/possible in the VF case. That seemed
cleaner and more readable than wrapping it all in an if ( ... vf .. )
construct.
> If yes, I am only thinking how it will affect the other calling
> sites.
> Why did they split it in the first place? Maybe somebody knows.
Paolo's commit dcc20931 gives the answer to that.
My original patch set and Paolo's commit refactored this code in
slightly different ways which is the root of the problem, some of which
Marcel pointed at in the review of v3 (the bogus dev->irq_state = 0)
> From what I saw it should be OK.
I have several virtual machines running with this code now but probably
just haven't hit the right use case yet, I suppose..
Thanks,
Knut
>
> > }
> > }
> >
> > @@ -771,6 +777,15 @@ static void pci_init_multifunction(PCIBus
> > *bus, PCIDevice *dev, Error **errp)
> > dev->config[PCI_HEADER_TYPE] |=
> > PCI_HEADER_TYPE_MULTI_FUNCTION;
> > }
> >
> > + /* With SR/IOV and ARI, a device at function 0 need not be a
> > multifunction
> > + * device, as it may just be a VF that ended up with function
> > 0 in
> > + * the legacy PCI interpretation. Avoid failing in such cases:
> > + */
> > + if (pci_is_vf(dev) &&
> > + dev->exp.sriov_vf.pf->cap_present &
> > QEMU_PCI_CAP_MULTIFUNCTION) {
> > + return;
> > + }
> > +
> > /*
> > * multifunction bit is interpreted in two ways as follows.
> > * - all functions must set the bit to 1.
> > @@ -962,6 +977,7 @@ void pci_register_bar(PCIDevice *pci_dev, int
> > region_num,
> > uint64_t wmask;
> > pcibus_t size = memory_region_size(memory);
> >
> > + assert(!pci_is_vf(pci_dev)); /* VFs must use
> > pcie_sriov_vf_register_bar */
> > assert(region_num >= 0);
> > assert(region_num < PCI_NUM_REGIONS);
> > if (size & (size-1)) {
> > @@ -1060,11 +1076,44 @@ pcibus_t pci_get_bar_addr(PCIDevice
> > *pci_dev, int region_num)
> > return pci_dev->io_regions[region_num].addr;
> > }
> >
> > -static pcibus_t pci_bar_address(PCIDevice *d,
> > - int reg, uint8_t type, pcibus_t
> > size)
> > +
> > +static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
> > + uint8_t type, pcibus_t
> > size)
> > +{
> > + pcibus_t new_addr;
> > + if (!pci_is_vf(d)) {
> > + int bar = pci_bar(d, reg);
> > + if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > + new_addr = pci_get_quad(d->config + bar);
> > + } else {
> > + new_addr = pci_get_long(d->config + bar);
> > + }
> > + } else {
> > + PCIDevice *pf = d->exp.sriov_vf.pf;
> > + uint16_t sriov_cap = pf->exp.sriov_cap;
> > + int bar = sriov_cap + PCI_SRIOV_BAR + reg * 4;
> > + uint16_t vf_offset = pci_get_word(pf->config + sriov_cap +
> > PCI_SRIOV_VF_OFFSET);
> > + uint16_t vf_stride = pci_get_word(pf->config + sriov_cap +
> > PCI_SRIOV_VF_STRIDE);
> > + uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) /
> > vf_stride;
> > +
> > + if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > + new_addr = pci_get_quad(pf->config + bar);
> > + } else {
> > + new_addr = pci_get_long(pf->config + bar);
> > + }
> > + new_addr += vf_num * size;
> > + }
> > + if (reg != PCI_ROM_SLOT) {
> > + /* Preserve the rom enable bit */
> > + new_addr &= ~(size - 1);
> > + }
> > + return new_addr;
> > +}
> > +
> > +pcibus_t pci_bar_address(PCIDevice *d,
> > + int reg, uint8_t type, pcibus_t size)
> > {
> > pcibus_t new_addr, last_addr;
> > - int bar = pci_bar(d, reg);
> > uint16_t cmd = pci_get_word(d->config + PCI_COMMAND);
> > Object *machine = qdev_get_machine();
> > ObjectClass *oc = object_get_class(machine);
> > @@ -1075,7 +1124,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
> > if (!(cmd & PCI_COMMAND_IO)) {
> > return PCI_BAR_UNMAPPED;
> > }
> > - new_addr = pci_get_long(d->config + bar) & ~(size - 1);
> > + new_addr = pci_config_get_bar_addr(d, reg, type, size);
> > last_addr = new_addr + size - 1;
> > /* Check if 32 bit BAR wraps around explicitly.
> > * TODO: make priorities correct and remove this work
> > around.
> > @@ -1090,11 +1139,7 @@ static pcibus_t pci_bar_address(PCIDevice
> > *d,
> > if (!(cmd & PCI_COMMAND_MEMORY)) {
> > return PCI_BAR_UNMAPPED;
> > }
> > - if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > - new_addr = pci_get_quad(d->config + bar);
> > - } else {
> > - new_addr = pci_get_long(d->config + bar);
> > - }
> > + new_addr = pci_config_get_bar_addr(d, reg, type, size);
> > /* the ROM slot has a specific enable bit */
> > if (reg == PCI_ROM_SLOT && !(new_addr &
> > PCI_ROM_ADDRESS_ENABLE)) {
> > return PCI_BAR_UNMAPPED;
> > @@ -1228,6 +1273,7 @@ void pci_default_write_config(PCIDevice *d,
> > uint32_t addr, uint32_t val_in, int
> >
> > msi_write_config(d, addr, val_in, l);
> > msix_write_config(d, addr, val_in, l);
> > + pcie_sriov_config_write(d, addr, val_in, l);
> > }
> >
> > /***********************************************************/
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index 6e28985..774b9ed 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -253,7 +253,7 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler
> > *hotplug_dev, DeviceState *dev,
> > * Right now, only a device of function = 0 is allowed to be
> > * hot plugged/unplugged.
> > */
> > - assert(PCI_FUNC(pci_dev->devfn) == 0);
> > + assert(PCI_FUNC(pci_dev->devfn) == 0 || pci_is_vf(pci_dev));
> >
> > pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> > PCI_EXP_SLTSTA_PDS);
> > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> > new file mode 100644
> > index 0000000..25de8d9
> > --- /dev/null
> > +++ b/hw/pci/pcie_sriov.c
> > @@ -0,0 +1,263 @@
> > +/*
> > + * pcie_sriov.c:
> > + *
> > + * Implementation of SR/IOV emulation support.
> > + *
> > + * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2
> > or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "hw/pci/pci.h"
> > +#include "hw/pci/pcie.h"
> > +#include "hw/pci/pci_bus.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/range.h"
> > +#include "trace.h"
> > +
> > +#define SRIOV_ID(dev) \
> > + (dev)->name, PCI_SLOT((dev)->devfn), PCI_FUNC((dev)->devfn)
>
> This is a little "hacky" but it is used only for tracing and inside a
> C file.
> I am OK with it.
>
> > +
> > +static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char
> > *name);
> > +static void unregister_vfs(PCIDevice *dev);
> > +
> > +void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
> > + const char *vfname, uint16_t vf_dev_id,
> > + uint16_t init_vfs, uint16_t total_vfs,
> > + uint16_t vf_offset, uint16_t vf_stride)
> > +{
> > + uint8_t *cfg = dev->config + offset;
> > + uint8_t *wmask;
> > +
> > + pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
> > + offset, PCI_EXT_CAP_SRIOV_SIZEOF);
> > + dev->exp.sriov_cap = offset;
> > + dev->exp.sriov_pf.num_vfs = 0;
> > + dev->exp.sriov_pf.vfname = g_strdup(vfname);
> > + dev->exp.sriov_pf.vf = NULL;
> > +
> > + pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
> > + pci_set_word(cfg + PCI_SRIOV_VF_STRIDE, vf_stride);
> > +
> > + /* Mandatory page sizes to support.
> > + * Device implementations can call
> > pcie_sriov_pf_add_sup_pgsize()
> > + * to set more bits:
> > + */
> > + pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE,
> > SRIOV_SUP_PGSIZE_MINREQ);
> > +
> > + /* Default is to use 4K pages, software can modify it
> > + * to any of the supported bits
> > + */
> > + pci_set_word(cfg + PCI_SRIOV_SYS_PGSIZE, 0x1);
> > +
> > + /* Set up device ID and initial/total number of VFs available
> > */
> > + pci_set_word(cfg + PCI_SRIOV_VF_DID, vf_dev_id);
> > + pci_set_word(cfg + PCI_SRIOV_INITIAL_VF, init_vfs);
> > + pci_set_word(cfg + PCI_SRIOV_TOTAL_VF, total_vfs);
> > + pci_set_word(cfg + PCI_SRIOV_NUM_VF, 0);
> > +
> > + /* Write enable control bits */
> > + wmask = dev->wmask + offset;
> > + pci_set_word(wmask + PCI_SRIOV_CTRL,
> > + PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE |
> > PCI_SRIOV_CTRL_ARI);
> > + pci_set_word(wmask + PCI_SRIOV_NUM_VF, 0xffff);
> > + pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
> > +
> > + qdev_prop_set_bit(&dev->qdev, "multifunction", true);
> > +}
> > +
> > +void pcie_sriov_pf_exit(PCIDevice *dev)
> > +{
> > + unregister_vfs(dev);
> > + g_free((char *)dev->exp.sriov_pf.vfname);
> > + dev->exp.sriov_pf.vfname = NULL;
> > +}
> > +
> > +void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
> > + uint8_t type, dma_addr_t size)
> > +{
> > + uint32_t addr;
> > + uint64_t wmask;
> > + uint16_t sriov_cap = dev->exp.sriov_cap;
> > +
> > + assert(sriov_cap > 0);
> > + assert(region_num >= 0);
> > + assert(region_num < PCI_NUM_REGIONS);
> > + assert(region_num != PCI_ROM_SLOT);
> > +
> > + wmask = ~(size - 1);
> > + addr = sriov_cap + PCI_SRIOV_BAR + region_num * 4;
> > +
> > + pci_set_long(dev->config + addr, type);
> > + if (!(type & PCI_BASE_ADDRESS_SPACE_IO) &&
> > + type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > + pci_set_quad(dev->wmask + addr, wmask);
> > + pci_set_quad(dev->cmask + addr, ~0ULL);
> > + } else {
> > + pci_set_long(dev->wmask + addr, wmask & 0xffffffff);
> > + pci_set_long(dev->cmask + addr, 0xffffffff);
> > + }
> > + dev->exp.sriov_pf.vf_bar_type[region_num] = type;
> > +}
> > +
> > +void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
> > + MemoryRegion *memory)
> > +{
> > + PCIIORegion *r;
> > + uint8_t type;
> > + pcibus_t size = memory_region_size(memory);
> > +
> > + assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */
> > + assert(region_num >= 0);
> > + assert(region_num < PCI_NUM_REGIONS);
> > + type = dev->exp.sriov_vf.pf
> > ->exp.sriov_pf.vf_bar_type[region_num];
> > +
> > + if (!is_power_of_2(size)) {
> > + error_report("%s: PCI region size must be a power"
> > + " of two - type=0x%x,
> > size=0x%"FMT_PCIBUS"\n",
> > + __func__, type, size);
> > + exit(1);
> > + }
> > +
> > + r = &dev->io_regions[region_num];
> > + r->memory = memory;
> > + r->address_space =
> > + type & PCI_BASE_ADDRESS_SPACE_IO
> > + ? dev->bus->address_space_io
> > + : dev->bus->address_space_mem;
> > + r->size = size;
> > + r->type = type;
> > +
> > + r->addr = pci_bar_address(dev, region_num, r->type, r->size);
> > + if (r->addr != PCI_BAR_UNMAPPED) {
> > + memory_region_add_subregion_overlap(r->address_space,
> > + r->addr, r->memory,
> > 1);
> > + }
> > +}
> > +
> > +static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char
> > *name)
> > +{
> > + PCIDevice *dev = pci_create(pf->bus, devfn, name);
> > + dev->exp.sriov_vf.pf = pf;
> > + Error *local_err = NULL;
> > +
> > + object_property_set_bool(OBJECT(&dev->qdev), true, "realized",
> > &local_err);
> > + if (local_err) {
> > + error_report_err(local_err);
> > + return NULL;
> > + }
> > +
> > + /* set vid/did according to sr/iov spec - they are not used */
> > + pci_config_set_vendor_id(dev->config, 0xffff);
> > + pci_config_set_device_id(dev->config, 0xffff);
> > + return dev;
> > +}
> > +
> > +static void register_vfs(PCIDevice *dev)
> > +{
> > + uint16_t num_vfs;
> > + uint16_t i;
> > + uint16_t sriov_cap = dev->exp.sriov_cap;
> > + uint16_t vf_offset = pci_get_word(dev->config + sriov_cap +
> > PCI_SRIOV_VF_OFFSET);
> > + uint16_t vf_stride = pci_get_word(dev->config + sriov_cap +
> > PCI_SRIOV_VF_STRIDE);
> > + int32_t devfn = dev->devfn + vf_offset;
> > +
> > + assert(sriov_cap > 0);
> > + num_vfs = pci_get_word(dev->config + sriov_cap +
> > PCI_SRIOV_NUM_VF);
> > +
> > + dev->exp.sriov_pf.vf = g_malloc(sizeof(PCIDevice *) *
> > num_vfs);
> > + assert(dev->exp.sriov_pf.vf);
> > +
> > + trace_sriov_register_vfs(SRIOV_ID(dev), num_vfs);
> > + for (i = 0; i < num_vfs; i++) {
> > + dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, dev
> > ->exp.sriov_pf.vfname);
> > + if (!dev->exp.sriov_pf.vf[i]) {
> > + num_vfs = i;
> > + break;
> > + }
> > + devfn += vf_stride;
> > + }
> > + dev->exp.sriov_pf.num_vfs = num_vfs;
> > +}
> > +
> > +static void unregister_vfs(PCIDevice *dev)
> > +{
> > + Error *local_err = NULL;
> > + uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
> > + uint16_t i;
> > +
> > + trace_sriov_unregister_vfs(SRIOV_ID(dev), num_vfs);
> > + for (i = 0; i < num_vfs; i++) {
> > + object_property_set_bool(OBJECT(&dev->exp.sriov_pf.vf[i]
> > ->qdev), false, "realized", &local_err);
> > + if (local_err) {
> > + fprintf(stderr, "Failed to unplug: %s\n",
> > + error_get_pretty(local_err));
> > + error_free(local_err);
> > + }
> > + }
> > + g_free(dev->exp.sriov_pf.vf);
> > + dev->exp.sriov_pf.vf = NULL;
> > + dev->exp.sriov_pf.num_vfs = 0;
> > + pci_set_word(dev->config + dev->exp.sriov_cap +
> > PCI_SRIOV_NUM_VF, 0);
> > +}
> > +
> > +void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
> > uint32_t val, int len)
> > +{
> > + uint32_t off;
> > + uint16_t sriov_cap = dev->exp.sriov_cap;
> > +
> > + if (!sriov_cap || address < sriov_cap) {
> > + return;
> > + }
> > + off = address - sriov_cap;
> > + if (off >= PCI_EXT_CAP_SRIOV_SIZEOF) {
> > + return;
> > + }
> > +
> > + trace_sriov_config_write(SRIOV_ID(dev), off, val, len);
> > +
> > + if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > + if (dev->exp.sriov_pf.num_vfs) {
> > + if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > + unregister_vfs(dev);
> > + }
> > + } else {
> > + if (val & PCI_SRIOV_CTRL_VFE) {
> > + register_vfs(dev);
> > + }
> > + }
> > + }
> > +}
> > +
> > +
> > +/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs
> > */
> > +void pcie_sriov_pf_disable_vfs(PCIDevice *dev)
> > +{
> > + uint16_t sriov_cap = dev->exp.sriov_cap;
> > + if (sriov_cap) {
> > + uint32_t val = pci_get_byte(dev->config + sriov_cap +
> > PCI_SRIOV_CTRL);
> > + if (val & PCI_SRIOV_CTRL_VFE) {
> > + val &= ~PCI_SRIOV_CTRL_VFE;
> > + pcie_sriov_config_write(dev, sriov_cap +
> > PCI_SRIOV_CTRL, val, 1);
> > + }
> > + }
> > +}
> > +
> > +/* Add optional supported page sizes to the mask of supported page
> > sizes */
> > +void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t
> > opt_sup_pgsize)
> > +{
> > + uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> > + uint8_t *wmask = dev->wmask + dev->exp.sriov_cap;
> > +
> > + uint16_t sup_pgsize = pci_get_word(cfg +
> > PCI_SRIOV_SUP_PGSIZE);
> > +
> > + sup_pgsize |= opt_sup_pgsize;
> > +
> > + /* Make sure the new bits are set, and that system page size
> > + * also can be set to any of the new values according to spec:
> > + */
> > + pci_set_word(cfg + PCI_SRIOV_SUP_PGSIZE, sup_pgsize);
> > + pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, sup_pgsize);
> > +}
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index 551cb3d..2e9d8ba 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -11,8 +11,6 @@
> > /* PCI includes legacy ISA access. */
> > #include "hw/isa/isa.h"
> >
> > -#include "hw/pci/pcie.h"
> > -
> > /* PCI bus */
> >
> > #define PCI_DEVFN(slot, func) ((((slot) & 0x1f) << 3) | ((func)
> > & 0x07))
> > @@ -132,6 +130,7 @@ enum {
> > #define QEMU_PCI_VGA_IO_HI_SIZE 0x20
> >
> > #include "hw/pci/pci_regs.h"
> > +#include "hw/pci/pcie.h"
> >
> > /* PCI HEADER_TYPE */
> > #define PCI_HEADER_TYPE_MULTI_FUNCTION 0x80
> > @@ -421,6 +420,9 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *,
> > void *, int);
> > AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> > void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
> >
> > +pcibus_t pci_bar_address(PCIDevice *d,
> > + int reg, uint8_t type, pcibus_t size);
> > +
> > static inline void
> > pci_set_byte(uint8_t *config, uint8_t val)
> > {
> > @@ -672,6 +674,11 @@ static inline int pci_is_express(const
> > PCIDevice *d)
> > return d->cap_present & QEMU_PCI_CAP_EXPRESS;
> > }
> >
> > +static inline int pci_is_vf(const PCIDevice *d)
> > +{
> > + return d->exp.sriov_vf.pf != NULL;
> > +}
> > +
> > static inline uint32_t pci_config_size(const PCIDevice *d)
> > {
> > return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE :
> > PCI_CONFIG_SPACE_SIZE;
> > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > index b48a7a2..b09f79a 100644
> > --- a/include/hw/pci/pcie.h
> > +++ b/include/hw/pci/pcie.h
> > @@ -25,6 +25,7 @@
> > #include "hw/pci/pci_regs.h"
> > #include "hw/pci/pcie_regs.h"
> > #include "hw/pci/pcie_aer.h"
> > +#include "hw/pci/pcie_sriov.h"
> > #include "hw/hotplug.h"
> >
> > typedef enum {
> > @@ -74,6 +75,11 @@ struct PCIExpressDevice {
> > /* AER */
> > uint16_t aer_cap;
> > PCIEAERLog aer_log;
> > +
> > + /* SR/IOV */
> > + uint16_t sriov_cap;
> > + PCIESriovPF sriov_pf;
> > + PCIESriovVF sriov_vf;
> > };
> >
> > #define COMPAT_PROP_PCP "power_controller_present"
> > diff --git a/include/hw/pci/pcie_sriov.h
> > b/include/hw/pci/pcie_sriov.h
> > new file mode 100644
> > index 0000000..71c2b00
> > --- /dev/null
> > +++ b/include/hw/pci/pcie_sriov.h
> > @@ -0,0 +1,58 @@
> > +/*
> > + * pcie_sriov.h:
> > + *
> > + * Implementation of SR/IOV emulation support.
> > + *
> > + * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2
> > or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef QEMU_PCIE_SRIOV_H
> > +#define QEMU_PCIE_SRIOV_H
> > +
> > +struct PCIESriovPF {
> > + uint16_t num_vfs; /* Number of virtual functions
> > created */
> > + uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each
> > VF bar */
> > + const char *vfname; /* Reference to the device type
> > used for the VFs */
> > + PCIDevice **vf; /* Pointer to an array of num_vfs
> > VF devices */
> > +};
> > +
> > +struct PCIESriovVF {
> > + PCIDevice *pf; /* Pointer back to owner physical
> > function */
> > +};
>
>
> Regarding naming conventions, maybe you it should be PCIESRIOVPF o Ir
> PCIESRIOVPf,
> but this would not justify another version (in my opinion)
>
> > +
> > +/* Optionally add supported page sizes to the mask of supported
> > page sizes
> > + * Page size values are interpreted as opt_sup_pgsize << 12.
> > + */
> > +void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t
> > opt_sup_pgsize);
> > +
> > +/* SR/IOV capability config write handler */
> > +void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
> > + uint32_t val, int len);
> > +
> > +/* Reset SR/IOV VF Enable bit to unregister all VFs */
> > +void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
> > +
> > +#endif /* QEMU_PCIE_SRIOV_H */
> > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> > index ee1ce1d..0b89316 100644
> > --- a/include/qemu/typedefs.h
> > +++ b/include/qemu/typedefs.h
> > @@ -54,6 +54,8 @@ typedef struct PCIDevice PCIDevice;
> > typedef struct PCIEAERErr PCIEAERErr;
> > typedef struct PCIEAERLog PCIEAERLog;
> > typedef struct PCIEAERMsg PCIEAERMsg;
> > +typedef struct PCIESriovPF PCIESriovPF;
> > +typedef struct PCIESriovVF PCIESriovVF;
> > typedef struct PCIEPort PCIEPort;
> > typedef struct PCIESlot PCIESlot;
> > typedef struct PCIExpressDevice PCIExpressDevice;
> > diff --git a/trace-events b/trace-events
> > index a0ddc6b..fd51c5f 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1531,6 +1531,11 @@ xen_pv_mmio_write(uint64_t addr) "WARNING:
> > write to Xen PV Device MMIO space (ad
> > pci_cfg_read(const char *dev, unsigned devid, unsigned fnid,
> > unsigned offs, unsigned val) "%s %02u:%u @0x%x -> 0x%x"
> > pci_cfg_write(const char *dev, unsigned devid, unsigned fnid,
> > unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"
> >
> > +# hw/pci/pcie_sriov.c
> > +sriov_register_vfs(const char *name, int slot, int function, int
> > num_vfs) "%s %02x:%x: creating %d vf devs"
> > +sriov_unregister_vfs(const char *name, int slot, int function, int
> > num_vfs) "%s %02x:%x: Unregistering %d vf devs"
> > +sriov_config_write(const char *name, int slot, int fun, uint32_t
> > offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x
> > val 0x%x len %d"
> > +
> > # hw/vfio/pci.c
> > vfio_intx_interrupt(const char *name, char line) " (%s) Pin %c"
> > vfio_intx_eoi(const char *name) " (%s) EOI"
> >
>
> Thanks for all the work, besides the questions above, from my point
> of view:
>
>
> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
>
> Thanks,
> Marcel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
2015-10-18 11:02 ` Marcel Apfelbaum
2015-10-18 12:26 ` Michael S. Tsirkin
2015-10-18 15:00 ` Knut Omang
@ 2015-10-18 15:57 ` Knut Omang
2015-10-19 9:00 ` Paolo Bonzini
3 siblings, 0 replies; 23+ messages in thread
From: Knut Omang @ 2015-10-18 15:57 UTC (permalink / raw)
To: marcel, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Paolo Bonzini,
Dotan Barak, Richard Henderson
On Sun, 2015-10-18 at 14:02 +0300, Marcel Apfelbaum wrote:
> > +#define SRIOV_ID(dev) \
> > + (dev)->name, PCI_SLOT((dev)->devfn), PCI_FUNC((dev)->devfn)
>
> This is a little "hacky" but it is used only for tracing and inside a
> C file.
> I am OK with it.
For the sake of readability of the trace calls and to avoid the trace
lines from disturbing the code overview too much - I would have had to
split the line of each trace call.
> > +++ b/include/hw/pci/pcie_sriov.h
> > @@ -0,0 +1,58 @@
> > +/*
> > + * pcie_sriov.h:
> > + *
> > + * Implementation of SR/IOV emulation support.
> > + *
> > + * Copyright (c) 2015 Knut Omang <knut.omang@oracle.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2
> > or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef QEMU_PCIE_SRIOV_H
> > +#define QEMU_PCIE_SRIOV_H
> > +
> > +struct PCIESriovPF {
> > + uint16_t num_vfs; /* Number of virtual functions
> > created */
> > + uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each
> > VF bar */
> > + const char *vfname; /* Reference to the device type
> > used for the VFs */
> > + PCIDevice **vf; /* Pointer to an array of num_vfs
> > VF devices */
> > +};
> > +
> > +struct PCIESriovVF {
> > + PCIDevice *pf; /* Pointer back to owner physical
> > function */
> > +};
>
>
> Regarding naming conventions, maybe you it should be PCIESRIOVPF o Ir
> PCIESRIOVPf,
> but this would not justify another version (in my opinion)
Yes, I did think about that, but didn't like the readability of either
of these, still wanted to try to follow conventions - this was the
tradeoff I ended at. SR/IOV is kind of complicated compared to e.g. AER
in that the '/' has to go anyway :-D
> > +
> > +/* Optionally add supported page sizes to the mask of supported
> > page sizes
> > + * Page size values are interpreted as opt_sup_pgsize << 12.
> > + */
> > +void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t
> > opt_sup_pgsize);
> > +
> > +/* SR/IOV capability config write handler */
> > +void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
> > + uint32_t val, int len);
> > +
> > +/* Reset SR/IOV VF Enable bit to unregister all VFs */
> > +void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
> > +
> > +#endif /* QEMU_PCIE_SRIOV_H */
> > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> > index ee1ce1d..0b89316 100644
> > --- a/include/qemu/typedefs.h
> > +++ b/include/qemu/typedefs.h
> > @@ -54,6 +54,8 @@ typedef struct PCIDevice PCIDevice;
> > typedef struct PCIEAERErr PCIEAERErr;
> > typedef struct PCIEAERLog PCIEAERLog;
> > typedef struct PCIEAERMsg PCIEAERMsg;
> > +typedef struct PCIESriovPF PCIESriovPF;
> > +typedef struct PCIESriovVF PCIESriovVF;
> > typedef struct PCIEPort PCIEPort;
> > typedef struct PCIESlot PCIESlot;
> > typedef struct PCIExpressDevice PCIExpressDevice;
> > diff --git a/trace-events b/trace-events
> > index a0ddc6b..fd51c5f 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1531,6 +1531,11 @@ xen_pv_mmio_write(uint64_t addr) "WARNING:
> > write to Xen PV Device MMIO space (ad
> > pci_cfg_read(const char *dev, unsigned devid, unsigned fnid,
> > unsigned offs, unsigned val) "%s %02u:%u @0x%x -> 0x%x"
> > pci_cfg_write(const char *dev, unsigned devid, unsigned fnid,
> > unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"
> >
> > +# hw/pci/pcie_sriov.c
> > +sriov_register_vfs(const char *name, int slot, int function, int
> > num_vfs) "%s %02x:%x: creating %d vf devs"
> > +sriov_unregister_vfs(const char *name, int slot, int function, int
> > num_vfs) "%s %02x:%x: Unregistering %d vf devs"
> > +sriov_config_write(const char *name, int slot, int fun, uint32_t
> > offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x
> > val 0x%x len %d"
> > +
> > # hw/vfio/pci.c
> > vfio_intx_interrupt(const char *name, char line) " (%s) Pin %c"
> > vfio_intx_eoi(const char *name) " (%s) EOI"
> >
>
> Thanks for all the work, besides the questions above, from my point
> of view:
>
>
> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Thanks,
Knut
> Thanks,
> Marcel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
2015-10-18 11:02 ` Marcel Apfelbaum
` (2 preceding siblings ...)
2015-10-18 15:57 ` Knut Omang
@ 2015-10-19 9:00 ` Paolo Bonzini
3 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2015-10-19 9:00 UTC (permalink / raw)
To: marcel, Knut Omang, qemu-devel
Cc: Eduardo Habkost, Michael S. Tsirkin, Richard W.M. Jones,
Alex Williamson, Gonglei (Arei), Jan Kiszka, Dotan Barak,
Richard Henderson
On 18/10/2015 13:02, Marcel Apfelbaum wrote:
>
> + pci_device_deassert_intx(dev);
> + assert(dev->irq_state == 0);
> +
> + /* Clear all writable bits */
> + pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
> + pci_get_word(dev->wmask + PCI_COMMAND) |
> + pci_get_word(dev->w1cmask +
> PCI_COMMAND));
> + pci_word_test_and_clear_mask(dev->config + PCI_STATUS,
> + pci_get_word(dev->wmask + PCI_STATUS) |
> + pci_get_word(dev->w1cmask + PCI_STATUS));
> + dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
> + dev->config[PCI_INTERRUPT_LINE] = 0x0;
> + pci_reset_regions(dev);
> + pci_update_mappings(dev);
> +
> + msi_reset(dev);
> + msix_reset(dev);
All this should stay in pci_do_device_reset. Of course it's okay to
split the PF-specific parts to pci_reset_regions.
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-10-19 9:01 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-14 15:51 [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Knut Omang
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 1/4] pci: Make use of the devfn property when registering new devices Knut Omang
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 2/4] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Knut Omang
2015-10-18 11:02 ` Marcel Apfelbaum
2015-10-18 12:26 ` Michael S. Tsirkin
2015-10-18 15:00 ` Knut Omang
2015-10-18 15:57 ` Knut Omang
2015-10-19 9:00 ` Paolo Bonzini
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 3/4] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Knut Omang
2015-10-16 8:36 ` Michael S. Tsirkin
2015-10-16 9:56 ` Knut Omang
2015-10-16 10:54 ` Michael S. Tsirkin
2015-10-16 11:25 ` Knut Omang
2015-10-16 11:34 ` Michael S. Tsirkin
2015-10-16 11:39 ` Paolo Bonzini
2015-10-16 11:27 ` Marcel Apfelbaum
2015-10-16 11:32 ` Richard W.M. Jones
2015-10-16 11:39 ` Michael S. Tsirkin
2015-10-18 11:03 ` Marcel Apfelbaum
2015-10-14 15:51 ` [Qemu-devel] [PATCH v5 4/4] pcie: A few minor fixes (type+code simplify) Knut Omang
2015-10-18 11:03 ` Marcel Apfelbaum
2015-10-18 11:02 ` [Qemu-devel] [PATCH v5 0/4] pcie: Add support for Single Root I/O Virtualization Marcel Apfelbaum
2015-10-18 14:39 ` Knut Omang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).